ggml-org / llama.cpp Public

Notifications You must be signed in to change notification settings
Fork 16.1k
Star 100k

Code
Issues 495
Pull requests 829
Discussions
Actions
Projects
Wiki
Security 13
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Pull requests: ggml-org/llama.cpp

Labels 93 Milestones 0

New pull request New

829 Open 9,720 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

common : gpt-oss handle builtins and unsolicited tool calls testing

Everything test related

#21213 opened Mar 31, 2026 by aldehir

Loading…

opencl: fix leak in Adreno q8_0 path ggml

changes relating to the ggml tensor library for machine learning

OpenCL

Issues specific to the OpenCL backend

#21212 opened Mar 31, 2026 by lhez • Draft

vendor : update BoringSSL to 0.20260327.0

#21211 opened Mar 31, 2026 by angt

Loading…

arg: fix incorrect default for backend sampling

#21210 opened Mar 31, 2026 by Galunid

Loading…

sampler: Disable backend sampling if reasoning budget is enabled

#21209 opened Mar 31, 2026 by Galunid

Loading…

CI: Enable CPU and Vulkan ARM64 Release devops

improvements to build systems and github actions

#21207 opened Mar 31, 2026 by ehfd

Loading…

webui: fix syntax highlighting lost after streaming for non-common languages examples server

#21206 opened Mar 31, 2026 by hmblair

Loading…

CANN: Add suport for Qwen35 ops Ascend NPU

issues specific to Ascend NPUs

ggml

changes relating to the ggml tensor library for machine learning

testing

Everything test related

#21204 opened Mar 31, 2026 by hipudding • Draft

server: respect the ignore eos flag examples python

python script changes

server

#21203 opened Mar 31, 2026 by ykhrustalev

Loading…

Fix undefined timing measurement errors in server context examples server

#21201 opened Mar 30, 2026 by thedanhoffman

Loading…

fix: include API key in CORS proxy requests for MCP connections examples server

#21193 opened Mar 30, 2026 by satishkc7

Loading…

llama-server: translating structured generation request parameters from responses API format to completions API format examples python

python script changes

server

#21187 opened Mar 30, 2026 by earslap

Loading…

[SYCL] Enhance flash-attention performance ggml

changes relating to the ggml tensor library for machine learning

SYCL

https://en.wikipedia.org/wiki/SYCL - GPU programming language

#21185 opened Mar 30, 2026 by arthw

Loading…

tests: allow exporting graph ops from HF file without downloading weights testing

Everything test related

#21182 opened Mar 30, 2026 by 0cc4m

Loading…

Add API key server support with optional arguments --api-key and --ju… examples python

python script changes

#21180 opened Mar 30, 2026 by gelim

Loading…

common : init in params parser, add Windows UTF-8 support examples server testing

Everything test related

#21176 opened Mar 30, 2026 by angt

Loading…

server: improve Responses API compliance and Codex CLI compatibility examples python

python script changes

server

#21174 opened Mar 30, 2026 by krystophny • Draft

8 tasks done

ggml-cuda: fix ROCm multi-GPU illegal memory access in recurrent state restore ggml

changes relating to the ggml tensor library for machine learning

Nvidia GPU

Issues specific to Nvidia GPUs

#21170 opened Mar 30, 2026 by uaruss

Loading…

ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels ggml

changes relating to the ggml tensor library for machine learning

Nvidia GPU

Issues specific to Nvidia GPUs

#21168 opened Mar 30, 2026 by iacopPBK

Loading…

contrib : clarify code origin guidelines

#21165 opened Mar 29, 2026 by ddh0

Loading…

cpp: Adding new arch RUGPT3XL model

Model specific

python

python script changes

#21161 opened Mar 29, 2026 by EvilFreelancer

Loading…

Cross-backend profiler Apple Metal

https://en.wikipedia.org/wiki/Metal_(API)

Ascend NPU

issues specific to Ascend NPUs

documentation

Improvements or additions to documentation

examples ggml

changes relating to the ggml tensor library for machine learning

Hexagon IBM zDNN

issues specific to IBM zDNN Accelerator

Nvidia GPU

Issues specific to Nvidia GPUs

OpenCL

Issues specific to the OpenCL backend

OpenVINO python

python script changes

SYCL

https://en.wikipedia.org/wiki/SYCL - GPU programming language

Vulkan

Issues specific to the Vulkan backend

WebGPU

#21160 opened Mar 29, 2026 by pwilkin • Draft

[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel ggml

changes relating to the ggml tensor library for machine learning

Nvidia GPU

Issues specific to Nvidia GPUs

#21159 opened Mar 29, 2026 by gaugarg-nv

Loading…

ggml-cpu: fix fallback for RVV kernels without zvfh ggml

changes relating to the ggml tensor library for machine learning

#21157 opened Mar 29, 2026 by taimur-10x

Loading…

examples : add llama-eval examples python

python script changes

#21152 opened Mar 29, 2026 by ggerganov • Draft

5 tasks

Previous 1 2 3 4 5 … 33 34 Next

Previous Next

ProTip! Updated in the last three days: updated:>2026-03-28.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!