-
Notifications
You must be signed in to change notification settings - Fork 16.1k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
common : gpt-oss handle builtins and unsolicited tool calls
testing
Everything test related
#21213
opened Mar 31, 2026 by
aldehir
Loading…
opencl: fix leak in Adreno q8_0 path
ggml
changes relating to the ggml tensor library for machine learning
OpenCL
Issues specific to the OpenCL backend
sampler: Disable backend sampling if reasoning budget is enabled
#21209
opened Mar 31, 2026 by
Galunid
Loading…
CI: Enable CPU and Vulkan ARM64 Release
devops
improvements to build systems and github actions
#21207
opened Mar 31, 2026 by
ehfd
Loading…
webui: fix syntax highlighting lost after streaming for non-common languages
examples
server
#21206
opened Mar 31, 2026 by
hmblair
Loading…
CANN: Add suport for Qwen35 ops
Ascend NPU
issues specific to Ascend NPUs
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
server: respect the ignore eos flag
examples
python
python script changes
server
#21203
opened Mar 31, 2026 by
ykhrustalev
Loading…
Fix undefined timing measurement errors in server context
examples
server
#21201
opened Mar 30, 2026 by
thedanhoffman
Loading…
[SYCL] Enhance flash-attention performance
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#21185
opened Mar 30, 2026 by
arthw
Loading…
tests: allow exporting graph ops from HF file without downloading weights
testing
Everything test related
#21182
opened Mar 30, 2026 by
0cc4m
Loading…
Add API key server support with optional arguments --api-key and --ju…
examples
python
python script changes
#21180
opened Mar 30, 2026 by
gelim
Loading…
server: improve Responses API compliance and Codex CLI compatibility
examples
python
python script changes
server
#21174
opened Mar 30, 2026 by
krystophny
•
Draft
8 tasks done
ggml-cuda: fix ROCm multi-GPU illegal memory access in recurrent state restore
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21170
opened Mar 30, 2026 by
uaruss
Loading…
ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21168
opened Mar 30, 2026 by
iacopPBK
Loading…
cpp: Adding new arch RUGPT3XL
model
Model specific
python
python script changes
#21161
opened Mar 29, 2026 by
EvilFreelancer
Loading…
Cross-backend profiler
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
Ascend NPU
issues specific to Ascend NPUs
documentation
Improvements or additions to documentation
examples
ggml
changes relating to the ggml tensor library for machine learning
Hexagon
IBM zDNN
issues specific to IBM zDNN Accelerator
Nvidia GPU
Issues specific to Nvidia GPUs
OpenCL
Issues specific to the OpenCL backend
OpenVINO
python
python script changes
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
Vulkan
Issues specific to the Vulkan backend
WebGPU
[CUDA ] Write an optimized flash_attn_stream_k_fixup kernel
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21159
opened Mar 29, 2026 by
gaugarg-nv
Loading…
ggml-cpu: fix fallback for RVV kernels without zvfh
ggml
changes relating to the ggml tensor library for machine learning
#21157
opened Mar 29, 2026 by
taimur-10x
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-03-28.