Name and Version
master branch
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server --model /path/to/llama3.2 --host 0.0.0.0 --port 11434 --n-gpu-layers 99
Problem description & steps to reproduce
I think #16736 introduced a regression in the performance of llama-server, when backed by ggml-vulkan (but no regression with ggml-metal, which seems strange):
Note that llama-bench isn't affected, pp512=223t/s and tg128=28t/s didn't degrade.
I looked further at the performance of the commits between these two days (tests ran on another system, so the absolute values are different)
b6924..b6927
- b6924
- cd5e3b575 # b6925
- 2f966b8ed # b6926
- b6927
and they highlight that the performance dropped occurred with cd5e3b5, when #16736 was merged.
First Bad Commit
cd5e3b5
Relevant log output
Name and Version
master branch
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
I think #16736 introduced a regression in the performance of
llama-server, when backed byggml-vulkan(but no regression withggml-metal, which seems strange):on 2025-11-02
b69233777ms41mson 2025-11-03
b69337459ms62msNote that
llama-benchisn't affected,pp512=223t/sandtg128=28t/sdidn't degrade.I looked further at the performance of the commits between these two days (tests ran on another system, so the absolute values are different)
b6924..b6927and they highlight that the performance dropped occurred with cd5e3b5, when #16736 was merged.
First Bad Commit
cd5e3b5
Relevant log output