Skip to content

Misc. bug: performance regression in llama-server (ggml-vulkan) #17033

Description

@kpouget

Name and Version

master branch

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server --model /path/to/llama3.2 --host 0.0.0.0 --port 11434 --n-gpu-layers 99

Problem description & steps to reproduce

I think #16736 introduced a regression in the performance of llama-server, when backed by ggml-vulkan (but no regression with ggml-metal, which seems strange):

Note that llama-bench isn't affected, pp512=223t/s and tg128=28t/s didn't degrade.

I looked further at the performance of the commits between these two days (tests ran on another system, so the absolute values are different)

b6924..b6927

Image
    - b6924
    - cd5e3b575 # b6925
    - 2f966b8ed # b6926
    - b6927
Image

and they highlight that the performance dropped occurred with cd5e3b5, when #16736 was merged.

First Bad Commit

cd5e3b5

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions