UPSTREAM PR #18180: vulkan: fix im2col overflowing maxworkgroupcount#618
UPSTREAM PR #18180: vulkan: fix im2col overflowing maxworkgroupcount#618
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #618OverviewThis PR addresses a Vulkan backend crash when processing large tensors in the im2col operation. The fix clamps workgroup counts to hardware limits and implements grid-stride loops in the shader. Changes are isolated to the Vulkan backend with no impact on CPU inference paths or tokenization functions. Key FindingsPerformance-Critical Areas Impact: The modifications affect only the Vulkan backend's im2col operation used in convolution preprocessing for vision models. Core inference functions (llama_decode, llama_encode, llama_tokenize) show no changes in response time or throughput. The CPU backend, which handles text-only inference, remains completely unaffected. Tokens Per Second Impact: No impact on tokens per second for LLM inference. The tokenization and decode functions execute on CPU backend paths that are unchanged by this PR. Vision model processing may experience 5-10% overhead in overflow scenarios, but this represents enabling execution where crashes previously occurred rather than degrading existing performance. Power Consumption Analysis: No measurable power consumption changes for the llama-cli binary during text inference workloads. The Vulkan backend modifications only activate during vision model convolution operations, which are not part of standard LLM token generation pipelines. Modified Functions: The changes affect ggml_vk_im2col and ggml_vk_dispatch_pipeline within the Vulkan backend. These functions are not in the critical path for text generation. For typical LLM workloads processing text tokens, the execution flow bypasses these functions entirely, maintaining baseline performance characteristics. |
e8bf2a6 to
9c8623e
Compare
048ad94 to
6c1fde6
Compare
823244c to
bab7d39
Compare
9ea4a65 to
c001e9f
Compare
Mirrored from ggml-org/llama.cpp#18180
Fixes #18164.