Skip to content

Comments

UPSTREAM PR #18180: vulkan: fix im2col overflowing maxworkgroupcount#618

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18180-branch_jeffbolznv-im2col_wglimit
Open

UPSTREAM PR #18180: vulkan: fix im2col overflowing maxworkgroupcount#618
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18180-branch_jeffbolznv-im2col_wglimit

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18180

Fixes #18164.

@loci-review
Copy link

loci-review bot commented Dec 18, 2025

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #618

Overview

This PR addresses a Vulkan backend crash when processing large tensors in the im2col operation. The fix clamps workgroup counts to hardware limits and implements grid-stride loops in the shader. Changes are isolated to the Vulkan backend with no impact on CPU inference paths or tokenization functions.

Key Findings

Performance-Critical Areas Impact:

The modifications affect only the Vulkan backend's im2col operation used in convolution preprocessing for vision models. Core inference functions (llama_decode, llama_encode, llama_tokenize) show no changes in response time or throughput. The CPU backend, which handles text-only inference, remains completely unaffected.

Tokens Per Second Impact:

No impact on tokens per second for LLM inference. The tokenization and decode functions execute on CPU backend paths that are unchanged by this PR. Vision model processing may experience 5-10% overhead in overflow scenarios, but this represents enabling execution where crashes previously occurred rather than degrading existing performance.

Power Consumption Analysis:

No measurable power consumption changes for the llama-cli binary during text inference workloads. The Vulkan backend modifications only activate during vision model convolution operations, which are not part of standard LLM token generation pipelines.

Modified Functions:

The changes affect ggml_vk_im2col and ggml_vk_dispatch_pipeline within the Vulkan backend. These functions are not in the critical path for text generation. For typical LLM workloads processing text tokens, the execution flow bypasses these functions entirely, maintaining baseline performance characteristics.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from e8bf2a6 to 9c8623e Compare December 22, 2025 20:09
@loci-dev loci-dev force-pushed the main branch 19 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32
@loci-dev loci-dev force-pushed the main branch 8 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17
@loci-dev loci-dev force-pushed the main branch 3 times, most recently from 9ea4a65 to c001e9f Compare February 22, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants