Skip to content

[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel#21086

Closed
gaugarg-nv wants to merge 3 commits intoggml-org:masterfrom
gaugarg-nv:reduce_stream_k_block
Closed

[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel#21086
gaugarg-nv wants to merge 3 commits intoggml-org:masterfrom
gaugarg-nv:reduce_stream_k_block

Commits