[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel#21086
Closed
gaugarg-nv wants to merge 3 commits intoggml-org:masterfrom
Closed
[CUDA] Reduce the number of stream-k blocks to reduce the overhead of the flash_attn_stream_k_fixup kernel#21086gaugarg-nv wants to merge 3 commits intoggml-org:masterfrom
gaugarg-nv wants to merge 3 commits intoggml-org:masterfrom