Skip to content

[CuTe,Flex] varlen blocksparsity#2224

Open
reubenconducts wants to merge 5 commits intoDao-AILab:mainfrom
reubenconducts:rstern/varlen-blocksparsity
Open

[CuTe,Flex] varlen blocksparsity#2224
reubenconducts wants to merge 5 commits intoDao-AILab:mainfrom
reubenconducts:rstern/varlen-blocksparsity

Conversation

@reubenconducts
Copy link
Copy Markdown
Contributor

@reubenconducts reubenconducts commented Feb 2, 2026

This PR extends blocksparsity to the variable sequence length case. Whereas in batched blocksparsity the metadata tensors take the shapes

*_block_cnt: [batch_size, num_heads, num_m_blocks]
*_block_idx: [batch_size, num_heads, num_m_blocks, num_n_blocks],

in varlen blocksparsity, we pack our metadata tensors to take the shapes

*_block_cnt: [num_heads, total_m_blocks]
*_block_idx: [num_heads, total_n_blocks]

where total_m_blocks is the sum of all m blocks per head (equiv. number of work tiles per head) and total_n_blocks is the total of all n blocks potentially processed per head across all sequences in the batch. For example, consider a varlen batch with sequences contained in seqlens_q and seqlens_k. At batch index b, we let

num_m(b) = ceildiv(seqlens_q[b], tile_m)
num_n(b) = ceildiv(seqlens_k[b], tile_n)

and define

total_m_blocks = sum_{b \in B} num_m(b)
total_n_blocks = sum_{b \in B} num_m(b) * num_n(b)

To properly index into the blocksparsity tensors, we use auxiliary mCuTotalMBlocks and mCuTotalNBlocks tensors, which can be prepared on host.

cc @drisspg @v0i0

NOT INTENDED FOR THIS PR:

  • bwd support

@SeanLi-OI
Copy link
Copy Markdown

Hi there, @reubenconducts ! Thank you so much for your draft.

Since I also need this feature eagerly, I tried to continue development based on your branch which fixes some grammar issue (SeanLi-OI@7ccfc5e). Though it can run, but returns wrong results when batch_size > 1.

I completely understand you may be busy with other priorities. If you have a moment, I’d be truly grateful for any guidance:
Are my modifications heading in the right direction? Or do you plan to continue updating this PR?

@reubenconducts
Copy link
Copy Markdown
Contributor Author

@SeanLi-OI Yes, I will be continuing this, but not until next week.

@reubenconducts reubenconducts force-pushed the rstern/varlen-blocksparsity branch from a4f3021 to bc15c46 Compare February 15, 2026 19:53
@reubenconducts reubenconducts force-pushed the rstern/varlen-blocksparsity branch from bc15c46 to 03f2f92 Compare February 25, 2026 19:03
@reubenconducts reubenconducts changed the title [WIP] varlen blocksparsity [CuTe,Flex] varlen blocksparsity Feb 25, 2026
@reubenconducts reubenconducts marked this pull request as ready for review February 25, 2026 19:05
@reubenconducts reubenconducts marked this pull request as draft February 25, 2026 19:17
@reubenconducts reubenconducts force-pushed the rstern/varlen-blocksparsity branch from 03f2f92 to 04d3016 Compare February 26, 2026 16:42
@reubenconducts reubenconducts marked this pull request as ready for review February 26, 2026 16:42
Comment thread flash_attn/cute/block_sparse_utils.py Outdated
Comment thread flash_attn/cute/block_sparse_utils.py
@reubenconducts reubenconducts force-pushed the rstern/varlen-blocksparsity branch from ab9bbeb to 9c4370b Compare March 16, 2026 17:41
@wqwqazwsxedc
Copy link
Copy Markdown

Hi @drisspg, @reubenconducts, just checking in on this PR. Are there any remaining blockers or changes needed before it can be merged?
I think this feature could be useful to implement something akin to PrefixGrouper for varlen sequences, which would be useful for RL training. I'd be glad to help if necessary.

@reubenconducts reubenconducts force-pushed the rstern/varlen-blocksparsity branch from 6b85348 to 6f736e9 Compare April 10, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants