[CuTe,Flex] varlen blocksparsity by reubenconducts · Pull Request #2224 · Dao-AILab/flash-attention

reubenconducts · 2026-02-02T21:45:00Z

This PR extends blocksparsity to the variable sequence length case. Whereas in batched blocksparsity the metadata tensors take the shapes

*_block_cnt: [batch_size, num_heads, num_m_blocks]
*_block_idx: [batch_size, num_heads, num_m_blocks, num_n_blocks],

in varlen blocksparsity, we pack our metadata tensors to take the shapes

*_block_cnt: [num_heads, total_m_blocks]
*_block_idx: [num_heads, total_n_blocks]

where total_m_blocks is the sum of all m blocks per head (equiv. number of work tiles per head) and total_n_blocks is the total of all n blocks potentially processed per head across all sequences in the batch. For example, consider a varlen batch with sequences contained in seqlens_q and seqlens_k. At batch index b, we let

num_m(b) = ceildiv(seqlens_q[b], tile_m)
num_n(b) = ceildiv(seqlens_k[b], tile_n)

and define

total_m_blocks = sum_{b \in B} num_m(b)
total_n_blocks = sum_{b \in B} num_m(b) * num_n(b)

To properly index into the blocksparsity tensors, we use auxiliary mCuTotalMBlocks and mCuTotalNBlocks tensors, which can be prepared on host.

cc @drisspg @v0i0

NOT INTENDED FOR THIS PR:

bwd support

SeanLi-OI · 2026-02-11T06:53:13Z

Hi there, @reubenconducts ! Thank you so much for your draft.

Since I also need this feature eagerly, I tried to continue development based on your branch which fixes some grammar issue (SeanLi-OI@7ccfc5e). Though it can run, but returns wrong results when batch_size > 1.

I completely understand you may be busy with other priorities. If you have a moment, I’d be truly grateful for any guidance:
Are my modifications heading in the right direction? Or do you plan to continue updating this PR?

reubenconducts · 2026-02-11T15:16:57Z

@SeanLi-OI Yes, I will be continuing this, but not until next week.

wqwqazwsxedc · 2026-04-09T14:50:49Z

Hi @drisspg, @reubenconducts, just checking in on this PR. Are there any remaining blockers or changes needed before it can be merged?
I think this feature could be useful to implement something akin to PrefixGrouper for varlen sequences, which would be useful for RL training. I'd be glad to help if necessary.

reubenconducts mentioned this pull request Feb 2, 2026

[broken benchmark] benchmark_mask_mod.py is broken with ImportError: cannot import name 'compute_block_sparsity' from 'flash_attn.cute.block_sparsity' #2184

Closed

reubenconducts force-pushed the rstern/varlen-blocksparsity branch from a4f3021 to bc15c46 Compare February 15, 2026 19:53

reubenconducts force-pushed the rstern/varlen-blocksparsity branch from bc15c46 to 03f2f92 Compare February 25, 2026 19:03

reubenconducts changed the title ~~[WIP] varlen blocksparsity~~ [CuTe,Flex] varlen blocksparsity Feb 25, 2026

reubenconducts marked this pull request as ready for review February 25, 2026 19:05

reubenconducts marked this pull request as draft February 25, 2026 19:17

reubenconducts force-pushed the rstern/varlen-blocksparsity branch from 03f2f92 to 04d3016 Compare February 26, 2026 16:42

reubenconducts marked this pull request as ready for review February 26, 2026 16:42

reubenconducts mentioned this pull request Mar 14, 2026

[Feature Request] Support for mask_mod and block_sparsity in varlen sequences #2182

Open

drisspg reviewed Mar 15, 2026

View reviewed changes

Comment thread flash_attn/cute/block_sparse_utils.py Outdated

Comment thread flash_attn/cute/block_sparse_utils.py

reubenconducts force-pushed the rstern/varlen-blocksparsity branch from ab9bbeb to 9c4370b Compare March 16, 2026 17:41

reubenconducts added 3 commits April 10, 2026 17:16

[WIP] varlen blocksparsity

d740509

unify naming in block_sparse_utils and fix _flash_attn_fwd input names

a7af3a0

rebase varlen blocksparsity on main

6f736e9

reubenconducts force-pushed the rstern/varlen-blocksparsity branch from 6b85348 to 6f736e9 Compare April 10, 2026 17:39

reubenconducts and others added 2 commits April 10, 2026 18:38

add is_fake_mode to compute_block_sparsity

f05059b

Merge branch 'main' into rstern/varlen-blocksparsity

a2be20c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CuTe,Flex] varlen blocksparsity#2224

[CuTe,Flex] varlen blocksparsity#2224
reubenconducts wants to merge 5 commits intoDao-AILab:mainfrom
reubenconducts:rstern/varlen-blocksparsity

reubenconducts commented Feb 2, 2026 •

edited

Loading

Uh oh!

SeanLi-OI commented Feb 11, 2026

Uh oh!

reubenconducts commented Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

wqwqazwsxedc commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

reubenconducts commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SeanLi-OI commented Feb 11, 2026

Uh oh!

reubenconducts commented Feb 11, 2026

Uh oh!

Uh oh!

Uh oh!

wqwqazwsxedc commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

reubenconducts commented Feb 2, 2026 •

edited

Loading