Commit b01ba0c
committed
[Cute] Add block-sparsity support to SM100
- Implement block-sparse attention in flash_fwd_sm100.py
- Update interface.py to handle SM100 block size calculations
(2x multiplier for m_block_size since 1 CTA handles 2*tile_m rows)
- Add mask_mod parameter support in mask.py for block-sparse masking
- Add SM100 test fixtures and tile size handling in test_mask_mod.py
This enables block-sparsity on SM 10.0 architecture, including
mask_mod support and proper block size accounting.1 parent fbf24f6 commit b01ba0c
5 files changed
Lines changed: 785 additions & 163 deletions
File tree
- flash_attn/cute
- tests/cute
0 commit comments