Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 2.4k
Star 22.3k

Code
Issues 946
Pull requests 115
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: Dao-AILab/flash-attention

Labels 9 Milestones 0

New pull request New

115 Open 445 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add SM120 (Blackwell GeForce / DGX Spark) forward pass support

#2268 opened Feb 20, 2026 by blake-snc

Loading…

2 of 4 tasks

[WIP,Cute,Flex,Sm100] vectorized mask mod application

#2261 opened Feb 17, 2026 by reubenconducts • Draft

Add sm_121 support for NVIDIA GB10 GPUs (CUDA 13.0+)

#2257 opened Feb 13, 2026 by askliar

Loading…

Deterministic backward for blocksparse impl

#2253 opened Feb 12, 2026 by drisspg

Loading…

correct max_seqlen_q for performance.

#2252 opened Feb 12, 2026 by lightzhan-intellif

Loading…

[Cute] Handle window_size=(-1, -1) for non-local attention

#2251 opened Feb 11, 2026 by henrylhtsang

Loading…

Add two-level accumulation for SM90 FP8 FWD to mitigate long-context degradation

#2250 opened Feb 11, 2026 by jmkuebler

Loading…

Fix directory path in README instructions

#2249 opened Feb 11, 2026 by Chuge0335

Loading…

support the flash api for Ascend

#2246 opened Feb 10, 2026 by AnyFree813

Loading…

Add gfx1150/gfx1151 (RDNA 3.5) to RDNA_ARCHS

#2243 opened Feb 9, 2026 by rwfsmith

Loading…

[AMD] Migrate to Triton Backend to Aiter

#2230 opened Feb 4, 2026 by micmelesse

Loading…

Nicer headdim error message

#2227 opened Feb 4, 2026 by drisspg

Loading…

[WIP] varlen blocksparsity

#2224 opened Feb 2, 2026 by reubenconducts • Draft

[Draft][Cute,Fwd,Sm120] FA Cute DSL sm12x

#2222 opened Feb 2, 2026 by johnnynunez • Draft

[Ai-assisted] CLC work stealing

#2218 opened Jan 31, 2026 by drisspg

Loading…

[ROCM] Add support with Infinity Cache (LLC) awareness for performance improvement - [PR#2147 rebased on PR#2178]

#2217 opened Jan 29, 2026 by tianwyan

Loading…

Add shift scheduler for deterministic full‑mask FA3 bwd on Hopper (sm90)

#2207 opened Jan 23, 2026 by tie-pilot-qxw

Loading…

Add loc info & Fix api changes for CuTeDSL 4.4

#2204 opened Jan 23, 2026 by keithzzzzz

Loading…

[Cute, SM100] Fix comment in tmem_p_offset

#2201 opened Jan 22, 2026 by Edenzzzz

Loading…

Warn when ninja is missing

#2191 opened Jan 17, 2026 by blueberrycongee

Loading…

Fix compute_block_sparsity import in benchmark_mask_mod

#2190 opened Jan 17, 2026 by blueberrycongee

Loading…

[Cute][Testing] Protyping a fast test mode for Cute

#2188 opened Jan 16, 2026 by drisspg

Loading…

[Cute,Fwd,Sm100] support irregular qhead / kvhead ratios

#2186 opened Jan 16, 2026 by timmy-feng • Draft

[Cute] Add torch.compile support for FA4

#2164 opened Jan 9, 2026 by gilfordting

Loading…

Update mha_fwd.cpp, Normalize the commented-out parameters

#2160 opened Jan 9, 2026 by breakfei

Loading…

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!