[core] Enable CP for kernels-based attention backends by sayakpaul · Pull Request #12812 · huggingface/diffusers

sayakpaul · 2025-12-09T09:47:25Z

What does this PR do?

Adds CP support to the kernels-based attention backends.

Our CP support is quickly gaining traction. Currently, we have a few attention backends that are fully based on kernels. In order for their adoption to grow and make them a bit more complete in terms of feature parity, I think we should make them CP-compatible, too.

Code to test:

import argparse
import torch
from torch import distributed as dist
from diffusers import DiffusionPipeline, ContextParallelConfig, AutoModel


CKPT_ID = "black-forest-labs/FLUX.1-dev"

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--cp-backend",
        type=str,
        choices=["ring", "ulysses", "unified"],
        default="ulysses",
        help="Context parallel backend to use.",
    )
    parser.add_argument(
        "--attn-backend",
        type=str,
        choices=["flash_hub", "_flash_3_hub", "sage_hub"],
        default="flash_hub",
        help="Attention backend to use.",
    )
    return parser.parse_args()


def setup_distributed():
    if not dist.is_initialized():
        dist.init_process_group(backend="nccl")
    rank = dist.get_rank()
    device = torch.device(f"cuda:{rank}")
    torch.cuda.set_device(device)
    return device


def main():
    args = parse_args()

    device = setup_distributed()
    world_size = dist.get_world_size()
    if args.cp_backend == "ring":
        cp_config = ContextParallelConfig(ring_degree=world_size)
    elif args.cp_backend == "unified":
        cp_config = ContextParallelConfig(ring_degree=world_size // 2, ulysses_degree=world_size // 2)
    else:
        cp_config = ContextParallelConfig(ulysses_degree=world_size)

    transformer = AutoModel.from_pretrained(
        CKPT_ID, 
        subfolder="transformer", 
        torch_dtype=torch.bfloat16, 
        parallel_config=cp_config
    )

    pipeline = DiffusionPipeline.from_pretrained(
        CKPT_ID, transformer=transformer, torch_dtype=torch.bfloat16,
    ).to(device)
    pipeline.transformer.set_attention_backend(args.attn_backend)

    prompt = """
    cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
    highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
    """

    generator = torch.Generator().manual_seed(42)
    image = pipeline(
        prompt,
        guidance_scale=3.5,
        num_inference_steps=50,
        generator=generator,
    ).images[0]

    if dist.get_rank() == 0:
        image.save(f"output_{args.cp_backend}_{args.attn_backend}.png")

    if dist.is_initialized():
        dist.destroy_process_group()


if __name__ == "__main__":
    main()

Outputs:

FA2+ Ulysses	FA3 + Ulysses	SAGE + Ulysses

sayakpaul · 2025-12-09T09:47:56Z

src/diffusers/models/attention_dispatch.py

+        wrapped_forward_attr="flash_attn_interface._wrapped_flash_attn_forward",
+        wrapped_backward_attr="flash_attn_interface._wrapped_flash_attn_backward",


Only FA2 provides these.

If you take a closer look, there is an equivalent for FA3. FA2 just renames its backward for wrapped_xxx

The original backward is noted in https://github.com/Dao-AILab/flash-attention/blob/a8780f2a17099fc1a3e7b00d7f5d9e08c5b71142/flash_attn/flash_attn_interface.py#L330-L333 (which is essentially just fancy ABI wrapping)

In lower torch this leads to https://github.com/Dao-AILab/flash-attention/blob/a8780f2a17099fc1a3e7b00d7f5d9e08c5b71142/flash_attn/flash_attn_interface.py#L242

So I expect that when torch may come around FA3, we get the same standardization but for now the equivalent is just

https://github.com/Dao-AILab/flash-attention/blob/a8780f2a17099fc1a3e7b00d7f5d9e08c5b71142/hopper/flash_attn_interface.py#L256

HuggingFaceDocBuilderDev · 2025-12-09T09:55:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-01-10T15:02:43Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2026-01-11T14:35:57Z

@DN6 a gentle ping.

DN6

One comment about the FA3 backward. Not a merge blocker since it mostly affects CP based training

DN6 · 2026-02-19T06:29:55Z

src/diffusers/models/attention_dispatch.py

+        key_r = key.detach().requires_grad_(True)
+        value_r = value.detach().requires_grad_(True)
+
+        out = kernel_fn(


This would result in a second in a forward pass during the backward op right? Would it make sense to just raise an error here similar to sage attention?

Added a comment in 9465231

sayakpaul added 2 commits December 9, 2025 14:39

up

82d20e6

up

7a8f85b

sayakpaul requested a review from DN6 December 9, 2025 09:47

sayakpaul added the performance Anything related to performance improvements, profiling and benchmarking label Dec 9, 2025

sayakpaul commented Dec 9, 2025

View reviewed changes

sayakpaul added 3 commits December 9, 2025 15:30

up

f732ff1

Merge branch 'main' into enable-cp-kernels

9bd8361

Merge branch 'main' into enable-cp-kernels

dfbd485

github-actions bot added the stale Issues that haven't received updates label Jan 10, 2026

up

2268583

sayakpaul removed the stale Issues that haven't received updates label Jan 11, 2026

sayakpaul added 3 commits January 19, 2026 10:28

Merge branch 'main' into enable-cp-kernels

7943857

Merge branch 'main' into enable-cp-kernels

dab372d

Merge branch 'main' into enable-cp-kernels

8cd0f86

sayakpaul added the roadmap Add to current release roadmap label Feb 16, 2026

github-project-automation bot added this to Diffusers Roadmap 0.37 Feb 16, 2026

github-project-automation bot moved this to In Progress in Diffusers Roadmap 0.37 Feb 16, 2026

sayakpaul and others added 3 commits February 16, 2026 11:01

resolve conflicts.

ac4b881

Merge branch 'main' into enable-cp-kernels

a906ada

Merge branch 'main' into enable-cp-kernels

e80f6c9

DN6 approved these changes Feb 19, 2026

View reviewed changes

up

9465231

sayakpaul merged commit 99daaa8 into main Feb 19, 2026
12 checks passed

github-project-automation bot moved this from In Progress to Done in Diffusers Roadmap 0.37 Feb 19, 2026

sayakpaul deleted the enable-cp-kernels branch February 19, 2026 12:46

sayakpaul mentioned this pull request Feb 19, 2026

[attention backends] use dedicated wrappers from fa3 for cp. #13165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[core] Enable CP for kernels-based attention backends#12812

[core] Enable CP for kernels-based attention backends#12812
sayakpaul merged 13 commits intomainfrom
enable-cp-kernels

sayakpaul commented Dec 9, 2025 •

edited

Loading

Uh oh!

sayakpaul Dec 9, 2025

Uh oh!

vasqu Feb 19, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Dec 9, 2025

Uh oh!

github-actions bot commented Jan 10, 2026

Uh oh!

sayakpaul commented Jan 11, 2026

Uh oh!

DN6 left a comment

Uh oh!

DN6 Feb 19, 2026

Uh oh!

sayakpaul Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		wrapped_forward_attr="flash_attn_interface._wrapped_flash_attn_forward",
		wrapped_backward_attr="flash_attn_interface._wrapped_flash_attn_backward",

Comments

Conversation

sayakpaul commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

sayakpaul Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 9, 2025

Uh oh!

github-actions bot commented Jan 10, 2026

Uh oh!

sayakpaul commented Jan 11, 2026

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

DN6 Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sayakpaul commented Dec 9, 2025 •

edited

Loading