[Community][Dev] feat(moe): Adding context parallel support to eager attention implementation by nrailg · Pull Request #1859 · NVIDIA/Megatron-LM

nrailg · 2025-10-13T15:15:51Z

Sometimes, certain attention and mask implementations are difficult to write a fused / optimized implementation in a short period of time. However, we still need to run experiments to verify their effectiveness.
At such times, we need to fallback to the eager mode. Therefore, I added a switch to fallback to the eager implementation of attention:

--fallback-to-eager-attn

Additionally, since Megatron Core's eager attention does not support context parallelism, I provided a distributed attention implementation similar to that described in the Llama 3 paper.

copy-pr-bot · 2025-10-13T15:15:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yuzhongw-nvidia · 2025-10-27T07:43:55Z

/ok to test c2410fc

github-actions · 2025-10-27T07:44:17Z

Thank you for your contribution!

NVIDIA Megatron-LM is currently transitioning to development on Github. We will aim to review your PR after we complete our transition and stabilize our Github development process.

Thank you for your understanding.

yuzhongw-nvidia · 2025-10-31T04:18:04Z

/ok to test 158e53b

hxbai · 2025-11-04T06:48:57Z

megatron/core/models/gpt/gpt_layer_specs.py

            raise AssertionError("use_te_activation_func not compatible with using kitchen.")
    else:
-        backend = TESpecProvider()
+        backend = TESpecProvider(fallback_to_eager_attn=fallback_to_eager_attn)


I think it is better to handle this in get_attention_module_spec_for_backend rather than modify the TESpecProvider since we have other backends like Kitchen.

Similar to the code here

Megatron-LM/megatron/core/models/gpt/gpt_layer_specs.py

Line 454 in ee14b5b

module = TEFusedMLP if use_te_op_fuser else MLP

hxbai · 2025-11-04T08:53:47Z

megatron/core/transformer/transformer_config.py

                    f"the number of layers ({self.num_layers})"
                )

+        if self.fallback_to_eager_attn:


Please also check --cp-comm-type to match the implementation if CP is enabled.

hxbai · 2025-11-04T10:16:15Z

tests/unit_tests/transformer/test_attention.py

+    return attn_output
+
+
+def test_eager_attention_function():


Please modify this to a parallel version to test the CP and TPxCP cases.

fix pipeline

nrailg changed the title ~~Add context parallel support to eager attention implementation~~ Adding context parallel support to eager attention implementation Oct 13, 2025

yanring requested a review from yuzhongw-nvidia October 13, 2025 15:21

nrailg force-pushed the nrwu/eagercp branch 2 times, most recently from 10ddbd8 to 5c981b7 Compare October 14, 2025 08:48

nrailg added 2 commits October 15, 2025 17:30

fallback to eager attn config

9568a15

adding cp support to eager attn

b4c7979

nrailg force-pushed the nrwu/eagercp branch from 5c981b7 to b4c7979 Compare October 15, 2025 09:30

yuzhongw-nvidia added 2 commits October 16, 2025 15:46

refine the code

ac0fc23

Refine dot_product_attention_context_parallel.py

370aa4a

sbhavani added the enhancement New feature or request label Oct 21, 2025

yuzhongw-nvidia changed the title ~~Adding context parallel support to eager attention implementation~~ [Dev] feat(moe): Adding context parallel support to eager attention implementation Oct 27, 2025

Merge branch 'dev' into nrwu/eagercp

c2410fc

yuzhongw-nvidia requested review from a team as code owners October 27, 2025 07:11

yanring requested a review from hxbai October 27, 2025 07:12

yuzhongw-nvidia self-assigned this Oct 27, 2025

yuzhongw-nvidia removed their request for review October 27, 2025 07:36

copy-pr-bot bot temporarily deployed to nemo-ci October 27, 2025 07:44 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 27, 2025 07:44 Failure

ko3n1g added this to the Core 0.16 milestone Oct 27, 2025

copy-pr-bot bot had a problem deploying to nemo-ci October 27, 2025 07:44 Failure

copy-pr-bot bot temporarily deployed to nemo-ci October 27, 2025 07:44 Inactive

yuzhongw-nvidia mentioned this pull request Oct 28, 2025

[Dev][Draft] Adding context parallel support to eager attention implementation #1991

Closed

6 tasks

NVIDIA deleted a comment from copy-pr-bot bot Oct 31, 2025

hxbai reviewed Nov 4, 2025

View reviewed changes

yanring changed the title ~~[Dev] feat(moe): Adding context parallel support to eager attention implementation~~ [Community][Dev] feat(moe): Adding context parallel support to eager attention implementation Nov 5, 2025

Victarry added the dev branch Dev branch related issues and development label Nov 7, 2025

nrailg and others added 8 commits November 11, 2025 15:42

fallback to eager attn config

f545d76

adding cp support to eager attn

53cac6d

refine the code

f943607

Refine dot_product_attention_context_parallel.py

b65cb5c

fix CI: copyright, linting, and import

6f5f12f

refine the code

4abf166

fix pipeline

Merge branch 'nrwu/eagercp' into eagercp

b5e8f58

Merge branch 'yuzhongw-nvidia-eagercp' into nrwu/eagercp

f561eb9

yanring approved these changes Nov 18, 2025

View reviewed changes

ko3n1g merged commit 157bec9 into NVIDIA:dev Nov 18, 2025

yanring added the community-request label Nov 18, 2025

yuzhongw-nvidia mentioned this pull request Nov 19, 2025

[Community][Main] feat(moe): Adding context parallel support to eager attention implementation #2295

Draft

6 tasks

yanring mentioned this pull request Dec 3, 2025

[ROADMAP][Updated on Jan 26] Megatron Core MoE Roadmap #1729

Open

44 tasks

yueming-yuan mentioned this pull request Dec 15, 2025

DeepSeek v3.2 support radixark/miles#305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Community][Dev] feat(moe): Adding context parallel support to eager attention implementation#1859

[Community][Dev] feat(moe): Adding context parallel support to eager attention implementation#1859
ko3n1g merged 13 commits intoNVIDIA:devfrom
nrailg:nrwu/eagercp

nrailg commented Oct 13, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 13, 2025

Uh oh!

yuzhongw-nvidia commented Oct 27, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

yuzhongw-nvidia commented Oct 31, 2025

Uh oh!

hxbai Nov 4, 2025 •

edited

Loading

Uh oh!

hxbai Nov 4, 2025 •

edited

Loading

Uh oh!

hxbai Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

nrailg commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 13, 2025

Uh oh!

yuzhongw-nvidia commented Oct 27, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

yuzhongw-nvidia commented Oct 31, 2025

Uh oh!

hxbai Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hxbai Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hxbai Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

nrailg commented Oct 13, 2025 •

edited

Loading

hxbai Nov 4, 2025 •

edited

Loading

hxbai Nov 4, 2025 •

edited

Loading

hxbai Nov 4, 2025 •

edited

Loading