[Dev] fix(moe): Support HybridEP and reduce memory overhead for 1F1B A2A overlap by lhb8125 · Pull Request #2201 · NVIDIA/Megatron-LM

lhb8125 · 2025-11-11T02:13:05Z

What does this PR do ?

replace enable_deepep with use_flex_dispatcher so that deepep and hybridep will be treated in the same way in 1f1b a2a overlap;
add some deconstructors in 1f1b a2a overlap to release the references to tensors, which helps to reduce the memory overhead;

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share discuss a design-doc with the team.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

…hybridep

copy-pr-bot · 2025-11-11T02:13:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

yanring · 2025-11-11T02:20:18Z

Thanks for the PR. Please mark the title with [Dev] fix(moe): xxx and label this PR with module:moe and dev.

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 · 2025-11-12T03:50:55Z

/ok to test 32fc988

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 · 2025-12-02T05:54:11Z

/ok to test 776d224

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 · 2025-12-02T06:30:36Z

/ok to test 487eea9

lhb8125 · 2025-12-02T06:31:53Z

@yanring @Victarry Could you give a final review of this PR? We did some modifications after the previous changes.

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 · 2025-12-02T06:33:54Z

/ok to test c568c37

yanring · 2025-12-05T03:13:25Z

@lhb8125 could you fix the api checks?
https://github.com/NVIDIA/Megatron-LM/blob/main/docs/api-backwards-compatibility-check.md

lhb8125 · 2025-12-05T03:19:28Z

/ok to test 36648e3

megatron/core/models/common/model_chunk_schedule_plan.py

yanring · 2025-12-05T04:18:04Z

megatron/core/pipeline_parallel/utils.py

                if g is not None:
                    g.record_stream(self.stream)
+                    if not self.delay_grads_release:
+                        g.untyped_storage().resize_(0)


Could you add some explanation here?

Fixed in lhb8125#50, @lhb8125 can you help take a look~

Merged, thanks!

yanring · 2025-12-05T04:21:26Z

megatron/core/model_parallel_config.py

    """Delay the weight gradient computation to improve batch-level communication overlapping"""

+    ep_overlap_early_attn_memory_release: bool = False
+    """Release the memory of the attention module early in EP overlap. Note this flag has 


This description is a bit vague—when exactly should users enable or disable this feature? Also, the connection to overlap_moe_expert_parallel_comm isn't clear here, which will likely confuse users.

Fixed in lhb8125#50, @lhb8125 can you help take a look~

Merged, thanks!

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 · 2025-12-08T02:05:45Z

/ok to test 0708cc1

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 · 2025-12-08T02:11:42Z

/ok to test 2cfaec1

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 · 2025-12-08T14:50:48Z

/ok to test 0f8663b

fix comments of dev 2201

lhb8125 · 2025-12-08T14:55:07Z

/ok to test 97de523

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 · 2025-12-08T15:38:56Z

/ok to test 12a2a22

lhb8125 and others added 16 commits September 3, 2025 03:30

release unused memory

12afb8b

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

format

ab40d7b

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

Merge branch 'main_tot' into hongbinl/1f1b_overlap_memory_issue

0e641d3

renaming golden values

1219a26

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

fix bug: accuracy issu because of recomputing and offloading same module

ce6e661

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

Merge branch 'dev' into hongbinl/activation_offloading_fix

d04d741

format

2fe4aeb

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

update golden values

fb3f7c3

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

Merge branch 'dev' into hongbinl/activation_offloading_fix

5001e2b

update golden values

9937890

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

update model_config and golden values

6c83118

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

format

33a38f5

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

update golden values

6c76b07

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

support hybridep+a2a overlap

e8c0eb0

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

Merge branch 'dev' into hongbinl/1f1b_hybridep

b207de3

Merge branch 'hongbinl/1f1b_overlap_memory_issue' into hongbinl/1f1b_…

465f497

…hybridep

lhb8125 requested review from a team as code owners November 11, 2025 02:13

minor fix

299df02

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

assert PP>1 for a2a overlap with MTP layers

dc0cb6c

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

Victarry added the dev branch Dev branch related issues and development label Nov 11, 2025

lhb8125 added module: moe Expert Review Apply this label to indicate that your PR is ready for expert review. labels Nov 12, 2025

lhb8125 changed the title ~~Support HybridEP and reduce memory overhead for 1F1B A2A overlap~~ [Dev] fix(moe): Support HybridEP and reduce memory overhead for 1F1B A2A overlap Nov 12, 2025

lhb8125 added this to the Core 0.16 milestone Nov 12, 2025

Merge branch 'dev' into hongbinl/1f1b_hybridep

32fc988

copy-pr-bot bot temporarily deployed to nemo-ci November 12, 2025 03:51 Inactive

lhb8125 and others added 2 commits December 1, 2025 21:53

format

060f53d

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

Merge branch 'dev' into hongbinl/1f1b_hybridep

776d224

remove unused try-except clause

487eea9

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

format

c568c37

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

Merge branch 'dev' into hongbinl/1f1b_hybridep

36648e3

yanring approved these changes Dec 5, 2025

View reviewed changes

yanring reviewed Dec 5, 2025

View reviewed changes

Wohox and others added 4 commits December 4, 2025 21:35

fix comments

33d4d9c

more explanation

3ee932b

Merge branch 'dev' into hongbinl/1f1b_hybridep

61a75d2

replace __del__ with explicit destructor

0708cc1

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

format

2cfaec1

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

lhb8125 and others added 2 commits December 8, 2025 22:46

Merge branch 'dev' into hongbinl/1f1b_hybridep

0e299f6

fix ut

0f8663b

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

Merge pull request #50 from Wohox/pingtian/fix_comments_2201

97de523

fix comments of dev 2201

fix ut

12a2a22

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

ko3n1g mentioned this pull request Dec 22, 2025

Revert "[Dev] fix(moe): Support HybridEP and reduce memory overhead f… #2735

Closed

6 tasks

Wohox mentioned this pull request Dec 24, 2025

[Dev] Disable ep overlap memory optimization #2750

Merged

6 tasks

yanring mentioned this pull request Jan 26, 2026

[ROADMAP][Updated on Jan 26] Megatron Core MoE Roadmap #1729

Open

44 tasks

Conversation

lhb8125 commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

copy-pr-bot bot commented Nov 11, 2025

Uh oh!

yanring commented Nov 11, 2025

Uh oh!

lhb8125 commented Nov 12, 2025

Uh oh!

lhb8125 commented Dec 2, 2025

Uh oh!

lhb8125 commented Dec 2, 2025

Uh oh!

lhb8125 commented Dec 2, 2025

Uh oh!

lhb8125 commented Dec 2, 2025

Uh oh!

yanring commented Dec 5, 2025

Uh oh!

lhb8125 commented Dec 5, 2025

Uh oh!

Uh oh!

yanring Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Wohox Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

lhb8125 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

yanring Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Wohox Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

lhb8125 Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

lhb8125 commented Dec 8, 2025

Uh oh!

lhb8125 commented Dec 8, 2025

Uh oh!

lhb8125 commented Dec 8, 2025

Uh oh!

lhb8125 commented Dec 8, 2025

Uh oh!

lhb8125 commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lhb8125 commented Nov 11, 2025 •

edited

Loading

(Step 1): Add PR label `Expert Review`