feat: Add mistral small3 support in bridge by eagle705 · Pull Request #764 · NVIDIA-NeMo/Megatron-Bridge

eagle705 · 2025-09-25T17:20:03Z

Part of #493

This PR adds support for the Mistral and Mistral Small3 models to MegatronBridge
(Note: The pretraining recipe could be further optimized)

ex) HF model: mistralai/Mistral-Small-24B-Instruct-2501

Related NeMo PR: Add mistral small3 24B config and recipe NeMo#14784

Training example

from megatron.bridge import AutoBridge

import megatron.bridge.recipes.mistral.mistral_small3_24b as mistral_small3_24b
from megatron.bridge.training.gpt_step import forward_step
from megatron.bridge.training.pretrain import pretrain

if __name__ == "__main__":
    # Load Mistral Small3 24B from Hugging Face Hub and convert to Megatron
    bridge = AutoBridge.from_hf_pretrained("/work/checkpoints/hf/Mistral-Small-24B-Instruct-2501")
    
    model_provider = bridge.to_megatron_provider()
    model_provider.finalize()

    # Get defaults for other configuration from an existing mistral small3 24b recipe
    cfg = mistral_small3_24b.pretrain_config()
    cfg.train.train_iters = 5
    cfg.train.eval_iters = 2
    cfg.model.seq_length = 8192
    cfg.dataset.sequence_length = cfg.model.seq_length
    cfg.tokenizer.vocab_size = cfg.model.vocab_size
    cfg.logger.log_interval = 1
    pretrain(cfg, forward_step)

Export ckpt

from megatron.bridge import AutoBridge
from transformers import AutoConfig

if __name__ == "__main__":
    # 1) Create a bridge from a Hugging Face model (hub or local path)
    bridge = AutoBridge.from_hf_pretrained("/work/checkpoints/hf/Mistral-Small-24B-Instruct-2501")
    print(f"bridge:{bridge}")

    # 2) Get a Megatron provider and configure parallelism before instantiation
    provider = bridge.to_megatron_provider()
    provider.tensor_model_parallel_size = 1
    provider.pipeline_model_parallel_size = 1
    provider.finalize()
    
    # 3) Materialize Megatron Core model(s)
    model = provider.provide_distributed_model(wrap_with_ddp=False)

    # 4a) Export Megatron → Hugging Face (full HF folder with config/tokenizer/weights)
    bridge.save_hf_pretrained(model, "./hf_exports/mistral-small-24b-instruct-2501")

    # 4b) Or stream only weights (Megatron → HF)
    for name, weight in bridge.export_hf_weights(model, cpu=True):
        print(name, tuple(weight.shape))

...
model.layers.39.mlp.up_proj.weight (32768, 5120)
model.layers.39.mlp.down_proj.weight (5120, 32768)
model.layers.39.self_attn.o_proj.weight (5120, 4096)
model.layers.39.input_layernorm.weight (5120,)
model.layers.39.self_attn.q_proj.weight (4096, 5120)
model.layers.39.self_attn.k_proj.weight (1024, 5120)
model.layers.39.self_attn.v_proj.weight (1024, 5120)
Converting to HuggingFace ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 (243/243) MistralBridge
[rank0]:[W925 10:36:36.424335029 ProcessGroupNCCL.cpp:1505] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

copy-pr-bot · 2025-09-25T17:20:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ananthsub

thanks for the contribution! for tests, could you add functional tests to ensure that the HF <-> Megatron mapping works as expected? You can use these tests as a reference:

https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/tests/functional_tests/models/test_qwen2_provider.py

https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/tests/functional_tests/models/test_qwen2_conversion.py

ananthsub · 2025-09-25T19:29:18Z

src/megatron/bridge/recipes/mistral/mistral_7b.py

@@ -0,0 +1,214 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.


our recipe structure is being streamlined based on #607

I think it'd make sense to directly use that structure here, as the current configs are very verbose.

you can still follow the unit tests for these recipes to sanity check the default configurations. for example: https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/tests/unit_tests/recipes/qwen

@yaoyu-33

@ananthsub
Could you share an example of a new recipe structure?
Is this llama2 recipe file the one you meant?

@eagle705 yes that's the one. we can remove the function to get the model provider with the parallelism overrides, and go directly into a pretrain config. you can see the latest design in #607 . if you'd like to unblock this sooner, we can split up the changes to have the model provider + bridge merge first (so users can begin converting between HF<-> Megatron, and later add the pretraining recipe

eagle705 · 2025-09-26T08:10:05Z

thanks for the contribution! for tests, could you add functional tests to ensure that the HF <-> Megatron mapping works as expected? You can use these tests as a reference:

https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/tests/functional_tests/models/test_qwen2_provider.py

https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/tests/functional_tests/models/test_qwen2_conversion.py

@ananthsub
I added some functions and unit tests. If you have time, please take a look at them.

ananthsub · 2025-09-26T16:53:56Z

/ok to test d947b89

eagle705 · 2025-09-27T00:05:52Z

@ananthsub I updated lint fixes through pre-commit

ananthsub · 2025-09-27T00:09:33Z

/ok to test 9742299

eagle705 · 2025-09-29T08:22:54Z

@ananthsub Fix errors from some test cases (vocab_size, kv_channels, ...)

ananthsub · 2025-09-29T16:36:54Z

/ok to test c7908a7

eagle705 · 2025-09-30T03:49:34Z

/ok to test c7908a7

@ananthsub
I modified a few more cases and tested them with the following commands

pytest tests/unit_tests/models/mistral/test_mistral_model_provider.py

(53 durations < 0.005s hidden.  Use -vv to show these durations.)
===================================================== 18 passed, 4 warnings in 2.79s =====================================================


pytest tests/unit_tests/models/mistral/test_mistral_model_bridge.py

(59 durations < 0.005s hidden.  Use -vv to show these durations.)
===================================================== 20 passed, 4 warnings in 2.46s =====================================================


pytest tests/functional_tests/models/test_mistral_conversion.py
================================================ 3 passed, 4 warnings in 78.87s (0:01:18) ================================================


pytest tests/functional_tests/models/test_mistral_provider.py
===================================================== 2 passed, 4 warnings in 13.23s =====================================================

ananthsub · 2025-09-30T07:22:58Z

/ok to test a0b4f1b

ananthsub · 2025-10-10T17:41:08Z

/ok to test 270d068

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>

eagle705 · 2025-10-11T04:38:42Z

@ananthsub
I’ve removed the pretrain recipe from PR #764 to focus on the model provider + bridge.
I’ll add the pretrain recipe later in a separate PR

ananthsub · 2025-10-13T16:10:48Z

/ok to test 4c27177

* add mistral small3 Signed-off-by: Joosung Yoon <joosungy@nvidia.com> * add func and unit tests with pre-commit lint fixes to existing commits Signed-off-by: Joosung Yoon <joosungy@nvidia.com> --------- Signed-off-by: Joosung Yoon <joosungy@nvidia.com> Signed-off-by: Paul Gibbons <pgibbons@nvidia.com>

* add mistral small3 Signed-off-by: Joosung Yoon <joosungy@nvidia.com> * add func and unit tests with pre-commit lint fixes to existing commits Signed-off-by: Joosung Yoon <joosungy@nvidia.com> --------- Signed-off-by: Joosung Yoon <joosungy@nvidia.com> Signed-off-by: mollys <mollys@mollys.nvidia.com>

ananthsub reviewed Sep 25, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci September 26, 2025 16:54 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci September 27, 2025 00:09 Inactive

copy-pr-bot bot temporarily deployed to test September 27, 2025 00:10 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci September 27, 2025 03:39 Failure

eagle705 force-pushed the add-mistral-small3-24b branch from 9742299 to c7908a7 Compare September 29, 2025 08:21

copy-pr-bot bot temporarily deployed to nemo-ci September 29, 2025 16:37 Inactive

copy-pr-bot bot temporarily deployed to test September 29, 2025 16:37 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci September 29, 2025 16:48 Failure

eagle705 force-pushed the add-mistral-small3-24b branch from c7908a7 to e2e6530 Compare September 30, 2025 03:49

eagle705 requested a review from a team as a code owner September 30, 2025 03:49

eagle705 force-pushed the add-mistral-small3-24b branch from e2e6530 to d51f618 Compare September 30, 2025 04:14

copy-pr-bot bot temporarily deployed to nemo-ci September 30, 2025 07:23 Inactive

eagle705 force-pushed the add-mistral-small3-24b branch from 3cf8050 to 270d068 Compare October 10, 2025 17:29

copy-pr-bot bot temporarily deployed to nemo-ci October 10, 2025 17:41 Inactive

copy-pr-bot bot temporarily deployed to test October 10, 2025 17:41 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 10, 2025 20:52 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 10, 2025 21:03 Inactive

eagle705 added 2 commits October 10, 2025 17:26

add mistral small3

59015ce

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>

add func and unit tests with pre-commit lint fixes to existing commits

4c27177

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>

eagle705 force-pushed the add-mistral-small3-24b branch from 270d068 to 4c27177 Compare October 11, 2025 00:29

copy-pr-bot bot temporarily deployed to nemo-ci October 13, 2025 16:11 Inactive

copy-pr-bot bot temporarily deployed to test October 13, 2025 16:11 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 13, 2025 18:47 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 13, 2025 18:58 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 13, 2025 18:58 Failure

copy-pr-bot bot temporarily deployed to nemo-ci October 13, 2025 22:13 Inactive

ananthsub approved these changes Oct 14, 2025

View reviewed changes

ananthsub merged commit 9d849fc into NVIDIA-NeMo:main Oct 14, 2025
44 of 46 checks passed

		@@ -0,0 +1,214 @@
		# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

Conversation

eagle705 commented Sep 25, 2025 • edited by ananthsub Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Training example

Export ckpt

Uh oh!

copy-pr-bot bot commented Sep 25, 2025

Uh oh!

ananthsub left a comment

Choose a reason for hiding this comment

Uh oh!

ananthsub Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

eagle705 Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

ananthsub Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

eagle705 commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ananthsub commented Sep 26, 2025

Uh oh!

eagle705 commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ananthsub commented Sep 27, 2025

Uh oh!

eagle705 commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ananthsub commented Sep 29, 2025

Uh oh!

eagle705 commented Sep 30, 2025

Uh oh!

ananthsub commented Sep 30, 2025

Uh oh!

ananthsub commented Oct 10, 2025

Uh oh!

eagle705 commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ananthsub commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eagle705 commented Sep 25, 2025 •

edited by ananthsub

Loading

eagle705 commented Sep 26, 2025 •

edited

Loading

eagle705 commented Sep 27, 2025 •

edited

Loading

eagle705 commented Sep 29, 2025 •

edited

Loading

eagle705 commented Oct 11, 2025 •

edited

Loading