Skip to content

feat: Add mistral small3 support in bridge#764

Merged
ananthsub merged 2 commits intoNVIDIA-NeMo:mainfrom
eagle705:add-mistral-small3-24b
Oct 14, 2025
Merged

feat: Add mistral small3 support in bridge#764
ananthsub merged 2 commits intoNVIDIA-NeMo:mainfrom
eagle705:add-mistral-small3-24b

Conversation

@eagle705
Copy link
Contributor

@eagle705 eagle705 commented Sep 25, 2025

Part of #493

This PR adds support for the Mistral and Mistral Small3 models to MegatronBridge
(Note: The pretraining recipe could be further optimized)

ex) HF model: mistralai/Mistral-Small-24B-Instruct-2501

Training example

from megatron.bridge import AutoBridge

import megatron.bridge.recipes.mistral.mistral_small3_24b as mistral_small3_24b
from megatron.bridge.training.gpt_step import forward_step
from megatron.bridge.training.pretrain import pretrain

if __name__ == "__main__":
    # Load Mistral Small3 24B from Hugging Face Hub and convert to Megatron
    bridge = AutoBridge.from_hf_pretrained("/work/checkpoints/hf/Mistral-Small-24B-Instruct-2501")
    
    model_provider = bridge.to_megatron_provider()
    model_provider.finalize()

    # Get defaults for other configuration from an existing mistral small3 24b recipe
    cfg = mistral_small3_24b.pretrain_config()
    cfg.train.train_iters = 5
    cfg.train.eval_iters = 2
    cfg.model.seq_length = 8192
    cfg.dataset.sequence_length = cfg.model.seq_length
    cfg.tokenizer.vocab_size = cfg.model.vocab_size
    cfg.logger.log_interval = 1
    pretrain(cfg, forward_step)

Export ckpt

from megatron.bridge import AutoBridge
from transformers import AutoConfig

if __name__ == "__main__":
    # 1) Create a bridge from a Hugging Face model (hub or local path)
    bridge = AutoBridge.from_hf_pretrained("/work/checkpoints/hf/Mistral-Small-24B-Instruct-2501")
    print(f"bridge:{bridge}")

    # 2) Get a Megatron provider and configure parallelism before instantiation
    provider = bridge.to_megatron_provider()
    provider.tensor_model_parallel_size = 1
    provider.pipeline_model_parallel_size = 1
    provider.finalize()
    
    # 3) Materialize Megatron Core model(s)
    model = provider.provide_distributed_model(wrap_with_ddp=False)

    # 4a) Export Megatron → Hugging Face (full HF folder with config/tokenizer/weights)
    bridge.save_hf_pretrained(model, "./hf_exports/mistral-small-24b-instruct-2501")

    # 4b) Or stream only weights (Megatron → HF)
    for name, weight in bridge.export_hf_weights(model, cpu=True):
        print(name, tuple(weight.shape))
...
model.layers.39.mlp.up_proj.weight (32768, 5120)
model.layers.39.mlp.down_proj.weight (5120, 32768)
model.layers.39.self_attn.o_proj.weight (5120, 4096)
model.layers.39.input_layernorm.weight (5120,)
model.layers.39.self_attn.q_proj.weight (4096, 5120)
model.layers.39.self_attn.k_proj.weight (1024, 5120)
model.layers.39.self_attn.v_proj.weight (1024, 5120)
Converting to HuggingFace ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 (243/243) MistralBridge
[rank0]:[W925 10:36:36.424335029 ProcessGroupNCCL.cpp:1505] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 25, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

@ananthsub ananthsub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution! for tests, could you add functional tests to ensure that the HF <-> Megatron mapping works as expected? You can use these tests as a reference:

https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/tests/functional_tests/models/test_qwen2_provider.py

https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/tests/functional_tests/models/test_qwen2_conversion.py

@@ -0,0 +1,214 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our recipe structure is being streamlined based on #607

I think it'd make sense to directly use that structure here, as the current configs are very verbose.

you can still follow the unit tests for these recipes to sanity check the default configurations. for example: https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/tests/unit_tests/recipes/qwen

@yaoyu-33

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ananthsub
Could you share an example of a new recipe structure?
Is this llama2 recipe file the one you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eagle705 yes that's the one. we can remove the function to get the model provider with the parallelism overrides, and go directly into a pretrain config. you can see the latest design in #607 . if you'd like to unblock this sooner, we can split up the changes to have the model provider + bridge merge first (so users can begin converting between HF<-> Megatron, and later add the pretraining recipe

@eagle705
Copy link
Contributor Author

eagle705 commented Sep 26, 2025

thanks for the contribution! for tests, could you add functional tests to ensure that the HF <-> Megatron mapping works as expected? You can use these tests as a reference:

https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/tests/functional_tests/models/test_qwen2_provider.py

https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/tests/functional_tests/models/test_qwen2_conversion.py

@ananthsub
I added some functions and unit tests. If you have time, please take a look at them.

@ananthsub
Copy link
Contributor

/ok to test d947b89

@eagle705
Copy link
Contributor Author

eagle705 commented Sep 27, 2025

@ananthsub I updated lint fixes through pre-commit

@ananthsub
Copy link
Contributor

/ok to test 9742299

@eagle705
Copy link
Contributor Author

eagle705 commented Sep 29, 2025

@ananthsub Fix errors from some test cases (vocab_size, kv_channels, ...)

@ananthsub
Copy link
Contributor

/ok to test c7908a7

@eagle705
Copy link
Contributor Author

/ok to test c7908a7

@ananthsub
I modified a few more cases and tested them with the following commands

pytest tests/unit_tests/models/mistral/test_mistral_model_provider.py

(53 durations < 0.005s hidden.  Use -vv to show these durations.)
===================================================== 18 passed, 4 warnings in 2.79s =====================================================


pytest tests/unit_tests/models/mistral/test_mistral_model_bridge.py

(59 durations < 0.005s hidden.  Use -vv to show these durations.)
===================================================== 20 passed, 4 warnings in 2.46s =====================================================


pytest tests/functional_tests/models/test_mistral_conversion.py
================================================ 3 passed, 4 warnings in 78.87s (0:01:18) ================================================


pytest tests/functional_tests/models/test_mistral_provider.py
===================================================== 2 passed, 4 warnings in 13.23s =====================================================

@eagle705 eagle705 force-pushed the add-mistral-small3-24b branch from e2e6530 to d51f618 Compare September 30, 2025 04:14
@ananthsub
Copy link
Contributor

/ok to test a0b4f1b

@eagle705 eagle705 force-pushed the add-mistral-small3-24b branch from 3cf8050 to 270d068 Compare October 10, 2025 17:29
@ananthsub
Copy link
Contributor

/ok to test 270d068

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>
Signed-off-by: Joosung Yoon <joosungy@nvidia.com>
@eagle705 eagle705 force-pushed the add-mistral-small3-24b branch from 270d068 to 4c27177 Compare October 11, 2025 00:29
@eagle705
Copy link
Contributor Author

eagle705 commented Oct 11, 2025

@ananthsub
I’ve removed the pretrain recipe from PR #764 to focus on the model provider + bridge.
I’ll add the pretrain recipe later in a separate PR

@ananthsub
Copy link
Contributor

/ok to test 4c27177

@ananthsub ananthsub merged commit 9d849fc into NVIDIA-NeMo:main Oct 14, 2025
44 of 46 checks passed
paul-gibbons pushed a commit to paul-gibbons/Megatron-Bridge that referenced this pull request Oct 29, 2025
* add mistral small3

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>

* add func and unit tests with pre-commit lint fixes to existing commits

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>

---------

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>
Signed-off-by: Paul Gibbons <pgibbons@nvidia.com>
nv-mollys pushed a commit that referenced this pull request Oct 31, 2025
* add mistral small3

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>

* add func and unit tests with pre-commit lint fixes to existing commits

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>

---------

Signed-off-by: Joosung Yoon <joosungy@nvidia.com>
Signed-off-by: mollys <mollys@mollys.nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants