feat: Add mistral small3 support in bridge#764
Conversation
ananthsub
left a comment
There was a problem hiding this comment.
thanks for the contribution! for tests, could you add functional tests to ensure that the HF <-> Megatron mapping works as expected? You can use these tests as a reference:
| @@ -0,0 +1,214 @@ | |||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
our recipe structure is being streamlined based on #607
I think it'd make sense to directly use that structure here, as the current configs are very verbose.
you can still follow the unit tests for these recipes to sanity check the default configurations. for example: https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/tests/unit_tests/recipes/qwen
There was a problem hiding this comment.
@ananthsub
Could you share an example of a new recipe structure?
Is this llama2 recipe file the one you meant?
There was a problem hiding this comment.
@eagle705 yes that's the one. we can remove the function to get the model provider with the parallelism overrides, and go directly into a pretrain config. you can see the latest design in #607 . if you'd like to unblock this sooner, we can split up the changes to have the model provider + bridge merge first (so users can begin converting between HF<-> Megatron, and later add the pretraining recipe
@ananthsub |
|
/ok to test d947b89 |
|
@ananthsub I updated lint fixes through pre-commit |
|
/ok to test 9742299 |
9742299 to
c7908a7
Compare
|
@ananthsub Fix errors from some test cases (vocab_size, kv_channels, ...) |
|
/ok to test c7908a7 |
c7908a7 to
e2e6530
Compare
@ananthsub |
e2e6530 to
d51f618
Compare
|
/ok to test a0b4f1b |
3cf8050 to
270d068
Compare
|
/ok to test 270d068 |
Signed-off-by: Joosung Yoon <joosungy@nvidia.com>
Signed-off-by: Joosung Yoon <joosungy@nvidia.com>
270d068 to
4c27177
Compare
|
@ananthsub |
|
/ok to test 4c27177 |
* add mistral small3 Signed-off-by: Joosung Yoon <joosungy@nvidia.com> * add func and unit tests with pre-commit lint fixes to existing commits Signed-off-by: Joosung Yoon <joosungy@nvidia.com> --------- Signed-off-by: Joosung Yoon <joosungy@nvidia.com> Signed-off-by: Paul Gibbons <pgibbons@nvidia.com>
* add mistral small3 Signed-off-by: Joosung Yoon <joosungy@nvidia.com> * add func and unit tests with pre-commit lint fixes to existing commits Signed-off-by: Joosung Yoon <joosungy@nvidia.com> --------- Signed-off-by: Joosung Yoon <joosungy@nvidia.com> Signed-off-by: mollys <mollys@mollys.nvidia.com>
Part of #493
This PR adds support for the Mistral and Mistral Small3 models to MegatronBridge
(Note: The pretraining recipe could be further optimized)
ex) HF model: mistralai/Mistral-Small-24B-Instruct-2501
Training example
Export ckpt