Skip to content

model: support Ministral3#17644

Merged
ngxson merged 11 commits intoggml-org:masterfrom
ngxson:xsn/ministral3
Dec 1, 2025
Merged

model: support Ministral3#17644
ngxson merged 11 commits intoggml-org:masterfrom
ngxson:xsn/ministral3

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Dec 1, 2025

Ref upstream PR: huggingface/transformers#42498

Disclosure: This PR is made with collaboration from Mistral. Huge thanks to @juliendenize for coordination!

Note: The model weight is not yet released

PPl results: for the 14B model (-Instruct variant, f16, ctx=32000, batch=8192), ppl is Final estimate: PPL = 5.5389 +/- 0.03163

@ModelBase.register("Mistral3ForConditionalGeneration")
class Mistral3Model(LlamaModel):
model_arch = gguf.MODEL_ARCH.LLAMA
model_arch = gguf.MODEL_ARCH.MISTRAL3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for maintainers: while the ministral3 and the old mistral models have almost the same cgraph, the hparams handling in llama_model::load_hparams is quite more complicated. Therefore, it's better to separate the 2 archs to make it more readable.

This also make the code to be more future-proof, in case future mistral models become significantly more complicated than the traditional llama arch.

Comment on lines 2821 to 2827
# for compatibility, we use LLAMA arch for older models
# TODO: remove this once everyone has migrated to newer version of llama.cpp
if self.hparams.get("model_type") != "ministral3":
self.model_arch = gguf.MODEL_ARCH.LLAMA
self.gguf_writer.arch = str(self.model_arch)
self.gguf_writer.add_architecture()
self.tensor_map = gguf.get_tensor_name_map(self.model_arch, self.block_count)
Copy link
Collaborator Author

@ngxson ngxson Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a time frame of ~1 week to remove this could be a reasonable timeline to remove this.

This is for the case where users using new version of script (i.e. via gguf-my-repo) to convert old models, while their local llama.cpp version probably not yet up-to-date

@ngxson ngxson marked this pull request as ready for review December 1, 2025 10:05
@ngxson ngxson requested review from CISC and ggerganov as code owners December 1, 2025 10:05
ngxson and others added 2 commits December 1, 2025 11:44
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@github-actions github-actions bot added model Model specific python python script changes labels Dec 1, 2025
@ngxson ngxson merged commit cd3c118 into ggml-org:master Dec 1, 2025
67 of 69 checks passed
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* conversion script

* support ministral 3

* maybe this is better?

* add TODO for rope_yarn_log_mul

* better ppl (tested on 14B-Instruct)

* Add Ministral3 support to Mistral format

* improve arch handling

* add sizes

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* nits

---------

Co-authored-by: Julien Denize <julien.denize@mistral.ai>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@Kabyik-Kayal
Copy link

Hi @ngxson and team,

Thank you for the excellent work on adding Ministral 3 support to llama.cpp!

I wanted to follow up regarding the Ministral 3 8B Reasoning model variant. While the current implementation might work well for the base and instruct variants, I'm encountering issues when trying to load the quantized GGUF versions of the 8B Reasoning model (e.g., Ministral-3-8B-Reasoning-2512-Q8_0.gguf).

The error I'm seeing is:

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mistral3'

This occurs even with the latest llama-cpp-python built from the current master branch (after commit cd3c118).

Is support for the Ministral 3 Reasoning variant's multimodal architecture planned for a future PR?

Are there any workarounds or alternative approaches you'd recommend for running the Reasoning model with llama.cpp in the meantime?

Thank you again for your work on this project!

@CISC
Copy link
Collaborator

CISC commented Jan 28, 2026

@Kabyik-Kayal llama-cpp-python has not been updated in months and does not support Ministral 3.

@Kabyik-Kayal
Copy link

@CISC Is there any way I can contribute towards it??

@CISC
Copy link
Collaborator

CISC commented Jan 28, 2026

@CISC Is there any way I can contribute towards it??

No, it seems to be dead, PRs have been ignored for ages, you can check out this fork though, it seems to be quite alive:
https://github.com/JamePeng/llama-cpp-python

blime4 pushed a commit to blime4/llama.cpp that referenced this pull request Feb 5, 2026
* conversion script

* support ministral 3

* maybe this is better?

* add TODO for rope_yarn_log_mul

* better ppl (tested on 14B-Instruct)

* Add Ministral3 support to Mistral format

* improve arch handling

* add sizes

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* nits

---------

Co-authored-by: Julien Denize <julien.denize@mistral.ai>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants