model: support Ministral3 by ngxson · Pull Request #17644 · ggml-org/llama.cpp

ngxson · 2025-12-01T09:42:15Z

Ref upstream PR: huggingface/transformers#42498

Disclosure: This PR is made with collaboration from Mistral. Huge thanks to @juliendenize for coordination!

Note: The model weight is not yet released

PPl results: for the 14B model (-Instruct variant, f16, ctx=32000, batch=8192), ppl is Final estimate: PPL = 5.5389 +/- 0.03163

…ral3

ngxson · 2025-12-01T09:46:16Z

convert_hf_to_gguf.py

 @ModelBase.register("Mistral3ForConditionalGeneration")
 class Mistral3Model(LlamaModel):
-    model_arch = gguf.MODEL_ARCH.LLAMA
+    model_arch = gguf.MODEL_ARCH.MISTRAL3


Note for maintainers: while the ministral3 and the old mistral models have almost the same cgraph, the hparams handling in llama_model::load_hparams is quite more complicated. Therefore, it's better to separate the 2 archs to make it more readable.

This also make the code to be more future-proof, in case future mistral models become significantly more complicated than the traditional llama arch.

ngxson · 2025-12-01T10:04:53Z

convert_hf_to_gguf.py

+        # for compatibility, we use LLAMA arch for older models
+        # TODO: remove this once everyone has migrated to newer version of llama.cpp
+        if self.hparams.get("model_type") != "ministral3":
+            self.model_arch = gguf.MODEL_ARCH.LLAMA
+            self.gguf_writer.arch = str(self.model_arch)
+            self.gguf_writer.add_architecture()
+            self.tensor_map = gguf.get_tensor_name_map(self.model_arch, self.block_count)


I think a time frame of ~1 week to remove this could be a reasonable timeline to remove this.

This is for the case where users using new version of script (i.e. via gguf-my-repo) to convert old models, while their local llama.cpp version probably not yet up-to-date

convert_hf_to_gguf.py

src/models/models.h

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* conversion script * support ministral 3 * maybe this is better? * add TODO for rope_yarn_log_mul * better ppl (tested on 14B-Instruct) * Add Ministral3 support to Mistral format * improve arch handling * add sizes * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits --------- Co-authored-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Kabyik-Kayal · 2026-01-28T11:58:33Z

Hi @ngxson and team,

Thank you for the excellent work on adding Ministral 3 support to llama.cpp!

I wanted to follow up regarding the Ministral 3 8B Reasoning model variant. While the current implementation might work well for the base and instruct variants, I'm encountering issues when trying to load the quantized GGUF versions of the 8B Reasoning model (e.g., Ministral-3-8B-Reasoning-2512-Q8_0.gguf).

The error I'm seeing is:

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mistral3'

This occurs even with the latest llama-cpp-python built from the current master branch (after commit cd3c118).

Is support for the Ministral 3 Reasoning variant's multimodal architecture planned for a future PR?

Are there any workarounds or alternative approaches you'd recommend for running the Reasoning model with llama.cpp in the meantime?

Thank you again for your work on this project!

CISC · 2026-01-28T12:10:29Z

@Kabyik-Kayal llama-cpp-python has not been updated in months and does not support Ministral 3.

Kabyik-Kayal · 2026-01-28T15:50:21Z

@CISC Is there any way I can contribute towards it??

CISC · 2026-01-28T21:03:44Z

@CISC Is there any way I can contribute towards it??

No, it seems to be dead, PRs have been ignored for ages, you can check out this fork though, it seems to be quite alive:
https://github.com/JamePeng/llama-cpp-python

* conversion script * support ministral 3 * maybe this is better? * add TODO for rope_yarn_log_mul * better ppl (tested on 14B-Instruct) * Add Ministral3 support to Mistral format * improve arch handling * add sizes * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits --------- Co-authored-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ngxson and others added 7 commits November 25, 2025 14:23

conversion script

3e41c14

support ministral 3

2b2f411

maybe this is better?

4cebf7b

add TODO for rope_yarn_log_mul

84be00f

better ppl (tested on 14B-Instruct)

786b3f8

Merge remote-tracking branch 'mistral/xsn/ministral3' into xsn/minist…

55a196f

…ral3

Add Ministral3 support to Mistral format

a4f540b

ngxson commented Dec 1, 2025

View reviewed changes

ngxson added 2 commits December 1, 2025 11:00

improve arch handling

bf08fcc

add sizes

34234a5

ngxson commented Dec 1, 2025

View reviewed changes

ngxson marked this pull request as ready for review December 1, 2025 10:05

ngxson requested review from CISC and ggerganov as code owners December 1, 2025 10:05

ggerganov approved these changes Dec 1, 2025

View reviewed changes

loci-dev mentioned this pull request Dec 1, 2025

UPSTREAM PR #17644: model: support Ministral3 auroralabs-loci/llama.cpp#387

Open

CISC approved these changes Dec 1, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

src/models/models.h Outdated Show resolved Hide resolved

ngxson and others added 2 commits December 1, 2025 11:44

Apply suggestions from code review

b185b7f

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

nits

5600361

github-actions bot added model Model specific python python script changes labels Dec 1, 2025

ngxson merged commit cd3c118 into ggml-org:master Dec 1, 2025
67 of 69 checks passed

giladgd mentioned this pull request Dec 1, 2025

model: fix llama arch implementation #17665

Merged

gabe-l-hart mentioned this pull request Dec 10, 2025

feat: llama.cpp bump (17f7f4) for SSM performance improvements ollama/ollama#13408

Merged

ggerganov mentioned this pull request Dec 11, 2025

models : fix the attn_factor for mistral3 graphs + improve consistency #17945

Merged

rillomas mentioned this pull request Jan 22, 2026

Misc. bug: mistralai_Devstral-Small-2-24B-Instruct-2512-IQ4_XS.gguf crashes Intel GPU for Vulkan #18984

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model: support Ministral3#17644

model: support Ministral3#17644
ngxson merged 11 commits intoggml-org:masterfrom
ngxson:xsn/ministral3

ngxson commented Dec 1, 2025 •

edited

Loading

Uh oh!

ngxson Dec 1, 2025

Uh oh!

ngxson Dec 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kabyik-Kayal commented Jan 28, 2026

Uh oh!

CISC commented Jan 28, 2026

Uh oh!

Kabyik-Kayal commented Jan 28, 2026

Uh oh!

CISC commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ngxson commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kabyik-Kayal commented Jan 28, 2026

Uh oh!

CISC commented Jan 28, 2026

Uh oh!

Kabyik-Kayal commented Jan 28, 2026

Uh oh!

CISC commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ngxson commented Dec 1, 2025 •

edited

Loading

ngxson Dec 1, 2025 •

edited

Loading