Conversation
| @ModelBase.register("Mistral3ForConditionalGeneration") | ||
| class Mistral3Model(LlamaModel): | ||
| model_arch = gguf.MODEL_ARCH.LLAMA | ||
| model_arch = gguf.MODEL_ARCH.MISTRAL3 |
There was a problem hiding this comment.
Note for maintainers: while the ministral3 and the old mistral models have almost the same cgraph, the hparams handling in llama_model::load_hparams is quite more complicated. Therefore, it's better to separate the 2 archs to make it more readable.
This also make the code to be more future-proof, in case future mistral models become significantly more complicated than the traditional llama arch.
| # for compatibility, we use LLAMA arch for older models | ||
| # TODO: remove this once everyone has migrated to newer version of llama.cpp | ||
| if self.hparams.get("model_type") != "ministral3": | ||
| self.model_arch = gguf.MODEL_ARCH.LLAMA | ||
| self.gguf_writer.arch = str(self.model_arch) | ||
| self.gguf_writer.add_architecture() | ||
| self.tensor_map = gguf.get_tensor_name_map(self.model_arch, self.block_count) |
There was a problem hiding this comment.
I think a time frame of ~1 week to remove this could be a reasonable timeline to remove this.
This is for the case where users using new version of script (i.e. via gguf-my-repo) to convert old models, while their local llama.cpp version probably not yet up-to-date
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* conversion script * support ministral 3 * maybe this is better? * add TODO for rope_yarn_log_mul * better ppl (tested on 14B-Instruct) * Add Ministral3 support to Mistral format * improve arch handling * add sizes * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits --------- Co-authored-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
Hi @ngxson and team, Thank you for the excellent work on adding Ministral 3 support to llama.cpp! I wanted to follow up regarding the Ministral 3 8B Reasoning model variant. While the current implementation might work well for the base and instruct variants, I'm encountering issues when trying to load the quantized GGUF versions of the 8B Reasoning model (e.g., Ministral-3-8B-Reasoning-2512-Q8_0.gguf). The error I'm seeing is: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mistral3' This occurs even with the latest llama-cpp-python built from the current master branch (after commit cd3c118). Is support for the Ministral 3 Reasoning variant's multimodal architecture planned for a future PR? Are there any workarounds or alternative approaches you'd recommend for running the Reasoning model with llama.cpp in the meantime? Thank you again for your work on this project! |
|
@Kabyik-Kayal |
|
@CISC Is there any way I can contribute towards it?? |
No, it seems to be dead, PRs have been ignored for ages, you can check out this fork though, it seems to be quite alive: |
* conversion script * support ministral 3 * maybe this is better? * add TODO for rope_yarn_log_mul * better ppl (tested on 14B-Instruct) * Add Ministral3 support to Mistral format * improve arch handling * add sizes * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits --------- Co-authored-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Ref upstream PR: huggingface/transformers#42498
Disclosure: This PR is made with collaboration from Mistral. Huge thanks to @juliendenize for coordination!
Note: The model weight is not yet released
PPl results: for the 14B model (
-Instructvariant, f16, ctx=32000, batch=8192), ppl isFinal estimate: PPL = 5.5389 +/- 0.03163