Add support for Devstral 2 Small

### Feature request description

[Mistral released Devstral 2](https://mistral.ai/news/devstral-2-vibe-cli) last month. I'd like to run it with ramalama but I can't get it working with ramalama version 0.16.0.

Attempting to use a quantised version in GGUF format gives:

```
$ ramalama serve hf://unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
...
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mistral3'
llama_model_load_from_file_impl: failed to load model
```


### Suggest potential solution

It looks to me like we would need to update `llama.cpp` to at least release [b7371](https://github.com/ggml-org/llama.cpp/releases/tag/b7371) which appears to merge [the PR adding support for Mistral](https://github.com/ggml-org/llama.cpp/pull/17945).

### Have you considered any alternatives?

I initially tried running Mistral's (unquantised/ non-gguf) release on hugging face but this fails:

```
$ ramalama serve hf://mistralai/Devstral-Small-2-24B-Instruct-2512

...
main: loading model
srv    load_model: loading model '/mnt/models'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX PRO 4500 Blackwell) (0000:e1:00.0) - 29042 MiB free
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/models
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/models', try reducing --n-gpu-layers if you're running out of VRAM
srv    load_model: failed to load model, '/mnt/models'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
```

I presume that this is expected since `llama.cpp` requires the GGUF format, but it wasn't apparent from the ramalama documentation. The section on transports gives the impression that any HF URI would work - and I was hoping ramalama might be able to use vLLM in this case.

Indeed Mistral recommends using vLLM. I tried selecting this runtime explicitly but I get a different error:

```
$ ramalama --runtime=vllm --debug serve hf://mistralai/Devstral-Small-2-24B-Instruct-2512
2026-01-06 14:12:35 - DEBUG - Checking if 8080 is available
2026-01-06 14:12:35 - DEBUG - run_cmd: nvidia-smi
2026-01-06 14:12:35 - DEBUG - Working directory: None
2026-01-06 14:12:35 - DEBUG - Ignore stderr: False
2026-01-06 14:12:35 - DEBUG - Ignore all: False
2026-01-06 14:12:35 - DEBUG - env: None
2026-01-06 14:12:35 - DEBUG - Command finished with return code: 0
2026-01-06 14:12:35 - DEBUG - run_cmd: podman inspect quay.io/ramalama/cuda:0.16
2026-01-06 14:12:35 - DEBUG - Working directory: None
2026-01-06 14:12:35 - DEBUG - Ignore stderr: False
2026-01-06 14:12:35 - DEBUG - Ignore all: True
2026-01-06 14:12:35 - DEBUG - env: None
2026-01-06 14:12:35 - DEBUG - run_cmd: nvidia-smi
2026-01-06 14:12:35 - DEBUG - Working directory: None
2026-01-06 14:12:35 - DEBUG - Ignore stderr: False
2026-01-06 14:12:35 - DEBUG - Ignore all: False
2026-01-06 14:12:35 - DEBUG - env: None
2026-01-06 14:12:35 - DEBUG - Command finished with return code: 0
2026-01-06 14:12:35 - DEBUG - run_cmd: podman inspect quay.io/ramalama/cuda:0.16
2026-01-06 14:12:35 - DEBUG - Working directory: None
2026-01-06 14:12:35 - DEBUG - Ignore stderr: False
2026-01-06 14:12:35 - DEBUG - Ignore all: True
2026-01-06 14:12:35 - DEBUG - env: None
2026-01-06 14:12:35 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://mistralai/Devstral-Small-2-24B-Instruct-2512 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --device /dev/dri --device /dev/kfd --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 -p 8080:8080 --label ai.ramalama --name ramalama-IKWCeY3WQm --env=HOME=/tmp --init --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-88664799f03f24dde112fe0005bb4529abf2198d,destination=/mnt/models/config.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-ce24962642faa5680cd421e65d94a4d67c905433,destination=/mnt/models/VIBE_SYSTEM_PROMPT.txt,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-1392a509a427ef57d8ba43608925e55b424cf2aa,destination=/mnt/models/CHAT_SYSTEM_PROMPT.txt,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-01c8776b5b3496af72e92a53a3bf92e113f66f2c,destination=/mnt/models/chat_template.jinja,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-54437e69136d9f46c140dd9cec6162e1bb87bc44,destination=/mnt/models/consolidated.safetensors.index.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-319bf12a84cdcdc5445cc039d4f3d0ef20ab4f9a,destination=/mnt/models/generation_config.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-8efdf4d1c2425a2a7956bf43ae343f44a825a90a87e341ff02f708da2923a0b1,destination=/mnt/models/model-00006-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-8180612f9e5a296d012b5e11bec7d5cca4606ce0,destination=/mnt/models/model.safetensors.index.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-b96ca6fc9cf937078113af615ddc15c89ff0f4d3,destination=/mnt/models/params.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-a37d728b12fd27ac60a437894bd51de83449bf30,destination=/mnt/models/processor_config.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-380035da60c7cc474cb7358888a1c50c70679bb3fb7f70870c2400f93ac51d70,destination=/mnt/models/model-00001-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-286acad9b0e27fce778ac429763536accf618ccb6ed72963b6f94685e531c5c7,destination=/mnt/models/tokenizer.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-a7843c180f2b39d43303e7eba55d2e34fd600a8f,destination=/mnt/models/tokenizer_config.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-e29d19ea32eb7e26e6c0572d57cb7f9eca0f4420e0e0fe6ae1cf3be94da1c0d6,destination=/mnt/models/tekken.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-2caed6d3fb5af9c97b8c70e1424a9e517454e01451332834fba4fdb4e7a18280,destination=/mnt/models/model-00002-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-63c422f7a5c1460967068c0ceff65eb31f136f64872e281841313e8c669e7c50,destination=/mnt/models/model-00004-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-ec99fd6a7faf35b43e38e60f531e9ee5d67c4292773d71246038b9eb508e373a,destination=/mnt/models/model-00005-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-dfa96c3ccb824ac308eeeaa86fd1ce01aca4e3311e1aaa27a498ec3b7302e165,destination=/mnt/models/consolidated-00001-of-00002.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-e2bab376f49baa1da58c0a737f688cbfe185dc6a994fa2870d62b7c8b36e3360,destination=/mnt/models/model-00003-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-b783163b5ee6fb9595fde29d6072e81be8fcc24ea576d09ecc3dc7611ababb97,destination=/mnt/models/consolidated-00002-of-00002.safetensors,ro quay.io/ramalama/cuda:latest "/opt/venv/bin/python3 -m vllm.entrypoints.openai.api_server" --model /mnt/models --max_model_len 2048 --port 8080
ERROR (catatonit:51): failed to exec pid1: No such file or directory
```

I get this same error for the GGUF URI too.

Presumably this is the same issue as #1948.

I noticed [a comment on #1204](https://github.com/containers/ramalama/issues/1204#issuecomment-3126472306) suggesting we might need to specify a different image.  I found this [cuda-vllm/Containerfile](https://github.com/containers/ramalama/blob/main/container-images/cuda-vllm/Containerfile) which looked promising but if this is what I need it doesn't appear to have been published to quay.io yet.

I also tried running on the host:

```
$ ramalama --nocontainer serve hf://unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
```

Since I installed ramalama with homebrew (as per [the Aurora instructions](https://docs.getaurora.dev/guides/local-ai/)), this picks up the [homebrew formula for llama.cpp](https://formulae.brew.sh/formula/llama.cpp) which is a recent enough build (`7640`) to successfully load and serve the model. Sadly this version only supports BLAS/CPU inference (giving a very slow 2 t/s) which defeats the point of using ramalama to provide CUDA support in the first place (I'm guessing this option is just for testing).

### Additional context

Linux aurora 6.17.8-300.fc43.x86_64


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Devstral 2 Small #2277

Feature request description

Suggest potential solution

Have you considered any alternatives?

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for Devstral 2 Small #2277

Description

Feature request description

Suggest potential solution

Have you considered any alternatives?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions