Skip to content

Add support for Devstral 2 Small #2277

@Robsteranium

Description

@Robsteranium

Feature request description

Mistral released Devstral 2 last month. I'd like to run it with ramalama but I can't get it working with ramalama version 0.16.0.

Attempting to use a quantised version in GGUF format gives:

$ ramalama serve hf://unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
...
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'mistral3'
llama_model_load_from_file_impl: failed to load model

Suggest potential solution

It looks to me like we would need to update llama.cpp to at least release b7371 which appears to merge the PR adding support for Mistral.

Have you considered any alternatives?

I initially tried running Mistral's (unquantised/ non-gguf) release on hugging face but this fails:

$ ramalama serve hf://mistralai/Devstral-Small-2-24B-Instruct-2512

...
main: loading model
srv    load_model: loading model '/mnt/models'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX PRO 4500 Blackwell) (0000:e1:00.0) - 29042 MiB free
gguf_init_from_file_impl: failed to read magic
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/models
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/models', try reducing --n-gpu-layers if you're running out of VRAM
srv    load_model: failed to load model, '/mnt/models'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

I presume that this is expected since llama.cpp requires the GGUF format, but it wasn't apparent from the ramalama documentation. The section on transports gives the impression that any HF URI would work - and I was hoping ramalama might be able to use vLLM in this case.

Indeed Mistral recommends using vLLM. I tried selecting this runtime explicitly but I get a different error:

$ ramalama --runtime=vllm --debug serve hf://mistralai/Devstral-Small-2-24B-Instruct-2512
2026-01-06 14:12:35 - DEBUG - Checking if 8080 is available
2026-01-06 14:12:35 - DEBUG - run_cmd: nvidia-smi
2026-01-06 14:12:35 - DEBUG - Working directory: None
2026-01-06 14:12:35 - DEBUG - Ignore stderr: False
2026-01-06 14:12:35 - DEBUG - Ignore all: False
2026-01-06 14:12:35 - DEBUG - env: None
2026-01-06 14:12:35 - DEBUG - Command finished with return code: 0
2026-01-06 14:12:35 - DEBUG - run_cmd: podman inspect quay.io/ramalama/cuda:0.16
2026-01-06 14:12:35 - DEBUG - Working directory: None
2026-01-06 14:12:35 - DEBUG - Ignore stderr: False
2026-01-06 14:12:35 - DEBUG - Ignore all: True
2026-01-06 14:12:35 - DEBUG - env: None
2026-01-06 14:12:35 - DEBUG - run_cmd: nvidia-smi
2026-01-06 14:12:35 - DEBUG - Working directory: None
2026-01-06 14:12:35 - DEBUG - Ignore stderr: False
2026-01-06 14:12:35 - DEBUG - Ignore all: False
2026-01-06 14:12:35 - DEBUG - env: None
2026-01-06 14:12:35 - DEBUG - Command finished with return code: 0
2026-01-06 14:12:35 - DEBUG - run_cmd: podman inspect quay.io/ramalama/cuda:0.16
2026-01-06 14:12:35 - DEBUG - Working directory: None
2026-01-06 14:12:35 - DEBUG - Ignore stderr: False
2026-01-06 14:12:35 - DEBUG - Ignore all: True
2026-01-06 14:12:35 - DEBUG - env: None
2026-01-06 14:12:35 - DEBUG - exec_cmd: podman run --rm --label ai.ramalama.model=hf://mistralai/Devstral-Small-2-24B-Instruct-2512 --label ai.ramalama.engine=podman --label ai.ramalama.runtime=vllm --label ai.ramalama.port=8080 --label ai.ramalama.command=serve --runtime /usr/bin/nvidia-container-runtime --security-opt=label=disable --cap-drop=all --security-opt=no-new-privileges --pull newer --device /dev/dri --device /dev/kfd --device nvidia.com/gpu=all -e CUDA_VISIBLE_DEVICES=0 -p 8080:8080 --label ai.ramalama --name ramalama-IKWCeY3WQm --env=HOME=/tmp --init --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-88664799f03f24dde112fe0005bb4529abf2198d,destination=/mnt/models/config.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-ce24962642faa5680cd421e65d94a4d67c905433,destination=/mnt/models/VIBE_SYSTEM_PROMPT.txt,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-1392a509a427ef57d8ba43608925e55b424cf2aa,destination=/mnt/models/CHAT_SYSTEM_PROMPT.txt,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-01c8776b5b3496af72e92a53a3bf92e113f66f2c,destination=/mnt/models/chat_template.jinja,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-54437e69136d9f46c140dd9cec6162e1bb87bc44,destination=/mnt/models/consolidated.safetensors.index.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-319bf12a84cdcdc5445cc039d4f3d0ef20ab4f9a,destination=/mnt/models/generation_config.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-8efdf4d1c2425a2a7956bf43ae343f44a825a90a87e341ff02f708da2923a0b1,destination=/mnt/models/model-00006-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-8180612f9e5a296d012b5e11bec7d5cca4606ce0,destination=/mnt/models/model.safetensors.index.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-b96ca6fc9cf937078113af615ddc15c89ff0f4d3,destination=/mnt/models/params.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-a37d728b12fd27ac60a437894bd51de83449bf30,destination=/mnt/models/processor_config.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-380035da60c7cc474cb7358888a1c50c70679bb3fb7f70870c2400f93ac51d70,destination=/mnt/models/model-00001-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-286acad9b0e27fce778ac429763536accf618ccb6ed72963b6f94685e531c5c7,destination=/mnt/models/tokenizer.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-a7843c180f2b39d43303e7eba55d2e34fd600a8f,destination=/mnt/models/tokenizer_config.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-e29d19ea32eb7e26e6c0572d57cb7f9eca0f4420e0e0fe6ae1cf3be94da1c0d6,destination=/mnt/models/tekken.json,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-2caed6d3fb5af9c97b8c70e1424a9e517454e01451332834fba4fdb4e7a18280,destination=/mnt/models/model-00002-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-63c422f7a5c1460967068c0ceff65eb31f136f64872e281841313e8c669e7c50,destination=/mnt/models/model-00004-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-ec99fd6a7faf35b43e38e60f531e9ee5d67c4292773d71246038b9eb508e373a,destination=/mnt/models/model-00005-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-dfa96c3ccb824ac308eeeaa86fd1ce01aca4e3311e1aaa27a498ec3b7302e165,destination=/mnt/models/consolidated-00001-of-00002.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-e2bab376f49baa1da58c0a737f688cbfe185dc6a994fa2870d62b7c8b36e3360,destination=/mnt/models/model-00003-of-00006.safetensors,ro --mount=type=bind,src=/var/home/robin/.local/share/ramalama/store/huggingface/mistralai/Devstral-Small-2-24B-Instruct-2512/blobs/sha256-b783163b5ee6fb9595fde29d6072e81be8fcc24ea576d09ecc3dc7611ababb97,destination=/mnt/models/consolidated-00002-of-00002.safetensors,ro quay.io/ramalama/cuda:latest "/opt/venv/bin/python3 -m vllm.entrypoints.openai.api_server" --model /mnt/models --max_model_len 2048 --port 8080
ERROR (catatonit:51): failed to exec pid1: No such file or directory

I get this same error for the GGUF URI too.

Presumably this is the same issue as #1948.

I noticed a comment on #1204 suggesting we might need to specify a different image. I found this cuda-vllm/Containerfile which looked promising but if this is what I need it doesn't appear to have been published to quay.io yet.

I also tried running on the host:

$ ramalama --nocontainer serve hf://unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF

Since I installed ramalama with homebrew (as per the Aurora instructions), this picks up the homebrew formula for llama.cpp which is a recent enough build (7640) to successfully load and serve the model. Sadly this version only supports BLAS/CPU inference (giving a very slow 2 t/s) which defeats the point of using ramalama to provide CUDA support in the first place (I'm guessing this option is just for testing).

Additional context

Linux aurora 6.17.8-300.fc43.x86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions