feat: add Ollama as runtime backend for Metal agent by Defilan · Pull Request #258 · defilantech/LLMKube

Defilan · 2026-04-01T15:52:17Z

Summary

Adds Ollama as a third runtime option for the Metal agent (--runtime ollama) alongside the existing llama-server and omlx backends.

Ollama is the most widely adopted local LLM runtime (200K+ stars) and recently switched to MLX for Apple Silicon inference. Most Mac users already have it installed, making this the lowest-friction path to fast local inference with LLMKube.

Key advantages over the other backends:

No model format changes needed (Ollama uses GGUF internally)
No binary path management (Ollama manages itself)
Model downloads handled by Ollama (/api/pull)
Most users already have Ollama installed

How it works:

OllamaExecutor manages models through Ollama's REST API
Model pull: POST /api/pull with 5-min timeout for large downloads
Pre-load: POST /api/generate with empty prompt (blocks until loaded)
Readiness: GET /api/ps verifies model is in memory
Unload: POST /api/generate with keep_alive: 0
Includes model name mapping for all 16 catalog models (e.g., llama-3.2-3b -> llama3.2:3b)

Usage:

# Ensure Ollama is running
ollama serve

# Start Metal agent with Ollama runtime
llmkube-metal-agent --runtime ollama

# Deploy as usual
llmkube deploy llama-3.2-3b --gpu --accelerator metal

No CRD changes. Default runtime remains llama-server. Fully backward compatible.

Test plan

make test passes
Start agent with --runtime ollama, deploy model, verify inference
Verify model pull works for a model not yet downloaded
Delete InferenceService, verify model unloads (/api/ps empty)
Default --runtime llama-server unchanged
--runtime omlx still works

Runtime comparison

	llama-server	oMLX	Ollama
Model format	GGUF	MLX	GGUF
Model download	Manual/init container	Manual	Automatic (`/api/pull`)
Install base	llama.cpp users	Small	200K+ users
CRD changes	None	MLX format added	None
Process model	One per model	One daemon	One daemon

Builds on #257 (ProcessExecutor interface). Closes #248 (MLX backend goal achieved through both oMLX and Ollama paths).

Add Ollama as a third runtime option for the Metal agent alongside llama-server and oMLX. Ollama is the most widely adopted local LLM runtime (200K+ GitHub stars) and recently switched to MLX backend for Apple Silicon, providing significant speedups. The OllamaExecutor manages models through Ollama's REST API: - Model pull via POST /api/pull (handles downloads internally) - Pre-load via POST /api/generate with empty prompt - Readiness check via GET /api/ps - Unload via POST /api/generate with keep_alive: 0 - Health via GET / ("Ollama is running") Includes model name mapping from LLMKube catalog names to Ollama tags (e.g., llama-3.2-3b -> llama3.2:3b) for all 16 catalog models. No CRD changes needed since Ollama uses GGUF format natively. Usage: llmkube-metal-agent --runtime ollama llmkube deploy llama-3.2-3b --gpu --accelerator metal Signed-off-by: Christopher Maher <chris@mahercode.io>

Signed-off-by: Christopher Maher <chris@mahercode.io>

Documents the --runtime ollama flag, prerequisites, usage, model name mapping, and differences from llama-server and oMLX backends. Signed-off-by: Christopher Maher <chris@mahercode.io>

Defilan added 3 commits April 1, 2026 08:51

fix: remove unused mutex field and sync import from OllamaExecutor

157bac8

Signed-off-by: Christopher Maher <chris@mahercode.io>

docs: add Ollama runtime section to Metal agent deployment guide

821c79a

Documents the --runtime ollama flag, prerequisites, usage, model name mapping, and differences from llama-server and oMLX backends. Signed-off-by: Christopher Maher <chris@mahercode.io>

Defilan merged commit 6148b89 into main Apr 1, 2026
16 checks passed

Defilan deleted the feat/ollama-backend branch April 1, 2026 16:45

github-actions bot mentioned this pull request Apr 1, 2026

chore: release 0.5.3 #255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Ollama as runtime backend for Metal agent#258

feat: add Ollama as runtime backend for Metal agent#258
Defilan merged 3 commits intomainfrom
feat/ollama-backend

Defilan commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Runtime comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Defilan commented Apr 1, 2026 •

edited

Loading