feat: expose llama.cpp parallel slots in InferenceService CRD by Defilan · Pull Request #133 · defilantech/LLMKube

Defilan · 2026-02-07T18:59:07Z

Summary

Add parallelSlots field to InferenceServiceSpec (int32, min 1, max 64)
Map to --parallel flag on llama.cpp server args (skipped when nil or 1)
Add --parallel CLI flag to llmkube deploy

Test plan

make test passes (3 new controller test cases)
make build-cli succeeds
./bin/llmkube deploy --help shows --parallel flag
Deploy with --parallel 4 and verify container args

Closes #132

Without parallel slots, llama.cpp processes requests sequentially — every concurrent request queues behind the current one. This adds a parallelSlots field to InferenceServiceSpec that maps to the --parallel flag, plus a --parallel CLI flag on `llmkube deploy`. Closes #132 Signed-off-by: Christopher Maher <chris@mahercode.io>

Defilan force-pushed the feat/parallel-slots branch from 1e1754f to 1566151 Compare February 7, 2026 19:04

Defilan merged commit cae7b52 into main Feb 7, 2026
14 checks passed

Defilan deleted the feat/parallel-slots branch February 7, 2026 19:11

github-actions bot mentioned this pull request Feb 7, 2026

chore: release 0.4.13 #129

Merged

github-actions bot mentioned this pull request Mar 4, 2026

chore: release 0.4.22 #207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose llama.cpp parallel slots in InferenceService CRD#133

feat: expose llama.cpp parallel slots in InferenceService CRD#133
Defilan merged 1 commit intomainfrom
feat/parallel-slots

Defilan commented Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented Feb 7, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant