Skip to content

feat: expose llama.cpp parallel slots in InferenceService CRD#133

Merged
Defilan merged 1 commit intomainfrom
feat/parallel-slots
Feb 7, 2026
Merged

feat: expose llama.cpp parallel slots in InferenceService CRD#133
Defilan merged 1 commit intomainfrom
feat/parallel-slots

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented Feb 7, 2026

Summary

  • Add parallelSlots field to InferenceServiceSpec (int32, min 1, max 64)
  • Map to --parallel flag on llama.cpp server args (skipped when nil or 1)
  • Add --parallel CLI flag to llmkube deploy

Test plan

  • make test passes (3 new controller test cases)
  • make build-cli succeeds
  • ./bin/llmkube deploy --help shows --parallel flag
  • Deploy with --parallel 4 and verify container args

Closes #132

Without parallel slots, llama.cpp processes requests sequentially —
every concurrent request queues behind the current one. This adds a
parallelSlots field to InferenceServiceSpec that maps to the --parallel
flag, plus a --parallel CLI flag on `llmkube deploy`.

Closes #132

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan force-pushed the feat/parallel-slots branch from 1e1754f to 1566151 Compare February 7, 2026 19:04
@Defilan Defilan merged commit cae7b52 into main Feb 7, 2026
14 checks passed
@Defilan Defilan deleted the feat/parallel-slots branch February 7, 2026 19:11
@github-actions github-actions bot mentioned this pull request Feb 7, 2026
@github-actions github-actions bot mentioned this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: expose llama.cpp parallel slots (--parallel) in InferenceService CRD

1 participant