feat: add 32B models to catalog with --context flag by Defilan · Pull Request #88 · defilantech/LLMKube

Defilan · 2025-12-02T07:05:05Z

Summary

Add three 32B Qwen models to catalog for 32GB+ VRAM setups
Add --context flag to benchmark command for VRAM-constrained testing
Set 32B models to 8K context default to prevent OOM

Models Added

Model	Description	VRAM
qwen-2.5-32b	Reasoning + multilingual	18-24GB
qwen-2.5-coder-32b	GPT-4o level coding	18-24GB
qwen-3-32b	Hybrid thinking modes	18-24GB

Test Plan

Benchmarked all 3 models on Shadowstack (2x RTX 5060 Ti)
Verified ~16.5 tok/s generation across all models
Confirmed zero OOM errors with 8K context
Tested --context flag override

Closes #87

Add three 32B Qwen models for 32GB+ VRAM setups: - qwen-2.5-32b: reasoning and multilingual - qwen-2.5-coder-32b: GPT-4o level coding - qwen-3-32b: hybrid thinking modes Also adds --context flag to benchmark command for overriding context size during VRAM-constrained testing. Closes #87

Defilan added 2 commits December 1, 2025 23:04

test: update catalog test for 13 models

fa002f1

Defilan merged commit 6c06602 into main Dec 2, 2025
13 checks passed

Defilan deleted the feat/32b-models-and-benchmark-context branch December 2, 2025 07:17

github-actions bot mentioned this pull request Dec 2, 2025

chore: release 0.4.10 #86

Merged

github-actions bot mentioned this pull request Mar 4, 2026

chore: release 0.4.22 #207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add 32B models to catalog with --context flag#88

feat: add 32B models to catalog with --context flag#88
Defilan merged 2 commits intomainfrom
feat/32b-models-and-benchmark-context

Defilan commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented Dec 2, 2025

Summary

Models Added

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant