Skip to content

feat: add 32B models to catalog with --context flag#88

Merged
Defilan merged 2 commits intomainfrom
feat/32b-models-and-benchmark-context
Dec 2, 2025
Merged

feat: add 32B models to catalog with --context flag#88
Defilan merged 2 commits intomainfrom
feat/32b-models-and-benchmark-context

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented Dec 2, 2025

Summary

  • Add three 32B Qwen models to catalog for 32GB+ VRAM setups
  • Add --context flag to benchmark command for VRAM-constrained testing
  • Set 32B models to 8K context default to prevent OOM

Models Added

Model Description VRAM
qwen-2.5-32b Reasoning + multilingual 18-24GB
qwen-2.5-coder-32b GPT-4o level coding 18-24GB
qwen-3-32b Hybrid thinking modes 18-24GB

Test Plan

  • Benchmarked all 3 models on Shadowstack (2x RTX 5060 Ti)
  • Verified ~16.5 tok/s generation across all models
  • Confirmed zero OOM errors with 8K context
  • Tested --context flag override

Closes #87

Add three 32B Qwen models for 32GB+ VRAM setups:
- qwen-2.5-32b: reasoning and multilingual
- qwen-2.5-coder-32b: GPT-4o level coding
- qwen-3-32b: hybrid thinking modes

Also adds --context flag to benchmark command for overriding
context size during VRAM-constrained testing.

Closes #87
@Defilan Defilan merged commit 6c06602 into main Dec 2, 2025
13 checks passed
@Defilan Defilan deleted the feat/32b-models-and-benchmark-context branch December 2, 2025 07:17
@github-actions github-actions bot mentioned this pull request Dec 2, 2025
@github-actions github-actions bot mentioned this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add 32B models to catalog with VRAM-safe context defaults

1 participant