Skip to content

feat: add teacher config/sft loss for hosted training SFT#514

Open
eligotts wants to merge 3 commits intomainfrom
eli/hosted-sft
Open

feat: add teacher config/sft loss for hosted training SFT#514
eligotts wants to merge 3 commits intomainfrom
eli/hosted-sft

Conversation

@eligotts
Copy link
Copy Markdown

@eligotts eligotts commented Apr 14, 2026

Summary

Implements APR-157 SFT distillation support through the existing prime train config path.

  • Adds top-level loss = "rl" | "sft" with [teacher] and [teacher.sampling] config models.
  • Validates that SFT requires a teacher, RL rejects teacher config, and teacher.save = true is not supported.
  • Defaults omitted SFT rollouts_per_example to 1 while preserving explicit overrides.
  • Forwards the public platform payload shape as loss and teacher.
  • Updates the confirmation summary so Training, Teacher, and Run Config render as separate sections.
  • Keeps the generated template TOML-safe when SFT teacher config and checkpoint_id are uncommented together.

Config Shape

model = "openai/gpt-oss-20b"
loss = "sft"

[teacher]
model = "openai/gpt-oss-120b"
save = false

[teacher.sampling]
max_tokens = 2048
reasoning_effort = "medium"

[[env]]
id = "primeintellect/wordle"

API Payload

{
  "loss": "sft",
  "teacher": {
    "model": "openai/gpt-oss-120b",
    "save": false,
    "sampling": {
      "max_tokens": 2048,
      "reasoning_effort": "medium"
    }
  }
}

Tests

  • uv run pytest packages/prime/tests/test_rl_config.py packages/prime/tests/test_train_cli.py -q

Comment thread packages/prime/src/prime_cli/commands/rl.py Outdated
eligotts and others added 2 commits May 3, 2026 22:46
Add TeacherRolloutModelConfig to the RL config schema so users can
specify an external teacher model for SFT hard distill via TOML:

  [teacher_rollout_model]
  base_url = ["https://..."]
  api_key_var = "PRIME_API_KEY"
  name = "model-name"

The field flows through the API client to the platform, which merges it
into the orchestrator's run_config as CLI overrides.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5da06aa. Configure here.

Comment thread packages/prime/src/prime_cli/commands/rl.py
@willccbb willccbb changed the title feat: add teacher_rollout_model config for hosted SFT hard distill feat: add teacher config/sft loss for hosted training SFT May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants