Skip to content

Add renderer keep_thinking config plumbing#2404

Closed
rasdani wants to merge 3 commits intomainfrom
daniel/renderer-keep-thinking-config
Closed

Add renderer keep_thinking config plumbing#2404
rasdani wants to merge 3 commits intomainfrom
daniel/renderer-keep-thinking-config

Conversation

@rasdani
Copy link
Copy Markdown
Contributor

@rasdani rasdani commented May 3, 2026

Summary

  • Add orchestrator.renderer.keep_thinking as the generic prime-rl config knob.
  • Preserve renderer defaults when the TOML key is omitted, pass true to force keeping historical reasoning, and pass false to force disabling it.
  • Pass the tri-state value into local renderer construction and through static/elastic renderer client setup.
  • Update the pinned verifiers/renderers git rev to Plumb keep_thinking through renderer factory verifiers#1281 so vf.ClientConfig.renderer_keep_thinking and renderer factory support are available.

Usage

[orchestrator.renderer]
name = "nemotron3"
keep_thinking = true

To explicitly disable a keep-thinking renderer alias:

[orchestrator.renderer]
name = "qwen3-keep-thinking"
keep_thinking = false

Omitting keep_thinking uses the selected renderer's default. This avoids model-name aliases like nemotron3_keep_thinking; the renderer name stays the actual renderer, and the historical-reasoning behavior is configured independently.

Dependency

Depends on PrimeIntellect-ai/verifiers#1281. If that PR is merged and GitHub creates a different merge SHA, the verifiers pin can be moved to the merged upstream commit before merging this PR.

Tests

  • uv sync --all-extras
  • uv run pytest tests/unit/test_configs.py -k 'renderer_keep_thinking' tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/utils/test_client.py tests/unit/utils/test_elastic.py -k 'renderer or elastic_clients_preserve_renderer_model_name or renderer_keep_thinking'
  • uv run ruff check pyproject.toml src/prime_rl/configs/shared.py src/prime_rl/configs/orchestrator.py src/prime_rl/orchestrator/orchestrator.py src/prime_rl/utils/client.py src/prime_rl/utils/elastic.py tests/unit/test_configs.py tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/utils/test_client.py tests/unit/utils/test_elastic.py
  • uv run ruff format --check src/prime_rl/configs/shared.py src/prime_rl/configs/orchestrator.py src/prime_rl/orchestrator/orchestrator.py src/prime_rl/utils/client.py src/prime_rl/utils/elastic.py tests/unit/test_configs.py tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/utils/test_client.py tests/unit/utils/test_elastic.py
  • git diff --check
  • uv lock --locked

Note

Medium Risk
Medium risk because it changes renderer client configuration plumbing and modifies inference /v1/generate error handling and scheduler I/O (new JSONL dumps), which can affect rollout behavior and server responses under load.

Overview
Adds a tri-state orchestrator.renderer.keep_thinking config knob and threads it through renderer creation and both static/elastic inference client setup (renderer_keep_thinking), while validating it is only set when use_renderer=true.

Hardens vLLM’s /v1/generate path by detecting non-finite logprob values before JSON serialization and returning a structured ErrorResponse (400) that the server now propagates with the correct status code.

Improves rollout debuggability by appending per-attempt rollout/exception records (with metadata) to train_rollout_attempts.jsonl, and updates SLURM templates to set TRITON_CACHE_DIR plus default VLLM_COMPUTE_NANS_IN_LOGITS=1; also bumps the pinned verifiers/renderers git rev to pick up required support.

Reviewed by Cursor Bugbot for commit d318adf. Bugbot is set up for automated code reviews on this repo. Configure here.

@rasdani rasdani requested review from hallerite and samsja May 3, 2026 19:10
@rasdani rasdani force-pushed the daniel/renderer-keep-thinking-config branch from d318adf to fdc1cc8 Compare May 3, 2026 22:07
@hallerite
Copy link
Copy Markdown
Member

Superseded by #2433 (and the underlying renderers PRs #5 / #6 + verifiers #1298). Closing!

@hallerite hallerite closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants