Closed
Conversation
d318adf to
fdc1cc8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
orchestrator.renderer.keep_thinkingas the generic prime-rl config knob.trueto force keeping historical reasoning, and passfalseto force disabling it.verifiers/renderersgit rev to Plumb keep_thinking through renderer factory verifiers#1281 sovf.ClientConfig.renderer_keep_thinkingand renderer factory support are available.Usage
To explicitly disable a keep-thinking renderer alias:
Omitting
keep_thinkinguses the selected renderer's default. This avoids model-name aliases likenemotron3_keep_thinking; the renderer name stays the actual renderer, and the historical-reasoning behavior is configured independently.Dependency
Depends on PrimeIntellect-ai/verifiers#1281. If that PR is merged and GitHub creates a different merge SHA, the
verifierspin can be moved to the merged upstream commit before merging this PR.Tests
uv sync --all-extrasuv run pytest tests/unit/test_configs.py -k 'renderer_keep_thinking' tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/utils/test_client.py tests/unit/utils/test_elastic.py -k 'renderer or elastic_clients_preserve_renderer_model_name or renderer_keep_thinking'uv run ruff check pyproject.toml src/prime_rl/configs/shared.py src/prime_rl/configs/orchestrator.py src/prime_rl/orchestrator/orchestrator.py src/prime_rl/utils/client.py src/prime_rl/utils/elastic.py tests/unit/test_configs.py tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/utils/test_client.py tests/unit/utils/test_elastic.pyuv run ruff format --check src/prime_rl/configs/shared.py src/prime_rl/configs/orchestrator.py src/prime_rl/orchestrator/orchestrator.py src/prime_rl/utils/client.py src/prime_rl/utils/elastic.py tests/unit/test_configs.py tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/utils/test_client.py tests/unit/utils/test_elastic.pygit diff --checkuv lock --lockedNote
Medium Risk
Medium risk because it changes renderer client configuration plumbing and modifies inference
/v1/generateerror handling and scheduler I/O (new JSONL dumps), which can affect rollout behavior and server responses under load.Overview
Adds a tri-state
orchestrator.renderer.keep_thinkingconfig knob and threads it through renderer creation and both static/elastic inference client setup (renderer_keep_thinking), while validating it is only set whenuse_renderer=true.Hardens vLLM’s
/v1/generatepath by detecting non-finite logprob values before JSON serialization and returning a structuredErrorResponse(400) that the server now propagates with the correct status code.Improves rollout debuggability by appending per-attempt rollout/exception records (with metadata) to
train_rollout_attempts.jsonl, and updates SLURM templates to setTRITON_CACHE_DIRplus defaultVLLM_COMPUTE_NANS_IN_LOGITS=1; also bumps the pinnedverifiers/renderersgit rev to pick up required support.Reviewed by Cursor Bugbot for commit d318adf. Bugbot is set up for automated code reviews on this repo. Configure here.