Conversation
…sion property The compaction summary call dropped tools= entirely so the model could only respond with text. vLLM's chat-completions layer renders a "# Tools" block into the system message only when tools= is set in the request, so dropping it makes the summary call's system prompt diverge from every regular turn. Downstream, prime-rl's trajectory walker (interleave_rollout) checks whether each step's prompt_ids extend any active sample's prefix; the diverging system prompt fails that check at the compaction summary turn, and the post-compaction continuation then fails it a second time. One compaction event therefore opens *two* training-sample splits where structurally one is enough — observable as samples_per_rollout / rlm_compactions_count == 2.0 across all envs in RL runs. Fix: forward the active tool list with tool_choice="none". Tools are still advertised so the system prompt is identical to regular turns, and tool_choice="none" preserves the original "text-only summary" behaviour by forbidding tool calls on this turn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eligotts
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The compaction summary call drops
tools=from the request entirely so the model can only respond with text. Side effect we missed: vLLM'schat-completions layer injects a
# Tools\n<tools>{...}</tools>block into the rendered system message only whentools=is in the request.Dropping
tools=therefore makes the compaction call's system prompt diverge from every regular turn.Downstream, prime-rl's RL trajectory walker (
interleave_rolloutinsrc/prime_rl/orchestrator/trajectories.py) opens a new training samplewhenever a step's
prompt_idsdoesn't prefix-match any active sample. One compaction event creates two prefix-incompatible boundaries:tools=in request → no# Toolsblock in rendered system) does not extend the prior pre-compaction sample.# Toolsblock back) does not extend the summary turn.So one compaction event splits the rollout into three samples where structurally two would be enough.
Symptoms in production
RLM-GLM5.1 run (TITO,
use_token_client=true), all four envs:Ratio is 2.0 across all envs — too clean to be tokenization noise; structural.
Fix
Forward the active tool list to the compaction call as
tools=active_toolswithtool_choice="none". The system prompt now renders identically toregular turns (extension property holds at the summary turn → 1 break per compaction instead of 2), while
tool_choice="none"preserves theoriginal "text-only summary" behaviour by forbidding tool calls.