Skip to content

fix(compaction): forward tools= with tool_choice="none" to keep exten…#73

Merged
samsja merged 1 commit intomainfrom
fix/compaction-tools-keep-extension-property
May 7, 2026
Merged

fix(compaction): forward tools= with tool_choice="none" to keep exten…#73
samsja merged 1 commit intomainfrom
fix/compaction-tools-keep-extension-property

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented May 7, 2026

Summary

The compaction summary call drops tools= from the request entirely so the model can only respond with text. Side effect we missed: vLLM's
chat-completions layer injects a # Tools\n<tools>{...}</tools> block into the rendered system message only when tools= is in the request.
Dropping tools= therefore makes the compaction call's system prompt diverge from every regular turn.

Downstream, prime-rl's RL trajectory walker (interleave_rollout in src/prime_rl/orchestrator/trajectories.py) opens a new training sample
whenever a step's prompt_ids doesn't prefix-match any active sample. One compaction event creates two prefix-incompatible boundaries:

  1. Compaction summary turn (no tools= in request → no # Tools block in rendered system) does not extend the prior pre-compaction sample.
  2. Post-compaction continuation turn (tools back in request → # Tools block back) does not extend the summary turn.

So one compaction event splits the rollout into three samples where structurally two would be enough.

Symptoms in production

RLM-GLM5.1 run (TITO, use_token_client=true), all four envs:

env samples/rollout compactions/rollout breaks (s−1) breaks/compaction
rlm-swe 3.16 1.07 2.16 2.01
rlm-deepdive 1.99 0.49 0.99 2.02
rlm-swerebench-v2 2.62 0.81 1.62 2.00
general-agent-rlm 2.23 0.61 1.23 2.02

Ratio is 2.0 across all envs — too clean to be tokenization noise; structural.

Fix

Forward the active tool list to the compaction call as tools=active_tools with tool_choice="none". The system prompt now renders identically to
regular turns (extension property holds at the summary turn → 1 break per compaction instead of 2), while tool_choice="none" preserves the
original "text-only summary" behaviour by forbidding tool calls.

request_kwargs = {"model": self.model, "messages": messages}                                                                                       
if active_tools:                                                    
    request_kwargs["tools"] = active_tools
    request_kwargs["tool_choice"] = "none"
response = await call_with_retries(self.client.chat.completions.create, **request_kwargs)                                                          
                                            
Why tool_choice="none" over dropping tools=                                                                                                        
                                                                                                                                                   
tool_choice="none" is the OpenAI-spec way to say "you have tools, I just don't want you to call them this turn." It keeps the request              
schema-consistent with regular turns (same system-prompt rendering on the server), so prime-rl's interleave_rollout keeps merging steps into one   
sample across the compaction boundary instead of opening a third one.                                                                              
                                                                                                                                                   
When this regressed
                                                                                                                                                   
It hasn't_compact_branch has called chat.completions.create without tools= since compaction was first introduced in #54 (2026-04-23, commit
f7cda58). The bug has been silent the whole time. It only surfaced now because we noticed samples_per_rollout / rlm_compactions_count == 2.0 and
traced the prefix break from the prime-rl orchestrator side.
                                                                                                                                                   
Scope
                                                                                                                                                   
- One file changed (src/rlm/engine.py), 23 insertions / 6 deletions.
- _compact_branch signature gains an active_tools argument; the only caller is updated.
- No test changestests/test_metrics.py exercises metrics shape, not the compaction request.

…sion property

The compaction summary call dropped tools= entirely so the model could only
respond with text. vLLM's chat-completions layer renders a "# Tools" block
into the system message only when tools= is set in the request, so dropping
it makes the summary call's system prompt diverge from every regular turn.

Downstream, prime-rl's trajectory walker (interleave_rollout) checks whether
each step's prompt_ids extend any active sample's prefix; the diverging
system prompt fails that check at the compaction summary turn, and the
post-compaction continuation then fails it a second time. One compaction
event therefore opens *two* training-sample splits where structurally one
is enough — observable as samples_per_rollout / rlm_compactions_count == 2.0
across all envs in RL runs.

Fix: forward the active tool list with tool_choice="none". Tools are still
advertised so the system prompt is identical to regular turns, and
tool_choice="none" preserves the original "text-only summary" behaviour by
forbidding tool calls on this turn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@samsja samsja merged commit 509bffe into main May 7, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants