perf: reduce completion eval overhead and fix follow-up quality#179
Merged
rockfordlhotka merged 5 commits intomainfrom Mar 20, 2026
Merged
perf: reduce completion eval overhead and fix follow-up quality#179rockfordlhotka merged 5 commits intomainfrom
rockfordlhotka merged 5 commits intomainfrom
Conversation
Subagent tool loops no longer run the completion evaluator — the primary agent's evaluator catches incomplete results when it synthesises the subagent output, eliminating 2 redundant LLM round-trips per subagent task (10-30s savings observed in production logs). Default MaxCompletionReprompts reduced from 2 to 1. The second re-prompt rarely improved results and added 15-30s of latency. Model-specific overrides still work via ModelBehavior.MaxCompletionRepromptsOverride. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow-up responses that contain only narration, refusals, or re-statements of the original answer are now discarded before concatenation. This catches the "split personality" pattern where the follow-up evaluator finds an opportunity but the LLM refuses or scope-polices instead of acting — producing contradictory content appended to an otherwise clean response. The check counts FunctionCallContent (native path) and [Tool result for ...] messages (text-based path) added during the follow-up loop. Zero tool calls means the follow-up added no new information and is dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add skill creation/refinement to the good follow-ups list so the evaluator suggests reusable learnings. Add two new bad follow-up patterns that were causing split-personality responses: implementing server-side logic/rules (agent can't change service behavior at runtime) and searching unrelated systems to double-check work already completed via the authoritative source. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) The follow-up evaluator now classifies the user's original request as closed/specific ("what is on my todo list?") vs open/exploratory ("find emails from Richard and see if I have outstanding requests") before considering follow-ups. Closed requests almost never warrant follow-ups — the user asked for X, got X, done. Exploratory requests may benefit from connecting dots across systems. This addresses the root cause of unnecessary follow-up passes on simple queries that were adding 18-50s of latency and sometimes producing contradictory "split personality" responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the agent loop called spawn_subagent or invoke_agent, the completion evaluator now skips rather than re-prompting. The SubagentResultHandler will deliver the result — re-prompting races with it and produces duplicate answers (the user sees both the subagent result and the re-prompted primary response). Also updates directives to allow direct handling of simple closed questions that need only 1-2 tool calls. The subagent overhead is counterproductive for "when does my class end?" style queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
rockbot.agent.turn.duration)Follow-up pass made no tool calls; discarding)🤖 Generated with Claude Code