perf: reduce completion eval overhead and fix follow-up quality by rockfordlhotka · Pull Request #179 · MarimerLLC/rockbot

rockfordlhotka · 2026-03-20T07:44:55Z

Summary

Skip completion evaluation in subagents — the primary agent's evaluator catches incomplete results when synthesising subagent output, so the subagent's own eval was redundant (saved 10-30s per subagent task)
Cap completion re-prompts from 2 to 1 — second re-prompt rarely improved results, added 15-30s latency
Discard follow-up passes that made no tool calls — prevents "split personality" responses where the follow-up narrates/refuses instead of acting, then gets concatenated to a clean response
Tighten follow-up evaluator prompt — steer toward skill refinement, away from suggesting server-side logic changes or redundant cross-system verification

Test plan

All 560+ unit tests pass
Deployed to cluster — verify faster turn times in logs (rockbot.agent.turn.duration)
Verify follow-up discards appear in logs (Follow-up pass made no tool calls; discarding)
Verify no split-personality responses in multi-turn conversations
Verify subagent tasks still produce useful results without their own completion eval

🤖 Generated with Claude Code

Subagent tool loops no longer run the completion evaluator — the primary agent's evaluator catches incomplete results when it synthesises the subagent output, eliminating 2 redundant LLM round-trips per subagent task (10-30s savings observed in production logs). Default MaxCompletionReprompts reduced from 2 to 1. The second re-prompt rarely improved results and added 15-30s of latency. Model-specific overrides still work via ModelBehavior.MaxCompletionRepromptsOverride. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Follow-up responses that contain only narration, refusals, or re-statements of the original answer are now discarded before concatenation. This catches the "split personality" pattern where the follow-up evaluator finds an opportunity but the LLM refuses or scope-polices instead of acting — producing contradictory content appended to an otherwise clean response. The check counts FunctionCallContent (native path) and [Tool result for ...] messages (text-based path) added during the follow-up loop. Zero tool calls means the follow-up added no new information and is dropped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add skill creation/refinement to the good follow-ups list so the evaluator suggests reusable learnings. Add two new bad follow-up patterns that were causing split-personality responses: implementing server-side logic/rules (agent can't change service behavior at runtime) and searching unrelated systems to double-check work already completed via the authoritative source. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) The follow-up evaluator now classifies the user's original request as closed/specific ("what is on my todo list?") vs open/exploratory ("find emails from Richard and see if I have outstanding requests") before considering follow-ups. Closed requests almost never warrant follow-ups — the user asked for X, got X, done. Exploratory requests may benefit from connecting dots across systems. This addresses the root cause of unnecessary follow-up passes on simple queries that were adding 18-50s of latency and sometimes producing contradictory "split personality" responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the agent loop called spawn_subagent or invoke_agent, the completion evaluator now skips rather than re-prompting. The SubagentResultHandler will deliver the result — re-prompting races with it and produces duplicate answers (the user sees both the subagent result and the re-prompted primary response). Also updates directives to allow direct handling of simple closed questions that need only 1-2 tool calls. The subagent overhead is counterproductive for "when does my class end?" style queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rockfordlhotka and others added 5 commits March 20, 2026 00:22

rockfordlhotka merged commit 897efd4 into main Mar 20, 2026
2 checks passed

rockfordlhotka deleted the perf/reduce-completion-eval-overhead branch March 20, 2026 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce completion eval overhead and fix follow-up quality#179

perf: reduce completion eval overhead and fix follow-up quality#179
rockfordlhotka merged 5 commits intomainfrom
perf/reduce-completion-eval-overhead

rockfordlhotka commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockfordlhotka commented Mar 20, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant