Surface input-blocked Symphony sessions#66
Conversation
Summary: - Treat Codex MCP elicitation and input-required events as blocked sessions instead of retryable worker exits. - Keep blocked issues claimed, visible in snapshots, and listed on the dashboard/API until their Linear state changes. - Add presenter and dashboard payloads for blocked issue details. - Cover app-server MCP elicitation and orchestrator blocked-state paths. Rationale: - Operator-input blockers should pause for human action instead of burning retries or moving issues to terminal states. - Demo operators need blocked sessions visible in the dashboard and API. Tests: - mise exec -- mix format - mise exec -- mix test test/symphony_elixir/app_server_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/orchestrator_status_test.exs Co-authored-by: Codex <codex@openai.com>
f61a471 to
ea86b01
Compare
gpt-cmdr
left a comment
There was a problem hiding this comment.
Code Review: Surface input-blocked Symphony sessions
Architecture is solid — blocking instead of retrying input-waiting sessions is the right call. A few things to address before merge:
Critical
1. Stall timeout may fire on sessions that should be blocked (high confidence)
reconcile_stalled_running_issues compares elapsed_ms > stall_timeout_ms without guarding for the input-blocked condition. If a running session emits :turn_input_required but hasn't exited yet, the stall path will fire and call restart_stalled_issue before the blocked state is populated.
Fix: restart_stalled_issue/5 should check input_required_blocker?(running_entry.last_codex_event) and route to stop_and_block_issue rather than schedule_issue_retry in that case. Also suppress stall fire entirely for entries already in state.blocked.
2. Blocked state is ephemeral — undocumented behavior on restart
state.blocked is in-memory only (%{}). On orchestrator restart, blocked sessions silently re-enter as dispatch candidates. This is acceptable but should be documented. Consider logging a warning when a session has been blocked longer than a configurable threshold.
3. MCP elicitation clause ordering — verify placement
The new needs_input?("mcpServer/elicitation/request", payload) clause must appear before the generic defp needs_input?(_method, _payload), do: false fallback. CI green suggests it's correct, but worth a visual confirm since Elixir won't error on an unreachable clause.
Test Coverage Gaps
4. Normal-exit :input_required → blocked path not tested
Tests exercise the stall path but not the happy path: agent exits normally with {:input_required, issue} → handle_normal_agent_exit → issue appears in state.blocked. Add a test that injects a running_entry with completion: %{outcome: :input_required}, sends :DOWN with :normal, and asserts the issue is in state.blocked.
5. Presenter counts map may be missing blocked field
Dashboard adds a blocked table, but verify the metric cards include counts.blocked so the summary numbers don't undercount visible work.
6. static_snapshot fixture missing blocked key
extensions_test.exs static_snapshot/0 doesn't include a blocked key, so the dashboard liveview rendering of blocked entries has no test coverage.
Overall: approve with the above addressed. The stall-guard issue (#1) is the highest-priority fix since without it sessions can oscillate between stalling and retrying — the exact problem this PR solves.
Review sandbox-denied Codex approvals
|
BLOCKED-PROOF: not auto-merged. The PR is clean and checks are green, but an explicit review comment flags critical/coverage work before merge. I verified the narrow fix shape locally: guard |
Summary: - Route normal agent exits with input-required completion into blocked state instead of continuation retry. - Keep stalled reconciliation from restarting entries already represented as blocked. - Add blocked snapshot/API/dashboard coverage and document blocked state restart behavior. Rationale: - Human-input blockers are not transient failures, including when the worker exits normally after reporting input required. Tests: - mise exec -- mix format --check-formatted - mise exec -- mix test test/symphony_elixir/app_server_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/orchestrator_status_test.exs Co-authored-by: Codex <codex@openai.com>
5f167cc to
a1730f3
Compare
|
Addressed the review feedback in
Validation:
|
Codex app-server sessions can request operator input or MCP elicitation. These should pause visibly instead of retrying until exhausted. *Show input-blocked Symphony sessions in state, API, and dashboard.* - Treat Codex input-required and MCP elicitation events as blocked sessions. - Keep blocked issues claimed until Linear state/routing changes. - Add blocked counts and per-issue blocked details to presenter payloads. - Render blocked sessions in the dashboard. - Add regression coverage for app-server and orchestrator blocked flows. - Retrying was rejected because human-input blockers are not transient failures. - Moving Linear to Done was rejected because Done is post-merge terminal state. - [ ] `make -C elixir all` - [x] `mise exec -- mix format` - [x] `mise exec -- mix test test/symphony_elixir/app_server_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/orchestrator_status_test.exs` --------- Co-authored-by: Codex <codex@openai.com> (cherry picked from commit 3365695)
Context
Codex app-server sessions can request operator input or MCP elicitation. These should pause visibly instead of retrying until exhausted.
TL;DR
Show input-blocked Symphony sessions in state, API, and dashboard.
Summary
Alternatives
Test Plan
make -C elixir allmise exec -- mix formatmise exec -- mix test test/symphony_elixir/app_server_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/orchestrator_status_test.exs