Skip to content

feat: improve eval criteria, clarity in improvement process#1035

Merged
Henry-811 merged 6 commits intodev/v0.1.70from
dev/v0.1.69-p1
Mar 30, 2026
Merged

feat: improve eval criteria, clarity in improvement process#1035
Henry-811 merged 6 commits intodev/v0.1.70from
dev/v0.1.69-p1

Conversation

@ncrispino
Copy link
Copy Markdown
Collaborator

@ncrispino ncrispino commented Mar 30, 2026

PR Title Format

Your PR title must follow the format: <type>: <brief description>

Valid types:

  • fix: - Bug fixes
  • feat: - New features
  • breaking: - Breaking changes
  • docs: - Documentation updates
  • refactor: - Code refactoring
  • test: - Test additions/modifications
  • chore: - Maintenance tasks
  • perf: - Performance improvements
  • style: - Code style changes
  • ci: - CI/CD configuration changes

Examples:

  • fix: resolve memory leak in data processing
  • feat: add export to CSV functionality
  • breaking: change API response format
  • docs: update installation guide

Description

Brief description of the changes in this PR

Type of change

  • Bug fix (fix:) - Non-breaking change which fixes an issue
  • New feature (feat:) - Non-breaking change which adds functionality
  • Breaking change (breaking:) - Fix or feature that would cause existing functionality to not work as expected
  • Documentation (docs:) - Documentation updates
  • Code refactoring (refactor:) - Code changes that neither fix a bug nor add a feature
  • Tests (test:) - Adding missing tests or correcting existing tests
  • Chore (chore:) - Maintenance tasks, dependency updates, etc.
  • Performance improvement (perf:) - Code changes that improve performance
  • Code style (style:) - Changes that do not affect the meaning of the code (formatting, missing semi-colons, etc.)
  • CI/CD (ci:) - Changes to CI/CD configuration files and scripts

Checklist

  • I have run pre-commit on my changed files and all checks pass
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Pre-commit status

# Paste the output of running pre-commit on your changed files:
# uv run pre-commit install
# git diff --name-only HEAD~1 | xargs uv run pre-commit run --files # for last commit
# git diff --name-only origin/<base branch>...HEAD | xargs uv run pre-commit run --files # for all commits in PR
# git add <your file> # if any fixes were applied
# git commit -m "chore: apply pre-commit fixes"
# git push origin <branch-name>

How to Test

Add test method for this PR.

Test CLI Command

Write down the test bash command. If there is pre-requests, please emphasize.

Expected Results

Description/screenshots of expected results.

Additional context

Add any other context about the PR here.

Summary by CodeRabbit

  • New Features

    • Added checklist-gated evaluation workflow enabling iterative submission cycles with scoring and improvement proposals before final voting
    • Introduced fast iteration mode streamlining multi-round submission phases
    • Added web UI review modal for approving and commenting on outputs
    • Enabled background trace analysis starting from round 2 onwards
    • Redesigned evaluation criteria system with three-tier categorization and anti-pattern definitions
  • Improvements

    • Enhanced workspace cleanup and isolation between rounds
    • Refined per-round token usage tracking
    • Improved evaluation criteria generation with aspiration statements
  • Documentation

    • Added comprehensive guides for checklist-gated workflows and new configuration options

Copilot AI review requested due to automatic review settings March 30, 2026 16:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 11914265-f45d-4d3e-aa4e-1bd3b87b7dbc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces a tool renaming (propose_improvementsdraft_approach), adds three new coordination configuration flags (fast_iteration_mode, web_review, auto_trace_analysis), migrates evaluation criteria categories from must/should/could to primary/standard/stretch, implements a WebUI-based review modal for final answers, improves workspace path routing and cleanup, and provides new documentation and example configuration for the fast iteration mode.

Changes

Cohort / File(s) Summary
Documentation Updates
docs/modules/coordination_workflow.md, docs/modules/injection.md, docs/modules/subagents.md, docs/source/reference/yaml_schema.rst, docs/source/user_guide/concepts.rst
Updated checklist-gated workflow documentation to replace propose_improvements with draft_approach, added fast_iteration_mode schema parameter with phase-skipping behavior, and introduced a new "Checklist-Gated Evaluation" concept section describing the voting workflow and default criteria.
Configuration Schema & Validation
massgen/agent_config.py, massgen/config_validator.py, massgen/config_builder.py, massgen/configs/features/fast_iteration.yaml
Added three new CoordinationConfig boolean flags (fast_iteration_mode, web_review, auto_trace_analysis), refactored Docker backend defaults into a shared constant, updated category validation to accept new primary/standard/stretch values, and provided a new fast-iteration example configuration.
Evaluation Criteria Refactoring
massgen/evaluation_criteria_generator.py
Major schema migration: added anti_patterns field, changed category semantics from must/should/could to primary/standard/stretch with legacy mapping, updated default criteria texts, modified _parse_criteria_response to return tuple with aspiration, enforced single-primary constraint, and added fast_iteration_mode parameter propagation.
CLI & Coordination Integration
massgen/cli.py, massgen/coordination_tracker.py
Added workspace-path routing (_route_workspace_path) for relative cwd values, introduced --web-review CLI flag, extended coordination config parsing for new flags, updated default criteria categories from "should" to "standard", and added review-pending status reporting with finish_reason="waiting_for_review".
Workspace Management
massgen/filesystem_manager/_filesystem_manager.py, massgen/filesystem_manager/_isolation_context_manager.py, massgen/filesystem_manager/_path_permission_manager.py
Extended workspace cleanup to prune stale agent directories under .massgen/workspaces/, added _is_massgen_workspace() helper and conditional cleanup of prefixed workspaces, implemented round-by-round workspace clearing (excluding .git/), and added type-safety guard for non-dict JSON parsing in tool arguments.
Web UI & Server Review Modal
massgen/frontend/web/server.py, massgen/frontend/web_display.py
Introduced review-modal state management in WebDisplay with show_final_answer_modal() method that emits WebSocket events and awaits user decision (up to 600s), added resolve_review() idempotent handler, introduced REST endpoints (GET /review, POST /review-response, GET /subagent/*/events), added WebSocket review_response action, plumbed web_review flag from coordination config, and updated docker-backend override to use shared defaults.
Display Updates
massgen/frontend/displays/textual/widgets/modals/content_modals.py, massgen/frontend/displays/rich_terminal_display.py
Updated EvaluationCriteriaModal category badge colors for new primary/standard/stretch categories with backward-compatible legacy support, changed default category from "should" to "standard", and clarified context-window usage metric label to "Peak round input vs context window" with per-round calculation explanation.
Token & Parameter Updates
massgen/backend/base.py, massgen/api_params_handler/_api_params_handler_base.py
Changed context-window usage calculation to use per-round input tokens (token_usage.input_tokens - round_start_snapshot["input_tokens"]) instead of cumulative total, and updated improvements parameter comment from propose_improvements to draft_approach gate semantics.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant Orchestrator
    participant WebServer
    participant Client as Client/WebUI
    participant User

    Agent->>Orchestrator: submit_answer()
    Orchestrator->>WebServer: show_final_answer_modal()
    WebServer->>Client: emit review_request event + display modal
    Client->>User: show answer in review modal
    User->>Client: approve/reject with comments
    Client->>WebServer: POST /api/sessions/{id}/review-response
    WebServer->>Orchestrator: resolve_review(result_data)
    Orchestrator->>Orchestrator: decide continuation based on approval
    Orchestrator->>Agent: proceed to next round or finalize
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • feat: v0.1.58 #957: Overlapping modifications to coordination/checklist workflow, evaluation criteria schema, and filesystem management across the same modules.
  • feat: v0.1.61 #985: Related changes to coordination config, tool naming (propose_improvements → draft_approach), and subagent documentation updates.
  • feat: v0.1.67 #1015: Code-level overlap in coordination config serialization, CLI web_review flag parsing, and WebUI review handling logic.

Suggested reviewers

  • a5507203
🚥 Pre-merge checks | ✅ 3 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely a template with no actual content provided by the author—it contains only blank sections and placeholder instructions without describing the changes, testing approach, or implementation details. Replace the template with a concrete description of changes, including a summary of evaluation criteria improvements, fast iteration mode, draft_approach renaming, and testing approach.
Documentation Updated ⚠️ Warning PR introduces user-facing changes without complete documentation coverage including missing migration guide for breaking category system change, no design documentation for review modal and fast_iteration_mode features, and missing runnable examples and --web-review flag documentation. Add migration guide for category change, create design documents in docs/dev_notes/, add runnable examples to checklist-gated evaluation section, and document --web-review flag in user guide.
Config Parameter Sync ⚠️ Warning Three new coordination parameters (auto_trace_analysis, web_review, fast_iteration_mode) were added to CoordinationConfig but not added to the exclusion lists in get_base_excluded_config_params() and get_base_excluded_params(). Add the three new coordination parameters to exclusion sets in both backend/base.py and api_params_handler/_api_params_handler_base.py.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: improve eval criteria, clarity in improvement process' clearly and specifically describes the main changes across the changeset, which focus on evaluation criteria enhancements and renaming propose_improvements to draft_approach.
Docstring Coverage ✅ Passed Docstring coverage is 92.73% which is sufficient. The required threshold is 80.00%.
Capabilities Registry Check ✅ Passed PR introduces only documentation, configuration, and workflow changes without new backend or model implementations requiring capabilities registry updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev/v0.1.69-p1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ncrispino
Copy link
Copy Markdown
Collaborator Author

@coderabbitai pause

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds stronger, more explicit evaluation/improvement semantics (new criterion tiers + draft_approach), plus new WebUI flows for (1) pre-collaboration phase visibility and (2) a git-diff review gate before applying write-mode changes.

Changes:

  • Introduces WebUI pre-collab phase tracking (sidebar + results panel) with subagent event polling, and adds a full-screen review modal driven by new WebSocket events.
  • Updates checklist-gated evaluation flow across backend/tests/docs (criterion categories, rename propose_improvementsdraft_approach, new anti-pattern/aspiration shape, fast-iteration option).
  • Improves operational robustness: toolArgs normalization for Copilot hooks/PPM, workspace routing/cleanup, review-pending status surfaced in status.json.

Reviewed changes

Copilot reviewed 54 out of 191 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
webui/src/types/index.ts Adds WS event types + review event payload types for the WebUI review modal.
webui/src/stores/v2/statusStore.ts Updates eval criterion category comment to reflect new tiering scheme.
webui/src/stores/v2/reviewStore.ts New Zustand store for parsing diffs, per-file approval selection, and sending review decisions.
webui/src/stores/v2/preCollabStore.ts New Zustand store modeling pre-collab lifecycle + storing generated personas/criteria/prompt artifacts.
webui/src/stores/v2/preCollabStore.test.ts Unit tests validating pre-collab event handling, results storage, and panel state.
webui/src/stores/v2/messageStore.ts Routes specific structured events to preCollabStore via dynamic import.
webui/src/stores/agentStore.ts Notes that pre-collab events are handled in messageStore → preCollabStore path.
webui/src/hooks/useWebSocket.ts Dispatches review WS events to reviewStore and registers a send function for responses.
webui/src/hooks/useSubagentEvents.ts New polling hook to fetch subagent events and feed them into the message store.
webui/src/components/v2/tiles/SubagentTile.tsx Enables subagent event polling when viewing a running pre-collab phase tile.
webui/src/components/v2/tiles/PromptBanner.tsx Adds colors for new criterion categories with legacy fallbacks.
webui/src/components/v2/sidebar/Sidebar.tsx Adds PreCollabSection above channels when phases exist.
webui/src/components/v2/sidebar/PreCollabSection.tsx New sidebar section rendering phase status and linking to tiles/results.
webui/src/components/v2/layout/PreCollabResultsPanel.tsx New modal panel showing personas/criteria/prompt outputs by tab.
webui/src/components/v2/layout/AppShell.tsx Mounts PreCollabResultsPanel and ReviewModal; suppresses LaunchIndicator during active pre-collab.
massgen/token_manager/token_manager.py Clarifies semantics for context usage percentage as a per-round cost proxy.
massgen/tests/unit/test_orchestrator_unit.py Updates expected prompt injection text to reference draft_approach.
massgen/tests/test_task_decomposer.py Updates expected decomposition criteria categories to new scheme.
massgen/tests/test_system_prompt_sections.py Updates expectations for new tool naming + evaluator availability flag usage.
massgen/tests/test_subagent_mcp_server.py Adjusts tests for deferred config loading behavior.
massgen/tests/test_standalone_mcp_servers.py Renames propose-improvements tests to draft-approach equivalents.
massgen/tests/test_specialized_subagents.py Updates prompt assertions to reference draft_approach tool name.
massgen/tests/test_round_evaluator_loop.py Updates expected injected block wording to draft_approach.
massgen/tests/test_read_media_analysis.py Adjusts expected wording (“foundation” → “fundamental”).
massgen/tests/test_planning_injection.py Renames propose-improvements references to draft-approach in planning injection tests/docs.
massgen/tests/test_novelty_injection.py Updates novelty messaging expectations (“CONVERGENCE SIGNAL”) and added guidance assertions.
massgen/tests/test_injection_checklist_guidance.py Updates injection guidance expectations to reference draft_approach.
massgen/tests/test_gepa_evaluation_flow.py Updates default criteria count/category expectations and parsing return shape.
massgen/tests/test_enforcement_observability.py Ensures orchestrator mocks include _review_pending.
massgen/tests/test_decomposition_mode.py Updates _get_active_criteria return tuple shape and category expectations.
massgen/tests/test_convergence_novelty.py Updates changedoc category expectations and revised wording assertions.
massgen/tests/test_checklist_criteria_presets.py Updates valid categories assertions and related preset expectations.
massgen/tests/test_changedoc_system_prompt.py Updates changedoc system prompt expectations and evaluator availability setup.
massgen/tests/backend/test_copilot_permission_integration.py Adds regression test ensuring PPM hook tolerates non-dict tool args.
massgen/tests/backend/test_copilot_hooks.py Expands hook adapter tests to cover toolArgs being JSON strings vs dicts.
massgen/task_decomposer.py Updates decomposition schema example and maps legacy criterion categories to new tiers; threads fast-iteration flag.
massgen/system_message_builder.py Threads anti-patterns, evaluator availability, and fast-iteration flags into system prompt sections.
massgen/subagent_types/execution_trace_analyzer/SUBAGENT.md Refactors subagent spec to DO/DON’T oriented guidance format.
massgen/subagent/launch_watcher.py Adds API to allow additional workspace roots.
massgen/skills/massgen/scripts/review_watcher.sh New watcher script that emits structured markers when review becomes pending/resolved.
massgen/skills/massgen/scripts/massgen_run.sh Adds --web-review wrapper behavior to run MassGen + review watcher together.
massgen/skills/massgen/SKILL.md Updates skill guidance for criteria format, cwd-context usage, and web review workflow.
massgen/skills/massgen-log-analyzer/SKILL.md Updates log analyzer guidance/tool names from propose-improvements to draft-approach.
massgen/prompt_improver.py Threads fast-iteration flag into pre-collab prompt improvement coordination.
massgen/persona_generator.py Threads fast-iteration flag into pre-collab persona generation coordination.
massgen/message_templates.py Tightens final presentation system message to require fully polished output.
massgen/mcp_tools/standalone/quality_server.py Renames propose_improvementsdraft_approach, adjusts defaults, and expands payload.
massgen/mcp_tools/planning/_planning_mcp_server.py Renames injection references to draft-approach in comments/logging/help text.
massgen/mcp_tools/native_hook_adapters/copilot_adapter.py Prevents double-encoding when toolArgs arrives as a JSON string.
massgen/frontend/web/static/index.html Updates bundled asset filenames for rebuilt WebUI static output.
massgen/frontend/web/static/assets/stateDiagram-v2-4FDKWEC3-CSg-juUb.js Removes old built asset.
massgen/frontend/web/static/assets/stateDiagram-v2-4FDKWEC3-BZytH76p.js.map Updates source map for rebuilt asset.
massgen/frontend/web/static/assets/stateDiagram-v2-4FDKWEC3-BZytH76p.js Adds rebuilt asset.
massgen/frontend/web/static/assets/pieDiagram-ADFJNKIX-nSgxIlGJ.js Updates rebuilt asset to reference new bundle/chunk names.
massgen/frontend/web/static/assets/min-D6HVG0bB.js Updates rebuilt asset.
massgen/frontend/web/static/assets/infoDiagram-WHAUD3N6-CRJbwCDr.js.map Updates rebuilt asset map.
massgen/frontend/web/static/assets/infoDiagram-WHAUD3N6-CRJbwCDr.js Updates rebuilt asset.
massgen/frontend/web/static/assets/flowDiagram-NV44I4VS-DvfftS_T.js Updates rebuilt asset.
massgen/frontend/web/static/assets/diagram-S2PKOQOG-BxWqGtM2.js Updates rebuilt asset.
massgen/frontend/web/static/assets/clone-Dk4sOD41.js.map Updates rebuilt asset map.
massgen/frontend/web/static/assets/clone-Dk4sOD41.js Adds rebuilt asset.
massgen/frontend/web/static/assets/clone-BbJZWq-S.js Removes old built asset.
massgen/frontend/web/static/assets/classDiagram-v2-WZHVMYZB-cXQvqT0Z.js.map Updates rebuilt asset map.
massgen/frontend/web/static/assets/classDiagram-v2-WZHVMYZB-cXQvqT0Z.js Adds rebuilt asset.
massgen/frontend/web/static/assets/classDiagram-v2-WZHVMYZB-DWh8I7Vs.js Removes old built asset.
massgen/frontend/web/static/assets/classDiagram-2ON5EDUG-cXQvqT0Z.js.map Updates rebuilt asset map.
massgen/frontend/web/static/assets/classDiagram-2ON5EDUG-cXQvqT0Z.js Adds rebuilt asset.
massgen/frontend/web/static/assets/classDiagram-2ON5EDUG-DWh8I7Vs.js Removes old built asset.
massgen/frontend/web/static/assets/chunk-TZMSLE5B-CNqpE2vn.js Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-QZHKN3VN-T8lBmuOM.js.map Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-QZHKN3VN-T8lBmuOM.js Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-QN33PNHL-BfOaT5Ka.js.map Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-QN33PNHL-BfOaT5Ka.js Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-FMBD7UC4-DUvoIIV_.js.map Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-FMBD7UC4-DUvoIIV_.js Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-55IACEB6-BcS3BSyB.js.map Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-55IACEB6-BcS3BSyB.js Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-4BX2VUAB-B2PxL9r5.js.map Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-4BX2VUAB-B2PxL9r5.js Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/channel-JJR5B7js.js Removes old built asset.
massgen/frontend/web/static/assets/channel-0bNEMcha.js.map Adds rebuilt asset map.
massgen/frontend/web/static/assets/channel-0bNEMcha.js Adds rebuilt asset.
massgen/frontend/web/static/assets/base-80a1f760-DCEq9MnE.js.map Updates rebuilt asset map.
massgen/frontend/web/static/assets/base-80a1f760-DCEq9MnE.js Updates rebuilt asset.
massgen/frontend/web/static/assets/arc-gU3KohN6.js Updates rebuilt asset.
massgen/frontend/displays/web_display.py Adds review modal lifecycle support (WS events + REST resolution + status markers/files).
massgen/frontend/displays/textual/widgets/modals/content_modals.py Adds new category badges and defaults for criteria modal (with legacy fallbacks).
massgen/frontend/displays/rich_terminal_display.py Improves context-usage explanation and warning text.
massgen/filesystem_manager/_path_permission_manager.py Hardens tool-args parsing to avoid crashes when args aren’t dict-shaped.
massgen/filesystem_manager/_isolation_context_manager.py Clears non-git workspace files between rounds in workspace mode.
massgen/filesystem_manager/_filesystem_manager.py Prunes stale workspaces and optionally cleans up main workspace under .massgen/workspaces/.
massgen/coordination_tracker.py Defaults criteria category to standard and adds review_pending/waiting state to status output.
massgen/configs/features/fast_iteration.yaml New example config enabling fast-iteration mode + docker/code-mode defaults.
massgen/config_validator.py Allows new category values and validates fast_iteration_mode.
massgen/config_builder.py Centralizes docker backend defaults into a shared constant and uses it for quickstart config generation.
massgen/cli.py Adds workspace routing under .massgen/workspaces/, propagates --web-review, and updates criteria parsing defaults.
massgen/backend/base.py Computes context_usage_pct based on per-round input tokens rather than cumulative.
massgen/api_params_handler/_api_params_handler_base.py Updates excluded param comment to reference draft_approach flow.
massgen/agent_config.py Adds auto_trace_analysis, web_review, and fast_iteration_mode to coordination config.
docs/source/user_guide/concepts.rst Adds checklist-gated evaluation section and updates tool naming (draft_approach).
docs/source/reference/yaml_schema.rst Documents fast_iteration_mode under coordination config schema.
docs/modules/subagents.md Updates subagent documentation for tool naming changes.
docs/modules/injection.md Updates injection docs to reference draft_approach.
docs/modules/coordination_workflow.md Updates workflow docs and describes fast-iteration mode behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +954 to +957
request_json = {
"review_pending": True,
"url": "http://localhost:8000/?v=2",
"api_url": f"http://localhost:8000/api/sessions/{self.session_id}/review-response",
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review URL/API URL written to review_request.json is hard-coded to http://localhost:8000, which will be wrong when the WebUI is served on a different host/port (e.g., --web-port, remote server). Consider deriving this from the server’s configured base URL (passed into WebDisplay via kwargs) so external watchers and logs point to the correct address.

Suggested change
request_json = {
"review_pending": True,
"url": "http://localhost:8000/?v=2",
"api_url": f"http://localhost:8000/api/sessions/{self.session_id}/review-response",
base_url = (
getattr(self, "web_base_url", None)
or getattr(self, "base_url", None)
or getattr(self, "server_base_url", None)
or "http://localhost:8000"
)
base_url = base_url.rstrip("/")
request_json = {
"review_pending": True,
"url": f"{base_url}/?v=2",
"api_url": f"{base_url}/api/sessions/{self.session_id}/review-response",

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +105
# Wait briefly for LOG_DIR to be printed, then extract it from status.json
sleep 3
# Find the latest log dir from ~/.massgen/massgen_logs
LOG_DIR=$(ls -td ~/.massgen/massgen_logs/*/ 2>/dev/null | head -1)
if [[ -n "$LOG_DIR" ]]; then
"$SKILL_DIR/scripts/review_watcher.sh" "$LOG_DIR" &
WATCHER_PID=$!
fi
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When --web-review is enabled, the script picks LOG_DIR via ls -td ~/.massgen/massgen_logs/* | head -1. If multiple runs are active (or another run finishes later), the watcher can attach to the wrong session. Prefer parsing the actual LOG_DIR from MassGen’s stdout (it prints [WebUI] LOG_DIR: ...) or accept it explicitly as an argument/output from the MassGen process you just spawned.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +10
# Usage:
# review_watcher.sh <LOG_DIR> [--poll-interval SECONDS]
#
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage text advertises review_watcher.sh <LOG_DIR> [--poll-interval SECONDS], but the script actually treats $2 as the poll interval and does not parse a --poll-interval flag. Either implement flag parsing (recommended) or update the usage/docs to match the positional-arg behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +90
export const useReviewStore = create<ReviewState & ReviewActions>(
(set, get) => ({
// Internal: WebSocket send function (set via setSendFn)
_sendFn: null as ((data: Record<string, unknown>) => void) | null,

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useReviewStore is typed as ReviewState & ReviewActions, but the initializer adds _sendFn, which isn’t declared in either interface. This typically triggers an excess-property type error and forces the later any casts. Consider modeling _sendFn in the store type (e.g., a separate internal interface) or keeping the send function in a module-level variable/closure so the store remains fully typed without any casts.

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +63
// Match "diff --git a/path b/path" or "+++ b/path"
const diffMatch = line.match(/^diff --git a\/(.+?) b\/(.+)/);
if (diffMatch) {
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says it matches both diff --git ... and +++ b/path, but the implementation only matches ^diff --git .... Either update the comment to reflect the actual behavior, or extend the parser to handle diffs that start at +++/--- (some patch formats omit the diff --git line).

Copilot uses AI. Check for mistakes.
@Henry-811 Henry-811 changed the base branch from main to dev/v0.1.70 March 30, 2026 16:13
@Henry-811 Henry-811 merged commit fab6ffb into dev/v0.1.70 Mar 30, 2026
20 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

✅ Actions performed

Reviews paused.

@coderabbitai coderabbitai bot mentioned this pull request Apr 1, 2026
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants