feat: improve eval criteria, clarity in improvement process by ncrispino · Pull Request #1035 · massgen/MassGen

ncrispino · 2026-03-30T16:01:41Z

PR Title Format

Your PR title must follow the format: <type>: <brief description>

Valid types:

fix: - Bug fixes
feat: - New features
breaking: - Breaking changes
docs: - Documentation updates
refactor: - Code refactoring
test: - Test additions/modifications
chore: - Maintenance tasks
perf: - Performance improvements
style: - Code style changes
ci: - CI/CD configuration changes

Examples:

fix: resolve memory leak in data processing
feat: add export to CSV functionality
breaking: change API response format
docs: update installation guide

Description

Brief description of the changes in this PR

Type of change

Checklist

I have run pre-commit on my changed files and all checks pass
My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Pre-commit status

# Paste the output of running pre-commit on your changed files:
# uv run pre-commit install
# git diff --name-only HEAD~1 | xargs uv run pre-commit run --files # for last commit
# git diff --name-only origin/<base branch>...HEAD | xargs uv run pre-commit run --files # for all commits in PR
# git add <your file> # if any fixes were applied
# git commit -m "chore: apply pre-commit fixes"
# git push origin <branch-name>

How to Test

Add test method for this PR.

Test CLI Command

Write down the test bash command. If there is pre-requests, please emphasize.

Expected Results

Description/screenshots of expected results.

Additional context

Add any other context about the PR here.

Summary by CodeRabbit

New Features
- Added checklist-gated evaluation workflow enabling iterative submission cycles with scoring and improvement proposals before final voting
- Introduced fast iteration mode streamlining multi-round submission phases
- Added web UI review modal for approving and commenting on outputs
- Enabled background trace analysis starting from round 2 onwards
- Redesigned evaluation criteria system with three-tier categorization and anti-pattern definitions
Improvements
- Enhanced workspace cleanup and isolation between rounds
- Refined per-round token usage tracking
- Improved evaluation criteria generation with aspiration statements
Documentation
- Added comprehensive guides for checklist-gated workflows and new configuration options

coderabbitai · 2026-03-30T16:02:02Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 11914265-f45d-4d3e-aa4e-1bd3b87b7dbc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

Reviews paused

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces a tool renaming (propose_improvements → draft_approach), adds three new coordination configuration flags (fast_iteration_mode, web_review, auto_trace_analysis), migrates evaluation criteria categories from must/should/could to primary/standard/stretch, implements a WebUI-based review modal for final answers, improves workspace path routing and cleanup, and provides new documentation and example configuration for the fast iteration mode.

Changes

Cohort / File(s)	Summary
Documentation Updates `docs/modules/coordination_workflow.md`, `docs/modules/injection.md`, `docs/modules/subagents.md`, `docs/source/reference/yaml_schema.rst`, `docs/source/user_guide/concepts.rst`	Updated checklist-gated workflow documentation to replace `propose_improvements` with `draft_approach`, added `fast_iteration_mode` schema parameter with phase-skipping behavior, and introduced a new "Checklist-Gated Evaluation" concept section describing the voting workflow and default criteria.
Configuration Schema & Validation `massgen/agent_config.py`, `massgen/config_validator.py`, `massgen/config_builder.py`, `massgen/configs/features/fast_iteration.yaml`	Added three new `CoordinationConfig` boolean flags (`fast_iteration_mode`, `web_review`, `auto_trace_analysis`), refactored Docker backend defaults into a shared constant, updated category validation to accept new `primary/standard/stretch` values, and provided a new fast-iteration example configuration.
Evaluation Criteria Refactoring `massgen/evaluation_criteria_generator.py`	Major schema migration: added `anti_patterns` field, changed `category` semantics from `must/should/could` to `primary/standard/stretch` with legacy mapping, updated default criteria texts, modified `_parse_criteria_response` to return tuple with aspiration, enforced single-primary constraint, and added `fast_iteration_mode` parameter propagation.
CLI & Coordination Integration `massgen/cli.py`, `massgen/coordination_tracker.py`	Added workspace-path routing (`_route_workspace_path`) for relative `cwd` values, introduced `--web-review` CLI flag, extended coordination config parsing for new flags, updated default criteria categories from `"should"` to `"standard"`, and added review-pending status reporting with `finish_reason="waiting_for_review"`.
Workspace Management `massgen/filesystem_manager/_filesystem_manager.py`, `massgen/filesystem_manager/_isolation_context_manager.py`, `massgen/filesystem_manager/_path_permission_manager.py`	Extended workspace cleanup to prune stale agent directories under `.massgen/workspaces/`, added `_is_massgen_workspace()` helper and conditional cleanup of prefixed workspaces, implemented round-by-round workspace clearing (excluding `.git/`), and added type-safety guard for non-dict JSON parsing in tool arguments.
Web UI & Server Review Modal `massgen/frontend/web/server.py`, `massgen/frontend/web_display.py`	Introduced review-modal state management in `WebDisplay` with `show_final_answer_modal()` method that emits WebSocket events and awaits user decision (up to 600s), added `resolve_review()` idempotent handler, introduced REST endpoints (`GET /review`, `POST /review-response`, `GET /subagent/*/events`), added WebSocket `review_response` action, plumbed `web_review` flag from coordination config, and updated docker-backend override to use shared defaults.
Display Updates `massgen/frontend/displays/textual/widgets/modals/content_modals.py`, `massgen/frontend/displays/rich_terminal_display.py`	Updated `EvaluationCriteriaModal` category badge colors for new `primary/standard/stretch` categories with backward-compatible legacy support, changed default category from `"should"` to `"standard"`, and clarified context-window usage metric label to "Peak round input vs context window" with per-round calculation explanation.
Token & Parameter Updates `massgen/backend/base.py`, `massgen/api_params_handler/_api_params_handler_base.py`	Changed context-window usage calculation to use per-round input tokens (`token_usage.input_tokens - round_start_snapshot["input_tokens"]`) instead of cumulative total, and updated `improvements` parameter comment from `propose_improvements` to `draft_approach` gate semantics.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant Orchestrator
    participant WebServer
    participant Client as Client/WebUI
    participant User

    Agent->>Orchestrator: submit_answer()
    Orchestrator->>WebServer: show_final_answer_modal()
    WebServer->>Client: emit review_request event + display modal
    Client->>User: show answer in review modal
    User->>Client: approve/reject with comments
    Client->>WebServer: POST /api/sessions/{id}/review-response
    WebServer->>Orchestrator: resolve_review(result_data)
    Orchestrator->>Orchestrator: decide continuation based on approval
    Orchestrator->>Agent: proceed to next round or finalize

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: v0.1.58 #957: Overlapping modifications to coordination/checklist workflow, evaluation criteria schema, and filesystem management across the same modules.
feat: v0.1.61 #985: Related changes to coordination config, tool naming (propose_improvements → draft_approach), and subagent documentation updates.
feat: v0.1.67 #1015: Code-level overlap in coordination config serialization, CLI web_review flag parsing, and WebUI review handling logic.

Suggested reviewers

a5507203

🚥 Pre-merge checks | ✅ 3 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely a template with no actual content provided by the author—it contains only blank sections and placeholder instructions without describing the changes, testing approach, or implementation details.	Replace the template with a concrete description of changes, including a summary of evaluation criteria improvements, fast iteration mode, draft_approach renaming, and testing approach.
Documentation Updated	⚠️ Warning	PR introduces user-facing changes without complete documentation coverage including missing migration guide for breaking category system change, no design documentation for review modal and fast_iteration_mode features, and missing runnable examples and --web-review flag documentation.	Add migration guide for category change, create design documents in docs/dev_notes/, add runnable examples to checklist-gated evaluation section, and document --web-review flag in user guide.
Config Parameter Sync	⚠️ Warning	Three new coordination parameters (auto_trace_analysis, web_review, fast_iteration_mode) were added to CoordinationConfig but not added to the exclusion lists in get_base_excluded_config_params() and get_base_excluded_params().	Add the three new coordination parameters to exclusion sets in both backend/base.py and api_params_handler/_api_params_handler_base.py.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat: improve eval criteria, clarity in improvement process' clearly and specifically describes the main changes across the changeset, which focus on evaluation criteria enhancements and renaming propose_improvements to draft_approach.
Docstring Coverage	✅ Passed	Docstring coverage is 92.73% which is sufficient. The required threshold is 80.00%.
Capabilities Registry Check	✅ Passed	PR introduces only documentation, configuration, and workflow changes without new backend or model implementations requiring capabilities registry updates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev/v0.1.69-p1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ncrispino · 2026-03-30T16:02:07Z

@coderabbitai pause

Copilot

Pull request overview

Adds stronger, more explicit evaluation/improvement semantics (new criterion tiers + draft_approach), plus new WebUI flows for (1) pre-collaboration phase visibility and (2) a git-diff review gate before applying write-mode changes.

Changes:

Introduces WebUI pre-collab phase tracking (sidebar + results panel) with subagent event polling, and adds a full-screen review modal driven by new WebSocket events.
Updates checklist-gated evaluation flow across backend/tests/docs (criterion categories, rename propose_improvements → draft_approach, new anti-pattern/aspiration shape, fast-iteration option).
Improves operational robustness: toolArgs normalization for Copilot hooks/PPM, workspace routing/cleanup, review-pending status surfaced in status.json.

Reviewed changes

Copilot reviewed 54 out of 191 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
webui/src/types/index.ts	Adds WS event types + review event payload types for the WebUI review modal.
webui/src/stores/v2/statusStore.ts	Updates eval criterion category comment to reflect new tiering scheme.
webui/src/stores/v2/reviewStore.ts	New Zustand store for parsing diffs, per-file approval selection, and sending review decisions.
webui/src/stores/v2/preCollabStore.ts	New Zustand store modeling pre-collab lifecycle + storing generated personas/criteria/prompt artifacts.
webui/src/stores/v2/preCollabStore.test.ts	Unit tests validating pre-collab event handling, results storage, and panel state.
webui/src/stores/v2/messageStore.ts	Routes specific structured events to `preCollabStore` via dynamic import.
webui/src/stores/agentStore.ts	Notes that pre-collab events are handled in messageStore → preCollabStore path.
webui/src/hooks/useWebSocket.ts	Dispatches review WS events to `reviewStore` and registers a send function for responses.
webui/src/hooks/useSubagentEvents.ts	New polling hook to fetch subagent events and feed them into the message store.
webui/src/components/v2/tiles/SubagentTile.tsx	Enables subagent event polling when viewing a running pre-collab phase tile.
webui/src/components/v2/tiles/PromptBanner.tsx	Adds colors for new criterion categories with legacy fallbacks.
webui/src/components/v2/sidebar/Sidebar.tsx	Adds `PreCollabSection` above channels when phases exist.
webui/src/components/v2/sidebar/PreCollabSection.tsx	New sidebar section rendering phase status and linking to tiles/results.
webui/src/components/v2/layout/PreCollabResultsPanel.tsx	New modal panel showing personas/criteria/prompt outputs by tab.
webui/src/components/v2/layout/AppShell.tsx	Mounts `PreCollabResultsPanel` and `ReviewModal`; suppresses LaunchIndicator during active pre-collab.
massgen/token_manager/token_manager.py	Clarifies semantics for context usage percentage as a per-round cost proxy.
massgen/tests/unit/test_orchestrator_unit.py	Updates expected prompt injection text to reference `draft_approach`.
massgen/tests/test_task_decomposer.py	Updates expected decomposition criteria categories to new scheme.
massgen/tests/test_system_prompt_sections.py	Updates expectations for new tool naming + evaluator availability flag usage.
massgen/tests/test_subagent_mcp_server.py	Adjusts tests for deferred config loading behavior.
massgen/tests/test_standalone_mcp_servers.py	Renames propose-improvements tests to draft-approach equivalents.
massgen/tests/test_specialized_subagents.py	Updates prompt assertions to reference `draft_approach` tool name.
massgen/tests/test_round_evaluator_loop.py	Updates expected injected block wording to `draft_approach`.
massgen/tests/test_read_media_analysis.py	Adjusts expected wording (“foundation” → “fundamental”).
massgen/tests/test_planning_injection.py	Renames propose-improvements references to draft-approach in planning injection tests/docs.
massgen/tests/test_novelty_injection.py	Updates novelty messaging expectations (“CONVERGENCE SIGNAL”) and added guidance assertions.
massgen/tests/test_injection_checklist_guidance.py	Updates injection guidance expectations to reference `draft_approach`.
massgen/tests/test_gepa_evaluation_flow.py	Updates default criteria count/category expectations and parsing return shape.
massgen/tests/test_enforcement_observability.py	Ensures orchestrator mocks include `_review_pending`.
massgen/tests/test_decomposition_mode.py	Updates `_get_active_criteria` return tuple shape and category expectations.
massgen/tests/test_convergence_novelty.py	Updates changedoc category expectations and revised wording assertions.
massgen/tests/test_checklist_criteria_presets.py	Updates valid categories assertions and related preset expectations.
massgen/tests/test_changedoc_system_prompt.py	Updates changedoc system prompt expectations and evaluator availability setup.
massgen/tests/backend/test_copilot_permission_integration.py	Adds regression test ensuring PPM hook tolerates non-dict tool args.
massgen/tests/backend/test_copilot_hooks.py	Expands hook adapter tests to cover toolArgs being JSON strings vs dicts.
massgen/task_decomposer.py	Updates decomposition schema example and maps legacy criterion categories to new tiers; threads fast-iteration flag.
massgen/system_message_builder.py	Threads anti-patterns, evaluator availability, and fast-iteration flags into system prompt sections.
massgen/subagent_types/execution_trace_analyzer/SUBAGENT.md	Refactors subagent spec to DO/DON’T oriented guidance format.
massgen/subagent/launch_watcher.py	Adds API to allow additional workspace roots.
massgen/skills/massgen/scripts/review_watcher.sh	New watcher script that emits structured markers when review becomes pending/resolved.
massgen/skills/massgen/scripts/massgen_run.sh	Adds `--web-review` wrapper behavior to run MassGen + review watcher together.
massgen/skills/massgen/SKILL.md	Updates skill guidance for criteria format, cwd-context usage, and web review workflow.
massgen/skills/massgen-log-analyzer/SKILL.md	Updates log analyzer guidance/tool names from propose-improvements to draft-approach.
massgen/prompt_improver.py	Threads fast-iteration flag into pre-collab prompt improvement coordination.
massgen/persona_generator.py	Threads fast-iteration flag into pre-collab persona generation coordination.
massgen/message_templates.py	Tightens final presentation system message to require fully polished output.
massgen/mcp_tools/standalone/quality_server.py	Renames `propose_improvements` → `draft_approach`, adjusts defaults, and expands payload.
massgen/mcp_tools/planning/_planning_mcp_server.py	Renames injection references to draft-approach in comments/logging/help text.
massgen/mcp_tools/native_hook_adapters/copilot_adapter.py	Prevents double-encoding when toolArgs arrives as a JSON string.
massgen/frontend/web/static/index.html	Updates bundled asset filenames for rebuilt WebUI static output.
massgen/frontend/web/static/assets/stateDiagram-v2-4FDKWEC3-CSg-juUb.js	Removes old built asset.
massgen/frontend/web/static/assets/stateDiagram-v2-4FDKWEC3-BZytH76p.js.map	Updates source map for rebuilt asset.
massgen/frontend/web/static/assets/stateDiagram-v2-4FDKWEC3-BZytH76p.js	Adds rebuilt asset.
massgen/frontend/web/static/assets/pieDiagram-ADFJNKIX-nSgxIlGJ.js	Updates rebuilt asset to reference new bundle/chunk names.
massgen/frontend/web/static/assets/min-D6HVG0bB.js	Updates rebuilt asset.
massgen/frontend/web/static/assets/infoDiagram-WHAUD3N6-CRJbwCDr.js.map	Updates rebuilt asset map.
massgen/frontend/web/static/assets/infoDiagram-WHAUD3N6-CRJbwCDr.js	Updates rebuilt asset.
massgen/frontend/web/static/assets/flowDiagram-NV44I4VS-DvfftS_T.js	Updates rebuilt asset.
massgen/frontend/web/static/assets/diagram-S2PKOQOG-BxWqGtM2.js	Updates rebuilt asset.
massgen/frontend/web/static/assets/clone-Dk4sOD41.js.map	Updates rebuilt asset map.
massgen/frontend/web/static/assets/clone-Dk4sOD41.js	Adds rebuilt asset.
massgen/frontend/web/static/assets/clone-BbJZWq-S.js	Removes old built asset.
massgen/frontend/web/static/assets/classDiagram-v2-WZHVMYZB-cXQvqT0Z.js.map	Updates rebuilt asset map.
massgen/frontend/web/static/assets/classDiagram-v2-WZHVMYZB-cXQvqT0Z.js	Adds rebuilt asset.
massgen/frontend/web/static/assets/classDiagram-v2-WZHVMYZB-DWh8I7Vs.js	Removes old built asset.
massgen/frontend/web/static/assets/classDiagram-2ON5EDUG-cXQvqT0Z.js.map	Updates rebuilt asset map.
massgen/frontend/web/static/assets/classDiagram-2ON5EDUG-cXQvqT0Z.js	Adds rebuilt asset.
massgen/frontend/web/static/assets/classDiagram-2ON5EDUG-DWh8I7Vs.js	Removes old built asset.
massgen/frontend/web/static/assets/chunk-TZMSLE5B-CNqpE2vn.js	Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-QZHKN3VN-T8lBmuOM.js.map	Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-QZHKN3VN-T8lBmuOM.js	Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-QN33PNHL-BfOaT5Ka.js.map	Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-QN33PNHL-BfOaT5Ka.js	Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-FMBD7UC4-DUvoIIV_.js.map	Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-FMBD7UC4-DUvoIIV_.js	Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-55IACEB6-BcS3BSyB.js.map	Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-55IACEB6-BcS3BSyB.js	Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/chunk-4BX2VUAB-B2PxL9r5.js.map	Updates rebuilt chunk map.
massgen/frontend/web/static/assets/chunk-4BX2VUAB-B2PxL9r5.js	Updates rebuilt chunk to reference new bundle.
massgen/frontend/web/static/assets/channel-JJR5B7js.js	Removes old built asset.
massgen/frontend/web/static/assets/channel-0bNEMcha.js.map	Adds rebuilt asset map.
massgen/frontend/web/static/assets/channel-0bNEMcha.js	Adds rebuilt asset.
massgen/frontend/web/static/assets/base-80a1f760-DCEq9MnE.js.map	Updates rebuilt asset map.
massgen/frontend/web/static/assets/base-80a1f760-DCEq9MnE.js	Updates rebuilt asset.
massgen/frontend/web/static/assets/arc-gU3KohN6.js	Updates rebuilt asset.
massgen/frontend/displays/web_display.py	Adds review modal lifecycle support (WS events + REST resolution + status markers/files).
massgen/frontend/displays/textual/widgets/modals/content_modals.py	Adds new category badges and defaults for criteria modal (with legacy fallbacks).
massgen/frontend/displays/rich_terminal_display.py	Improves context-usage explanation and warning text.
massgen/filesystem_manager/_path_permission_manager.py	Hardens tool-args parsing to avoid crashes when args aren’t dict-shaped.
massgen/filesystem_manager/_isolation_context_manager.py	Clears non-git workspace files between rounds in workspace mode.
massgen/filesystem_manager/_filesystem_manager.py	Prunes stale workspaces and optionally cleans up main workspace under `.massgen/workspaces/`.
massgen/coordination_tracker.py	Defaults criteria category to `standard` and adds `review_pending`/waiting state to status output.
massgen/configs/features/fast_iteration.yaml	New example config enabling fast-iteration mode + docker/code-mode defaults.
massgen/config_validator.py	Allows new category values and validates `fast_iteration_mode`.
massgen/config_builder.py	Centralizes docker backend defaults into a shared constant and uses it for quickstart config generation.
massgen/cli.py	Adds workspace routing under `.massgen/workspaces/`, propagates `--web-review`, and updates criteria parsing defaults.
massgen/backend/base.py	Computes `context_usage_pct` based on per-round input tokens rather than cumulative.
massgen/api_params_handler/_api_params_handler_base.py	Updates excluded param comment to reference `draft_approach` flow.
massgen/agent_config.py	Adds `auto_trace_analysis`, `web_review`, and `fast_iteration_mode` to coordination config.
docs/source/user_guide/concepts.rst	Adds checklist-gated evaluation section and updates tool naming (`draft_approach`).
docs/source/reference/yaml_schema.rst	Documents `fast_iteration_mode` under coordination config schema.
docs/modules/subagents.md	Updates subagent documentation for tool naming changes.
docs/modules/injection.md	Updates injection docs to reference `draft_approach`.
docs/modules/coordination_workflow.md	Updates workflow docs and describes fast-iteration mode behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T16:09:59Z

massgen/frontend/displays/web_display.py

+            request_json = {
+                "review_pending": True,
+                "url": "http://localhost:8000/?v=2",
+                "api_url": f"http://localhost:8000/api/sessions/{self.session_id}/review-response",


The review URL/API URL written to review_request.json is hard-coded to http://localhost:8000, which will be wrong when the WebUI is served on a different host/port (e.g., --web-port, remote server). Consider deriving this from the server’s configured base URL (passed into WebDisplay via kwargs) so external watchers and logs point to the correct address.

Suggested change

request_json = {

"review_pending": True,

"url": "http://localhost:8000/?v=2",

"api_url": f"http://localhost:8000/api/sessions/{self.session_id}/review-response",

base_url = (

getattr(self, "web_base_url", None)

or getattr(self, "base_url", None)

or getattr(self, "server_base_url", None)

or "http://localhost:8000"

)

base_url = base_url.rstrip("/")

request_json = {

"review_pending": True,

"url": f"{base_url}/?v=2",

"api_url": f"{base_url}/api/sessions/{self.session_id}/review-response",

Copilot · 2026-03-30T16:10:00Z

massgen/skills/massgen/scripts/massgen_run.sh

+    # Wait briefly for LOG_DIR to be printed, then extract it from status.json
+    sleep 3
+    # Find the latest log dir from ~/.massgen/massgen_logs
+    LOG_DIR=$(ls -td ~/.massgen/massgen_logs/*/  2>/dev/null | head -1)
+    if [[ -n "$LOG_DIR" ]]; then
+        "$SKILL_DIR/scripts/review_watcher.sh" "$LOG_DIR" &
+        WATCHER_PID=$!
+    fi


When --web-review is enabled, the script picks LOG_DIR via ls -td ~/.massgen/massgen_logs/* | head -1. If multiple runs are active (or another run finishes later), the watcher can attach to the wrong session. Prefer parsing the actual LOG_DIR from MassGen’s stdout (it prints [WebUI] LOG_DIR: ...) or accept it explicitly as an argument/output from the MassGen process you just spawned.

Copilot · 2026-03-30T16:10:00Z

massgen/skills/massgen/scripts/review_watcher.sh

+# Usage:
+#   review_watcher.sh <LOG_DIR> [--poll-interval SECONDS]
+#


The usage text advertises review_watcher.sh <LOG_DIR> [--poll-interval SECONDS], but the script actually treats $2 as the poll interval and does not parse a --poll-interval flag. Either implement flag parsing (recommended) or update the usage/docs to match the positional-arg behavior.

massgen/config_builder.py

Copilot · 2026-03-30T16:10:01Z

webui/src/stores/v2/reviewStore.ts

+export const useReviewStore = create<ReviewState & ReviewActions>(
+  (set, get) => ({
+    // Internal: WebSocket send function (set via setSendFn)
+    _sendFn: null as ((data: Record<string, unknown>) => void) | null,
+


useReviewStore is typed as ReviewState & ReviewActions, but the initializer adds _sendFn, which isn’t declared in either interface. This typically triggers an excess-property type error and forces the later any casts. Consider modeling _sendFn in the store type (e.g., a separate internal interface) or keeping the send function in a module-level variable/closure so the store remains fully typed without any casts.

Copilot · 2026-03-30T16:10:01Z

webui/src/stores/v2/reviewStore.ts

+    // Match "diff --git a/path b/path" or "+++ b/path"
+    const diffMatch = line.match(/^diff --git a\/(.+?) b\/(.+)/);
+    if (diffMatch) {


The comment says it matches both diff --git ... and +++ b/path, but the implementation only matches ^diff --git .... Either update the comment to reflect the actual behavior, or extend the parser to handle diffs that start at +++/--- (some patch formats omit the diff --git line).

webui/src/types/index.ts

coderabbitai · 2026-03-30T16:22:21Z

✅ Actions performed

Reviews paused.

ncrispino added 5 commits March 27, 2026 20:43

Improve eval criteria; add review modal for working in git

fb91ec6

Refactor quickstart

ca0c770

General fixes for subagent

4e666b0

Fix path issues

808c31f

Rename for clearer role; adjust sys prompts

0eb5428

Copilot AI review requested due to automatic review settings March 30, 2026 16:01

Copilot started reviewing on behalf of ncrispino March 30, 2026 16:02 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Fix precommit

d5a1a34

Henry-811 changed the base branch from main to dev/v0.1.70 March 30, 2026 16:13

Henry-811 merged commit fab6ffb into dev/v0.1.70 Mar 30, 2026
20 checks passed

coderabbitai bot mentioned this pull request Apr 1, 2026

feat: v0.1.71 #1040

Merged

18 tasks

-            request_json = {
-                "review_pending": True,
-                "url": "http://localhost:8000/?v=2",
-                "api_url": f"http://localhost:8000/api/sessions/{self.session_id}/review-response",
+            base_url = (
+                getattr(self, "web_base_url", None)
+                or getattr(self, "base_url", None)
+                or getattr(self, "server_base_url", None)
+                or "http://localhost:8000"
+            )
+            base_url = base_url.rstrip("/")
+            request_json = {
+                "review_pending": True,
+                "url": f"{base_url}/?v=2",
+                "api_url": f"{base_url}/api/sessions/{self.session_id}/review-response",

Conversation

ncrispino commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title Format

Description

Type of change

Checklist

Pre-commit status

How to Test

Test CLI Command

Expected Results

Additional context

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (3 warnings)

Uh oh!

ncrispino commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ncrispino commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading