Make ValidationPipeline issue-scoped so TestRun cards correlate to specific issues#4381
Make ValidationPipeline issue-scoped so TestRun cards correlate to specific issues#4381
Conversation
Integrate findings from CS-10672 implementation directly into the relevant plan sections rather than appending notes: - Selection algorithm uses backlog (not ready) to match darkfactory.gts - Priority enum includes critical - Issue lifecycle uses backlog → in_progress → done/blocked/review - Exhausted issues tracked via exclusion set to prevent re-picking - RealmIssueStore uses absolute darkfactory module URL (env-sensitive) - Boxel linksToMany uses dotted-key format for blockedBy relationships - Loop outcome disambiguation table for edge cases - Migration path references actual file organization and CS-10708 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a modular validation pipeline that runs after every inner-loop agent turn. The pipeline runs all steps concurrently via Promise.allSettled() and aggregates results. Each step is a separate module implementing the ValidationStepRunner interface — adding a new step is one file plus one line in createDefaultPipeline(). Key changes: - ValidationPipeline class implementing the Validator interface with concurrent step execution and per-step exception capture - TestValidationStep wrapping executeTestRunFromRealm() with detailed failure data read back from the completed TestRun card - NoOpStepRunner placeholders for parse, lint, evaluate, instantiate (child tickets CS-10713 through CS-10716) - Step-specific failure shapes via details field on ValidationStepResult - formatForContext() on each step for LLM-friendly failure output - Max iterations + failing validation now blocks the issue with the reason and failure context in the issue description - Extract fetchRealmFilenames() from pullRealmFiles() for reuse - Optional updateIssue() on IssueStore for max-iteration blocking - Updated phase-2-plan.md with validation architecture documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…es is standalone pullRealmFiles is dead code (only used in its own tests) and doesn't need to be refactored. fetchRealmFilenames stands on its own for the test validation step's file discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ersistence warning - Fix test-step error mapping: fall back to handle.errorMessage when details.failures is empty (infrastructure errors with readable TestRun) - Remove incorrect "no-op" log suffix in smoke test - Improve createDefaultPipeline test to actually verify 5 steps in order - Add warning log when IssueStore lacks updateIssue for max-iteration blocking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… lint - Make IssueStore.updateIssue required (not optional) - Implement updateIssue on RealmIssueStore: read card, mutate status/description, write back to realm - Update all MockIssueStore implementations to include updateIssue - Fix prettier formatting and non-null assertion lint errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update orchestrator pseudocode to reflect max-iterations→blocked behavior - Document IssueStore.updateIssue and orchestrator realm writes - Update Issue Lifecycle section with max-iteration blocking - Clean up --jwt reference in boxel-cli auth section - Fix NoOpStepRunner to declare parameter names (TS strict mode) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The import was ../setup-logger (resolving to scripts/setup-logger which doesn't exist). Fixed to ../../src/setup-logger matching all other smoke test scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The validation pipeline's TestValidationStep already exercises executeTestRunFromRealm internally, so the separate direct call was redundant. The smoke test now goes straight from writing test files to running the ValidationPipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er interface Add missing _targetRealmUrl parameter to run() and _result parameter to formatForContext() on MockStepRunner and ThrowingStepRunner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move exitReason = 'blocked' inside the try block so it's only set after issueStore.updateIssue succeeds. If the realm write fails, exitReason stays max_iterations to accurately reflect that the issue wasn't actually blocked in the realm. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- factory-seed.spec.ts: fixture realm has pre-existing issues, so use find() instead of asserting exactly 1 issue from listIssues() - factory-target-realm.spec.ts: update from old summary.bootstrap shape to new summary.seedIssue shape, verify seed issue exists in realm instead of Project card (Project is now created by the agent, not the entrypoint) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ecific issues Replace `validator: Validator` with `createValidator: (issueId: string) => Validator` in IssueLoopConfig. The outer loop now creates a fresh validator per issue, passing the issue ID so artifacts like TestRun cards are scoped per-issue instead of shared. Closes CS-10723 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f74ab93bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ionpipeline-issue-scoped-so-testrun-cards
…slug - Pass issueId as issueURL from TestValidationStep to executeTestRunFn so the TestRun card gets a proper linksTo relationship to the driving issue - Add issue slug to the QUnit isolated template <title> for observability - Scope getNextSequenceNumber by slug (using _mtimes filename listing) so each issue's test runs start from sequence 1 independently - Show issue.summary in the TestRun fitted/embedded view when linked - Merge upstream main (includes test-results.gts module grouping changes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR makes validation (especially test execution) issue-scoped in the phase-2 issue loop, so TestRun cards can be correlated back to the specific Issue that triggered them and have per-issue naming/sequence behavior.
Changes:
- Replace a shared
validatorinstance withcreateValidator(issueId)to instantiate a fresh validator per issue (enabling per-issue scoping). - Make TestRun creation issue-aware by threading
issueURLand using an issue-derived slug, plus slug-scoped sequence numbering via_mtimesfilename listing. - Improve observability and coverage: show linked issue summary in TestRun views, add new unit tests for the validation pipeline/test step, and update smoke tests/docs accordingly.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/software-factory/src/issue-loop.ts | Switch to createValidator(issueId) and block issues on max-iterations + failing validation with formatted context persisted to the issue description. |
| packages/software-factory/src/issue-scheduler.ts | Extend IssueStore with updateIssue() and implement realm-backed read-modify-write updates for Issue cards. |
| packages/software-factory/src/validators/validation-pipeline.ts | Add modular ValidationPipeline with concurrent step execution and context formatting; provide createDefaultPipeline(). |
| packages/software-factory/src/validators/test-step.ts | Add TestValidationStep that discovers .test.gts, runs tests, links TestRun → Issue, and formats rich failure context from the TestRun card. |
| packages/software-factory/src/validators/noop-step.ts | Add NoOpStepRunner placeholder for unimplemented steps. |
| packages/software-factory/src/test-run-execution.ts | Scope TestRun sequence numbering by slug using _mtimes filename listing; include slug in the isolated test page <title>. |
| packages/software-factory/src/realm-operations.ts | Add fetchRealmFilenames() helper backed by _mtimes (reused by validation and test-run sequencing). |
| packages/software-factory/src/factory-agent-types.ts | Add details?: Record<string, unknown> to ValidationStepResult for step-specific structured context. |
| packages/software-factory/realm/test-results.gts | Show linked issue summary in TestRun fitted/embedded rendering when present. |
| packages/software-factory/realm/tsconfig.json | Add a realm-local TS config for typechecking realm sources. |
| packages/software-factory/tests/validation-pipeline.test.ts | New unit tests for pipeline concurrency, error capture, ordering, and formatting. |
| packages/software-factory/tests/test-step.test.ts | New unit tests for TestValidationStep behavior including slug derivation from issue ID. |
| packages/software-factory/tests/issue-loop.test.ts | Update issue-loop tests for createValidator, max-iteration blocking behavior, and new factory-call ordering test. |
| packages/software-factory/tests/issue-scheduler.test.ts | Update mock IssueStore to implement updateIssue() required by the scheduler interface. |
| packages/software-factory/tests/factory-test-realm.test.ts | Extend fetch mock to support _mtimes endpoint used by sequence-number logic. |
| packages/software-factory/tests/factory-target-realm.spec.ts | Update E2E expectations to new “seedIssue” summary shape and validate seed issue exists/fields. |
| packages/software-factory/tests/index.ts | Include the new/updated test modules in the test entrypoint. |
| packages/software-factory/scripts/smoke-tests/issue-loop-smoke.ts | Update smoke tests to use createValidator; add a scenario exercising ValidationPipeline integration with NoOp steps. |
| packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts | Update smoke test to run the full validation pipeline instead of directly calling test execution, and validate formatted context output. |
| packages/software-factory/docs/phase-2-plan.md | Update docs to reflect the implemented scheduler semantics and the modular validation pipeline architecture. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace fetchRealmFilenames (O(all files)) with searchRealm filtered to TestRun cards. Filter by card ID matching the slug prefix to get per-issue sequence numbers. More targeted — only fetches TestRun cards, not the entire realm file listing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test expected `summary.seedIssue.seedIssueId` but the actual entrypoint output uses `summary.bootstrap.activeIssue.id`. Updated assertions to match the real FactoryEntrypointSummary structure: - activeIssue.id = 'Issues/sticky-note-define-core' (derived from brief title) - issueType = 'implementation' (not 'bootstrap') - status = 'in_progress' (bootstrap patches the first issue) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lementation' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
IssueLoopConfig.validator: ValidatorwithcreateValidator: (issueId: string) => Validator— a factory that creates a fresh validator per issue in the outer loopcreateValidator(issue.id)for each issue, so TestRun cards are namedTest Runs/<issue-slug>-<n>instead of the sharedTest Runs/validation-<n>issueURLfromTestValidationSteptoexecuteTestRunFnso the TestRun card gets a properlinksTorelationship to the driving Issue card<title>for debugging visibilitygetNextSequenceNumberby slug (via_mtimesfilename listing) so each issue's test runs start from sequence 1 independentlycreateValidatorreceives correct issue IDs in orderCloses CS-10723
Test plan
resolveTestRuntests for slug-scoped sequences)createValidator receives issue IDverifies factory is called with correct issue IDs🤖 Generated with Claude Code