Make ValidationPipeline issue-scoped so TestRun cards correlate to specific issues by habdelra · Pull Request #4381 · cardstack/boxel

habdelra · 2026-04-10T17:25:28Z

Summary

Replace IssueLoopConfig.validator: Validator with createValidator: (issueId: string) => Validator — a factory that creates a fresh validator per issue in the outer loop
The outer loop now calls createValidator(issue.id) for each issue, so TestRun cards are named Test Runs/<issue-slug>-<n> instead of the shared Test Runs/validation-<n>
Thread issueURL from TestValidationStep to executeTestRunFn so the TestRun card gets a proper linksTo relationship to the driving Issue card
Show the linked issue's summary in the TestRun fitted/embedded view for at-a-glance observability
Add the issue slug to the QUnit isolated template <title> for debugging visibility
Scope getNextSequenceNumber by slug (via _mtimes filename listing) so each issue's test runs start from sequence 1 independently
Add test verifying createValidator receives correct issue IDs in order

Closes CS-10723

Depends on: #4375 must be reviewed and merged first — this branch is based on cs-10675-validation-pipeline.

Test plan

All 451 node unit tests pass (including updated resolveTestRun tests for slug-scoped sequences)
All 22 issue-loop smoke test checks pass
ESLint + Prettier pass
New test createValidator receives issue ID verifies factory is called with correct issue IDs
Merged upstream main (includes test-results.gts module grouping changes)
Verify TestRun fitted view shows issue summary when linked (requires realm with Issue cards)

🤖 Generated with Claude Code

Integrate findings from CS-10672 implementation directly into the relevant plan sections rather than appending notes: - Selection algorithm uses backlog (not ready) to match darkfactory.gts - Priority enum includes critical - Issue lifecycle uses backlog → in_progress → done/blocked/review - Exhausted issues tracked via exclusion set to prevent re-picking - RealmIssueStore uses absolute darkfactory module URL (env-sensitive) - Boxel linksToMany uses dotted-key format for blockedBy relationships - Loop outcome disambiguation table for edge cases - Migration path references actual file organization and CS-10708 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a modular validation pipeline that runs after every inner-loop agent turn. The pipeline runs all steps concurrently via Promise.allSettled() and aggregates results. Each step is a separate module implementing the ValidationStepRunner interface — adding a new step is one file plus one line in createDefaultPipeline(). Key changes: - ValidationPipeline class implementing the Validator interface with concurrent step execution and per-step exception capture - TestValidationStep wrapping executeTestRunFromRealm() with detailed failure data read back from the completed TestRun card - NoOpStepRunner placeholders for parse, lint, evaluate, instantiate (child tickets CS-10713 through CS-10716) - Step-specific failure shapes via details field on ValidationStepResult - formatForContext() on each step for LLM-friendly failure output - Max iterations + failing validation now blocks the issue with the reason and failure context in the issue description - Extract fetchRealmFilenames() from pullRealmFiles() for reuse - Optional updateIssue() on IssueStore for max-iteration blocking - Updated phase-2-plan.md with validation architecture documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…es is standalone pullRealmFiles is dead code (only used in its own tests) and doesn't need to be refactored. fetchRealmFilenames stands on its own for the test validation step's file discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ersistence warning - Fix test-step error mapping: fall back to handle.errorMessage when details.failures is empty (infrastructure errors with readable TestRun) - Remove incorrect "no-op" log suffix in smoke test - Improve createDefaultPipeline test to actually verify 5 steps in order - Add warning log when IssueStore lacks updateIssue for max-iteration blocking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… lint - Make IssueStore.updateIssue required (not optional) - Implement updateIssue on RealmIssueStore: read card, mutate status/description, write back to realm - Update all MockIssueStore implementations to include updateIssue - Fix prettier formatting and non-null assertion lint errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update orchestrator pseudocode to reflect max-iterations→blocked behavior - Document IssueStore.updateIssue and orchestrator realm writes - Update Issue Lifecycle section with max-iteration blocking - Clean up --jwt reference in boxel-cli auth section - Fix NoOpStepRunner to declare parameter names (TS strict mode) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The import was ../setup-logger (resolving to scripts/setup-logger which doesn't exist). Fixed to ../../src/setup-logger matching all other smoke test scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The validation pipeline's TestValidationStep already exercises executeTestRunFromRealm internally, so the separate direct call was redundant. The smoke test now goes straight from writing test files to running the ValidationPipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…er interface Add missing _targetRealmUrl parameter to run() and _result parameter to formatForContext() on MockStepRunner and ThrowingStepRunner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move exitReason = 'blocked' inside the try block so it's only set after issueStore.updateIssue succeeds. If the realm write fails, exitReason stays max_iterations to accurately reflect that the issue wasn't actually blocked in the realm. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- factory-seed.spec.ts: fixture realm has pre-existing issues, so use find() instead of asserting exactly 1 issue from listIssues() - factory-target-realm.spec.ts: update from old summary.bootstrap shape to new summary.seedIssue shape, verify seed issue exists in realm instead of Project card (Project is now created by the agent, not the entrypoint) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ecific issues Replace `validator: Validator` with `createValidator: (issueId: string) => Validator` in IssueLoopConfig. The outer loop now creates a fresh validator per issue, passing the issue ID so artifacts like TestRun cards are scoped per-issue instead of shared. Closes CS-10723 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f74ab93bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/software-factory/src/issue-scheduler.ts

…ionpipeline-issue-scoped-so-testrun-cards

…slug - Pass issueId as issueURL from TestValidationStep to executeTestRunFn so the TestRun card gets a proper linksTo relationship to the driving issue - Add issue slug to the QUnit isolated template <title> for observability - Scope getNextSequenceNumber by slug (using _mtimes filename listing) so each issue's test runs start from sequence 1 independently - Show issue.summary in the TestRun fitted/embedded view when linked - Merge upstream main (includes test-results.gts module grouping changes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR makes validation (especially test execution) issue-scoped in the phase-2 issue loop, so TestRun cards can be correlated back to the specific Issue that triggered them and have per-issue naming/sequence behavior.

Changes:

Replace a shared validator instance with createValidator(issueId) to instantiate a fresh validator per issue (enabling per-issue scoping).
Make TestRun creation issue-aware by threading issueURL and using an issue-derived slug, plus slug-scoped sequence numbering via _mtimes filename listing.
Improve observability and coverage: show linked issue summary in TestRun views, add new unit tests for the validation pipeline/test step, and update smoke tests/docs accordingly.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/software-factory/src/issue-loop.ts	Switch to `createValidator(issueId)` and block issues on max-iterations + failing validation with formatted context persisted to the issue description.
packages/software-factory/src/issue-scheduler.ts	Extend `IssueStore` with `updateIssue()` and implement realm-backed read-modify-write updates for Issue cards.
packages/software-factory/src/validators/validation-pipeline.ts	Add modular `ValidationPipeline` with concurrent step execution and context formatting; provide `createDefaultPipeline()`.
packages/software-factory/src/validators/test-step.ts	Add `TestValidationStep` that discovers `.test.gts`, runs tests, links TestRun → Issue, and formats rich failure context from the TestRun card.
packages/software-factory/src/validators/noop-step.ts	Add `NoOpStepRunner` placeholder for unimplemented steps.
packages/software-factory/src/test-run-execution.ts	Scope TestRun sequence numbering by slug using `_mtimes` filename listing; include slug in the isolated test page `<title>`.
packages/software-factory/src/realm-operations.ts	Add `fetchRealmFilenames()` helper backed by `_mtimes` (reused by validation and test-run sequencing).
packages/software-factory/src/factory-agent-types.ts	Add `details?: Record<string, unknown>` to `ValidationStepResult` for step-specific structured context.
packages/software-factory/realm/test-results.gts	Show linked issue summary in TestRun fitted/embedded rendering when present.
packages/software-factory/realm/tsconfig.json	Add a realm-local TS config for typechecking realm sources.
packages/software-factory/tests/validation-pipeline.test.ts	New unit tests for pipeline concurrency, error capture, ordering, and formatting.
packages/software-factory/tests/test-step.test.ts	New unit tests for TestValidationStep behavior including slug derivation from issue ID.
packages/software-factory/tests/issue-loop.test.ts	Update issue-loop tests for `createValidator`, max-iteration blocking behavior, and new factory-call ordering test.
packages/software-factory/tests/issue-scheduler.test.ts	Update mock IssueStore to implement `updateIssue()` required by the scheduler interface.
packages/software-factory/tests/factory-test-realm.test.ts	Extend fetch mock to support `_mtimes` endpoint used by sequence-number logic.
packages/software-factory/tests/factory-target-realm.spec.ts	Update E2E expectations to new “seedIssue” summary shape and validate seed issue exists/fields.
packages/software-factory/tests/index.ts	Include the new/updated test modules in the test entrypoint.
packages/software-factory/scripts/smoke-tests/issue-loop-smoke.ts	Update smoke tests to use `createValidator`; add a scenario exercising `ValidationPipeline` integration with NoOp steps.
packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts	Update smoke test to run the full validation pipeline instead of directly calling test execution, and validate formatted context output.
packages/software-factory/docs/phase-2-plan.md	Update docs to reflect the implemented scheduler semantics and the modular validation pipeline architecture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/software-factory/src/test-run-execution.ts

packages/software-factory/src/validators/test-step.ts

Replace fetchRealmFilenames (O(all files)) with searchRealm filtered to TestRun cards. Filter by card ID matching the slug prefix to get per-issue sequence numbers. More targeted — only fetches TestRun cards, not the entire realm file listing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test expected `summary.seedIssue.seedIssueId` but the actual entrypoint output uses `summary.bootstrap.activeIssue.id`. Updated assertions to match the real FactoryEntrypointSummary structure: - activeIssue.id = 'Issues/sticky-note-define-core' (derived from brief title) - issueType = 'implementation' (not 'bootstrap') - status = 'in_progress' (bootstrap patches the first issue) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lementation' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra and others added 14 commits April 9, 2026 18:24

glint support for software factory realm gts

97cd9f2

Fix setup-logger import path in smoke-test-realm.ts

ef78431

The import was ../setup-logger (resolving to scripts/setup-logger which doesn't exist). Fixed to ../../src/setup-logger matching all other smoke test scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Set updatedAt timestamp when updating issue via RealmIssueStore

8c073b7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

packages/software-factory/src/issue-scheduler.ts Show resolved Hide resolved

habdelra and others added 2 commits April 10, 2026 13:38

Merge remote-tracking branch 'origin/main' into cs-10723-make-validat…

3714067

…ionpipeline-issue-scoped-so-testrun-cards

habdelra requested a review from Copilot April 10, 2026 17:49

Copilot started reviewing on behalf of habdelra April 10, 2026 17:50 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

packages/software-factory/src/test-run-execution.ts Outdated Show resolved Hide resolved

packages/software-factory/src/validators/test-step.ts Show resolved Hide resolved

habdelra and others added 3 commits April 10, 2026 14:04

Fix issueType assertion: bootstrap creates 'feature' issues, not 'imp…

e7fd709

…lementation' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra requested a review from a team April 10, 2026 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ValidationPipeline issue-scoped so TestRun cards correlate to specific issues#4381

Make ValidationPipeline issue-scoped so TestRun cards correlate to specific issues#4381
habdelra wants to merge 19 commits intomainfrom
cs-10723-make-validationpipeline-issue-scoped-so-testrun-cards

habdelra commented Apr 10, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

habdelra commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

habdelra commented Apr 10, 2026 •

edited

Loading