Skip to content

Make ValidationPipeline issue-scoped so TestRun cards correlate to specific issues#4381

Open
habdelra wants to merge 19 commits intomainfrom
cs-10723-make-validationpipeline-issue-scoped-so-testrun-cards
Open

Make ValidationPipeline issue-scoped so TestRun cards correlate to specific issues#4381
habdelra wants to merge 19 commits intomainfrom
cs-10723-make-validationpipeline-issue-scoped-so-testrun-cards

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented Apr 10, 2026

Summary

  • Replace IssueLoopConfig.validator: Validator with createValidator: (issueId: string) => Validator — a factory that creates a fresh validator per issue in the outer loop
  • The outer loop now calls createValidator(issue.id) for each issue, so TestRun cards are named Test Runs/<issue-slug>-<n> instead of the shared Test Runs/validation-<n>
  • Thread issueURL from TestValidationStep to executeTestRunFn so the TestRun card gets a proper linksTo relationship to the driving Issue card
  • Show the linked issue's summary in the TestRun fitted/embedded view for at-a-glance observability
  • Add the issue slug to the QUnit isolated template <title> for debugging visibility
  • Scope getNextSequenceNumber by slug (via _mtimes filename listing) so each issue's test runs start from sequence 1 independently
  • Add test verifying createValidator receives correct issue IDs in order

Closes CS-10723

Depends on: #4375 must be reviewed and merged first — this branch is based on cs-10675-validation-pipeline.

Test plan

  • All 451 node unit tests pass (including updated resolveTestRun tests for slug-scoped sequences)
  • All 22 issue-loop smoke test checks pass
  • ESLint + Prettier pass
  • New test createValidator receives issue ID verifies factory is called with correct issue IDs
  • Merged upstream main (includes test-results.gts module grouping changes)
  • Verify TestRun fitted view shows issue summary when linked (requires realm with Issue cards)

🤖 Generated with Claude Code

habdelra and others added 14 commits April 9, 2026 18:24
Integrate findings from CS-10672 implementation directly into the
relevant plan sections rather than appending notes:

- Selection algorithm uses backlog (not ready) to match darkfactory.gts
- Priority enum includes critical
- Issue lifecycle uses backlog → in_progress → done/blocked/review
- Exhausted issues tracked via exclusion set to prevent re-picking
- RealmIssueStore uses absolute darkfactory module URL (env-sensitive)
- Boxel linksToMany uses dotted-key format for blockedBy relationships
- Loop outcome disambiguation table for edge cases
- Migration path references actual file organization and CS-10708

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a modular validation pipeline that runs after every inner-loop agent
turn. The pipeline runs all steps concurrently via Promise.allSettled()
and aggregates results. Each step is a separate module implementing the
ValidationStepRunner interface — adding a new step is one file plus one
line in createDefaultPipeline().

Key changes:
- ValidationPipeline class implementing the Validator interface with
  concurrent step execution and per-step exception capture
- TestValidationStep wrapping executeTestRunFromRealm() with detailed
  failure data read back from the completed TestRun card
- NoOpStepRunner placeholders for parse, lint, evaluate, instantiate
  (child tickets CS-10713 through CS-10716)
- Step-specific failure shapes via details field on ValidationStepResult
- formatForContext() on each step for LLM-friendly failure output
- Max iterations + failing validation now blocks the issue with the
  reason and failure context in the issue description
- Extract fetchRealmFilenames() from pullRealmFiles() for reuse
- Optional updateIssue() on IssueStore for max-iteration blocking
- Updated phase-2-plan.md with validation architecture documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…es is standalone

pullRealmFiles is dead code (only used in its own tests) and doesn't
need to be refactored. fetchRealmFilenames stands on its own for the
test validation step's file discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ersistence warning

- Fix test-step error mapping: fall back to handle.errorMessage when
  details.failures is empty (infrastructure errors with readable TestRun)
- Remove incorrect "no-op" log suffix in smoke test
- Improve createDefaultPipeline test to actually verify 5 steps in order
- Add warning log when IssueStore lacks updateIssue for max-iteration blocking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… lint

- Make IssueStore.updateIssue required (not optional)
- Implement updateIssue on RealmIssueStore: read card, mutate
  status/description, write back to realm
- Update all MockIssueStore implementations to include updateIssue
- Fix prettier formatting and non-null assertion lint errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update orchestrator pseudocode to reflect max-iterations→blocked behavior
- Document IssueStore.updateIssue and orchestrator realm writes
- Update Issue Lifecycle section with max-iteration blocking
- Clean up --jwt reference in boxel-cli auth section
- Fix NoOpStepRunner to declare parameter names (TS strict mode)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The import was ../setup-logger (resolving to scripts/setup-logger which
doesn't exist). Fixed to ../../src/setup-logger matching all other
smoke test scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The validation pipeline's TestValidationStep already exercises
executeTestRunFromRealm internally, so the separate direct call
was redundant. The smoke test now goes straight from writing test
files to running the ValidationPipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er interface

Add missing _targetRealmUrl parameter to run() and _result parameter
to formatForContext() on MockStepRunner and ThrowingStepRunner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move exitReason = 'blocked' inside the try block so it's only set
after issueStore.updateIssue succeeds. If the realm write fails,
exitReason stays max_iterations to accurately reflect that the
issue wasn't actually blocked in the realm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- factory-seed.spec.ts: fixture realm has pre-existing issues, so use
  find() instead of asserting exactly 1 issue from listIssues()
- factory-target-realm.spec.ts: update from old summary.bootstrap shape
  to new summary.seedIssue shape, verify seed issue exists in realm
  instead of Project card (Project is now created by the agent, not the
  entrypoint)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ecific issues

Replace `validator: Validator` with `createValidator: (issueId: string) => Validator`
in IssueLoopConfig. The outer loop now creates a fresh validator per issue, passing
the issue ID so artifacts like TestRun cards are scoped per-issue instead of shared.

Closes CS-10723

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f74ab93bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

habdelra and others added 2 commits April 10, 2026 13:38
…slug

- Pass issueId as issueURL from TestValidationStep to executeTestRunFn so
  the TestRun card gets a proper linksTo relationship to the driving issue
- Add issue slug to the QUnit isolated template <title> for observability
- Scope getNextSequenceNumber by slug (using _mtimes filename listing) so
  each issue's test runs start from sequence 1 independently
- Show issue.summary in the TestRun fitted/embedded view when linked
- Merge upstream main (includes test-results.gts module grouping changes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes validation (especially test execution) issue-scoped in the phase-2 issue loop, so TestRun cards can be correlated back to the specific Issue that triggered them and have per-issue naming/sequence behavior.

Changes:

  • Replace a shared validator instance with createValidator(issueId) to instantiate a fresh validator per issue (enabling per-issue scoping).
  • Make TestRun creation issue-aware by threading issueURL and using an issue-derived slug, plus slug-scoped sequence numbering via _mtimes filename listing.
  • Improve observability and coverage: show linked issue summary in TestRun views, add new unit tests for the validation pipeline/test step, and update smoke tests/docs accordingly.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/software-factory/src/issue-loop.ts Switch to createValidator(issueId) and block issues on max-iterations + failing validation with formatted context persisted to the issue description.
packages/software-factory/src/issue-scheduler.ts Extend IssueStore with updateIssue() and implement realm-backed read-modify-write updates for Issue cards.
packages/software-factory/src/validators/validation-pipeline.ts Add modular ValidationPipeline with concurrent step execution and context formatting; provide createDefaultPipeline().
packages/software-factory/src/validators/test-step.ts Add TestValidationStep that discovers .test.gts, runs tests, links TestRun → Issue, and formats rich failure context from the TestRun card.
packages/software-factory/src/validators/noop-step.ts Add NoOpStepRunner placeholder for unimplemented steps.
packages/software-factory/src/test-run-execution.ts Scope TestRun sequence numbering by slug using _mtimes filename listing; include slug in the isolated test page <title>.
packages/software-factory/src/realm-operations.ts Add fetchRealmFilenames() helper backed by _mtimes (reused by validation and test-run sequencing).
packages/software-factory/src/factory-agent-types.ts Add details?: Record<string, unknown> to ValidationStepResult for step-specific structured context.
packages/software-factory/realm/test-results.gts Show linked issue summary in TestRun fitted/embedded rendering when present.
packages/software-factory/realm/tsconfig.json Add a realm-local TS config for typechecking realm sources.
packages/software-factory/tests/validation-pipeline.test.ts New unit tests for pipeline concurrency, error capture, ordering, and formatting.
packages/software-factory/tests/test-step.test.ts New unit tests for TestValidationStep behavior including slug derivation from issue ID.
packages/software-factory/tests/issue-loop.test.ts Update issue-loop tests for createValidator, max-iteration blocking behavior, and new factory-call ordering test.
packages/software-factory/tests/issue-scheduler.test.ts Update mock IssueStore to implement updateIssue() required by the scheduler interface.
packages/software-factory/tests/factory-test-realm.test.ts Extend fetch mock to support _mtimes endpoint used by sequence-number logic.
packages/software-factory/tests/factory-target-realm.spec.ts Update E2E expectations to new “seedIssue” summary shape and validate seed issue exists/fields.
packages/software-factory/tests/index.ts Include the new/updated test modules in the test entrypoint.
packages/software-factory/scripts/smoke-tests/issue-loop-smoke.ts Update smoke tests to use createValidator; add a scenario exercising ValidationPipeline integration with NoOp steps.
packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts Update smoke test to run the full validation pipeline instead of directly calling test execution, and validate formatted context output.
packages/software-factory/docs/phase-2-plan.md Update docs to reflect the implemented scheduler semantics and the modular validation pipeline architecture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

habdelra and others added 3 commits April 10, 2026 14:04
Replace fetchRealmFilenames (O(all files)) with searchRealm filtered to
TestRun cards. Filter by card ID matching the slug prefix to get per-issue
sequence numbers. More targeted — only fetches TestRun cards, not the
entire realm file listing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test expected `summary.seedIssue.seedIssueId` but the actual
entrypoint output uses `summary.bootstrap.activeIssue.id`. Updated
assertions to match the real FactoryEntrypointSummary structure:
- activeIssue.id = 'Issues/sticky-note-define-core' (derived from brief title)
- issueType = 'implementation' (not 'bootstrap')
- status = 'in_progress' (bootstrap patches the first issue)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lementation'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team April 10, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants