Skip to content

Surface skipped/todo QUnit tests in TestRun results (CS-10651)#4380

Open
habdelra wants to merge 15 commits intomainfrom
cs-10651-surface-skippedtodo-qunit-tests-in-testrun-results-instead
Open

Surface skipped/todo QUnit tests in TestRun results (CS-10651)#4380
habdelra wants to merge 15 commits intomainfrom
cs-10651-surface-skippedtodo-qunit-tests-in-testrun-results-instead

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented Apr 10, 2026

Summary

  • Adds skipped status to TestResultStatusField enum and TestResultEntryData type so QUnit skipped/todo tests are no longer silently mapped to passed
  • Updates parseQunitResults() to map QUnit skipped/todoskipped, and treats all-skipped runs as failed (nothing actually verified)
  • Surfaces skipped tests in the TestRun card UI with a icon, muted italic styling, dedicated "Skipped" section in isolated view, and skippedCount in summaries
  • Updates formatTestResultSummary, validation pipeline context, and prompt loader to include skipped counts for agent visibility
  • Updates agent SKILL.md to explicitly prohibit QUnit.skip() / QUnit.todo() usage

Dependencies

Merge #4375 first — this branch is based on cs-10675-validation-pipeline and will conflict without it.

image

Test plan

  • All 454 existing node tests pass (no regressions)
  • New test: skipped/todo QUnit tests are mapped to skipped status
  • New test: all-skipped run is treated as failed
  • New test: formatTestResultSummary includes skipped count when present
  • New test: skipped count omitted when zero
  • ESLint passes on all changed files

Closes CS-10651

🤖 Generated with Claude Code

habdelra and others added 14 commits April 9, 2026 18:24
Integrate findings from CS-10672 implementation directly into the
relevant plan sections rather than appending notes:

- Selection algorithm uses backlog (not ready) to match darkfactory.gts
- Priority enum includes critical
- Issue lifecycle uses backlog → in_progress → done/blocked/review
- Exhausted issues tracked via exclusion set to prevent re-picking
- RealmIssueStore uses absolute darkfactory module URL (env-sensitive)
- Boxel linksToMany uses dotted-key format for blockedBy relationships
- Loop outcome disambiguation table for edge cases
- Migration path references actual file organization and CS-10708

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a modular validation pipeline that runs after every inner-loop agent
turn. The pipeline runs all steps concurrently via Promise.allSettled()
and aggregates results. Each step is a separate module implementing the
ValidationStepRunner interface — adding a new step is one file plus one
line in createDefaultPipeline().

Key changes:
- ValidationPipeline class implementing the Validator interface with
  concurrent step execution and per-step exception capture
- TestValidationStep wrapping executeTestRunFromRealm() with detailed
  failure data read back from the completed TestRun card
- NoOpStepRunner placeholders for parse, lint, evaluate, instantiate
  (child tickets CS-10713 through CS-10716)
- Step-specific failure shapes via details field on ValidationStepResult
- formatForContext() on each step for LLM-friendly failure output
- Max iterations + failing validation now blocks the issue with the
  reason and failure context in the issue description
- Extract fetchRealmFilenames() from pullRealmFiles() for reuse
- Optional updateIssue() on IssueStore for max-iteration blocking
- Updated phase-2-plan.md with validation architecture documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…es is standalone

pullRealmFiles is dead code (only used in its own tests) and doesn't
need to be refactored. fetchRealmFilenames stands on its own for the
test validation step's file discovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ersistence warning

- Fix test-step error mapping: fall back to handle.errorMessage when
  details.failures is empty (infrastructure errors with readable TestRun)
- Remove incorrect "no-op" log suffix in smoke test
- Improve createDefaultPipeline test to actually verify 5 steps in order
- Add warning log when IssueStore lacks updateIssue for max-iteration blocking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… lint

- Make IssueStore.updateIssue required (not optional)
- Implement updateIssue on RealmIssueStore: read card, mutate
  status/description, write back to realm
- Update all MockIssueStore implementations to include updateIssue
- Fix prettier formatting and non-null assertion lint errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update orchestrator pseudocode to reflect max-iterations→blocked behavior
- Document IssueStore.updateIssue and orchestrator realm writes
- Update Issue Lifecycle section with max-iteration blocking
- Clean up --jwt reference in boxel-cli auth section
- Fix NoOpStepRunner to declare parameter names (TS strict mode)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The import was ../setup-logger (resolving to scripts/setup-logger which
doesn't exist). Fixed to ../../src/setup-logger matching all other
smoke test scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The validation pipeline's TestValidationStep already exercises
executeTestRunFromRealm internally, so the separate direct call
was redundant. The smoke test now goes straight from writing test
files to running the ValidationPipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er interface

Add missing _targetRealmUrl parameter to run() and _result parameter
to formatForContext() on MockStepRunner and ThrowingStepRunner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move exitReason = 'blocked' inside the try block so it's only set
after issueStore.updateIssue succeeds. If the realm write fails,
exitReason stays max_iterations to accurately reflect that the
issue wasn't actually blocked in the realm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… them as passed

QUnit skipped and todo tests were previously mapped to 'passed', creating a
blind spot where the agent could write QUnit.skip() tests and see "all passed".
Now they are mapped to a new 'skipped' status that is:
- Terminal (not pending) so resume/isComplete logic still works
- Visually distinct in the TestRun card UI (→ icon, muted italic styling)
- Surfaced in counts (skippedCount on TestModuleResult, TestRun, TestResult)
- Treated as failure when ALL tests are skipped (nothing actually verified)
- Included in formatTestResultSummary and validation context for the agent
- Documented in SKILL.md to tell the agent not to use QUnit.skip()/todo()

Closes CS-10651

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main refactored the isolated template to use module-group rendering and
moved statusIcon/isComplete/totalCount to class-level getters. Resolved
by keeping main's structure and adding the skipped case to statusIcon,
skipped count display in module-group headers, and status-skipped CSS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ensures QUnit skipped/todo tests are no longer silently treated as passing by extending the test-result model/UI to surface skipped outcomes, and by adding a modular validation pipeline (with a real test step) that can feed richer validation context back into the issue loop and agent prompts.

Changes:

  • Add skipped to test result status modeling, parse QUnit skipped/todo as skipped, and treat “all skipped” runs as failed.
  • Introduce a modular ValidationPipeline with step runners (NoOp placeholders + implemented TestValidationStep) and improve issue-loop behavior when max iterations are reached with failing validation (auto-block + write context).
  • Surface skipped counts and display affordances in TestRun UI, agent prompt context, smoke tests, and documentation.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/software-factory/tests/validation-pipeline.test.ts Adds unit coverage for ValidationPipeline concurrency/error capture/formatting.
packages/software-factory/tests/test-step.test.ts Adds unit coverage for TestValidationStep execution + context formatting.
packages/software-factory/tests/issue-scheduler.test.ts Updates mock store to satisfy new IssueStore contract.
packages/software-factory/tests/issue-loop.test.ts Updates max-iteration behavior expectations + verifies issue blocking update.
packages/software-factory/tests/index.ts Registers new test modules.
packages/software-factory/tests/factory-test-realm.test.ts Adds coverage for skipped/todo mapping and summary formatting.
packages/software-factory/src/validators/validation-pipeline.ts Implements modular validation pipeline with Promise.allSettled aggregation + context formatting.
packages/software-factory/src/validators/test-step.ts Implements test validation step: discover tests, run QUnit via Playwright, read TestRun card, format details.
packages/software-factory/src/validators/noop-step.ts Adds placeholder step runner for unimplemented validation steps.
packages/software-factory/src/test-run-types.ts Extends test run/result types to support skipped counts and statuses.
packages/software-factory/src/test-run-parsing.ts Maps QUnit skipped/todo → skipped, computes skippedCount, and fails all-skipped runs; updates summary formatting.
packages/software-factory/src/realm-operations.ts Adds fetchRealmFilenames() helper via _mtimes for test discovery.
packages/software-factory/src/issue-scheduler.ts Extends IssueStore with updateIssue() and implements read-modify-write updates in RealmIssueStore.
packages/software-factory/src/issue-loop.ts Adds optional formatForContext() to Validator, re-exports validation pipeline, and blocks issues on max-iterations + failing validation.
packages/software-factory/src/factory-prompt-loader.ts Includes skippedCount in prompt context test result payload.
packages/software-factory/src/factory-agent-types.ts Extends TestResult and ValidationStepResult to carry skippedCount and structured details.
packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts Updates smoke test to exercise ValidationPipeline + formatted context output.
packages/software-factory/scripts/smoke-tests/issue-loop-smoke.ts Updates max-iteration scenario expectation and adds pipeline integration scenario.
packages/software-factory/realm/tsconfig.json Adds TS config for realm sources.
packages/software-factory/realm/test-results.gts Adds skipped status option, skipped icon/styling, and skippedCount rollups in TestRun/TestModuleResult UI.
packages/software-factory/docs/phase-2-plan.md Updates architecture docs for validation pipeline + max-iteration blocking semantics.
packages/software-factory/.agents/skills/software-factory-operations/SKILL.md Prohibits QUnit.skip()/QUnit.todo() usage; documents how skipped is treated.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Adds skippedCount: 0 to existing formatForContext test details fixtures
and adds two new tests verifying that skipped counts are included in
both passing and failing formatForContext output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team April 10, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants