Surface skipped/todo QUnit tests in TestRun results (CS-10651) by habdelra · Pull Request #4380 · cardstack/boxel

habdelra · 2026-04-10T17:23:11Z

Summary

Adds skipped status to TestResultStatusField enum and TestResultEntryData type so QUnit skipped/todo tests are no longer silently mapped to passed
Updates parseQunitResults() to map QUnit skipped/todo → skipped, and treats all-skipped runs as failed (nothing actually verified)
Surfaces skipped tests in the TestRun card UI with a → icon, muted italic styling, dedicated "Skipped" section in isolated view, and skippedCount in summaries
Updates formatTestResultSummary, validation pipeline context, and prompt loader to include skipped counts for agent visibility
Updates agent SKILL.md to explicitly prohibit QUnit.skip() / QUnit.todo() usage

Dependencies

Merge #4375 first — this branch is based on cs-10675-validation-pipeline and will conflict without it.

Test plan

All 454 existing node tests pass (no regressions)
New test: skipped/todo QUnit tests are mapped to skipped status
New test: all-skipped run is treated as failed
New test: formatTestResultSummary includes skipped count when present
New test: skipped count omitted when zero
ESLint passes on all changed files

Closes CS-10651

🤖 Generated with Claude Code

Integrate findings from CS-10672 implementation directly into the relevant plan sections rather than appending notes: - Selection algorithm uses backlog (not ready) to match darkfactory.gts - Priority enum includes critical - Issue lifecycle uses backlog → in_progress → done/blocked/review - Exhausted issues tracked via exclusion set to prevent re-picking - RealmIssueStore uses absolute darkfactory module URL (env-sensitive) - Boxel linksToMany uses dotted-key format for blockedBy relationships - Loop outcome disambiguation table for edge cases - Migration path references actual file organization and CS-10708 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a modular validation pipeline that runs after every inner-loop agent turn. The pipeline runs all steps concurrently via Promise.allSettled() and aggregates results. Each step is a separate module implementing the ValidationStepRunner interface — adding a new step is one file plus one line in createDefaultPipeline(). Key changes: - ValidationPipeline class implementing the Validator interface with concurrent step execution and per-step exception capture - TestValidationStep wrapping executeTestRunFromRealm() with detailed failure data read back from the completed TestRun card - NoOpStepRunner placeholders for parse, lint, evaluate, instantiate (child tickets CS-10713 through CS-10716) - Step-specific failure shapes via details field on ValidationStepResult - formatForContext() on each step for LLM-friendly failure output - Max iterations + failing validation now blocks the issue with the reason and failure context in the issue description - Extract fetchRealmFilenames() from pullRealmFiles() for reuse - Optional updateIssue() on IssueStore for max-iteration blocking - Updated phase-2-plan.md with validation architecture documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…es is standalone pullRealmFiles is dead code (only used in its own tests) and doesn't need to be refactored. fetchRealmFilenames stands on its own for the test validation step's file discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ersistence warning - Fix test-step error mapping: fall back to handle.errorMessage when details.failures is empty (infrastructure errors with readable TestRun) - Remove incorrect "no-op" log suffix in smoke test - Improve createDefaultPipeline test to actually verify 5 steps in order - Add warning log when IssueStore lacks updateIssue for max-iteration blocking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… lint - Make IssueStore.updateIssue required (not optional) - Implement updateIssue on RealmIssueStore: read card, mutate status/description, write back to realm - Update all MockIssueStore implementations to include updateIssue - Fix prettier formatting and non-null assertion lint errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update orchestrator pseudocode to reflect max-iterations→blocked behavior - Document IssueStore.updateIssue and orchestrator realm writes - Update Issue Lifecycle section with max-iteration blocking - Clean up --jwt reference in boxel-cli auth section - Fix NoOpStepRunner to declare parameter names (TS strict mode) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The import was ../setup-logger (resolving to scripts/setup-logger which doesn't exist). Fixed to ../../src/setup-logger matching all other smoke test scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The validation pipeline's TestValidationStep already exercises executeTestRunFromRealm internally, so the separate direct call was redundant. The smoke test now goes straight from writing test files to running the ValidationPipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…er interface Add missing _targetRealmUrl parameter to run() and _result parameter to formatForContext() on MockStepRunner and ThrowingStepRunner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move exitReason = 'blocked' inside the try block so it's only set after issueStore.updateIssue succeeds. If the realm write fails, exitReason stays max_iterations to accurately reflect that the issue wasn't actually blocked in the realm. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… them as passed QUnit skipped and todo tests were previously mapped to 'passed', creating a blind spot where the agent could write QUnit.skip() tests and see "all passed". Now they are mapped to a new 'skipped' status that is: - Terminal (not pending) so resume/isComplete logic still works - Visually distinct in the TestRun card UI (→ icon, muted italic styling) - Surfaced in counts (skippedCount on TestModuleResult, TestRun, TestResult) - Treated as failure when ALL tests are skipped (nothing actually verified) - Included in formatTestResultSummary and validation context for the agent - Documented in SKILL.md to tell the agent not to use QUnit.skip()/todo() Closes CS-10651 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main refactored the isolated template to use module-group rendering and moved statusIcon/isComplete/totalCount to class-level getters. Resolved by keeping main's structure and adding the skipped case to statusIcon, skipped count display in module-group headers, and status-skipped CSS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR ensures QUnit skipped/todo tests are no longer silently treated as passing by extending the test-result model/UI to surface skipped outcomes, and by adding a modular validation pipeline (with a real test step) that can feed richer validation context back into the issue loop and agent prompts.

Changes:

Add skipped to test result status modeling, parse QUnit skipped/todo as skipped, and treat “all skipped” runs as failed.
Introduce a modular ValidationPipeline with step runners (NoOp placeholders + implemented TestValidationStep) and improve issue-loop behavior when max iterations are reached with failing validation (auto-block + write context).
Surface skipped counts and display affordances in TestRun UI, agent prompt context, smoke tests, and documentation.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
packages/software-factory/tests/validation-pipeline.test.ts	Adds unit coverage for ValidationPipeline concurrency/error capture/formatting.
packages/software-factory/tests/test-step.test.ts	Adds unit coverage for TestValidationStep execution + context formatting.
packages/software-factory/tests/issue-scheduler.test.ts	Updates mock store to satisfy new IssueStore contract.
packages/software-factory/tests/issue-loop.test.ts	Updates max-iteration behavior expectations + verifies issue blocking update.
packages/software-factory/tests/index.ts	Registers new test modules.
packages/software-factory/tests/factory-test-realm.test.ts	Adds coverage for skipped/todo mapping and summary formatting.
packages/software-factory/src/validators/validation-pipeline.ts	Implements modular validation pipeline with `Promise.allSettled` aggregation + context formatting.
packages/software-factory/src/validators/test-step.ts	Implements test validation step: discover tests, run QUnit via Playwright, read TestRun card, format details.
packages/software-factory/src/validators/noop-step.ts	Adds placeholder step runner for unimplemented validation steps.
packages/software-factory/src/test-run-types.ts	Extends test run/result types to support skipped counts and statuses.
packages/software-factory/src/test-run-parsing.ts	Maps QUnit skipped/todo → skipped, computes skippedCount, and fails all-skipped runs; updates summary formatting.
packages/software-factory/src/realm-operations.ts	Adds `fetchRealmFilenames()` helper via `_mtimes` for test discovery.
packages/software-factory/src/issue-scheduler.ts	Extends IssueStore with `updateIssue()` and implements read-modify-write updates in RealmIssueStore.
packages/software-factory/src/issue-loop.ts	Adds optional `formatForContext()` to Validator, re-exports validation pipeline, and blocks issues on max-iterations + failing validation.
packages/software-factory/src/factory-prompt-loader.ts	Includes skippedCount in prompt context test result payload.
packages/software-factory/src/factory-agent-types.ts	Extends TestResult and ValidationStepResult to carry skippedCount and structured `details`.
packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts	Updates smoke test to exercise ValidationPipeline + formatted context output.
packages/software-factory/scripts/smoke-tests/issue-loop-smoke.ts	Updates max-iteration scenario expectation and adds pipeline integration scenario.
packages/software-factory/realm/tsconfig.json	Adds TS config for realm sources.
packages/software-factory/realm/test-results.gts	Adds skipped status option, skipped icon/styling, and skippedCount rollups in TestRun/TestModuleResult UI.
packages/software-factory/docs/phase-2-plan.md	Updates architecture docs for validation pipeline + max-iteration blocking semantics.
packages/software-factory/.agents/skills/software-factory-operations/SKILL.md	Prohibits `QUnit.skip()`/`QUnit.todo()` usage; documents how skipped is treated.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/software-factory/src/validators/test-step.ts

packages/software-factory/tests/test-step.test.ts

Adds skippedCount: 0 to existing formatForContext test details fixtures and adds two new tests verifying that skipped counts are included in both passing and failing formatForContext output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra and others added 14 commits April 9, 2026 18:24

glint support for software factory realm gts

97cd9f2

Fix setup-logger import path in smoke-test-realm.ts

ef78431

The import was ../setup-logger (resolving to scripts/setup-logger which doesn't exist). Fixed to ../../src/setup-logger matching all other smoke test scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Set updatedAt timestamp when updating issue via RealmIssueStore

8c073b7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

habdelra requested a review from Copilot April 10, 2026 17:36

Copilot started reviewing on behalf of habdelra April 10, 2026 17:36 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

packages/software-factory/src/validators/test-step.ts Show resolved Hide resolved

packages/software-factory/tests/test-step.test.ts Show resolved Hide resolved

packages/software-factory/tests/test-step.test.ts Show resolved Hide resolved

habdelra requested a review from a team April 10, 2026 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface skipped/todo QUnit tests in TestRun results (CS-10651)#4380

Surface skipped/todo QUnit tests in TestRun results (CS-10651)#4380
habdelra wants to merge 15 commits intomainfrom
cs-10651-surface-skippedtodo-qunit-tests-in-testrun-results-instead

habdelra commented Apr 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

habdelra commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Dependencies

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

habdelra commented Apr 10, 2026 •

edited

Loading