Surface skipped/todo QUnit tests in TestRun results (CS-10651)#4380
Open
Surface skipped/todo QUnit tests in TestRun results (CS-10651)#4380
Conversation
Integrate findings from CS-10672 implementation directly into the relevant plan sections rather than appending notes: - Selection algorithm uses backlog (not ready) to match darkfactory.gts - Priority enum includes critical - Issue lifecycle uses backlog → in_progress → done/blocked/review - Exhausted issues tracked via exclusion set to prevent re-picking - RealmIssueStore uses absolute darkfactory module URL (env-sensitive) - Boxel linksToMany uses dotted-key format for blockedBy relationships - Loop outcome disambiguation table for edge cases - Migration path references actual file organization and CS-10708 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a modular validation pipeline that runs after every inner-loop agent turn. The pipeline runs all steps concurrently via Promise.allSettled() and aggregates results. Each step is a separate module implementing the ValidationStepRunner interface — adding a new step is one file plus one line in createDefaultPipeline(). Key changes: - ValidationPipeline class implementing the Validator interface with concurrent step execution and per-step exception capture - TestValidationStep wrapping executeTestRunFromRealm() with detailed failure data read back from the completed TestRun card - NoOpStepRunner placeholders for parse, lint, evaluate, instantiate (child tickets CS-10713 through CS-10716) - Step-specific failure shapes via details field on ValidationStepResult - formatForContext() on each step for LLM-friendly failure output - Max iterations + failing validation now blocks the issue with the reason and failure context in the issue description - Extract fetchRealmFilenames() from pullRealmFiles() for reuse - Optional updateIssue() on IssueStore for max-iteration blocking - Updated phase-2-plan.md with validation architecture documentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…es is standalone pullRealmFiles is dead code (only used in its own tests) and doesn't need to be refactored. fetchRealmFilenames stands on its own for the test validation step's file discovery. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ersistence warning - Fix test-step error mapping: fall back to handle.errorMessage when details.failures is empty (infrastructure errors with readable TestRun) - Remove incorrect "no-op" log suffix in smoke test - Improve createDefaultPipeline test to actually verify 5 steps in order - Add warning log when IssueStore lacks updateIssue for max-iteration blocking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… lint - Make IssueStore.updateIssue required (not optional) - Implement updateIssue on RealmIssueStore: read card, mutate status/description, write back to realm - Update all MockIssueStore implementations to include updateIssue - Fix prettier formatting and non-null assertion lint errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update orchestrator pseudocode to reflect max-iterations→blocked behavior - Document IssueStore.updateIssue and orchestrator realm writes - Update Issue Lifecycle section with max-iteration blocking - Clean up --jwt reference in boxel-cli auth section - Fix NoOpStepRunner to declare parameter names (TS strict mode) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The import was ../setup-logger (resolving to scripts/setup-logger which doesn't exist). Fixed to ../../src/setup-logger matching all other smoke test scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The validation pipeline's TestValidationStep already exercises executeTestRunFromRealm internally, so the separate direct call was redundant. The smoke test now goes straight from writing test files to running the ValidationPipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er interface Add missing _targetRealmUrl parameter to run() and _result parameter to formatForContext() on MockStepRunner and ThrowingStepRunner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move exitReason = 'blocked' inside the try block so it's only set after issueStore.updateIssue succeeds. If the realm write fails, exitReason stays max_iterations to accurately reflect that the issue wasn't actually blocked in the realm. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… them as passed QUnit skipped and todo tests were previously mapped to 'passed', creating a blind spot where the agent could write QUnit.skip() tests and see "all passed". Now they are mapped to a new 'skipped' status that is: - Terminal (not pending) so resume/isComplete logic still works - Visually distinct in the TestRun card UI (→ icon, muted italic styling) - Surfaced in counts (skippedCount on TestModuleResult, TestRun, TestResult) - Treated as failure when ALL tests are skipped (nothing actually verified) - Included in formatTestResultSummary and validation context for the agent - Documented in SKILL.md to tell the agent not to use QUnit.skip()/todo() Closes CS-10651 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main refactored the isolated template to use module-group rendering and moved statusIcon/isComplete/totalCount to class-level getters. Resolved by keeping main's structure and adding the skipped case to statusIcon, skipped count display in module-group headers, and status-skipped CSS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR ensures QUnit skipped/todo tests are no longer silently treated as passing by extending the test-result model/UI to surface skipped outcomes, and by adding a modular validation pipeline (with a real test step) that can feed richer validation context back into the issue loop and agent prompts.
Changes:
- Add
skippedto test result status modeling, parse QUnitskipped/todoasskipped, and treat “all skipped” runs asfailed. - Introduce a modular
ValidationPipelinewith step runners (NoOp placeholders + implementedTestValidationStep) and improve issue-loop behavior when max iterations are reached with failing validation (auto-block + write context). - Surface skipped counts and display affordances in TestRun UI, agent prompt context, smoke tests, and documentation.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/software-factory/tests/validation-pipeline.test.ts | Adds unit coverage for ValidationPipeline concurrency/error capture/formatting. |
| packages/software-factory/tests/test-step.test.ts | Adds unit coverage for TestValidationStep execution + context formatting. |
| packages/software-factory/tests/issue-scheduler.test.ts | Updates mock store to satisfy new IssueStore contract. |
| packages/software-factory/tests/issue-loop.test.ts | Updates max-iteration behavior expectations + verifies issue blocking update. |
| packages/software-factory/tests/index.ts | Registers new test modules. |
| packages/software-factory/tests/factory-test-realm.test.ts | Adds coverage for skipped/todo mapping and summary formatting. |
| packages/software-factory/src/validators/validation-pipeline.ts | Implements modular validation pipeline with Promise.allSettled aggregation + context formatting. |
| packages/software-factory/src/validators/test-step.ts | Implements test validation step: discover tests, run QUnit via Playwright, read TestRun card, format details. |
| packages/software-factory/src/validators/noop-step.ts | Adds placeholder step runner for unimplemented validation steps. |
| packages/software-factory/src/test-run-types.ts | Extends test run/result types to support skipped counts and statuses. |
| packages/software-factory/src/test-run-parsing.ts | Maps QUnit skipped/todo → skipped, computes skippedCount, and fails all-skipped runs; updates summary formatting. |
| packages/software-factory/src/realm-operations.ts | Adds fetchRealmFilenames() helper via _mtimes for test discovery. |
| packages/software-factory/src/issue-scheduler.ts | Extends IssueStore with updateIssue() and implements read-modify-write updates in RealmIssueStore. |
| packages/software-factory/src/issue-loop.ts | Adds optional formatForContext() to Validator, re-exports validation pipeline, and blocks issues on max-iterations + failing validation. |
| packages/software-factory/src/factory-prompt-loader.ts | Includes skippedCount in prompt context test result payload. |
| packages/software-factory/src/factory-agent-types.ts | Extends TestResult and ValidationStepResult to carry skippedCount and structured details. |
| packages/software-factory/scripts/smoke-tests/smoke-test-realm.ts | Updates smoke test to exercise ValidationPipeline + formatted context output. |
| packages/software-factory/scripts/smoke-tests/issue-loop-smoke.ts | Updates max-iteration scenario expectation and adds pipeline integration scenario. |
| packages/software-factory/realm/tsconfig.json | Adds TS config for realm sources. |
| packages/software-factory/realm/test-results.gts | Adds skipped status option, skipped icon/styling, and skippedCount rollups in TestRun/TestModuleResult UI. |
| packages/software-factory/docs/phase-2-plan.md | Updates architecture docs for validation pipeline + max-iteration blocking semantics. |
| packages/software-factory/.agents/skills/software-factory-operations/SKILL.md | Prohibits QUnit.skip()/QUnit.todo() usage; documents how skipped is treated. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds skippedCount: 0 to existing formatForContext test details fixtures and adds two new tests verifying that skipped counts are included in both passing and failing formatForContext output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
skippedstatus toTestResultStatusFieldenum andTestResultEntryDatatype so QUnitskipped/todotests are no longer silently mapped topassedparseQunitResults()to map QUnitskipped/todo→skipped, and treats all-skipped runs asfailed(nothing actually verified)→icon, muted italic styling, dedicated "Skipped" section in isolated view, andskippedCountin summariesformatTestResultSummary, validation pipeline context, and prompt loader to include skipped counts for agent visibilityQUnit.skip()/QUnit.todo()usageDependencies
Test plan
skippedstatusfailedformatTestResultSummaryincludes skipped count when presentCloses CS-10651
🤖 Generated with Claude Code