fix(ci): fix testhost hang by draining ProcessSessionManager I/O tasks before coverage finalization#447
Conversation
…overage steps Agent-Logs-Url: https://github.com/JerrettDavis/JD.AI/sessions/6f63b1e3-5636-4131-a267-a52da0b0783c Co-authored-by: JerrettDavis <2610199+JerrettDavis@users.noreply.github.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
Test Results6 616 tests 6 597 ✅ 2m 39s ⏱️ Results for commit 6404a3c. ♻️ This comment has been updated with latest results. |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@copilot ci / pr-checks workflow failed. https://github.com/JerrettDavis/JD.AI/actions/runs/23871170903/job/69603906298?pr=447 |
Agent-Logs-Url: https://github.com/JerrettDavis/JD.AI/sessions/88d47ad4-645d-4fca-86c1-1a34429babd2 Co-authored-by: JerrettDavis <2610199+JerrettDavis@users.noreply.github.com>
|
@copilot fix the conflicts with main on ci.yml |
Investigated the failure. The test step timed out at exactly 15 minutes with zero output — tests actually complete in ~2 min but the testhost hangs afterward, preventing coverlet from writing coverage. Root cause: Fix (commit
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Code Coverage |
Description
The CI "Test with coverage" step timed out at 15 minutes with zero test output. Investigation confirmed tests complete normally in ~2 minutes, but the testhost then hangs indefinitely — preventing coverlet's CLR profiler from writing
coverage.cobertura.xml.Root cause (confirmed):
ProcessSessionManager's background async tasks (PumpOutputAsync,MonitorExitAsync) remain blocked on native I/O (reading child process stdout/stderr pipes) afterDispose()kills the child processes. While these threads are in a native-wait state, coverlet's CLR profiler cannot reach a GC safepoint to finalize and flush coverage data, causing an indefinite deadlock.The earlier
forceExit: trueapproach was incorrect — it exits the process before the coverage writer runs, so coverage is never generated.Changes
src/JD.AI.Core/Tools/ProcessSessionManager.cs— Addprivate readonly ConcurrentQueue<Task> _allExitMonitors. EveryExitMonitortask created inExecAsync()is enqueued, keeping it accessible even afterClear()removes its session from the dictionary. NewWaitForIdleAsync(CancellationToken ct = default)method drains the queue and awaits any still-incomplete tasks with a safety-valve cancellation token.tests/JD.AI.Tests/ProcessSessionManagerTests.cs— ConvertIDisposable→IAsyncLifetime.DisposeAsync()kills all scopes (same logic as before) and then callsawait _manager.WaitForIdleAsync(cts.Token)with a 10-second timeout. This ensures every stdout/stderr pump task has reached a terminal state before xunit reports the test complete, giving coverlet a clean CLR in which to finalize coverage output.tests/JD.AI.Tests/xunit.runner.json— RemoveforceExit: true(incorrect fix that exits before coverage is written)..github/workflows/ci.yml— Merged with main (PR fix: resolve test infrastructure hang and align PR validation CI #448): removes the background polling hack, replaces with standarddotnet test+--blame-hang-timeout 5msafety net, fixes invalidif:conditions, and adds Codecov resilience options. Conflicts with main have been resolved;ci.ymlis now identical to main.Related Issue
Type of Change
Checklist
.editorconfig)dotnet test)dotnet build)Testing
All 23
ProcessSessionManagerTestspass and the process exits cleanly in under 4 seconds — previously this would hang indefinitely waiting for the coverage profiler.WaitForIdleAsyncis validated by the same tests: afterDisposeAsync()completes, no background I/O threads remain.Screenshots / Logs
Previous failure — testhost hung for exactly 15 minutes with zero output:
After fix —
ProcessSessionManagerTests(23 tests) complete and exit cleanly:Original prompt