Skip to content

tests: raise Coverage (Debug) timeout for world allocation shards (fix flaky main CI)#3195

Closed
jslee02 wants to merge 1 commit into
mainfrom
fix/coverage-world-shard-timeout
Closed

tests: raise Coverage (Debug) timeout for world allocation shards (fix flaky main CI)#3195
jslee02 wants to merge 1 commit into
mainfrom
fix/coverage-world-shard-timeout

Conversation

@jslee02

@jslee02 jslee02 commented Jun 26, 2026

Copy link
Copy Markdown
Member

Fixes the flaky Coverage (Debug) failure on main.

Symptom: Coverage (Debug) failed on main (currently red on dcb5c688260/#3189) — CTest exit 8, test_world_raw_malloc Timeout at 3600.13s.

Root cause (not a code regression): test_world_raw_malloc is one of five sharded world allocation gates that already get a coverage-specific TIMEOUT 3600 (the gcov-instrumented build is much slower). It ran ~0.13s over the limit — i.e. it sits right at the boundary and load-flakes. The same Coverage (Debug) job also failed on the preceding commit #3127 (an actions/checkout version bump) and passed on the two before that; neither that bump nor the docs-only #3189 can affect test runtime. So this is a flaky coverage timeout, surfaced by runner load, not anything those commits introduced.

Fix: raise the coverage-only timeout for the world allocation shards from 3600s → 5400s (50% headroom). Coverage-only; no effect on normal CI timing or runtime behavior.

Verified: check-lint-cmake (gersemi) and codespell clean. The PR's own Coverage (Debug) run exercises the new headroom.

@codex review

test_world_raw_malloc timed out at ~3600.1s on the gcov-instrumented
Coverage (Debug) job (a sharded allocation gate that already gets a
coverage-specific TIMEOUT 3600). It sits right at the limit and load-flakes —
the same job also failed on the preceding commit (#3127, an actions/checkout
bump), so this is a flaky coverage timeout, not a code regression. Raise the
world-shard coverage timeout to 5400s for headroom.
@jslee02 jslee02 added this to the DART 7.0 milestone Jun 26, 2026
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

Reviewed commit: d1db96fbc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@jslee02

jslee02 commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

Closing in favor of #3196, which fixes this at the root instead of raising the timeout.

The timeout bump here was a band-aid. The real cause is that BakedBasicDeformableRowsDoNotMallocOnHeap bundled ~13 deformable scenes into one atomic gtest test, one of which (a 17×17 matrix-free self-contact production grid) runs ~3429s under the Debug+gcov raw-malloc interposer and alone consumed the shard budget. #3196 splits the monolith into focused gates and right-sizes that one scene to 8×8 under DART_CODECOV (the property is grid-size independent; full 17×17 stays in normal CI), taking the shard from ~3600s to 358s locally.

@jslee02 jslee02 closed this Jun 26, 2026
@jslee02 jslee02 deleted the fix/coverage-world-shard-timeout branch June 26, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant