Skip to content

fix: stabilize flaky CI tests#426

Merged
9seconds merged 3 commits into
9seconds:masterfrom
dolonet:fix/flaky-ci-race-and-bloom
Mar 31, 2026
Merged

fix: stabilize flaky CI tests#426
9seconds merged 3 commits into
9seconds:masterfrom
dolonet:fix/flaky-ci-race-and-bloom

Conversation

@dolonet
Copy link
Copy Markdown
Collaborator

@dolonet dolonet commented Mar 30, 2026

Fixes #425

Summary

  • Data race in ScoutConnCollected: add sync.Mutex to protect Add()/MarkWrite() from concurrent access by HTTP transport readLoop. Add Snapshot() method so learn() reads a consistent copy instead of iterating live slices.

  • Bloom filter false negatives: increase test filter size from 500 to 100,000. With 500 bytes the random eviction has ~1.7% chance per assertion of destroying a previously inserted key — matches the observed ~1-in-8 CI failure rate. Larger filter keeps the same test logic without the noise.

  • nil-pointer panic in TestHTTPSRequest: replace suite.NoError(err) with suite.Require().NoError(err) so the test stops immediately on connection error instead of dereferencing nil resp.

Test plan

  • All tests pass with -race -count=1
  • Bloom filter test passes with -count=2

dolonet added 2 commits March 30, 2026 14:50
1. Add sync.Mutex to ScoutConnCollected to eliminate data race between
   Add()/MarkWrite() in readLoop and learn() iterating results.
   Introduce Snapshot() for safe read access.

2. Increase bloom filter test size from 500 to 100000 to prevent
   false negatives from random eviction in the stable bloom filter.

3. Use Require().NoError() in TestHTTPSRequest to prevent nil-pointer
   panic on resp.Body.Close() when the request fails.

Fixes 9seconds#425
- Move error check before Snapshot() to avoid unnecessary allocation
- Update existing tests to use Snapshot() instead of direct field access
- Add TestConcurrentAddSnapshot to explicitly exercise the mutex
@dolonet dolonet force-pushed the fix/flaky-ci-race-and-bloom branch from c63a0d1 to 73c6a3a Compare March 30, 2026 12:05
Copy link
Copy Markdown
Owner

@9seconds 9seconds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Все хорошо, я оставил пару совсем минорных комментариев. Но если что, вмержу через пару часов и так

Comment thread mtglib/proxy_test.go

resp, err := client.Get(addr) //nolint: noctx
suite.NoError(err)
suite.Require().NoError(err)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

О, оказывается такое в testify завезли

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Да, удобная штука — сразу останавливает тест вместо продолжения с nil.


for i := 0; i < 1000; i++ {
data, _ := collected.Snapshot()
_ = len(data)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А зачем эта операция?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Была идея заставить горутину обратиться к данным слайса, чтобы race detector зафиксировал конкурентный доступ. Но по факту сам вызов Snapshot() уже достаточен — убрал, оставил комментарий.

func (s *ScoutConnCollected) Snapshot() ([]ScoutConnResult, int) {
s.mu.Lock()
snapshot := make([]ScoutConnResult, len(s.data))
copy(snapshot, s.data)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Можно, кстати, и slices.Clone использовать.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Поправил, спасибо.

- Replace manual make+copy with slices.Clone in Snapshot()
- Remove redundant _ = len(data); Snapshot() call alone is
  sufficient to exercise the lock under -race
@9seconds 9seconds merged commit 8993063 into 9seconds:master Mar 31, 2026
5 of 6 checks passed
@dolonet dolonet deleted the fix/flaky-ci-race-and-bloom branch April 1, 2026 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky CI: data race в ScoutConnCollected + false negative в bloom filter

2 participants