Skip to content

[Filestore] Eliminate race in test TFileSystemTest::ShouldFlushAutomaticallyWithServerWriteBackCacheEnabled#5720

Merged
e673 merged 4 commits intomainfrom
users/nasonov/wbc-auto-flush-race-fix
Apr 15, 2026
Merged

[Filestore] Eliminate race in test TFileSystemTest::ShouldFlushAutomaticallyWithServerWriteBackCacheEnabled#5720
e673 merged 4 commits intomainfrom
users/nasonov/wbc-auto-flush-race-fix

Conversation

@e673
Copy link
Copy Markdown
Collaborator

@e673 e673 commented Apr 13, 2026

Notes

The test hangs if TBootstrap is destroyed at the moment WriteBackCache is processing WriteData response in the scheduler thread.

Change the logic: the scheduler thread should be destroyed synchronously instead of Loop->StopAsync() callback.

2026-04-11T01:23:16.490315Z :NFS_FUSE DEBUG: : unique: 2, opcode: WRITE (16), nodeid: 123, insize: 80, pid: 0
2026-04-11T01:23:16.490328Z :NFS_FUSE TRACE: cloud/filestore/libs/diagnostics/request_stats.cpp:99: WriteData #2 [f:fs1] REQUEST
2026-04-11T01:23:16.490333Z :NFS_FUSE DEBUG: cloud/filestore/libs/vfs_fuse/fs_impl_data.cpp:691: WriteBuf #123 @456 offset:0 size:4096
2026-04-11T01:23:16.490386Z :NFS_FUSE TRACE: cloud/filestore/libs/vfs_fuse/fs_impl.cpp:205: invalidating node: 123, version: 2
2026-04-11T01:23:16.490392Z :NFS_FUSE DEBUG: :    unique: 2, success, outsize: 24
2026-04-11T01:23:16.490394Z :NFS_VHOST DEBUG: : virtio_send_msg:570: response with 2 desc of length 24
2026-04-11T01:23:16.490425Z :NFS_FUSE TRACE: cloud/filestore/libs/diagnostics/request_stats.cpp:138: WriteData #2 [f:fs1][c:] RESPONSE request completed(total_time: 77us, execution_time: 77us, predicted_postponed_time: 0, postponed_time: 0, backoff_time: 0, size: 4.00 KiB, error: S_OK)
2026-04-11T01:23:16.490433Z :NFS_VHOST DEBUG: : virtq_push:676: /home/nbsci/.ya/build/build_root/o1kt/000372/r3tmp/vhost.socket_385352015545342952[0]: head = 3
2026-04-11T01:23:16.506419Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:225: [f:fs1] StopAsync: completing left: 0, requests left: 0, fuse cancellation code: 0
2026-04-11T01:23:16.506545Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:1297: [f:"fs1"][c:""] WriteBackCache is not empty, starting FlushAllData
CRITICAL_EVENT:AppCriticalEvents/WriteBackCacheDataLossError:[f:"fs1"][c:""] WriteBackCache was not emptied after successful FlushAllData at DestroySession, possible data loss
2026-04-11T01:23:16.512986Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:1338: [f:"fs1"][c:""] completed FlushAllData
2026-04-11T01:23:16.512994Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:652: stopping FUSE loop
2026-04-11T01:23:16.513000Z :NFS_VHOST INFO: : virtio_session_exit:517: unregister device /home/nbsci/.ya/build/build_root/o1kt/000372/r3tmp/vhost.socket_385352015545342952
2026-04-11T01:23:16.513054Z :NFS_VHOST INFO: : vdev_disconnect:2087: /home/nbsci/.ya/build/build_root/o1kt/000372/r3tmp/vhost.socket_385352015545342952: Close connection with client, sock = 12
2026-04-11T01:23:16.513102Z :NFS_VHOST INFO: : vring_mark_stopped:287: /home/nbsci/.ya/build/build_root/o1kt/000372/r3tmp/vhost.socket_385352015545342952[0]: stopped vring with 0 in-flight requests
2026-04-11T01:23:16.513108Z :NFS_VHOST INFO: : vring_mark_stopped:287: /home/nbsci/.ya/build/build_root/o1kt/000372/r3tmp/vhost.socket_385352015545342952[1]: stopped vring with 0 in-flight requests
2026-04-11T01:23:16.513169Z :NFS_VHOST INFO: : unregister_complete:332: stopping device /home/nbsci/.ya/build/build_root/o1kt/000372/r3tmp/vhost.socket_385352015545342952
2026-04-11T01:23:16.513177Z :NFS_VHOST INFO: : unregister_complete:336: finished stopping device /home/nbsci/.ya/build/build_root/o1kt/000372/r3tmp/vhost.socket_385352015545342952
2026-04-11T01:23:16.513181Z :NFS_VHOST INFO: : virtio_session_exit:526: finished unregister device
2026-04-11T01:23:16.513192Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:582: stopping FUSE loop thread 6.0
2026-04-11T01:23:16.513224Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:589: stopped FUSE loop thread 6.0
2026-04-11T01:23:16.513226Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:664: stopped FUSE loop
2026-04-11T01:23:16.513228Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:676: unmounting FUSE session
2026-04-11T01:23:16.513232Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:1428: [f:"fs1"][c:""] got destroy request
2026-04-11T01:23:16.513235Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:1438: [f:"fs1"][c:""][l:0] resetting session state
2026-04-11T01:23:16.513258Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/loop.cpp:1458: [f:"fs1"][c:""] session reset completed: S_OK
2026-04-11T01:23:16.513262Z :NFS_FUSE INFO: cloud/filestore/libs/vfs_fuse/fs_impl.cpp:94: resetting filesystem cache
2026-04-11T01:23:16.513265Z :NFS_FUSE DEBUG: cloud/filestore/libs/vfs_fuse/fs_impl.cpp:97: clear directory cache of size 0
2026-04-11T01:23:16.513269Z :NFS_VHOST INFO: : virtio_session_close:496: destroying device /home/nbsci/.ya/build/build_root/o1kt/000372/r3tmp/vhost.socket_385352015545342952
2026-04-11T01:23:16.513286Z :NFS_VHOST INFO: : virtio_session_close:505: finished destroying device
2026-04-11T01:23:16.513321Z :NFS_FUSE TRACE: cloud/filestore/libs/diagnostics/request_stats.cpp:99: DestroySession #4495979746816372457 [f:fs1] REQUEST
2026-04-11T01:23:16.513329Z :NFS_CLIENT INFO: cloud/filestore/libs/client/session.cpp:557: [f:""][c:""][s:"f394537c-494e7fab-b0371e37-270f78e8"][n:0] destroying session
2026-04-11T01:23:16.513336Z :NFS_CLIENT INFO: cloud/filestore/libs/client/session.cpp:580: [f:""][c:""][s:"f394537c-494e7fab-b0371e37-270f78e8"][n:0] session destroyed
2026-04-11T01:23:16.513352Z :NFS_FUSE TRACE: cloud/filestore/libs/diagnostics/request_stats.cpp:138: DestroySession #4495979746816372457 [f:fs1][c:] RESPONSE request completed(total_time: 25us, execution_time: 25us, predicted_postponed_time: 0, postponed_time: 0, backoff_time: 0, size: 0 B, error: S_OK)
Uncaught exception: (TSystemError) (Error 35: Resource deadlock avoided) util/system/thread.cpp:198: can not join thread
??+0 (0x16E6421)
std::terminate()+41 (0x1886609)
??+0 (0x16E019E)
NCloud::NFileStore::NFuse::NWriteBackCache::TWriteBackCacheState::FlushSucceeded(unsigned long, unsigned long)+357 (0x374D6A5)
NCloud::NFileStore::NFuse::TWriteBackCache::TImpl::CompleteFlush(std::__y1::shared_ptr<NCloud::NFileStore::NFuse::NWriteBackCache::TNodeFlushState>)+57 (0x374A7B9)
auto NCloud::NFileStore::NFuse::TWriteBackCache::TImpl::ExecuteFlush(std::__y1::shared_ptr<NCloud::NFileStore::NFuse::NWriteBackCache::TNodeFlushState>)::'lambda'(auto const&)::operator()<NThreading::TFuture<NCloud::NFileStore::NProto::TWriteDataResponse>>(auto const&)+148 (0x374A174)
NCloud::NFileStore::NFuse::TWriteBackCache::TImpl::ExecuteFlush(std::__y1::shared_ptr<NCloud::NFileStore::NFuse::NWriteBackCache::TNodeFlushState>)+368 (0x37497F0)
NCloud::NFileStore::NFuse::TWriteBackCache::TImpl::ScheduleFlushNode(unsigned long)+193 (0x3748261)
NCloud::NFileStore::NFuse::NWriteBackCache::TQueuedOperations::Release()+129 (0x3754321)
NCloud::NFileStore::NFuse::NWriteBackCache::TWriteBackCacheState::TriggerPeriodicFlushAll()+51 (0x374CC63)
NCloud::NFileStore::NFuse::TWriteBackCache::TImpl::RequestAutomaticFlush()+26 (0x373EF5A)
std::__y1::__function::__func<NCloud::NFileStore::NFuse::TWriteBackCache::TImpl::ScheduleAutomaticFlushIfNeeded()::'lambda'(), std::__y1::allocator<NCloud::NFileStore::NFuse::TWriteBackCache::TImpl::ScheduleAutomaticFlushIfNeeded()::'lambda'()>, void ()>::operator()()+46 (0x373EEAE)
??+0 (0x1EA6663)
??+0 (0x1EA606A)
??+0 (0x199C73F)
??+0 (0x7F7F14231AC3)
??+0 (0x7F7F142C38D0)

@e673 e673 requested a review from SvartMetal April 13, 2026 10:20
@e673 e673 added the filestore Add this label to run only cloud/filestore build and tests on PR label Apr 13, 2026
@e673 e673 force-pushed the users/nasonov/wbc-auto-flush-race-fix branch from 07b2daa to 672fd68 Compare April 13, 2026 10:22
@github-actions
Copy link
Copy Markdown
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo target: cloud/filestore/ (test time: 4722s): all tests PASSED for commit 672fd68.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
3538 3538 0 0 0 0 0

SvartMetal
SvartMetal previously approved these changes Apr 14, 2026
@e673 e673 requested a review from yegorskii April 14, 2026 18:19
@github-actions
Copy link
Copy Markdown
Contributor

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo target: cloud/filestore/ (test time: 4790s): all tests PASSED for commit 649f596.

TESTS PASSED ERRORS FAILED FAILED BUILD SKIPPED MUTED?
3539 3539 0 0 0 0 0

@e673
Copy link
Copy Markdown
Collaborator Author

e673 commented Apr 15, 2026

Checked that the test TFileSystemTest::ShouldNotCrashWhileStoppingWhenForgetRequestIsInFlight is not affected.

@e673 e673 merged commit fc7eae9 into main Apr 15, 2026
11 of 13 checks passed
@e673 e673 deleted the users/nasonov/wbc-auto-flush-race-fix branch April 15, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

filestore Add this label to run only cloud/filestore build and tests on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants