[Filestore] Resolve accessing Singleton<TGlobalLogsStorage> after it has been destroyed#5903
Conversation
…has been destroyed
There was a problem hiding this comment.
Pull request overview
This PR changes how the FUSE loop defers StopAsync/SuspendAsync completion handling to avoid executing teardown logic via SystemThreadFactory() after global logging storage teardown, and updates unit tests to account for the new scheduling behavior.
Changes:
- Route
CompletionQueue->StopAsync(...).Subscribe(...)continuations through the providedISchedulerinstead ofSystemThreadFactory()->Run(...)in the FUSE loop. - Update
TBootstrap::Stop()in FUSE unit tests to drain tasks when aTTestScheduleris used. - Refactor the “deadlock when flush completion fired in scheduler” unit test to control flush completion via a promise and explicitly coordinate stop/flush sequencing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| cloud/filestore/libs/vfs_fuse/loop.cpp | Moves stop/suspend continuations from system thread factory to IScheduler::Schedule to avoid teardown-time singleton access. |
| cloud/filestore/libs/vfs_fuse/fs_ut.cpp | Adjusts test bootstrap shutdown behavior and refactors a write-back-cache stop/deadlock test to align with the new scheduling model. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I've also checked that the changes in test |
There was a problem hiding this comment.
AI Review Summary
Verdict: ✅ No critical issues found
Critical issues
No critical issues found.
Other findings
- Minor | Medium: Implicit scheduler lifecycle dependency in
StopAsync/SuspendAsync—cloud/filestore/libs/vfs_fuse/loop.cpp:819 - Minor | Medium:
RunAllScheduledTasks()addition inTBootstrap::Stop()affects all tests usingTTestScheduler, not just the modified test —cloud/filestore/libs/vfs_fuse/fs_ut.cpp:308 - Nit | Low: Misleading comment in WriteDataHandler —
cloud/filestore/libs/vfs_fuse/fs_ut.cpp:4814
Analysis notes:
The fix correctly addresses the singleton-after-destruction problem reported in the PR body. The root cause was SystemThreadFactory()->Run() creating a detached thread whose lifetime wasn't managed — during process teardown, this thread could access destroyed globals (e.g., TGlobalLogsStorage's mutex). Replacing this with scheduler->Schedule() ensures the callback runs on the managed scheduler thread, which is properly joined during shutdown.
I verified the production shutdown order for both deployment paths:
-
VHOST daemon (
TBootstrapCommon::Stop):Drain()callsSuspendAsync()on all endpoints while the scheduler is alive → scheduler processes the callbacks →Drain()returns →Scheduler->Stop()→StopComponents()(loops are already suspended,FuseLoop == nullptr, soStopAsync()returns immediately). -
Mount command (
TMountCommand::Stop):FileSystemLoop->StopAsync().GetValueSync()is called before parentStop(), so the scheduler is alive when the callback fires.Scheduler->Stop()happens later in the parent chain.
Both paths are safe — the scheduler is always alive when the scheduler->Schedule() call is made from the Subscribe callback.
The test refactoring is well-motivated: the old test blocked the scheduler thread inside WriteDataHandler (via WaitForCondition), which is incompatible with the new design where onStop runs on the scheduler thread. The new pattern uses promise-based control flow that doesn't block the scheduler.
This review was generated automatically. Critical issues require attention; other findings are advisory.
If this comment was useful, please give it a 👍 — it helps us improve the review bot.
Inline comment:
|
Inline comment:
|
Inline comment:
|
Notes
Problem:
Root cause - the code is executed in a thread that is not joined:
Future returned by
StopAsyncis completed insideonStopthat is executed on a thread created bySystemThreadFactory. MethodonStopholdsstd::shared_ptrtoTFileSystem. A race may happen - we may destroy everything after waiting forStopAsyncwhile the thread is still running.Suggestion: replace
SystemThreadFactorywith normal scheduler + fix tests.