Skip to content

Refactor AsyncThreadCtx to use shared_ptr and state management#201

Merged
fzhedu merged 23 commits intobytedance:mainfrom
Ashutosh0x:refactor/async-ctx-shared-ptr
Feb 12, 2026
Merged

Refactor AsyncThreadCtx to use shared_ptr and state management#201
fzhedu merged 23 commits intobytedance:mainfrom
Ashutosh0x:refactor/async-ctx-shared-ptr

Conversation

@Ashutosh0x
Copy link
Contributor

This PR refactors AsyncThreadCtx to use std::shared_ptr for shared ownership, ensuring context survival during async tasks even if TableScan closes. It introduces State::kActive and State::kClosed to handle graceful shutdown. It updates AsyncLoadHolder and TableScan to check isClosed() and create Guard only within the async task, preventing the main thread from waiting on queued tasks. It also reverts manual in() calls in TableScan::preload and ensures AsyncThreadCtx is properly managed.

@Ashutosh0x
Copy link
Contributor Author

cc @yangzhg @fzhedu - this is the follow-up PR implementing the shared_ptr refactor discussed in #167. This resolves the blocking behavior while maintaining safety.

@Ashutosh0x
Copy link
Contributor Author

Fixed formatting issues reported by CI. The clang-tidy errors appear to be related to a missing compilation database in the CI environment (unable to find folly headers), which is unrelated to my changes.

@frankobe frankobe requested a review from fzhedu February 4, 2026 22:55
@frankobe
Copy link
Collaborator

frankobe commented Feb 4, 2026

@Ashutosh0x Do you mind fixing the build issue on bolt/dwio/dwrf/test/CMakeFiles/bolt_dwio_cache_test.dir/DirectBufferedInputTest.cpp.o

It needs to updated to use the std::shared_ptr instead of the raw pointer

 /__w/bolt/bolt/bolt/dwio/common/DirectBufferedInput.h:171:50: note:   no known conversion for argument 9 from ‘bytedance::bolt::connector::AsyncThreadCtx*’ to ‘std::shared_ptr<bytedance::bolt::connector::AsyncThreadCtx>’
  171 |       std::shared_ptr<connector::AsyncThreadCtx> asyncThreadCtx)
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
      

@Ashutosh0x
Copy link
Contributor Author

Fixed the build issue in DirectBufferedInputTest.cpp. It was still using the raw pointer instead of passing the shared_ptr directly. Changes pushed.

@Ashutosh0x
Copy link
Contributor Author

@fzhedu ADDRESSED ALL COMMENTS.

Summary of changes:

  1. Updated AsyncThreadCtx::in() to return �ool and check kClosed state under lock to prevent race conditions.
  2. Modified Guard to check in() result; if false, it invalidates itself (ctx_ = nullptr).
  3. Moved explicit state transition state_ = State::kClosed inside AsyncThreadCtx::wait() under lock, and removed the separate close() method.
  4. Updated TableScan.cpp and DirectBufferedInput.cpp to use the safe Guard check or manual in()/out() pattern where appropriate.

Ready for re-review.

Copy link
Collaborator

@fzhedu fzhedu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ashutosh0x
Copy link
Contributor Author

@fzhedu Reverted changes to TableScanTest.cpp as requested. Also removed empty lines in Connector.h.

@fzhedu
Copy link
Collaborator

fzhedu commented Feb 9, 2026

@fzhedu Reverted changes to TableScanTest.cpp as requested. Also removed empty lines in Connector.h.

@Ashutosh0x thank you for you PR and fixing some comments, would you like to continue to fix some new comments?

@Ashutosh0x
Copy link
Contributor Author

Ashutosh0x commented Feb 9, 2026

@fzhedu Thanks for the review! I've addressed your comments:

  1. Changed AsyncThreadCtx_.get() to AsyncThreadCtx_ - Now passing the shared_ptr directly to createConnectorQueryCtx instead of a raw pointer.

  2. Updated createConnectorQueryCtx signature - Changed from connector::AsyncThreadCtx* const asyncThreadCtx to std::shared_ptrconnector::AsyncThreadCtx asyncThreadCtx in both Operator.h and Operator.cpp.

  3. Handled hiveSplit->length default value - When the length is std::numeric_limits<uint64_t>::max() (the default value), we now treat it as 0 for the preloadBytes calculation.

Ready for re-review!

@fzhedu
Copy link
Collaborator

fzhedu commented Feb 9, 2026

thanks for your quick reply!
please fix following failed tests in CI @Ashutosh0x

The following tests FAILED:
	353 - bolt_dwio_cache_test (SEGFAULT)
	407 - bolt_tool_trace_test (SEGFAULT)

@Ashutosh0x
Copy link
Contributor Author

@fzhedu I've applied additional fixes in the latest commit:

  1. Task Counting Logic: Restored the inGuard_ mechanism in AsyncLoadHolder (and TableScan preloader) to ensure that task counting starts immediately when a task is submitted to the executor.
  2. Safe Lifetime Management: By storing a shared_ptr as a member of AsyncLoadHolder before the inGuard_, we ensure the AsyncThreadCtx object remains valid throughout the life of the queued and executing task, safely resolving the lifetime issue with the Guard's internal raw pointer.
  3. Lint/Formatting: Fixed all clang-format violations reported by the CI.
    Tests are running now!

@fzhedu
Copy link
Collaborator

fzhedu commented Feb 11, 2026

c748524

we should revert the commit c748524 , as it actually is the previous way where the main thread should wait all async threads. This problem is the goal to be solved by this PR, as the PR description said It updates AsyncLoadHolder and TableScan to check isClosed() and create Guard only within the async task, preventing the main thread from waiting on queued tasks.

@Ashutosh0x
Copy link
Contributor Author

@fzhedu I've addressed the latest feedback:

  1. Reverted the blocking logic: Removed inGuard_ from AsyncLoadHolder and moved the connector::AsyncThreadCtx::Guard creation back inside the executor's async task in both DirectBufferedInput.cpp and TableScan.cpp. This ensures the main thread doesn't wait for queued tasks and resolves the issue introduced in commit c748524.
  2. Formatting: Applied clang-format to all modified files as requested.

The PR is now ready for re-review!

Copy link
Collaborator

@fzhedu fzhedu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution, nice work!

@fzhedu fzhedu enabled auto-merge February 12, 2026 01:59
@fzhedu fzhedu disabled auto-merge February 12, 2026 02:33
@fzhedu fzhedu added this pull request to the merge queue Feb 12, 2026
github-merge-queue bot pushed a commit that referenced this pull request Feb 12, 2026
@Ashutosh0x
Copy link
Contributor Author

LGTM, thanks for your contribution, nice work!

Thank you so much

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 12, 2026
@fzhedu fzhedu added this pull request to the merge queue Feb 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 12, 2026
@fzhedu fzhedu added this pull request to the merge queue Feb 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 12, 2026
@fzhedu fzhedu added this pull request to the merge queue Feb 12, 2026
@fzhedu fzhedu removed this pull request from the merge queue due to a manual request Feb 12, 2026
@fzhedu fzhedu enabled auto-merge February 12, 2026 06:37
@fzhedu fzhedu added this pull request to the merge queue Feb 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 12, 2026
@fzhedu fzhedu added this pull request to the merge queue Feb 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 12, 2026
@fzhedu fzhedu added this pull request to the merge queue Feb 12, 2026
Merged via the queue into bytedance:main with commit d81f3d6 Feb 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants