fix: prevent crash when KnnSearch and Tune run concurrently by Copilot · Pull Request #1764 · antgroup/vsag

Copilot · 2026-03-27T06:33:37Z

Change Type

Linked Issue

Issue: #crash-knnsearch-tune-concurrent

What Changed

Tune() mutates basic_flatten_codes_, high_precise_codes_, and use_reorder_ under add_mutex_, while search methods hold global_mutex_. Different mutexes → no synchronization → use-after-free in SIMD distance computation when Tune destroys the old flatten codes mid-search.

Tune(): Defer member mutations to local variables (new_use_reorder, drop_precise_codes). Build new flatten codes fully, then atomically swap all shared state under exclusive global_mutex_.
RangeSearch(): Add missing shared_lock(global_mutex_) — was present in KnnSearch/SearchWithRequest but absent here.

Lock order add_mutex_ → global_mutex_ is preserved (consistent with add_one_point).

Test Evidence

Test details:

# algorithm target (includes hgraph.cpp) builds cleanly
cmake --build build/ --target algorithm -j$(nproc)
# [100%] Built target algorithm
# Pre-existing build error in unrelated window_result_queue.cpp prevents full build

Compatibility Impact

API/ABI compatibility: none
Behavior changes: Tune() now blocks concurrent searches only during the pointer swap (not during training/rebuild). RangeSearch() now correctly synchronizes with Tune().

Performance and Concurrency Impact

Performance impact: none — lock scope on global_mutex_ in Tune is minimal (pointer swaps only; training happens outside the lock)
Concurrency/thread-safety impact: Fixes data race between Tune and all search methods. RangeSearch gains the same global_mutex_ protection other search methods already had.

Documentation Impact

Risk and Rollback

Risk level: low
Rollback plan: revert single commit

Checklist

I have linked the relevant issue (or explained why not applicable)
I have added/updated tests for new behavior or bug fixes
I have considered API compatibility impact
I have updated docs if behavior/workflow changed
My commit messages follow project conventions (Conventional Commits, optional [skip ci] prefix)

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

The crash occurred because Tune() modified basic_flatten_codes_, high_precise_codes_, and use_reorder_ under add_mutex_ while search methods (KnnSearch, SearchWithRequest) held shared global_mutex_ — these are different mutexes providing no synchronization. Changes: - In Tune(), defer member variable mutations using local variables, build new flatten codes first, then atomically swap all shared state under exclusive global_mutex_ lock - In RangeSearch(), add missing shared_lock(global_mutex_) for consistency with KnnSearch and SearchWithRequest Agent-Logs-Url: https://github.com/antgroup/vsag/sessions/ff0a358d-0694-4cb2-a40e-daf1830bd6f4 Co-authored-by: wxyucs <12595343+wxyucs@users.noreply.github.com>

Copilot

Pull request overview

Fixes a concurrency crash in VSAG’s HGraph index when Tune() runs concurrently with search operations by ensuring shared search state is swapped under the same mutex that protects readers.

Changes:

Update HGraph::Tune() to rebuild flatten codes off-lock and then atomically swap basic_flatten_codes_, high_precise_codes_, and use_reorder_ under an exclusive global_mutex_.
Add the missing shared_lock(global_mutex_) in HGraph::RangeSearch() to synchronize with Tune().

Copilot · 2026-03-27T08:19:57Z

src/algorithm/hgraph.cpp

+    // Acquire exclusive global lock to atomically swap flatten codes,
+    // preventing concurrent searches from accessing partially updated state.
+    {
+        std::scoped_lock<std::shared_mutex> wlock(this->global_mutex_);
+        basic_flatten_codes_ = new_basic;
+        if (drop_precise_codes) {
+            high_precise_codes_.reset();
+        } else {
+            high_precise_codes_ = new_precise;
+        }
+        use_reorder_ = new_use_reorder;

-    // set status
-    if (disable_future_tuning) {
-        this->index_feature_list_->SetFeature(IndexFeature::SUPPORT_TUNE, false);
-        this->raw_vector_.reset();
-        has_raw_vector_ = false;
-        create_new_raw_vector_ = false;
+        check_and_init_raw_vector(param->raw_vector_param, common_param, false);
+        init_resize_bit_and_reorder();


Tune now swaps basic_flatten_codes_/high_precise_codes_ under global_mutex_, but other public read paths (e.g., CalcDistanceById/CalDistanceById) access these members without taking global_mutex_. That means concurrent Tune() can still race with those APIs and potentially trigger the same use-after-free. Consider taking a std::shared_lock on global_mutex_ in those distance-by-id methods (and any other readers of these pointers), or otherwise ensuring the swapped state is read atomically everywhere.

Copilot · 2026-03-27T08:19:58Z

src/algorithm/hgraph.cpp

+    auto new_basic = tune_and_rebuild(is_tune_base_code, basic_flatten_codes_, new_basic_code);
+    auto new_precise =
        tune_and_rebuild(is_tune_precise_code, high_precise_codes_, new_precise_code);

-    check_and_init_raw_vector(param->raw_vector_param, common_param, false);
-    init_resize_bit_and_reorder();
+    // Acquire exclusive global lock to atomically swap flatten codes,
+    // preventing concurrent searches from accessing partially updated state.
+    {
+        std::scoped_lock<std::shared_mutex> wlock(this->global_mutex_);
+        basic_flatten_codes_ = new_basic;
+        if (drop_precise_codes) {
+            high_precise_codes_.reset();
+        } else {
+            high_precise_codes_ = new_precise;
+        }


This PR fixes a specific concurrent crash scenario (Tune() vs searches) but doesn't add a regression test that runs Tune() concurrently with KnnSearch()/RangeSearch() to ensure the race stays fixed. There are already concurrent tests for read/write in tests/test_hgraph.cpp, so adding a focused Tune-vs-search stress/regression test would help prevent future reintroductions.

Initial plan

8857f49

Copilot AI assigned Copilot and wxyucs Mar 27, 2026

pull-request-size bot added the size/XS label Mar 27, 2026

Copilot started work on behalf of wxyucs March 27, 2026 06:33 View session

wxyucs added kind/bug Something isn't working version/1.0 labels Mar 27, 2026

pull-request-size bot added size/M and removed size/XS labels Mar 27, 2026

Copilot AI changed the title ~~[WIP] Fix crash when KnnSearch and Tune concurrently~~ fix: prevent crash when KnnSearch and Tune run concurrently Mar 27, 2026

Copilot finished work on behalf of wxyucs March 27, 2026 07:04

Copilot AI requested a review from wxyucs March 27, 2026 07:04

wxyucs marked this pull request as ready for review March 27, 2026 08:16

wxyucs requested a review from LHT129 as a code owner March 27, 2026 08:16

Copilot AI review requested due to automatic review settings March 27, 2026 08:16

wxyucs requested review from inabao and jiaweizone as code owners March 27, 2026 08:16

Copilot started reviewing on behalf of wxyucs March 27, 2026 08:16 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent crash when KnnSearch and Tune run concurrently#1764

fix: prevent crash when KnnSearch and Tune run concurrently#1764
Copilot wants to merge 2 commits intomainfrom
copilot/fix-crash-knnsearch-tune

Copilot AI commented Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Type

Linked Issue

What Changed

Test Evidence

Compatibility Impact

Performance and Concurrency Impact

Documentation Impact

Risk and Rollback

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 27, 2026 •

edited

Loading