[bug fix] set finish before notifying load in LanceArrowWriter setFinished method#1985
Merged
luoyuxia merged 1 commit intoapache:mainfrom Nov 18, 2025
Merged
[bug fix] set finish before notifying load in LanceArrowWriter setFinished method#1985luoyuxia merged 1 commit intoapache:mainfrom
luoyuxia merged 1 commit intoapache:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a critical race condition bug in LanceArrowWriter.setFinished() that could cause loadNextBatch() to hang indefinitely. The fix reorders operations to set the finished flag before releasing the semaphore, ensuring proper thread synchronization.
Key Changes:
- Reordered
finished = trueto execute beforeloadToken.release()insetFinished()method to prevent race condition - Added explicit
falseinitialization to thefinishedfield
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private final int batchSize; | ||
|
|
||
| private volatile boolean finished; | ||
| private volatile boolean finished = false; |
There was a problem hiding this comment.
[nitpick] The explicit initialization = false is redundant in Java. Boolean instance fields are automatically initialized to false by default. While this doesn't cause any issues, it's unnecessary and can be removed for cleaner code.
Suggested change
| private volatile boolean finished = false; | |
| private volatile boolean finished; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
… method
Purpose
Linked issue: close #xxx
This commit is copied from lance-format/lance-spark#92.
I found that the Lance writer has a certain probability of hanging. After some troubleshooting, I discovered this is related to the LanceArrowWriter.setFinished method.
The original code appears to have a bug where it sets the finished status after notifying loadNextBatch, which could cause loadNextBatch to hang.
Root Cause
The ideal flow should be:
(thread 1) loadToken.release
(thread 1) finished = true
(thread 2) loadNextBatch
(thread 2) finished is true and count is 0 so return false
However, there's a chance it becomes:
(thread 1) loadToken.release
(thread 2) loadNextBatch
(thread 2) finished is false so return true and waiting
(thread 1) finished = false
If the second scenario occurs, thread 2 will hang indefinitely and cannot receive new notifications. jstack will show stacks hanging in LanceDataWriter.commit.
Brief change log
Tests
API and Format
Documentation