[AURON #1792] Keep the null result in the reverse connection result#1793
[AURON #1792] Keep the null result in the reverse connection result#1793richox merged 9 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in Left Anti Join handling where NULL join keys were incorrectly included in the result set. The fix ensures NULL join keys are filtered out from Anti join results, which aligns with SQL NOT IN semantics where NULL NOT IN (...) evaluates to NULL (treated as false).
Key Changes
- Added explicit NULL join key handling for Anti joins that filters out rows with NULL keys
- Refactored the key validity check to be computed once before the conditional logic
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
The failed issues are all LeftAnti join related tests which all involve NULLs in the test data. I believe the reason is the following: there are two different kinds of LeftAnti in Spark: 'NOT IN' and 'NOT EXISTS'. Before this PR, Auron's implementation of semi_join.rs implemented 'NOT EXISTS', the standard LeftAnti. It produces incorrect results for the NOT IN queries, like the query you mentioned in #1792 . In this PR, you changed the implementation from 'NOT EXISTS' to 'NOT IN'(actually not a fully implemented 'NOT IN'), so the 'NOT EXISTS'(standard LeftAnti) related tests which involves NULLs failed. I think we should revert this PR and have a better we to handle both NOT IN and NOT EXISTS. |
…ction result (apache#1793)" This reverts commit 283b5da.
Which issue does this PR close?
Closes #1792
Rationale for this change
Keep the null result in the reverse connection result
What changes are included in this PR?
Are there any user-facing changes?
How was this patch tested?
cluster test