Skip to content

[AURON #1792] Keep the null result in the reverse connection result#1793

Merged
richox merged 9 commits intoapache:masterfrom
dh20:master_fix_left
Dec 30, 2025
Merged

[AURON #1792] Keep the null result in the reverse connection result#1793
richox merged 9 commits intoapache:masterfrom
dh20:master_fix_left

Conversation

@dh20
Copy link
Contributor

@dh20 dh20 commented Dec 26, 2025

Which issue does this PR close?

Closes #1792

Rationale for this change

Keep the null result in the reverse connection result

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

cluster test

@cxzl25 cxzl25 requested a review from Copilot December 26, 2025 10:57
@cxzl25 cxzl25 changed the title Keep the null result in the reverse connection result [AURON #1792] Keep the null result in the reverse connection result Dec 26, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in Left Anti Join handling where NULL join keys were incorrectly included in the result set. The fix ensures NULL join keys are filtered out from Anti join results, which aligns with SQL NOT IN semantics where NULL NOT IN (...) evaluates to NULL (treated as false).

Key Changes

  • Added explicit NULL join key handling for Anti joins that filters out rows with NULL keys
  • Refactored the key validity check to be computed once before the conditional logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@richox richox merged commit 283b5da into apache:master Dec 30, 2025
98 checks passed
@xumingming
Copy link
Contributor

The following two tests failed after this PR is merged:

#1810
#1811

@dh20 Can you take a look?

@xumingming
Copy link
Contributor

The failed issues are all LeftAnti join related tests which all involve NULLs in the test data.

I believe the reason is the following: there are two different kinds of LeftAnti in Spark: 'NOT IN' and 'NOT EXISTS'. Before this PR, Auron's implementation of semi_join.rs implemented 'NOT EXISTS', the standard LeftAnti. It produces incorrect results for the NOT IN queries, like the query you mentioned in #1792 .

In this PR, you changed the implementation from 'NOT EXISTS' to 'NOT IN'(actually not a fully implemented 'NOT IN'), so the 'NOT EXISTS'(standard LeftAnti) related tests which involves NULLs failed.

I think we should revert this PR and have a better we to handle both NOT IN and NOT EXISTS.

cxzl25 added a commit to cxzl25/auron that referenced this pull request Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Left connection failure

4 participants