[AURON #1898] Fix NULL handling for NOT IN subquery by yew1eb · Pull Request #1899 · apache/auron

yew1eb · 2026-01-15T18:16:49Z

Which issue does this PR close?

Closes #1898

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

yew1eb · 2026-01-16T02:11:04Z

@cxzl25 PTAL

Copilot

Pull request overview

This pull request aims to fix NULL handling for NOT IN subquery operations (issue #1898) by implementing proper null-aware anti-join semantics. The changes ensure that when a NOT IN subquery contains NULL values, the query returns the correct (empty) result set as per SQL semantics.

Changes:

Added logic in the Rust semi-join implementation to detect NULL keys in the build side and filter probe rows accordingly
Added a test case to verify NOT IN subquery behavior with NULL values

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
native-engine/datafusion-ext-plans/src/joins/bhj/semi_join.rs	Implements null-aware anti-join logic by checking for NULL keys in build side and filtering probe rows when NULL keys are present
spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala	Adds test case for NOT IN subquery with NULL values to verify correct null-aware anti-join behavior

Comments suppressed due to low confidence (1)

native-engine/datafusion-ext-plans/src/joins/bhj/semi_join.rs:210

There's a potential index synchronization issue. When a probe row has a valid key (key_is_valid is true) but build_has_null_keys is also true, the code continues at line 209 without incrementing hashes_idx. However, this valid probe key would have been included in map_values during the lookup phase (lines 175-186). This causes hashes_idx to become out of sync with map_values, potentially leading to incorrect lookups or index out of bounds errors in subsequent iterations.

The fix should ensure that when key_is_valid is true and we early exit due to build_has_null_keys, we still increment hashes_idx. One approach is to move the hashes_idx increment before the probe key lookup, or restructure the logic to handle this case correctly.

            if P.mode == Anti
                && P.probe_is_join_side
                && (!key_is_valid || build_has_null_keys) // Filter if probe row is NULL or build side has any NULL
                && self.join_params.is_null_aware_anti_join
            {
                probed_joined.set(row_idx, true);
                continue;
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cxzl25 · 2026-01-16T03:40:12Z

@yew1eb Nice catch, Thank you for your contribution!

cxzl25 · 2026-01-16T07:29:47Z

spark-extension-shims-spark/src/test/scala/org/apache/auron/AuronQuerySuite.scala

+    Seq(row((1, 1)), row((2, 2)), row((3, null)))
+      .toDF("a", "b")
+      .createOrReplaceTempView("tbl")
+    val df = checkSparkAnswer("select * from tbl where a not in (select b from tbl)")


== Results == !== Correct Answer - 0 == == Spark Answer - 1 == !struct<> struct<a:int,b:int> ! [3,null] (QueryTest.scala:244)

github-actions bot added spark native labels Jan 15, 2026

[AURON apache#1898] Fix NULL handling for NOT IN subquery

bb9f600

yew1eb force-pushed the fix_null_not_in_subquery branch from 1303630 to bb9f600 Compare January 15, 2026 19:14

cxzl25 requested a review from Copilot January 16, 2026 03:07

Copilot started reviewing on behalf of cxzl25 January 16, 2026 03:08 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

cxzl25 approved these changes Jan 16, 2026

View reviewed changes

cxzl25 reviewed Jan 16, 2026

View reviewed changes

cxzl25 merged commit db984c1 into apache:master Jan 16, 2026
104 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AURON #1898] Fix NULL handling for NOT IN subquery#1899

[AURON #1898] Fix NULL handling for NOT IN subquery#1899
cxzl25 merged 1 commit intoapache:masterfrom
yew1eb:fix_null_not_in_subquery

yew1eb commented Jan 15, 2026

Uh oh!

yew1eb commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

cxzl25 commented Jan 16, 2026

Uh oh!

cxzl25 Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yew1eb commented Jan 15, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Uh oh!

yew1eb commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

cxzl25 commented Jan 16, 2026

Uh oh!

cxzl25 Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants