Skip to content

[AURON #1646] isNan semantics are aligned with Spark#1647

Merged
zuston merged 12 commits intoapache:masterfrom
ShreyeshArangath:sarangat/fix-is-nan
Dec 5, 2025
Merged

[AURON #1646] isNan semantics are aligned with Spark#1647
zuston merged 12 commits intoapache:masterfrom
ShreyeshArangath:sarangat/fix-is-nan

Conversation

@ShreyeshArangath
Copy link
Contributor

@ShreyeshArangath ShreyeshArangath commented Nov 20, 2025

Which issue does this PR close?

Closes #1646

Rationale for this change

isnan currently uses math::isnan which propagates nulls and can mismatch Spark semantics. In Spark, isnan(null) must return false, not null. Additionally, computed NaN values like log(-3) may not be handled consistently during Parquet round-trip.

What changes are included in this PR?

new implementation for is_nan native function made for spark

Are there any user-facing changes?

  • isnan(null) → false (arrays and scalars)
  • isnan(non-float) → false (arrays: all false; scalars: false)
  • No API changes

How was this patch tested?

  • Existing unit tests

sarangat_LinkedIn added 2 commits November 20, 2025 15:50
@ShreyeshArangath ShreyeshArangath marked this pull request as ready for review November 21, 2025 21:26
@zuston zuston changed the title [AURON #1646] isNaN currently uses math::isnan which propagates nulls and can mismatch Spark semantics [AURON #1646] isNan semantics are aligned with Spark Nov 28, 2025
@ShreyeshArangath
Copy link
Contributor Author

@cxzl25 Could you take a look as well? Thanks!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns the isNaN function implementation with Spark semantics, addressing issue #1646. The key semantic change is that isnan(null) now returns false instead of propagating null, matching Spark's behavior. The implementation switches from the built-in DataFusion IsNaN function to a custom Spark-compatible implementation.

Key changes:

  • Implemented custom Spark_IsNaN function that returns false for null values instead of propagating nulls
  • Added nulls_to_false helper utility to convert null values in boolean arrays to false
  • Updated tests to verify correct null handling and removed computed NaN test case (log(-3)) due to Parquet round-trip consistency issues

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala Changed IsNaN registration from built-in function to custom Spark extension
spark-extension-shims-spark/src/test/scala/org.apache.auron/AuronFunctionSuite.scala Enabled IsNaN test (previously ignored), removed computed NaN test case, added more null value test cases
native-engine/datafusion-ext-functions/src/spark_isnan.rs New Spark-compatible IsNaN implementation with correct null semantics (null→false)
native-engine/datafusion-ext-functions/src/lib.rs Registered Spark_IsNaN function and updated error message for consistency
native-engine/datafusion-ext-commons/src/arrow/mod.rs Added boolean module
native-engine/datafusion-ext-commons/src/arrow/boolean.rs New helper utility to convert nulls to false in boolean arrays

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +49
_other => {
// For non-float arrays, Spark's isnan is effectively false.
let len = array.len();
let out = ScalarValue::Boolean(Some(false)).to_array_of_size(len)?;
Ok(ColumnarValue::Array(out))
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code handles non-float types (lines 44-49) by returning an array of all false values, but there are no unit tests covering this behavior. Consider adding test cases for non-float types (e.g., integers, strings) to ensure isnan correctly returns false for these types, matching Spark's semantics.

Copilot uses AI. Check for mistakes.
match sv {
ScalarValue::Float64(a) => a.map(|x| x.is_nan()).unwrap_or(false),
ScalarValue::Float32(a) => a.map(|x| x.is_nan()).unwrap_or(false),
_ => false,
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code handles non-float scalar types (line 55) by returning false, but there are no unit tests covering this behavior. Consider adding test cases for non-float scalar types (e.g., integers, strings) to ensure isnan correctly returns false for these types, matching Spark's semantics.

Copilot uses AI. Check for mistakes.
Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Merged @ShreyeshArangath thanks for your nice contribution!

@zuston zuston merged commit 027cffa into apache:master Dec 5, 2025
104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

isNaN currently uses math::isnan which propagates nulls and can mismatch Spark semantics

3 participants