-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
This issue servers to document some existing behaviour, as well as discuss if we can change this to something more intuitive/ergonomic.
Problem
When using TypeSignature::Coercible, there is some interesting behaviour when we have DataType::Null, DataType::Dictionary and DataType::RunEndEncoded passed in.
Given a function with signature like so:
Signature::coercible(
vec![Coercion::new_exact(TypeSignatureClass::Native(logical_int64())],
Volatility::Immutable
)It expects only a single Int64 argument. However, if we pass in DataType::Null or Dictionary(Int8, Int64) argument, these are actually valid and in-fact are casted to Int64 (the function's invoke method would see the input array of type Int64; it wouldn't see arrays of type Null or Dictionary(Int8, Int64).
This is the same for the following signature:
Signature::coercible(
vec![Coercion::new_implicit(
TypeSignatureClass::Native(logical_int64()),
vec![],
NativeType::Int64,
)],
Volatility::Immutable
)If a signature specifies TypeSignatureClass::Native then not only is the native type accepted, but also Null and Dictionary/RunEndEncoded types of this native type are allowed in (but casted to the specified type).
Using TypeSignatureClass::Integer
If we instead use any of these APIs:
Signature::coercible(
vec![Coercion::new_exact(TypeSignatureClass::Integer],
Volatility::Immutable
)Signature::coercible(
vec![Coercion::new_implicit(
TypeSignatureClass::Integer,
vec![],
NativeType::Int64,
)],
Volatility::Immutable
)That is, using TypeSignatureClass::Integer which encompasses the type we're looking for, then Null/Dictionary/REE types are passed through without being casted. Only if we use TypeSignatureClass::Native do they get casted.
Why is this a problem?
It makes it annoying for function implementations to handle, as depending on which API they use they must provide an implementation that considers Null/Dictionary/REE type arrays or not. This is subtle behaviour that is hard to spot without appropriate testing.
What to do
- See if we can make this behaviour more obvious
- Enhance signature API to tune this behaviour; let functions choose if:
- Dictionary/REE arrays are allowed as in (function implementation must handle these array types)
- Dictionary/REE arrays are materialized (function implementation doesn't need to consider Dictionary/REE array types as they'll be casted)
- Can we universally handle
Nulltype arrays? Majority of functions likely just return null anyway, it's tedious to implement this handling per function
Example of null propagation from log:
datafusion/datafusion/functions/src/math/log.rs
Lines 330 to 341 in 5fedb84
| let mut arg_types = args | |
| .iter() | |
| .map(|arg| info.get_data_type(arg)) | |
| .collect::<Result<Vec<_>>>()?; | |
| let return_type = self.return_type(&arg_types)?; | |
| // Null propagation | |
| if arg_types.iter().any(|dt| dt.is_null()) { | |
| return Ok(ExprSimplifyResult::Simplified(lit( | |
| ScalarValue::Null.cast_to(&return_type)? | |
| ))); | |
| } |