Skip to content

How Null/Dictionary/REE datatypes interact with Coercion API #19458

@Jefffrey

Description

@Jefffrey

This issue servers to document some existing behaviour, as well as discuss if we can change this to something more intuitive/ergonomic.

Problem

When using TypeSignature::Coercible, there is some interesting behaviour when we have DataType::Null, DataType::Dictionary and DataType::RunEndEncoded passed in.

Given a function with signature like so:

Signature::coercible(
    vec![Coercion::new_exact(TypeSignatureClass::Native(logical_int64())],
    Volatility::Immutable
)

It expects only a single Int64 argument. However, if we pass in DataType::Null or Dictionary(Int8, Int64) argument, these are actually valid and in-fact are casted to Int64 (the function's invoke method would see the input array of type Int64; it wouldn't see arrays of type Null or Dictionary(Int8, Int64).

This is the same for the following signature:

Signature::coercible(
    vec![Coercion::new_implicit(
        TypeSignatureClass::Native(logical_int64()),
        vec![],
        NativeType::Int64,
    )],
    Volatility::Immutable
)

If a signature specifies TypeSignatureClass::Native then not only is the native type accepted, but also Null and Dictionary/RunEndEncoded types of this native type are allowed in (but casted to the specified type).

Using TypeSignatureClass::Integer

If we instead use any of these APIs:

Signature::coercible(
    vec![Coercion::new_exact(TypeSignatureClass::Integer],
    Volatility::Immutable
)
Signature::coercible(
    vec![Coercion::new_implicit(
        TypeSignatureClass::Integer,
        vec![],
        NativeType::Int64,
    )],
    Volatility::Immutable
)

That is, using TypeSignatureClass::Integer which encompasses the type we're looking for, then Null/Dictionary/REE types are passed through without being casted. Only if we use TypeSignatureClass::Native do they get casted.

Why is this a problem?

It makes it annoying for function implementations to handle, as depending on which API they use they must provide an implementation that considers Null/Dictionary/REE type arrays or not. This is subtle behaviour that is hard to spot without appropriate testing.

What to do

  • See if we can make this behaviour more obvious
  • Enhance signature API to tune this behaviour; let functions choose if:
    • Dictionary/REE arrays are allowed as in (function implementation must handle these array types)
    • Dictionary/REE arrays are materialized (function implementation doesn't need to consider Dictionary/REE array types as they'll be casted)
  • Can we universally handle Null type arrays? Majority of functions likely just return null anyway, it's tedious to implement this handling per function

Example of null propagation from log:

let mut arg_types = args
.iter()
.map(|arg| info.get_data_type(arg))
.collect::<Result<Vec<_>>>()?;
let return_type = self.return_type(&arg_types)?;
// Null propagation
if arg_types.iter().any(|dt| dt.is_null()) {
return Ok(ExprSimplifyResult::Simplified(lit(
ScalarValue::Null.cast_to(&return_type)?
)));
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions