feat(extensions): support int arguments with std_dev and variance functions by nielspardon · Pull Request #1012 · substrait-io/substrait

nielspardon · 2026-03-17T12:36:11Z

This PR depends on #1010 and #1011 since it uses the new enum argument syntax in the test cases and adds additional generated test cases using integer arguments to the generated test cases in #1011.

The main changes for this PR are in commit 714d363.

This PR adds new function signatures for std_dev and variance which accept integer arguments. Most SQL systems support any numeric arguments not just floating point arguments as the current Substrait function signatures suggest.

Integer arguments are used in TPC-DS and are required so we can correctly convert TPC-DS queries with isthmus in substrait-java. See: substrait-io/substrait-java#68

This change is

mbwhite

LGTM - good to get the complete implementation - and following the pattern of avg

extensions/functions_arithmetic.yaml

yongchul

+1 for supporting integral types.

…and variance BREAKING CHANGE: changes the function signature of existing functions std_dev and variance Signed-off-by: Niels Pardon <par@zurich.ibm.com>

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

nielspardon · 2026-03-23T18:07:18Z

rebased on latest #1011 to remove the deprecation

vbarua · 2026-03-23T22:29:51Z

Do we actually need to support calls likestd_dev(<i8>), or should we pushing producers to generate std_dev(<i8>::fp64) to force them to capture their type promotion logic.

following the pattern of avg

This is slightly different from the avg case IMO. The avg functions output the same numeric type as the input. So avg:i8 outputs i8, avg:i16 outputs i16, etc.

For the std_dev and variance functions with numeric inputs, the output is always fp64. There is no difference in output types that we need to capture.

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

…ctions Signed-off-by: Niels Pardon <par@zurich.ibm.com>

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

nielspardon · 2026-03-24T08:50:32Z

For the std_dev and variance functions with numeric inputs, the output is always fp64. There is no difference in output types that we need to capture.

Currently, the fp32 argument versions also returns an fp32 result:

substrait/extensions/functions_arithmetic.yaml

Lines 1395 to 1404 in 7b39d4c

    
           - args: 
        
               - name: x 
        
                 value: fp32 
        
             options: 
        
               rounding: 
        
                 values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR ] 
        
               distribution: 
        
                 values: [ SAMPLE, POPULATION] 
        
             nullability: DECLARED_OUTPUT 
        
             return: fp32?

substrait/extensions/functions_arithmetic.yaml

Lines 1418 to 1427 in 7b39d4c

    
           - args: 
        
               - name: x 
        
                 value: fp32 
        
             options: 
        
               rounding: 
        
                 values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR ] 
        
               distribution: 
        
                 values: [ SAMPLE, POPULATION] 
        
             nullability: DECLARED_OUTPUT 
        
             return: fp32?

nielspardon · 2026-03-24T09:10:37Z

Do we actually need to support calls likestd_dev(<i8>), or should we pushing producers to generate std_dev(<i8>::fp64) to force them to capture their type promotion logic.

I think not offering function signatures with all numeric types makes it only less convenient for Substrait adopters since they need to add and parse extra cast expressions which they otherwise would not need. Sure, we could force them to do so but then I also feel that the Substrait specification is under-specified on the expected casting behavior like e.g. when casting from fp32 to i64 does it round up or down?

vbarua · 2026-03-24T14:46:36Z

Currently, the fp32 argument versions also returns an fp32 result:

Should have been clearer. The integer args versions all return fp64.

I also feel that the Substrait specification is under-specified on the expected casting behavior like e.g. when casting from fp32 to i64 does it round up or down?

I agree actually, but that sounds like something worth pinning down as well.

I think not offering function signatures with all numeric types makes it only less convenient for Substrait adopters since they need to add and parse extra cast expressions which they otherwise would not need.

It makes the producers life more difficult, but lessens the burden on the consumer because the surface area of functions to implement/support is smaller. I haven't figured out a good criteria to express when we should push for a plan producer to cast, but I don't think think we need to provide 1-1 functions for everything in SQL. This actually might be a case where it's worth it, but forcing producers to cast is also a mechanism to get them to capture their type coercion behaviors explicitly.

nielspardon · 2026-03-24T14:57:04Z

I also feel that the Substrait specification is under-specified on the expected casting behavior like e.g. when casting from fp32 to i64 does it round up or down?

I agree actually, but that sounds like something worth pinning down as well.

created an issue for the casting docs improvement: #1023

nielspardon · 2026-03-25T07:25:04Z

It makes the producers life more difficult, but lessens the burden on the consumer because the surface area of functions to implement/support is smaller. I haven't figured out a good criteria to express when we should push for a plan producer to cast, but I don't think think we need to provide 1-1 functions for everything in SQL. This actually might be a case where it's worth it, but forcing producers to cast is also a mechanism to get them to capture their type coercion behaviors explicitly.

If you wanted to enforce such a behavior you would have to analyze all of the existing function signatures and deprecate/remove the ones that are "too convenient", not forcing producers to expose their type coercion behavior.

Also this will require a major rewrite of the isthmus aggregate function mapping (at least for std_dev, variance but also for scalar, windowing functions prospectively) since the way it is implemented today does not allow to explicitly coerce types using casts since in the function conversion logic we have only access to field indices of fields used in aggregate functions but not their types.

nielspardon requested review from EpsilonPrime, cpcloud, jacques-n, vbarua, westonpace and yongchul as code owners March 17, 2026 12:36

nielspardon self-assigned this Mar 17, 2026

mbwhite approved these changes Mar 20, 2026

View reviewed changes

yongchul reviewed Mar 20, 2026

View reviewed changes

extensions/functions_arithmetic.yaml Show resolved Hide resolved

nielspardon force-pushed the par-stddev-int branch 2 times, most recently from 76ff67d to 82379c0 Compare March 20, 2026 20:08

This comment was marked as outdated.

Sign in to view

yongchul approved these changes Mar 20, 2026

View reviewed changes

nielspardon added the PMC Ready PRs ready for review by PMCs label Mar 23, 2026

nielspardon added 8 commits March 23, 2026 18:50

fix(extensions)!: change distribution option to enum arg for std_dev …

26ddf93

…and variance BREAKING CHANGE: changes the function signature of existing functions std_dev and variance Signed-off-by: Niels Pardon <par@zurich.ibm.com>

fix: add deprecated flag and deprecate old sigs

9b5f945

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

doc: add deprecated field to docs

eed9e41

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

fix: use new deprecation mechanism

02dd601

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

fix: also change variance function to new deprecation

8e570e2

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

fix: test case nullability

1330e33

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

fix: update deprecation version

b364972

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

fix: remove deprecated field

6275d23

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

nielspardon force-pushed the par-stddev-int branch from 82379c0 to 7219621 Compare March 23, 2026 18:06

nielspardon force-pushed the par-stddev-int branch from 7219621 to 9b21b2b Compare March 24, 2026 08:36

nielspardon added 3 commits March 24, 2026 09:39

fix: reverse argument order

8175a73

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

feat(extensions): support int arguments with std_dev and variance fun…

2911777

…ctions Signed-off-by: Niels Pardon <par@zurich.ibm.com>

fix: reverse argument order

a9b2d92

Signed-off-by: Niels Pardon <par@zurich.ibm.com>

nielspardon force-pushed the par-stddev-int branch from 9b21b2b to a9b2d92 Compare March 24, 2026 08:43

nielspardon mentioned this pull request Mar 24, 2026

docs: improve documentation of casting behavior #1023

Open

nielspardon mentioned this pull request Mar 26, 2026

fix(isthmus): std_dev, variance function mappings substrait-io/substrait-java#780

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(extensions): support int arguments with std_dev and variance functions#1012

feat(extensions): support int arguments with std_dev and variance functions#1012
nielspardon wants to merge 11 commits intosubstrait-io:mainfrom
nielspardon:par-stddev-int

nielspardon commented Mar 17, 2026 •

edited by jacques-n

Loading

Uh oh!

mbwhite left a comment

Uh oh!

Uh oh!

This comment was marked as outdated.

yongchul left a comment

Uh oh!

nielspardon commented Mar 23, 2026

Uh oh!

vbarua commented Mar 23, 2026

Uh oh!

nielspardon commented Mar 24, 2026

Uh oh!

nielspardon commented Mar 24, 2026

Uh oh!

vbarua commented Mar 24, 2026

Uh oh!

nielspardon commented Mar 24, 2026

Uh oh!

nielspardon commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nielspardon commented Mar 17, 2026 • edited by jacques-n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mbwhite left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as outdated.

yongchul left a comment

Choose a reason for hiding this comment

Uh oh!

nielspardon commented Mar 23, 2026

Uh oh!

vbarua commented Mar 23, 2026

Uh oh!

nielspardon commented Mar 24, 2026

Uh oh!

nielspardon commented Mar 24, 2026

Uh oh!

vbarua commented Mar 24, 2026

Uh oh!

nielspardon commented Mar 24, 2026

Uh oh!

nielspardon commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nielspardon commented Mar 17, 2026 •

edited by jacques-n

Loading