feat(extensions): support int arguments with std_dev and variance functions#1012
feat(extensions): support int arguments with std_dev and variance functions#1012nielspardon wants to merge 11 commits intosubstrait-io:mainfrom
Conversation
mbwhite
left a comment
There was a problem hiding this comment.
LGTM - good to get the complete implementation - and following the pattern of avg
76ff67d to
82379c0
Compare
This comment was marked as outdated.
This comment was marked as outdated.
yongchul
left a comment
There was a problem hiding this comment.
+1 for supporting integral types.
…and variance BREAKING CHANGE: changes the function signature of existing functions std_dev and variance Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
82379c0 to
7219621
Compare
|
rebased on latest #1011 to remove the deprecation |
|
Do we actually need to support calls like
This is slightly different from the For the |
7219621 to
9b21b2b
Compare
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
…ctions Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
9b21b2b to
a9b2d92
Compare
Currently, the fp32 argument versions also returns an fp32 result: substrait/extensions/functions_arithmetic.yaml Lines 1395 to 1404 in 7b39d4c substrait/extensions/functions_arithmetic.yaml Lines 1418 to 1427 in 7b39d4c |
I think not offering function signatures with all numeric types makes it only less convenient for Substrait adopters since they need to add and parse extra cast expressions which they otherwise would not need. Sure, we could force them to do so but then I also feel that the Substrait specification is under-specified on the expected casting behavior like e.g. when casting from fp32 to i64 does it round up or down? |
Should have been clearer. The integer args versions all return fp64.
I agree actually, but that sounds like something worth pinning down as well.
It makes the producers life more difficult, but lessens the burden on the consumer because the surface area of functions to implement/support is smaller. I haven't figured out a good criteria to express when we should push for a plan producer to cast, but I don't think think we need to provide 1-1 functions for everything in SQL. This actually might be a case where it's worth it, but forcing producers to cast is also a mechanism to get them to capture their type coercion behaviors explicitly. |
created an issue for the casting docs improvement: #1023 |
If you wanted to enforce such a behavior you would have to analyze all of the existing function signatures and deprecate/remove the ones that are "too convenient", not forcing producers to expose their type coercion behavior. Also this will require a major rewrite of the isthmus aggregate function mapping (at least for std_dev, variance but also for scalar, windowing functions prospectively) since the way it is implemented today does not allow to explicitly coerce types using casts since in the function conversion logic we have only access to field indices of fields used in aggregate functions but not their types. |
This PR depends on #1010 and #1011 since it uses the new enum argument syntax in the test cases and adds additional generated test cases using integer arguments to the generated test cases in #1011.
The main changes for this PR are in commit 714d363.
This PR adds new function signatures for
std_devandvariancewhich accept integer arguments. Most SQL systems support any numeric arguments not just floating point arguments as the current Substrait function signatures suggest.Integer arguments are used in TPC-DS and are required so we can correctly convert TPC-DS queries with isthmus in substrait-java. See: substrait-io/substrait-java#68
This change is