feat!: require window function bounds and support multiple order-by columns#932
feat!: require window function bounds and support multiple order-by columns#932benbellick wants to merge 3 commits intomainfrom
Conversation
asolimando
left a comment
There was a problem hiding this comment.
LGTM mostly but there are a couple of open questions
| | Lower Bound | Bound Following(int64), Bound Trailing(int64) or CurrentRow. | False, defaults to start of partition | | ||
| | Upper Bound | Bound Following(int64), Bound Trailing(int64) or CurrentRow. | False, defaults to end of partition | | ||
| | Order By | A list of ordering expressions with sort directions. | False, defaults to unordered | | ||
| | Bounds Type | ROWS or RANGE. ROWS bounds count physical rows. RANGE bounds consider value equivalence based on ordering columns. | False, defaults to RANGE | |
There was a problem hiding this comment.
There is also GROUP, even if not implemented at the moment, would it be worth mentioning it and specifying is not supported yet?
There was a problem hiding this comment.
@asolimando could you cite the reference of GROUP you are referring to?
There was a problem hiding this comment.
It's described in "Section 4.15.14 Windowed tables" of the "Part 2: Foundation (SQL/Foundation)" in my SQL standard 2011 (draft version):
A window ordering group is a maximal set of rows in a window partition that are peers according to the window
ordering. Although the ordering of rows within a window ordering group is implementation-dependent, it is
possible to totally order the window ordering groups of a window partition, as follows: if WOG1 and WOG2
are two window ordering groups contained in the same window partition P, then WOG1 precedes WOG2 if
and only if some row of WOG1 precedes some row of WOG2.
Optionally, a window may define a window frame for each row R. A window frame is always defined relative
to the current row. A window frame is specified by up to four syntactic elements:
- The choice of RANGE, to indicate a logical definition of the window frame by offsetting forward or
backward from the current row by an increment or decrement to the value of the <sort key>; GROUPS,
to indicate a logical definition of the window frame by counting forward or backward a number of window
ordering groups (as defined by the <sort key>s) from the window ordering group containing the current
row; or ROWS, to indicate a physical definition of the window frame, by counting rows forward or backward
from the current row.
- ...
[...]
For what I could check, and up to my best understanding, Postgres' follows the standard in that respect, so its documentation can be referred to.
Please note that this is an optional feature, citing from conformance rules from the SQL:2011 standard:
Without Feature T620, “WINDOW clause: GROUPS option”, conforming SQL language shall not contain
<window frame units> that specifies GROUPS.
| | Lower Bound | Preceding(int64), Following(int64), CurrentRow, or Unbounded. | False, defaults to Unbounded (start of partition) | | ||
| | Upper Bound | Preceding(int64), Following(int64), CurrentRow, or Unbounded. | False, defaults to Unbounded (end of partition) | |
There was a problem hiding this comment.
For defaults, it actually depends on a few additional factors:
Quoting Postgres doc:
The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer. Without ORDER BY, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.
There was a problem hiding this comment.
We don't have to match Postgres default behaviours here, especially given that we've already assigned default behaviours for these and changing these would have implications for systems that currently implement the defaults correctly as given.
Not for this PR, but my actual inclination for window functions would be to get rid of all of the defaults. This stuff is already complicated enough, and trying to reason about it in the presence of defaults just makes it harder. Forcing plan producers to be explicit about this seems easier than trying to pick the correct defaults for everyone to use.
| | Lower Bound | Preceding(int64), Following(int64), CurrentRow, or Unbounded. | False, defaults to Unbounded (start of partition) | | ||
| | Upper Bound | Preceding(int64), Following(int64), CurrentRow, or Unbounded. | False, defaults to Unbounded (end of partition) | |
There was a problem hiding this comment.
We don't have to match Postgres default behaviours here, especially given that we've already assigned default behaviours for these and changing these would have implications for systems that currently implement the defaults correctly as given.
Not for this PR, but my actual inclination for window functions would be to get rid of all of the defaults. This stuff is already complicated enough, and trying to reason about it in the presence of defaults just makes it harder. Forcing plan producers to be explicit about this seems easier than trying to pick the correct defaults for everyone to use.
|
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
|
I would normally finagle the title to pass the conventional commit CI check, but in this case it's complaining about the capitalization of |
|
This PR has been automatically marked as stale because it has not had |
|
This is still relevant, not stale |
Closes #930
BREAKING CHANGE: Window function bindings no longer have default values.
This change is