feat!: remove deprecated time, timestamp and timestamp_tz types#994
feat!: remove deprecated time, timestamp and timestamp_tz types#994nielspardon wants to merge 3 commits intosubstrait-io:mainfrom
Conversation
250d6ab to
de5bf21
Compare
extensions/functions_datetime.yaml
Outdated
| - name: precision | ||
| value: i8 | ||
| return: |- | ||
| precision = integer_parameter(precision) | ||
| precision_timestamp<precision> |
There was a problem hiding this comment.
introducing a new precision argument here, to make the return type precision predictable
There was a problem hiding this comment.
I have no idea what the level of support is for interger_parameter actually is in the downstream libraries. It doesn't seem very well specified, considering that the only line about it is:
integer_parameter(string name) => integer
I propose we tackle this in a separate PR where we can have dedicated discussion about it.
There was a problem hiding this comment.
btw I had already created an issue for improving the documentation of these return type functions and had asked @jacques-n if he wants to contribute since he originally introduced these functions (unfortunately did not get any reaction from him) #959
| - name: precision | ||
| value: i8 | ||
| return: |- | ||
| precision = integer_parameter(precision) | ||
| precision_timestamp_tz<precision> |
There was a problem hiding this comment.
introducing a new precision argument here, to make the return type precision predictable
| - name: precision | ||
| value: i8 | ||
| return: |- | ||
| precision = integer_parameter(precision) | ||
| precision_time<precision> |
There was a problem hiding this comment.
introducing a new precision argument here, to make the return type precision predictable
| - name: precision | ||
| value: i8 | ||
| return: |- | ||
| precision = integer_parameter(precision) | ||
| precision_timestamp_tz<precision> |
There was a problem hiding this comment.
introducing a new precision argument here, to make the return type precision predictable
| - name: precision | ||
| value: i8 | ||
| return: |- | ||
| precision = integer_parameter(precision) | ||
| precision_timestamp_tz<precision> |
There was a problem hiding this comment.
introducing a new precision argument here, to make the return type precision predictable
bc6f5bc to
a371761
Compare
| string=re.compile(r'"(?:[^"\\]|\\.)*"'), | ||
| number=re.compile(r"[0-9]+"), | ||
| symbol=re.compile(r"[=;{}\[\]]"), | ||
| symbol=re.compile(r"[=;,{}\[\]]"), |
There was a problem hiding this comment.
since we now have the reserved 14, 17, 27; statements in the proto files we need to add comma , to the symbol regex
There was a problem hiding this comment.
Overall, okay with the change but I want us to discuss and document deprecation and release policy... for instance, @vbarua proposed to hold off a breaking change and make an explicit release with the breaking change (#971 (comment)) -- which to me seems a good idea.
Will create a discussion item for this.
a371761 to
4dc6368
Compare
|
I haven't investigated myself yet but has this migration been managed within the reasonably maintained implementation libraries? (For us, this means |
Here the problem is that there is no clean migration path right now since some datetime function variants simply do not exist for the new types in the current default extensions. We would need to at least introduce the new function variants. Even then I would argue the time and timestamp types have been deprecated for a long time and it should be ok to force a migration to the new types by removing the old types. |
|
My previous comment still holds so I'm okay to do this breaking change. But since we have deprecation mechanism, that is another option and delay this removal a few releases (i.e., make this PR as official deprecation then remove). |
I guess you mean what you introduced in #1014 with deprecation mechanism which is for extensions. This PR mainly removes the data types from the protobuf files which have been deprecated for a while now and as consequence of that removes / updates functions using the data types. The protobuf deprecation already led to deprecation warnings in the code generated from the protobufs (at least in Java) ever since they were marked deprecated. Marking the extensions deprecated does not lead to any warnings right now and is mainly informational. Marking the extensions using long deprecated types as deprecated first would satisfy the new process but feels a bit redundant for such a longstanding deprecation especially since there are no warning mechanisms in place anyway. |
|
Yeah I'm +1 with @nielspardon about just doing this breaking change. I just need to validate that Datafusion consumer actually accepts the new value. I will do that right now! |
|
Seems like we are good on the DF side of things: Type Handling
Literal Handling
|
benbellick
left a comment
There was a problem hiding this comment.
Mostly looks good to me. Only a few minor things I flagged in comments. Thanks!
| "num_aggregate_functions": 29, | ||
| "num_scalar_functions": 170, | ||
| "num_window_functions": 11, | ||
| "num_function_overloads": 533 |
There was a problem hiding this comment.
Why did this one go up so much?
There was a problem hiding this comment.
I've been asking Claude for an explanation and it gave a very detailed one:
When the deprecated time, timestamp, and timestamp_tz types were removed, they were replaced with parameterized types like precision_time
and precision_timestamp
.
The massive increase in overloads (533 → 4,482) is due to parametric type expansion. Previously, functions like extract(HOUR, x: timestamp) had a single overload. Now with precision_timestamp
, the type system generates separate overloads for each supported precision value (microsecond, millisecond, etc.), causing a combinatorial explosion in the number of overload signatures.
For example:
Old: 1 overload for extract(HOUR, x: timestamp)
New: Multiple overloads for extract(HOUR, x: precision_timestamp), extract(HOUR, x: precision_timestamp), etc.
This pattern repeated across many datetime functions explains the ~8.4× increase. The later jump from 4,482 → 8,466 in commit 63f3fd5 likely comes from additional function additions or test coverage adjustments.
There was a problem hiding this comment.
Hmm... if that is the case, what do you think about only deleting the old impls in this PR, and we can add new ones if we need in a subsequent one?
There was a problem hiding this comment.
I'm only adding new impls when there would be no equivalent impls after removing the old ones. This PR retain parity in terms of functionality with before the change aka if there was an impl with time, timestamp, timestamp_tz before and there wouldn't be one after removing all impls using those then I'm adding the equivalent precision impl. If we do these in separate PRs then we risk having a Substrait release that has less functionality than before.
There was a problem hiding this comment.
I have no idea what's going on here. The explanation that Claude Code gave does not match the current code on main. Currently, num_function_overloads is essentially the total number of impls across all extension YAMLs. I rebased this PR branch on latest main and now num_function_overloads decreases now from 533 to 498 which sounds more reasonable than the previous 8,466.
extensions/functions_datetime.yaml
Outdated
| - name: precision | ||
| value: i8 | ||
| return: |- | ||
| precision = integer_parameter(precision) | ||
| precision_timestamp<precision> |
There was a problem hiding this comment.
I have no idea what the level of support is for interger_parameter actually is in the downstream libraries. It doesn't seem very well specified, considering that the only line about it is:
integer_parameter(string name) => integer
I propose we tackle this in a separate PR where we can have dedicated discussion about it.
4dc6368 to
0eecb71
Compare
BREAKING CHANGE Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
63f3fd5 to
220f284
Compare
BREAKING CHANGE: removes the deprecated
time,timestampandtimestamp_tztypes from:closes #980
This change is