Skip to content

feat!: remove deprecated time, timestamp and timestamp_tz types#994

Open
nielspardon wants to merge 3 commits intosubstrait-io:mainfrom
nielspardon:par-rm-depr-datetime
Open

feat!: remove deprecated time, timestamp and timestamp_tz types#994
nielspardon wants to merge 3 commits intosubstrait-io:mainfrom
nielspardon:par-rm-depr-datetime

Conversation

@nielspardon
Copy link
Copy Markdown
Member

@nielspardon nielspardon commented Mar 5, 2026

BREAKING CHANGE: removes the deprecated time, timestamp and timestamp_tz types from:

  • proto files
  • dialect schema
  • extension yamls
  • ANTLR grammar
  • test cases
  • coverage python code
  • documentation

closes #980


This change is Reviewable

@nielspardon nielspardon force-pushed the par-rm-depr-datetime branch 4 times, most recently from 250d6ab to de5bf21 Compare March 5, 2026 15:27
Comment on lines +205 to +209
- name: precision
value: i8
return: |-
precision = integer_parameter(precision)
precision_timestamp<precision>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introducing a new precision argument here, to make the return type precision predictable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the level of support is for interger_parameter actually is in the downstream libraries. It doesn't seem very well specified, considering that the only line about it is:

  • integer_parameter(string name) => integer

I propose we tackle this in a separate PR where we can have dedicated discussion about it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I had already created an issue for improving the documentation of these return type functions and had asked @jacques-n if he wants to contribute since he originally introduced these functions (unfortunately did not get any reaction from him) #959

Comment on lines +510 to +514
- name: precision
value: i8
return: |-
precision = integer_parameter(precision)
precision_timestamp_tz<precision>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introducing a new precision argument here, to make the return type precision predictable

Comment on lines +542 to +546
- name: precision
value: i8
return: |-
precision = integer_parameter(precision)
precision_time<precision>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introducing a new precision argument here, to make the return type precision predictable

Comment on lines +579 to +583
- name: precision
value: i8
return: |-
precision = integer_parameter(precision)
precision_timestamp_tz<precision>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introducing a new precision argument here, to make the return type precision predictable

Comment on lines +589 to +593
- name: precision
value: i8
return: |-
precision = integer_parameter(precision)
precision_timestamp_tz<precision>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introducing a new precision argument here, to make the return type precision predictable

@nielspardon nielspardon force-pushed the par-rm-depr-datetime branch 2 times, most recently from bc6f5bc to a371761 Compare March 5, 2026 16:04
string=re.compile(r'"(?:[^"\\]|\\.)*"'),
number=re.compile(r"[0-9]+"),
symbol=re.compile(r"[=;{}\[\]]"),
symbol=re.compile(r"[=;,{}\[\]]"),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we now have the reserved 14, 17, 27; statements in the proto files we need to add comma , to the symbol regex

Copy link
Copy Markdown
Contributor

@yongchul yongchul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, okay with the change but I want us to discuss and document deprecation and release policy... for instance, @vbarua proposed to hold off a breaking change and make an explicit release with the breaking change (#971 (comment)) -- which to me seems a good idea.

Will create a discussion item for this.

@nielspardon nielspardon force-pushed the par-rm-depr-datetime branch from a371761 to 4dc6368 Compare March 9, 2026 15:32
@benbellick
Copy link
Copy Markdown
Member

I haven't investigated myself yet but has this migration been managed within the reasonably maintained implementation libraries? (For us, this means substrait-{go, java, py, rs})

@nielspardon
Copy link
Copy Markdown
Member Author

I haven't investigated myself yet but has this migration been managed within the reasonably maintained implementation libraries? (For us, this means substrait-{go, java, py, rs})

Here the problem is that there is no clean migration path right now since some datetime function variants simply do not exist for the new types in the current default extensions. We would need to at least introduce the new function variants.

Even then I would argue the time and timestamp types have been deprecated for a long time and it should be ok to force a migration to the new types by removing the old types.

@yongchul
Copy link
Copy Markdown
Contributor

My previous comment still holds so I'm okay to do this breaking change. But since we have deprecation mechanism, that is another option and delay this removal a few releases (i.e., make this PR as official deprecation then remove).

@nielspardon
Copy link
Copy Markdown
Member Author

My previous comment still holds so I'm okay to do this breaking change. But since we have deprecation mechanism, that is another option and delay this removal a few releases (i.e., make this PR as official deprecation then remove).

I guess you mean what you introduced in #1014 with deprecation mechanism which is for extensions. This PR mainly removes the data types from the protobuf files which have been deprecated for a while now and as consequence of that removes / updates functions using the data types. The protobuf deprecation already led to deprecation warnings in the code generated from the protobufs (at least in Java) ever since they were marked deprecated. Marking the extensions deprecated does not lead to any warnings right now and is mainly informational. Marking the extensions using long deprecated types as deprecated first would satisfy the new process but feels a bit redundant for such a longstanding deprecation especially since there are no warning mechanisms in place anyway.

@benbellick
Copy link
Copy Markdown
Member

Yeah I'm +1 with @nielspardon about just doing this breaking change. I just need to validate that Datafusion consumer actually accepts the new value. I will do that right now!

@benbellick
Copy link
Copy Markdown
Member

Seems like we are good on the DF side of things:

Type Handling

Literal Handling

Copy link
Copy Markdown
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me. Only a few minor things I flagged in comments. Thanks!

"num_aggregate_functions": 29,
"num_scalar_functions": 170,
"num_window_functions": 11,
"num_function_overloads": 533
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this one go up so much?

Copy link
Copy Markdown
Member Author

@nielspardon nielspardon Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been asking Claude for an explanation and it gave a very detailed one:

When the deprecated time, timestamp, and timestamp_tz types were removed, they were replaced with parameterized types like precision_time

and precision_timestamp

.

The massive increase in overloads (533 → 4,482) is due to parametric type expansion. Previously, functions like extract(HOUR, x: timestamp) had a single overload. Now with precision_timestamp

, the type system generates separate overloads for each supported precision value (microsecond, millisecond, etc.), causing a combinatorial explosion in the number of overload signatures.

For example:

Old: 1 overload for extract(HOUR, x: timestamp)
New: Multiple overloads for extract(HOUR, x: precision_timestamp), extract(HOUR, x: precision_timestamp), etc.
This pattern repeated across many datetime functions explains the ~8.4× increase. The later jump from 4,482 → 8,466 in commit 63f3fd5 likely comes from additional function additions or test coverage adjustments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... if that is the case, what do you think about only deleting the old impls in this PR, and we can add new ones if we need in a subsequent one?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only adding new impls when there would be no equivalent impls after removing the old ones. This PR retain parity in terms of functionality with before the change aka if there was an impl with time, timestamp, timestamp_tz before and there wouldn't be one after removing all impls using those then I'm adding the equivalent precision impl. If we do these in separate PRs then we risk having a Substrait release that has less functionality than before.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what's going on here. The explanation that Claude Code gave does not match the current code on main. Currently, num_function_overloads is essentially the total number of impls across all extension YAMLs. I rebased this PR branch on latest main and now num_function_overloads decreases now from 533 to 498 which sounds more reasonable than the previous 8,466.

Comment on lines +205 to +209
- name: precision
value: i8
return: |-
precision = integer_parameter(precision)
precision_timestamp<precision>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the level of support is for interger_parameter actually is in the downstream libraries. It doesn't seem very well specified, considering that the only line about it is:

  • integer_parameter(string name) => integer

I propose we tackle this in a separate PR where we can have dedicated discussion about it.

@nielspardon nielspardon force-pushed the par-rm-depr-datetime branch from 4dc6368 to 0eecb71 Compare March 20, 2026 19:48
BREAKING CHANGE

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
@nielspardon nielspardon force-pushed the par-rm-depr-datetime branch from 63f3fd5 to 220f284 Compare March 23, 2026 12:43
@nielspardon nielspardon added the PMC Ready PRs ready for review by PMCs label Mar 23, 2026
@nielspardon nielspardon self-assigned this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PMC Ready PRs ready for review by PMCs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

drop support for deprecated time types

3 participants