-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add option to disable evaluation of stable expressions in optimizer #19426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Could you elaborate a bit more on the motivating use case? I'm having a bit of trouble understanding it from the original issue. |
Sure yes. I am working on a feature to cache / materialize scan subtrees of queries. I am essentially transforming a query like: select time_bucket('1 minute', ts)
from t
where ts > now() - interval '1 hour';Into a materializable subtree: select time_bucket('1 minute', ts) as __0, ts
from t;And a query to apply on top of that: select __0 as "time_bucket('1 minute', ts)"
from __mv
where ts > now() - interval '1 hour';As part of this I want to be able to pass the SQL text / string through optimizers to generally normalize / optimize the query, but this currently evaluates select time_bucket('1 minute', ts) as __0, ts
from t
where ts > '2025-...';Because I have no way of differentiating a literal date from an evaluated I added this context to the original issue. |
|
Would that cached query then be passed through the optimizer again when executed (with the config reset to default), or would it be directly executed? Because datafusion/datafusion/functions/src/datetime/now.rs Lines 115 to 120 in d8e68a4
datafusion/datafusion/functions/src/datetime/current_time.rs Lines 87 to 94 in d8e68a4
datafusion/datafusion/functions/src/datetime/current_date.rs Lines 87 to 94 in d8e68a4
|
That's a good point, I had not noticed that. Personally I would have them implement invoke (I'm guessing it's not too hard?). But I don't really need that: I pass the non cached / materialized portion of the query through the optimizer again right before it's executed. But more generally: it seems reasonable to me to want to pass a query through the optimizer without having any state (is it just time?) dependent evaluation done. |
This adds a new configuration option `datafusion.optimizer.evaluate_stable_expressions` (default: true) that controls whether stable functions like `now()`, `current_date()`, and `current_time()` are evaluated to literal values during query planning. When set to false, stable functions are preserved in the plan rather than being converted to literals. This is useful for query rewrites that need to preserve stable function calls. Closes apache#19418 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
e1420f4 to
ed2277b
Compare
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'm finding interesting as I look into this, is does Volatility::Stable actually do anything for the 'stable' UDFs in the DataFusion codebase? (not considering UDFs of downstream users)
Because it seems the stable functions we have (now, current_time, current_date) don't necessarily get 'evaluated' via the volatility check, but instead via simplify, which is why in this PR you need to introduce the config check in each simplify implementation which doesn't seem ideal 🤔
I guess what I'm getting at, is maybe we should remove the simplify implementations entirely and rely only on invoke, and assume the invoke will get evaluated during optimization, meaning theres only a single place to check this new config.
(As for having this config itself, I don't have any strong feelings on it; it makes sense in terms of your use case, but curious to see what others might think)
| } | ||
|
|
||
| #[test] | ||
| fn test_evaluate_stable_expressions_disabled() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think this test is particularly useful, can probably just remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed removed
| } | ||
|
|
||
| #[test] | ||
| fn test_evaluate_stable_expressions_enabled_by_default() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Not sure, but I think better place for these test cases should be, datafusion/core/tests/expr_api/simplification.rs
As far as I can tell |
I think we'd have to go all the way to updating |
|
I suppose we can keep the current implementation as is, since we don't have that many stable functions currently; could be worth raising as a separate issue I reckon for further discussion. |
|
I think this setting should also disable volatile expressions. Essentially it should make the optimizer compatible with prepared statements. I'll at least add a test for this. |
Summary
This PR adds a new configuration option
datafusion.optimizer.evaluate_stable_expressions(default:true) that controls whether stable functions likenow(),current_date(), andcurrent_time()are evaluated to literal values during query planning.When set to
false, stable functions are preserved in the plan rather than being converted to literals. This is useful for query rewrites that need to preserve stable function calls.Changes
evaluate_stable_expressionstoOptimizerOptionsConstEvaluator.volatility_ok()to check the configsimplify()methods innow(),current_date(),current_time()to respect the configUsage
Closes #19418
🤖 Generated with Claude Code