Skip to content

Conversation

@adriangb
Copy link
Contributor

Summary

This PR adds a new configuration option datafusion.optimizer.evaluate_stable_expressions (default: true) that controls whether stable functions like now(), current_date(), and current_time() are evaluated to literal values during query planning.

When set to false, stable functions are preserved in the plan rather than being converted to literals. This is useful for query rewrites that need to preserve stable function calls.

Changes

  1. Added new config option evaluate_stable_expressions to OptimizerOptions
  2. Modified ConstEvaluator.volatility_ok() to check the config
  3. Updated simplify() methods in now(), current_date(), current_time() to respect the config
  4. Added unit tests and SLT test

Usage

-- Disable stable expression evaluation
SET datafusion.optimizer.evaluate_stable_expressions = false;

-- now() will remain as a function call in the plan
EXPLAIN SELECT now();

Closes #19418

🤖 Generated with Claude Code

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels Dec 20, 2025
@adriangb adriangb requested a review from Jefffrey December 20, 2025 14:36
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 20, 2025
@Jefffrey
Copy link
Contributor

Could you elaborate a bit more on the motivating use case? I'm having a bit of trouble understanding it from the original issue.

@adriangb
Copy link
Contributor Author

adriangb commented Dec 20, 2025

Could you elaborate a bit more on the motivating use case? I'm having a bit of trouble understanding it from the original issue.

Sure yes. I am working on a feature to cache / materialize scan subtrees of queries. I am essentially transforming a query like:

select time_bucket('1 minute', ts)
from t
where ts > now() - interval '1 hour';

Into a materializable subtree:

select time_bucket('1 minute', ts) as __0, ts
from t;

And a query to apply on top of that:

select __0 as "time_bucket('1 minute', ts)"
from __mv
where ts > now() - interval '1 hour';

As part of this I want to be able to pass the SQL text / string through optimizers to generally normalize / optimize the query, but this currently evaluates now(), which would result in materializing:

select time_bucket('1 minute', ts) as __0, ts
from t
where ts > '2025-...';

Because I have no way of differentiating a literal date from an evaluated now().

I added this context to the original issue.

@Jefffrey
Copy link
Contributor

Would that cached query then be passed through the optimizer again when executed (with the config reset to default), or would it be directly executed?

Because now(), current_date() and current_time() don't actually implement invoke, so if they aren't simplified they can't execute physically I believe:

fn invoke_with_args(
&self,
_args: datafusion_expr::ScalarFunctionArgs,
) -> Result<ColumnarValue> {
internal_err!("invoke should not be called on a simplified now() function")
}

fn invoke_with_args(
&self,
_args: datafusion_expr::ScalarFunctionArgs,
) -> Result<ColumnarValue> {
internal_err!(
"invoke should not be called on a simplified current_time() function"
)
}

fn invoke_with_args(
&self,
_args: datafusion_expr::ScalarFunctionArgs,
) -> Result<ColumnarValue> {
internal_err!(
"invoke should not be called on a simplified current_date() function"
)
}

@adriangb
Copy link
Contributor Author

adriangb commented Dec 20, 2025

Would that cached query then be passed through the optimizer again when executed (with the config reset to default), or would it be directly executed?

Because now(), current_date() and current_time() don't actually implement invoke, so if they aren't simplified they can't execute physically I believe:

That's a good point, I had not noticed that. Personally I would have them implement invoke (I'm guessing it's not too hard?). But I don't really need that: I pass the non cached / materialized portion of the query through the optimizer again right before it's executed.

But more generally: it seems reasonable to me to want to pass a query through the optimizer without having any state (is it just time?) dependent evaluation done.

adriangb and others added 2 commits December 20, 2025 10:21
This adds a new configuration option `datafusion.optimizer.evaluate_stable_expressions`
(default: true) that controls whether stable functions like `now()`, `current_date()`,
and `current_time()` are evaluated to literal values during query planning.

When set to false, stable functions are preserved in the plan rather than being
converted to literals. This is useful for query rewrites that need to preserve
stable function calls.

Closes apache#19418

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@adriangb adriangb force-pushed the disable-stable-eval branch from e1420f4 to ed2277b Compare December 20, 2025 16:31
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm finding interesting as I look into this, is does Volatility::Stable actually do anything for the 'stable' UDFs in the DataFusion codebase? (not considering UDFs of downstream users)

Because it seems the stable functions we have (now, current_time, current_date) don't necessarily get 'evaluated' via the volatility check, but instead via simplify, which is why in this PR you need to introduce the config check in each simplify implementation which doesn't seem ideal 🤔

I guess what I'm getting at, is maybe we should remove the simplify implementations entirely and rely only on invoke, and assume the invoke will get evaluated during optimization, meaning theres only a single place to check this new config.

(As for having this config itself, I don't have any strong feelings on it; it makes sense in terms of your use case, but curious to see what others might think)

}

#[test]
fn test_evaluate_stable_expressions_disabled() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think this test is particularly useful, can probably just remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed removed

}

#[test]
fn test_evaluate_stable_expressions_enabled_by_default() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not sure, but I think better place for these test cases should be, datafusion/core/tests/expr_api/simplification.rs

@adriangb
Copy link
Contributor Author

I guess what I'm getting at, is maybe we should remove the simplify implementations entirely and rely only on invoke, and assume the invoke will get evaluated during optimization, meaning theres only a single place to check this new config.

As far as I can tell invoke_with_args doesn't get the current execution time. Any thoughts as to how we'd pipe that in?

@adriangb
Copy link
Contributor Author

I guess what I'm getting at, is maybe we should remove the simplify implementations entirely and rely only on invoke, and assume the invoke will get evaluated during optimization, meaning theres only a single place to check this new config.

As far as I can tell invoke_with_args doesn't get the current execution time. Any thoughts as to how we'd pipe that in?

I think we'd have to go all the way to updating PhysicalExpr::evaluate to from taking RecordBatch to EvaluateArgs which would then contain this runtime info, so then ScalarFunctionExpr can pipe it into invoke_with_args

@Jefffrey
Copy link
Contributor

I suppose we can keep the current implementation as is, since we don't have that many stable functions currently; could be worth raising as a separate issue I reckon for further discussion.

@adriangb
Copy link
Contributor Author

#19470

@adriangb
Copy link
Contributor Author

adriangb commented Dec 24, 2025

I think this setting should also disable volatile expressions. Essentially it should make the optimizer compatible with prepared statements. I'll at least add a test for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate documentation Improvements or additions to documentation functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

option to disable evaluation of stable expressions in optimizer rules

3 participants