-
Notifications
You must be signed in to change notification settings - Fork 2k
SQL Unparser loses subquery structure when SubqueryAlias directly wraps an Aggregate (PR to follow) #21098
Description
Describe the bug
When the SQL unparser encounters a SubqueryAlias node whose direct child is an Aggregate (without an intermediate Projection), it flattens the subquery into a simple table alias, losing the aggregate entirely.
Root cause
The SubqueryAlias handler in select_to_sql_recursively (datafusion/sql/src/unparser/plan.rs) calls
subquery_alias_inner_query_and_columns (which only unwraps Projection children) and unparse_table_scan_pushdown (which only
handles TableScan/SubqueryAlias/Projection). When both return nothing useful for an Aggregate child, the code recurses directly
into the Aggregate, merging its GROUP BY into the outer SELECT instead of emitting a derived subquery.
Additional context
This also affects other plan types that build their own SELECT clauses (Window, Sort, Limit, Union) when directly wrapped by
SubqueryAlias.
To Reproduce
Manually construct a logical plan with a join where the right side is SubqueryAlias > Aggregate:
SELECT j1.j1_string FROM j1 JOIN (SELECT max(j2_id) AS max_id FROM j2) AS b ON j1.j1_id = b.max_idThis bug doesn't manifest when the SQL is parsed from a string, as the parser inserts a Projection between the SubqueryAlias and Aggregate, so the roundtrip works. The bug only manifests with manually constructed plans where SubqueryAlias directly wraps Aggregate.
let right_scan = table_scan(Some("j2"), &j2_schema, None)?.build()?;
let right_agg = LogicalPlanBuilder::from(right_scan)
.aggregate(vec![] as Vec<Expr>, vec![max(col("j2.j2_id")).alias("max_id")])?
.build()?;
let right_subquery = subquery_alias(right_agg, "b")?;
let left_scan = table_scan(Some("j1"), &j1_schema, None)?.build()?;
let plan = LogicalPlanBuilder::from(left_scan)
.join(right_subquery, JoinType::Inner,
(vec![Column::from_qualified_name("j1.j1_id")],
vec![Column::from_qualified_name("b.max_id")]),
None)?
.project(vec![col("j1.j1_string")])?
.build()?;
let sql = Unparser::default().plan_to_sql(&plan)?.to_string();Current behavior
SELECT j1.j1_string FROM j1 INNER JOIN j2 AS b ON j1.j1_id = b.max_idThe aggregate subquery is completely dropped — (SELECT max(j2_id) AS max_id FROM j2) AS b becomes just j2 AS b.
Expected behavior
SELECT j1.j1_string FROM j1 INNER JOIN (SELECT max(j2.j2_id) AS max_id FROM j2) AS b ON j1.j1_id = b.max_idAdditional context
No response