Skip to content

SQL Unparser loses subquery structure when SubqueryAlias directly wraps an Aggregate (PR to follow) #21098

@yonatan-sevenai

Description

@yonatan-sevenai

Describe the bug

When the SQL unparser encounters a SubqueryAlias node whose direct child is an Aggregate (without an intermediate Projection), it flattens the subquery into a simple table alias, losing the aggregate entirely.

Root cause

The SubqueryAlias handler in select_to_sql_recursively (datafusion/sql/src/unparser/plan.rs) calls
subquery_alias_inner_query_and_columns (which only unwraps Projection children) and unparse_table_scan_pushdown (which only
handles TableScan/SubqueryAlias/Projection). When both return nothing useful for an Aggregate child, the code recurses directly
into the Aggregate, merging its GROUP BY into the outer SELECT instead of emitting a derived subquery.

Additional context

This also affects other plan types that build their own SELECT clauses (Window, Sort, Limit, Union) when directly wrapped by
SubqueryAlias.

To Reproduce

Manually construct a logical plan with a join where the right side is SubqueryAlias > Aggregate:

SELECT j1.j1_string FROM j1 JOIN (SELECT max(j2_id) AS max_id FROM j2) AS b ON j1.j1_id = b.max_id

This bug doesn't manifest when the SQL is parsed from a string, as the parser inserts a Projection between the SubqueryAlias and Aggregate, so the roundtrip works. The bug only manifests with manually constructed plans where SubqueryAlias directly wraps Aggregate.

let right_scan = table_scan(Some("j2"), &j2_schema, None)?.build()?;
let right_agg = LogicalPlanBuilder::from(right_scan)
    .aggregate(vec![] as Vec<Expr>, vec![max(col("j2.j2_id")).alias("max_id")])?
    .build()?;
let right_subquery = subquery_alias(right_agg, "b")?;

let left_scan = table_scan(Some("j1"), &j1_schema, None)?.build()?;
let plan = LogicalPlanBuilder::from(left_scan)
    .join(right_subquery, JoinType::Inner,
        (vec![Column::from_qualified_name("j1.j1_id")],
         vec![Column::from_qualified_name("b.max_id")]),
        None)?
    .project(vec![col("j1.j1_string")])?
    .build()?;

let sql = Unparser::default().plan_to_sql(&plan)?.to_string();

Current behavior

SELECT j1.j1_string FROM j1 INNER JOIN j2 AS b ON j1.j1_id = b.max_id

The aggregate subquery is completely dropped — (SELECT max(j2_id) AS max_id FROM j2) AS b becomes just j2 AS b.

Expected behavior

SELECT j1.j1_string FROM j1 INNER JOIN (SELECT max(j2.j2_id) AS max_id FROM j2) AS b ON j1.j1_id = b.max_id

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions