Fix unionByName to properly handle missing columns from both DataFrames #243

mariotaddeucci · 2026-01-02T23:27:00Z

When allowMissingColumns=True, the method now correctly handles missing columns from both the left and right DataFrames by:

Adding missing columns from the right DataFrame to the left as NULL
Ensuring all columns from the left DataFrame are present in the right
Properly aligning column order to match Spark's behavior

This ensures the union result contains all columns from both DataFrames, with NULL values where columns are missing, matching PySpark behavior.

When allowMissingColumns=True, the method now correctly handles missing columns from both the left and right DataFrames by: - Adding missing columns from the right DataFrame to the left as NULL - Ensuring all columns from the left DataFrame are present in the right - Properly aligning column order to match Spark's behavior This ensures the union result contains all columns from both DataFrames, with NULL values where columns are missing, matching PySpark behavior.

Copilot

Pull request overview

This PR fixes the unionByName method to properly handle missing columns from both DataFrames when allowMissingColumns=True. Previously, the method only handled missing columns from the right DataFrame, but not from the left one.

Key Changes:

Updated the logic to add NULL columns for missing columns from both DataFrames
Column order now matches Spark's behavior by prioritizing the left DataFrame's schema
Added a test case to verify the reversed scenario works correctly

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
duckdb/experimental/spark/sql/dataframe.py	Rewrote the `unionByName` implementation to handle missing columns bidirectionally and align columns properly before performing the union
tests/fast/spark/test_spark_union_by_name.py	Added test case `test_union_by_name_allow_missing_cols_rev` to verify the fix works when the DataFrame with fewer columns is on the left side

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

duckdb/experimental/spark/sql/dataframe.py

Co-authored-by: Copilot <[email protected]>

evertlammerts · 2026-01-06T17:32:43Z

Can you fix the linting and formatting errors please? See https://duckdb.org/docs/stable/dev/building/python#3-enable-pre-commit-hooks for guidance.

evertlammerts

formatting

Copilot AI review requested due to automatic review settings January 2, 2026 23:27

Copilot started reviewing on behalf of mariotaddeucci January 2, 2026 23:27 View session

Copilot AI reviewed Jan 2, 2026

View reviewed changes

duckdb/experimental/spark/sql/dataframe.py Outdated Show resolved Hide resolved

duckdb/experimental/spark/sql/dataframe.py Outdated Show resolved Hide resolved

mariotaddeucci and others added 3 commits January 2, 2026 20:33

Update duckdb/experimental/spark/sql/dataframe.py

b8767db

Co-authored-by: Copilot <[email protected]>

Update duckdb/experimental/spark/sql/dataframe.py

cabba0d

Co-authored-by: Copilot <[email protected]>

Merge branch 'main' into fix-unionbyname-missing-columns

6794f0a

evertlammerts requested changes Jan 6, 2026

View reviewed changes

mariotaddeucci and others added 2 commits January 6, 2026 21:38

fix formatting

8956d0e

Merge branch 'main' into fix-unionbyname-missing-columns

654b5b5

mariotaddeucci requested a review from evertlammerts January 7, 2026 01:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix unionByName to properly handle missing columns from both DataFrames #243

Fix unionByName to properly handle missing columns from both DataFrames #243

mariotaddeucci commented Jan 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

evertlammerts commented Jan 6, 2026

Uh oh!

evertlammerts left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix unionByName to properly handle missing columns from both DataFrames #243

Are you sure you want to change the base?

Fix unionByName to properly handle missing columns from both DataFrames #243

Conversation

mariotaddeucci commented Jan 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

evertlammerts commented Jan 6, 2026

Uh oh!

evertlammerts left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants