DONT-MERGE GSOC26: Fix flat-construction early-return in NestedFrame.from_flat/from_lists; add regression tests#459
Conversation
…ists with regression tests
hombit
left a comment
There was a problem hiding this comment.
PR message looks 100% AI-generated (Claude Code?), which made me change the project description on Slack and introduce the AI policy. Since this PR was submitted before the policy was introduced, we will consider it, but I kindly ask you to make a short, human-scale PR description instead of the AI agent report, which is addressed to you, not to us.
There is a discussion at #413 on how this method should work with default arguments. From my perspective, it should do the opposite of what was implemented here: assume that all columns are supposed to be nested. The current implementation is not different from NestedFrame(flat_df), which seems to be a weird interface duplication for me. But don't take my words for it until the discussion at #413 is resolved.
| if nested_columns is None: | ||
| nested_columns = [col for col in df.columns if col not in base_columns] | ||
| if len(nested_columns) == 0: | ||
| return cls(df[base_columns].copy()) |
| if base_columns is not None: | ||
| return cls(df[base_columns].copy()) | ||
| return cls(df.copy()) |
Fixed #413
Summary
This PR fixes flat-construction behavior in
NestedFrame.from_flatandNestedFrame.from_listswhen no nested/list columns are assigned, and adds regression tests to lock in the intended behavior.This change is submitted as part of GSOC26 (DONT-MERGE as requested).
Problem
Previously:
from_flatalways flowed intojoin_nestedlogic even when no nested columns were assigned.from_listsraised aValueErrorwhen no list columns were assigned.cls(df.copy()), which could include unintended columns whenbase_columnswas a subset.This led to inconsistent behavior for flat input cases.
What This PR Changes
1.
NestedFrame.from_flatbase_columnsis now optional (Noneby default).base_columnsandnested_columnsareNone, treat all columns as base.base_columnsisNoneandnested_columnsis provided, infer base as the complement.len(nested_columns) == 0:cls(df[base_columns].copy())2.
NestedFrame.from_listsNestedFrameinstead of raisingValueError.base_columns is None.3. Regression Tests
Added/updated tests in:
tests/nested_pandas/nestedframe/test_nestedframe.pytest_from_flat_all_base_columns_returns_flattest_from_flat_defaults_to_all_base_columnstest_from_liststo validate flat return instead of errorCorrectness & Safety
from_flatcall-sites with nested columns remain unaffected.onindex handling remains intact.NestedFrame(df)construction for flat cases.Testing
Targeted tests:
python -m pytest tests/nested_pandas/nestedframe/test_nestedframe.py -k "from_flat or from_lists"Test results:
Targeted tests: 11/11 passed
Full test_nestedframe.py: 73/74 passed (1 pre-existing failure)
All series tests (test_accessor, test_dtype, test_ext_array, test_packer): 267/267 passed
All doctests in core.py: 19/19 passed
All series-related tests pass.
Please let me know if there are any issues or suggestions