Skip to content

DONT-MERGE GSOC26: Fix flat-construction early-return in NestedFrame.from_flat/from_lists; add regression tests#459

Draft
Akshat1000Sharma wants to merge 2 commits intolincc-frameworks:mainfrom
Akshat1000Sharma:gsoc26-fix-from-flat-early-return
Draft

DONT-MERGE GSOC26: Fix flat-construction early-return in NestedFrame.from_flat/from_lists; add regression tests#459
Akshat1000Sharma wants to merge 2 commits intolincc-frameworks:mainfrom
Akshat1000Sharma:gsoc26-fix-from-flat-early-return

Conversation

@Akshat1000Sharma
Copy link

Fixed #413

Summary

This PR fixes flat-construction behavior in NestedFrame.from_flat and NestedFrame.from_lists when no nested/list columns are assigned, and adds regression tests to lock in the intended behavior.

This change is submitted as part of GSOC26 (DONT-MERGE as requested).


Problem

Previously:

  • from_flat always flowed into join_nested logic even when no nested columns were assigned.
  • from_lists raised a ValueError when no list columns were assigned.
  • An early-return path incorrectly returned cls(df.copy()), which could include unintended columns when base_columns was a subset.

This led to inconsistent behavior for flat input cases.


What This PR Changes

1. NestedFrame.from_flat

  • base_columns is now optional (None by default).
  • Resolution logic:
    • If both base_columns and nested_columns are None, treat all columns as base.
    • If base_columns is None and nested_columns is provided, infer base as the complement.
  • Added early-return when len(nested_columns) == 0:
    • Returns cls(df[base_columns].copy())
    • Ensures column selection matches normal path behavior.

2. NestedFrame.from_lists

  • If no list columns are assigned, return a flat NestedFrame instead of raising ValueError.
  • Guard added to correctly handle base_columns is None.

3. Regression Tests

Added/updated tests in:
tests/nested_pandas/nestedframe/test_nestedframe.py

  • test_from_flat_all_base_columns_returns_flat
  • test_from_flat_defaults_to_all_base_columns
  • Updated test_from_lists to validate flat return instead of error

Correctness & Safety

  • All existing from_flat call-sites with nested columns remain unaffected.
  • Early-return now mirrors the standard column-selection logic.
  • on index handling remains intact.
  • Behavior is consistent with plain NestedFrame(df) construction for flat cases.

Testing

Targeted tests:

python -m pytest tests/nested_pandas/nestedframe/test_nestedframe.py -k "from_flat or from_lists"

Test results:
Targeted tests: 11/11 passed
Full test_nestedframe.py: 73/74 passed (1 pre-existing failure)
All series tests (test_accessor, test_dtype, test_ext_array, test_packer): 267/267 passed
All doctests in core.py: 19/19 passed

All series-related tests pass.


Please let me know if there are any issues or suggestions

Copy link
Collaborator

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR message looks 100% AI-generated (Claude Code?), which made me change the project description on Slack and introduce the AI policy. Since this PR was submitted before the policy was introduced, we will consider it, but I kindly ask you to make a short, human-scale PR description instead of the AI agent report, which is addressed to you, not to us.

There is a discussion at #413 on how this method should work with default arguments. From my perspective, it should do the opposite of what was implemented here: assume that all columns are supposed to be nested. The current implementation is not different from NestedFrame(flat_df), which seems to be a weird interface duplication for me. But don't take my words for it until the discussion at #413 is resolved.

if nested_columns is None:
nested_columns = [col for col in df.columns if col not in base_columns]
if len(nested_columns) == 0:
return cls(df[base_columns].copy())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy() here?

Comment on lines +693 to +695
if base_columns is not None:
return cls(df[base_columns].copy())
return cls(df.copy())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

@delucchi-cmu delucchi-cmu added the GSOC26: WIP In-progress PRs for Google Summer of Code 2026 applicants label Mar 3, 2026
@hombit hombit marked this pull request as draft March 4, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GSOC26: WIP In-progress PRs for Google Summer of Code 2026 applicants

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed to create flat dataframe

3 participants