DEPR: warn on silent dtype changes during setitem-with-expansion#64678
Draft
jbrockmendel wants to merge 4 commits into
Draft
DEPR: warn on silent dtype changes during setitem-with-expansion#64678jbrockmendel wants to merge 4 commits into
jbrockmendel wants to merge 4 commits into
Conversation
18c931e to
9e4ed27
Compare
Member
Author
|
On the dev call @jorisvandenbossche had a preference for using the concat behavior rather than the reindex behavior, i.e. always-cast rather than PDEP6-only-casting. |
9e4ed27 to
c77b1cd
Compare
Deprecate cases where `ser.loc[new_key] = value` or `df.loc[new_key] = value` silently changes the dtype of the Series/DataFrame. In a future version, the existing dtype will be retained instead of being silently changed. The PDEP-6 exception is preserved: int/uint -> float is still allowed when the value introduces NaN. The warning is issued from `infer_and_maybe_downcast`, which is the function that decides whether values can be held in the original dtype. The check runs from `_post_expansion_casting` (the definitive answer after all intermediate dtype changes have been resolved), not from the forward-cast calls in `_setitem_with_indexer_missing` or `_append_internal`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c77b1cd to
817819d
Compare
Member
Author
|
Revisiting this I'm still not comfortable with the concat behavior. It makes it really easy to get object dtype. And code-wise reindex seems much simpler (e.g. we dont need the cast-back logic we currently have) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Needs disc, as in the issue there was not 100% consensus on whether to enforce PDEP6 behavior (which this PR does) or to allow concat behavior (the status quo in some but not all code paths).
Summary
ser.loc[new_key] = valueordf.loc[new_key] = valuesilently changes the dtype of the Series/DataFrameinfer_and_maybe_downcast, which is the function that decides whether values can be held in the original dtype_post_expansion_casting(the definitive answer after all intermediate dtype changes have been resolved), not from the forward-cast calls in_setitem_with_indexer_missingor_append_internalpivot_tablemargin calculationsMotivation
This is the first step toward simplifying the expansion codepath in
core.indexing. Currently expansion usesconcat_compat(Series) and_append_internal/concat(DataFrame) which do implicit dtype promotion. Once this deprecation is enforced, the expansion can be refactored to usereindexthroughout, which preserves the existing dtype and is simpler.Test plan
🤖 Generated with Claude Code