PERF: preserve block memory layout in Block.copy (GH#60469) by jbrockmendel · Pull Request #65302 · pandas-dev/pandas

jbrockmendel · 2026-04-19T20:00:33Z

closes PERF: regression on mean(axis=1) compared to old pandas version #60469
supersedes PERF: DataFrame.copy preserve row/column order #44871 (closed as stale 2022)

Block.copy calls values.copy() without specifying order. numpy's default is order="C", which flips the memory layout of non-C-contiguous blocks. Since BlockManager stores data transposed relative to the user-facing shape, a user-facing C-contiguous DataFrame has F-contiguous block storage — so after .copy() the user ends up with F-contiguous .values, and e.g. mean(axis=1) walks the slow stride. Passing order="K" preserves the original layout.

Repro from #60469 on this branch:

df_nan.mean(axis=1): 7.24 ms → 7.24 ms
df_nan_copy.mean(axis=1): 18.73 ms → 7.43 ms

ASV results

This PR was the subject of #44871, which was closed as stale in 2022 after an ASV run showed real wide-frame arithmetic regressions (FrameWithFrameWide.time_op_different_blocks was 2.06× slower). A full asv continuous run on the current tree shows those regressions are no longer present — presumably resolved by internals refactoring over the last four years.

Current run: ~120 benchmarks improved (0.45×–0.91×), 3 apparent regressions. Re-running the 3 regressions with more repeats showed all were noise from concurrent machine activity (BENCHMARKS NOT SIGNIFICANTLY CHANGED).

Notable improvements (sampling):

Ratio	Benchmark
0.45×	`index_cached_properties.IndexCache.time_engine('TimedeltaIndex')`
0.62×	`strings.Methods.time_wrap('string[pyarrow]')`
0.68×	`indexing.Setitem.time_setitem_list`
0.72×	`arithmetic.OpWithFillValue.time_frame_op_with_fill_value_no_nas`
0.75×	`sparse.Arithmetic.time_make_union`
0.76×	`multiindex_object.Unique.time_unique_dups(('Int64', <NA>))`
0.78×	`arithmetic.NumericInferOps.time_add(float64)`
0.78×	`series_methods.ToNumpy.time_to_numpy_copy`

Caveat: the ASV run had some concurrent machine activity, so per-benchmark ratios are directional, not quantitative. No 2×-style regression like the 2021 one appears; the targeted re-run of the three flagged regressions cleared them.

Pass the fortran-ordered transpose to DataFrame so per-column .values remain contiguous, matching the layout of the DataFrame that was written (GH#22073, GH#60469). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mroeschke · 2026-04-21T17:22:23Z

Thanks @jbrockmendel

…h-origin * upstream/main: (31 commits) DOC:Missing r in your (pandas-dev#65323) DOC: fix grammar in the .dt accessor section (pandas-dev#65325) REGR: restore rank() for ExtensionArrays with custom values for sorting (pandas-dev#64976) BUG: MultiIndex.get_loc returns scalar for unique key in non-unique index (pandas-dev#65234) BUG/TST: add test for _cast_pointwise_result robustness + fix some cases (pandas-dev#65318) BUG: fix .loc with tuple key on MultiIndex with IntervalIndex level (pandas-dev#65239) BUG: permit building from source with mingw (pandas-dev#64849) BUG: DataFrame.loc setitem with list-like value on single-column EA DataFrame (pandas-dev#65241) PERF: preserve block memory layout in Block.copy (GH#60469) (pandas-dev#65302) PERF: short-circuit sort_index(level=...) on monotonic non-MultiIndex (pandas-dev#65279) BUG: fix FloatingArray.astype(str) crash with distinguish_nan_and_na=True (pandas-dev#65038) BUG: fix to_timedelta ignoring unit for mixed round/non-round floats (pandas-dev#65170) BUG: DataFrame.loc preserves original index name when key is an Index (pandas-dev#65229) REF: continue moving freq management off DatetimeArray/TimedeltaArray (GH#24566) (pandas-dev#65285) REF: remove redundant BaseMaskedArray.map override (pandas-dev#65297) Bump github/codeql-action from 4.35.1 to 4.35.2 (pandas-dev#65310) Bump actions/setup-node from 6.3.0 to 6.4.0 (pandas-dev#65309) BUG: Fix formatters applied to wrong columns in truncated DataFrame.to_string (GH#35410) (pandas-dev#65288) PERF: optimize block consolidation (pandas-dev#64574) CLN: Replace no_default signature with False for allow_duplicates in insert and reset_index (pandas-dev#65146) ...

jbrockmendel added the Performance Memory or execution speed performance label Apr 19, 2026

jbrockmendel marked this pull request as ready for review April 20, 2026 20:04

jbrockmendel and others added 3 commits April 20, 2026 13:07

PERF: preserve block memory layout in Block.copy (GH#60469)

f63b5e8

whatsnew entry for GH#60469

b99ef19

jbrockmendel force-pushed the perf-60469 branch from d222ebe to f976aa2 Compare April 20, 2026 20:08

mroeschke approved these changes Apr 21, 2026

View reviewed changes

mroeschke added this to the 3.1 milestone Apr 21, 2026

mroeschke merged commit 7a3e2dc into pandas-dev:main Apr 21, 2026
45 checks passed

jbrockmendel deleted the perf-60469 branch April 21, 2026 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PERF: preserve block memory layout in Block.copy (GH#60469)#65302

PERF: preserve block memory layout in Block.copy (GH#60469)#65302
mroeschke merged 3 commits intopandas-dev:mainfrom
jbrockmendel:perf-60469

jbrockmendel commented Apr 19, 2026

Uh oh!

Uh oh!

mroeschke commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jbrockmendel commented Apr 19, 2026

ASV results

Uh oh!

Uh oh!

mroeschke commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants