Skip to content

PERF: preserve block memory layout in Block.copy (GH#60469)#65302

Merged
mroeschke merged 3 commits intopandas-dev:mainfrom
jbrockmendel:perf-60469
Apr 21, 2026
Merged

PERF: preserve block memory layout in Block.copy (GH#60469)#65302
mroeschke merged 3 commits intopandas-dev:mainfrom
jbrockmendel:perf-60469

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

Block.copy calls values.copy() without specifying order. numpy's default is order="C", which flips the memory layout of non-C-contiguous blocks. Since BlockManager stores data transposed relative to the user-facing shape, a user-facing C-contiguous DataFrame has F-contiguous block storage — so after .copy() the user ends up with F-contiguous .values, and e.g. mean(axis=1) walks the slow stride. Passing order="K" preserves the original layout.

Repro from #60469 on this branch:

  • df_nan.mean(axis=1): 7.24 ms → 7.24 ms
  • df_nan_copy.mean(axis=1): 18.73 ms → 7.43 ms

ASV results

This PR was the subject of #44871, which was closed as stale in 2022 after an ASV run showed real wide-frame arithmetic regressions (FrameWithFrameWide.time_op_different_blocks was 2.06× slower). A full asv continuous run on the current tree shows those regressions are no longer present — presumably resolved by internals refactoring over the last four years.

Current run: ~120 benchmarks improved (0.45×–0.91×), 3 apparent regressions. Re-running the 3 regressions with more repeats showed all were noise from concurrent machine activity (BENCHMARKS NOT SIGNIFICANTLY CHANGED).

Notable improvements (sampling):

Ratio Benchmark
0.45× index_cached_properties.IndexCache.time_engine('TimedeltaIndex')
0.62× strings.Methods.time_wrap('string[pyarrow]')
0.68× indexing.Setitem.time_setitem_list
0.72× arithmetic.OpWithFillValue.time_frame_op_with_fill_value_no_nas
0.75× sparse.Arithmetic.time_make_union
0.76× multiindex_object.Unique.time_unique_dups(('Int64', <NA>))
0.78× arithmetic.NumericInferOps.time_add(float64)
0.78× series_methods.ToNumpy.time_to_numpy_copy

Caveat: the ASV run had some concurrent machine activity, so per-benchmark ratios are directional, not quantitative. No 2×-style regression like the 2021 one appears; the targeted re-run of the three flagged regressions cleared them.

@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Apr 19, 2026
@jbrockmendel jbrockmendel marked this pull request as ready for review April 20, 2026 20:04
jbrockmendel and others added 3 commits April 20, 2026 13:07
Pass the fortran-ordered transpose to DataFrame so per-column .values
remain contiguous, matching the layout of the DataFrame that was
written (GH#22073, GH#60469).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mroeschke mroeschke added this to the 3.1 milestone Apr 21, 2026
@mroeschke mroeschke merged commit 7a3e2dc into pandas-dev:main Apr 21, 2026
45 checks passed
@mroeschke
Copy link
Copy Markdown
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the perf-60469 branch April 21, 2026 17:28
Sharl0tteIsTaken added a commit to Sharl0tteIsTaken/pandas that referenced this pull request Apr 22, 2026
…h-origin

* upstream/main: (31 commits)
  DOC:Missing r in your (pandas-dev#65323)
  DOC: fix grammar in the .dt accessor section (pandas-dev#65325)
  REGR: restore rank() for ExtensionArrays with custom values for sorting (pandas-dev#64976)
  BUG: MultiIndex.get_loc returns scalar for unique key in non-unique index (pandas-dev#65234)
  BUG/TST: add test for _cast_pointwise_result robustness + fix some cases (pandas-dev#65318)
  BUG: fix .loc with tuple key on MultiIndex with IntervalIndex level (pandas-dev#65239)
  BUG: permit building from source with mingw (pandas-dev#64849)
  BUG: DataFrame.loc setitem with list-like value on single-column EA DataFrame (pandas-dev#65241)
  PERF: preserve block memory layout in Block.copy (GH#60469) (pandas-dev#65302)
  PERF: short-circuit sort_index(level=...) on monotonic non-MultiIndex (pandas-dev#65279)
  BUG: fix FloatingArray.astype(str) crash with distinguish_nan_and_na=True (pandas-dev#65038)
  BUG: fix to_timedelta ignoring unit for mixed round/non-round floats (pandas-dev#65170)
  BUG: DataFrame.loc preserves original index name when key is an Index (pandas-dev#65229)
  REF: continue moving freq management off DatetimeArray/TimedeltaArray (GH#24566) (pandas-dev#65285)
  REF: remove redundant BaseMaskedArray.map override (pandas-dev#65297)
  Bump github/codeql-action from 4.35.1 to 4.35.2 (pandas-dev#65310)
  Bump actions/setup-node from 6.3.0 to 6.4.0 (pandas-dev#65309)
  BUG: Fix formatters applied to wrong columns in truncated DataFrame.to_string (GH#35410) (pandas-dev#65288)
  PERF: optimize block consolidation (pandas-dev#64574)
  CLN: Replace no_default signature with False for allow_duplicates in insert and reset_index (pandas-dev#65146)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: regression on mean(axis=1) compared to old pandas version

2 participants