-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
What is your issue?
I have a workload that is calling ds.to_dataframe() which takes several seconds because the output DataFrame has 10M+ rows. On my machine, the vast majority of the time (>99%) spent in xr.Dataset.to_dataframe() is constructing the pd.MultiIndex and within that, >80% of the time is spent calling tolist() and forcing the constructor of pd.MultiIndex to iterate through a list rather than an ndarray.
On line L180:
https://github.com/pydata/xarray/blob/main/xarray/core/coordinates.py#L180
is there a reason to call .tolist() rather than just keeping the object as an ndarray? Removing .tolist() results in a significant performance improvement for me.
Reactions are currently unavailable