New chunking approach that never splits encoded chunks#11060
New chunking approach that never splits encoded chunks#11060jsignell wants to merge 16 commits intopydata:mainfrom
Conversation
* Move ``preserve_chunks`` to base ChunkManager class * Get target size from dask config options for DaskManager * Add test for open_zarr
| ({"x": "preserve", "y": -1}, (160, 500)), | ||
| ], | ||
| ) | ||
| def test_open_dataset_chunking_zarr_with_preserve( |
There was a problem hiding this comment.
These tests are kind of slow.
|
@dcherian is this something you would be able to review? I'd love to get more people trying it. |
| New Features | ||
| ~~~~~~~~~~~~ | ||
|
|
||
| - Adds a new option ``chunks="preserve"`` when opening a dataset. This option |
There was a problem hiding this comment.
IMO this should just be "auto". Are we really working around a dask bug?
There was a problem hiding this comment.
I thought about doing this work in dask, but I ended up deciding that this is a sufficiently different goal from dask auto. The goal of dask auto is to guarantee that the chunksize will be under a configurable limit while preserving the aspect ratio of previous_chunks. We don't really want either of those things.
But maybe you are just saying: this is what xarray should mean by "auto" in which case I definitely agree. I'm just not sure how to make the transition from the old version of "auto" to the new version. Maybe it would be easier to give it a new name ("preserve") and then change the default value in kwargs from chunks="auto" to chunks="preserve" at some point. If we just change what "auto" means then there is no way for people to get the dask auto behavior.
chunks={}orchunks="auto". Since Remove special mapping ofautoto {} inopen_zarr#11010 got in xarra could eventually change the default onopen_zarrto map it to"preserve"rather than to{}if dask is available.whats-new.rstapi.rstProposal
A new
chunksoption that is only allowed to use encoded chunks or multiples of them. No chunk splitting allowed.Demo
Current behavior when
chunks="auto"This PR introduces a new option:
chunks="preserve"Context
I originally set out to update the
auto_chunksfunction in dask, but it felt like my goals were actually quite different. The goal of the daskauto_chunksfunction is to guarantee that the chunksize will be under a configurable limit while preserving the aspect ratio ofprevious_chunks(previous_chunks==encoding). This PR instead guarantees that encoded chunks are never split but it will multiply them by some factor to try to get the chunksize close to a targetsize. It doesn't try to preserve the aspect ratio of the chunks. Instead it goes after the dim where there is the greatest number of chunks and it tries to take those in bigger bites.Also:
"preserve".