Write out zarr with combined variables#2
Write out zarr with combined variables#2abarciauskas-bgse merged 13 commits intodevelopmentseed:mainfrom
Conversation
|
Looks like we're already getting a pretty large performance boost opening things locally-- excited to see how this translates for opening a cloud based store, let us know how it goes! @jbusecke for this issue with time dimension, I'm wondering if we can take a similar approach to what pandas does for joining two dataframes (say one representing This is obviously harder to do if the mismatch is the other direction (i.e., if |
|
Hey @espg, I think that we can basically choose from the xarray built in options for join: I believe you are pointing to the |
|
@jbusecke in order to keep working incrementally, do you think we can merge this, close #1 and then open issues for the ongoing work. That ongoing work would be what you have already enumerated above as outstanding errors and putting the output store in google cloud storage with a demonstration notebook? |
|
Yeah we totally could, but maybe I should verify that it actually runs via cloud functions? |
|
@jbusecke yes or if it's running fine locally on your laptop, perhaps just document how to run it locally, if it's not a priority to re-run it on the cloud (yet). |
|
Ok I have wrapped this up locally for now:
|
|
Should I point this to #1 @abarciauskas-bgse? I based this PR on yours, so we can either merge this here or merge them in order. No preference from my end. |
|
I think we should just merge this one to |
This PR is built on top of #1 and aims to produce a 'collapsed' group structure by combining all variables belonging to any combination of model/experiment into a single group.
Modifications
Results
I am currently running the code in a notebook like this:
xr.open_datatree()call (which builds all the indices) in ~35s, which is a significant improvement over the more complex group structureremaining errors
There are errors mostly during the virtualization that I have started to fix, but also wanted to get some feedback on (@espg, @abarciauskas-bgse @maxrjones @sharkinsspatial). I believe most of these will carry over to #1 too!
I can group the remaining errors like this:
Details
There are a few general categories:
'LSCE_GRISLI2'model. @espg i would be curious if you have thoughts on how to handle this?'Unable to synchronously open file (file signature not found)'and'early eof'as signs of possible corrupted source files. I need to confirm that, but I wonder if it is within scope to redownload/check certain files via globus.Todo