diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index e7d5a8567c7..e85c0f435dc 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -24,24 +24,24 @@ repos: - id: rst-inline-touching-normal - id: text-unicode-replacement-char - repo: https://github.com/astral-sh/ruff-pre-commit - rev: v0.12.1 + rev: v0.12.2 hooks: - id: ruff-format - id: ruff args: ["--fix", "--show-fixes"] - repo: https://github.com/keewis/blackdoc - rev: v0.3.9 + rev: v0.4.1 hooks: - id: blackdoc exclude: "generate_aggregations.py" additional_dependencies: ["black==24.8.0"] - repo: https://github.com/rbubley/mirrors-prettier - rev: v3.5.3 + rev: v3.6.2 hooks: - id: prettier args: [--cache-location=.prettier_cache/cache] - repo: https://github.com/pre-commit/mirrors-mypy - rev: v1.16.0 + rev: v1.16.1 hooks: - id: mypy # Copied from setup.cfg diff --git a/HOW_TO_RELEASE.md b/HOW_TO_RELEASE.md index 15be8c5d0f4..786ef8f2f18 100644 --- a/HOW_TO_RELEASE.md +++ b/HOW_TO_RELEASE.md @@ -110,7 +110,6 @@ upstream https://github.com/pydata/xarray (push) (Note that repo branch restrictions prevent pushing to `main`, so you have to just-self-merge this.) 13. Update the version available on pyodide: - - Open the PyPI page for [Xarray downloads](https://pypi.org/project/xarray/#files) - Edit [`pyodide/packages/xarray/meta.yaml`](https://github.com/pyodide/pyodide/blob/main/packages/xarray/meta.yaml) to update the - version number @@ -121,7 +120,6 @@ upstream https://github.com/pydata/xarray (push) 14. Issue the release announcement to mailing lists & Twitter (X). For bug fix releases, I usually only email xarray@googlegroups.com. For major/feature releases, I will email a broader list (no more than once every 3-6 months): - - pydata@googlegroups.com - xarray@googlegroups.com - numpy-discussion@scipy.org diff --git a/asv_bench/benchmarks/README_CI.md b/asv_bench/benchmarks/README_CI.md index 9c35e8a93b2..8461b5cd548 100644 --- a/asv_bench/benchmarks/README_CI.md +++ b/asv_bench/benchmarks/README_CI.md @@ -115,8 +115,10 @@ To minimize the time required to run the full suite, we trimmed the parameter ma ```python from . import _skip_slow # this function is defined in benchmarks.__init__ + def time_something_slow(): pass + time_something.setup = _skip_slow ``` diff --git a/design_notes/flexible_indexes_notes.md b/design_notes/flexible_indexes_notes.md index 382911c18de..2a3a1cccc40 100644 --- a/design_notes/flexible_indexes_notes.md +++ b/design_notes/flexible_indexes_notes.md @@ -97,12 +97,12 @@ The new `indexes` argument of Dataset/DataArray constructors may be used to spec ```python >>> da = xr.DataArray( ... data=[[275.2, 273.5], [270.8, 278.6]], -... dims=('x', 'y'), +... dims=("x", "y"), ... coords={ -... 'lat': (('x', 'y'), [[45.6, 46.5], [50.2, 51.6]]), -... 'lon': (('x', 'y'), [[5.7, 10.5], [6.2, 12.8]]), +... "lat": (("x", "y"), [[45.6, 46.5], [50.2, 51.6]]), +... "lon": (("x", "y"), [[5.7, 10.5], [6.2, 12.8]]), ... }, -... indexes={('lat', 'lon'): SpatialIndex}, +... indexes={("lat", "lon"): SpatialIndex}, ... ) array([[275.2, 273.5], @@ -120,7 +120,7 @@ More formally, `indexes` would accept `Mapping[CoordinateNames, IndexSpec]` wher Currently index objects like `pandas.MultiIndex` can be passed directly to `coords`, which in this specific case results in the implicit creation of virtual coordinates. With the new `indexes` argument this behavior may become even more confusing than it currently is. For the sake of clarity, it would be appropriate to eventually drop support for this specific behavior and treat any given mapping value given in `coords` as an array that can be wrapped into an Xarray variable, i.e., in the case of a multi-index: ```python ->>> xr.DataArray([1.0, 2.0], dims='x', coords={'x': midx}) +>>> xr.DataArray([1.0, 2.0], dims="x", coords={"x": midx}) array([1., 2.]) Coordinates: @@ -169,8 +169,8 @@ Like for the indexes, explicit coordinate creation should be preferred over impl For example, it is currently possible to pass a `pandas.MultiIndex` object as a coordinate to the Dataset/DataArray constructor: ```python ->>> midx = pd.MultiIndex.from_arrays([['a', 'b'], [0, 1]], names=['lvl1', 'lvl2']) ->>> da = xr.DataArray([1.0, 2.0], dims='x', coords={'x': midx}) +>>> midx = pd.MultiIndex.from_arrays([["a", "b"], [0, 1]], names=["lvl1", "lvl2"]) +>>> da = xr.DataArray([1.0, 2.0], dims="x", coords={"x": midx}) >>> da array([1., 2.]) @@ -201,7 +201,9 @@ Besides `pandas.MultiIndex`, there may be other situations where we would like t The example given here is quite confusing, though: this is not an easily predictable behavior. We could entirely avoid the implicit creation of coordinates, e.g., using a helper function that generates coordinate + index dictionaries that we could then pass directly to the DataArray/Dataset constructor: ```python ->>> coords_dict, index_dict = create_coords_from_index(midx, dims='x', include_dim_coord=True) +>>> coords_dict, index_dict = create_coords_from_index( +... midx, dims="x", include_dim_coord=True +... ) >>> coords_dict {'x': array([('a', 0), ('b', 1)], dtype=object), @@ -211,7 +213,7 @@ The example given here is quite confusing, though: this is not an easily predict array([0, 1])} >>> index_dict {('lvl1', 'lvl2'): midx} ->>> xr.DataArray([1.0, 2.0], dims='x', coords=coords_dict, indexes=index_dict) +>>> xr.DataArray([1.0, 2.0], dims="x", coords=coords_dict, indexes=index_dict) array([1., 2.]) Coordinates: diff --git a/design_notes/grouper_objects.md b/design_notes/grouper_objects.md index ca6f099377f..f702dc17d0b 100644 --- a/design_notes/grouper_objects.md +++ b/design_notes/grouper_objects.md @@ -8,7 +8,7 @@ I propose the addition of Grouper objects to Xarray's public API so that ```python -Dataset.groupby(x=BinGrouper(bins=np.arange(10, 2)))) +Dataset.groupby(x=BinGrouper(bins=np.arange(10, 2))) ``` is identical to today's syntax: @@ -27,7 +27,7 @@ results = [] for element in unique_labels: subset = ds.sel(x=(ds.x == element)) # split # subset = ds.where(ds.x == element, drop=True) # alternative - result = subset.mean() # apply + result = subset.mean() # apply results.append(result) xr.concat(results) # combine @@ -36,7 +36,7 @@ xr.concat(results) # combine to ```python -ds.groupby('x').mean() # splits, applies, and combines +ds.groupby("x").mean() # splits, applies, and combines ``` Efficient vectorized implementations of this pattern are implemented in numpy's [`ufunc.at`](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.at.html), [`ufunc.reduceat`](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html), [`numbagg.grouped`](https://github.com/numbagg/numbagg/blob/main/numbagg/grouped.py), [`numpy_groupies`](https://github.com/ml31415/numpy-groupies), and probably more. @@ -110,11 +110,13 @@ All Grouper objects will subclass from a Grouper object ```python import abc + class Grouper(abc.ABC): @abc.abstractmethod def factorize(self, by: DataArray): raise NotImplementedError + class CustomGrouper(Grouper): def factorize(self, by: DataArray): ... diff --git a/design_notes/named_array_design_doc.md b/design_notes/named_array_design_doc.md index 455ba72ef87..3c331c76f71 100644 --- a/design_notes/named_array_design_doc.md +++ b/design_notes/named_array_design_doc.md @@ -75,7 +75,6 @@ The named-array package is designed to be interoperable with other scientific Py - Delete the ExplicitIndexer objects (`BasicIndexer`, `VectorizedIndexer`, `OuterIndexer`) - Remove explicit support for `pd.Index`. When provided with a `pd.Index` object, Variable will coerce to an array using `np.array(pd.Index)`. For Xarray's purposes, Xarray can use `as_variable` to explicitly wrap these in PandasIndexingAdapter and pass them to `Variable.__init__`. 3. Define a minimal variable interface that the rest of Xarray can use: - 1. `dims`: tuple of dimension names 2. `data`: numpy/dask/duck arrays` 3. `attrs``: dictionary of attributes @@ -194,134 +193,132 @@ Questions: ```python # Sorting - Variable.argsort - Variable.searchsorted +Variable.argsort +Variable.searchsorted # NaN handling - Variable.fillna - Variable.isnull - Variable.notnull +Variable.fillna +Variable.isnull +Variable.notnull # Lazy data handling - Variable.chunk # Could instead have accessor interface and recommend users use `Variable.dask.chunk` and `Variable.cubed.chunk`? - Variable.to_numpy() - Variable.as_numpy() +Variable.chunk # Could instead have accessor interface and recommend users use `Variable.dask.chunk` and `Variable.cubed.chunk`? +Variable.to_numpy() +Variable.as_numpy() # Xarray-specific - Variable.get_axis_num - Variable.isel - Variable.to_dict +Variable.get_axis_num +Variable.isel +Variable.to_dict # Reductions - Variable.reduce - Variable.all - Variable.any - Variable.argmax - Variable.argmin - Variable.count - Variable.max - Variable.mean - Variable.median - Variable.min - Variable.prod - Variable.quantile - Variable.std - Variable.sum - Variable.var +Variable.reduce +Variable.all +Variable.any +Variable.argmax +Variable.argmin +Variable.count +Variable.max +Variable.mean +Variable.median +Variable.min +Variable.prod +Variable.quantile +Variable.std +Variable.sum +Variable.var # Accumulate - Variable.cumprod - Variable.cumsum +Variable.cumprod +Variable.cumsum # numpy-like Methods - Variable.astype - Variable.copy - Variable.clip - Variable.round - Variable.item - Variable.where +Variable.astype +Variable.copy +Variable.clip +Variable.round +Variable.item +Variable.where # Reordering/Reshaping - Variable.squeeze - Variable.pad - Variable.roll - Variable.shift - +Variable.squeeze +Variable.pad +Variable.roll +Variable.shift ``` #### methods to be renamed from xarray.Variable ```python # Xarray-specific - Variable.concat # create two functions, one as the equivalent of `np.stack` and other for `np.concat` +Variable.concat # create two functions, one as the equivalent of `np.stack` and other for `np.concat` - # Given how niche these are, these would be better as functions than methods. - # We could also keep these in Xarray, at least for now. If we don't think people will use functionality outside of Xarray it probably is not worth the trouble of porting it (including documentation, etc). - Variable.coarsen # This should probably be called something like coarsen_reduce. - Variable.coarsen_reshape - Variable.rolling_window +# Given how niche these are, these would be better as functions than methods. +# We could also keep these in Xarray, at least for now. If we don't think people will use functionality outside of Xarray it probably is not worth the trouble of porting it (including documentation, etc). +Variable.coarsen # This should probably be called something like coarsen_reduce. +Variable.coarsen_reshape +Variable.rolling_window - Variable.set_dims # split this into broadcast_to and expand_dims +Variable.set_dims # split this into broadcast_to and expand_dims # Reordering/Reshaping - Variable.stack # To avoid confusion with np.stack, let's call this stack_dims. - Variable.transpose # Could consider calling this permute_dims, like the [array API standard](https://data-apis.org/array-api/2022.12/API_specification/manipulation_functions.html#objects-in-api) - Variable.unstack # Likewise, maybe call this unstack_dims? +Variable.stack # To avoid confusion with np.stack, let's call this stack_dims. +Variable.transpose # Could consider calling this permute_dims, like the [array API standard](https://data-apis.org/array-api/2022.12/API_specification/manipulation_functions.html#objects-in-api) +Variable.unstack # Likewise, maybe call this unstack_dims? ``` #### methods to be removed from xarray.Variable ```python # Testing - Variable.broadcast_equals - Variable.equals - Variable.identical - Variable.no_conflicts +Variable.broadcast_equals +Variable.equals +Variable.identical +Variable.no_conflicts # Lazy data handling - Variable.compute # We can probably omit this method for now, too, given that dask.compute() uses a protocol. The other concern is that different array libraries have different notions of "compute" and this one is rather Dask specific, including conversion from Dask to NumPy arrays. For example, in JAX every operation executes eagerly, but in a non-blocking fashion, and you need to call jax.block_until_ready() to ensure computation is finished. - Variable.load # Could remove? compute vs load is a common source of confusion. +Variable.compute # We can probably omit this method for now, too, given that dask.compute() uses a protocol. The other concern is that different array libraries have different notions of "compute" and this one is rather Dask specific, including conversion from Dask to NumPy arrays. For example, in JAX every operation executes eagerly, but in a non-blocking fashion, and you need to call jax.block_until_ready() to ensure computation is finished. +Variable.load # Could remove? compute vs load is a common source of confusion. # Xarray-specific - Variable.to_index - Variable.to_index_variable - Variable.to_variable - Variable.to_base_variable - Variable.to_coord +Variable.to_index +Variable.to_index_variable +Variable.to_variable +Variable.to_base_variable +Variable.to_coord - Variable.rank # Uses bottleneck. Delete? Could use https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rankdata.html instead +Variable.rank # Uses bottleneck. Delete? Could use https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rankdata.html instead # numpy-like Methods - Variable.conjugate # .conj is enough - Variable.__array_wrap__ # This is a very old NumPy protocol for duck arrays. We don't need it now that we have `__array_ufunc__` and `__array_function__` +Variable.conjugate # .conj is enough +Variable.__array_wrap__ # This is a very old NumPy protocol for duck arrays. We don't need it now that we have `__array_ufunc__` and `__array_function__` # Encoding - Variable.reset_encoding - +Variable.reset_encoding ``` #### Attributes to be preserved from xarray.Variable ```python # Properties - Variable.attrs - Variable.chunks - Variable.data - Variable.dims - Variable.dtype - - Variable.nbytes - Variable.ndim - Variable.shape - Variable.size - Variable.sizes - - Variable.T - Variable.real - Variable.imag - Variable.conj +Variable.attrs +Variable.chunks +Variable.data +Variable.dims +Variable.dtype + +Variable.nbytes +Variable.ndim +Variable.shape +Variable.size +Variable.sizes + +Variable.T +Variable.real +Variable.imag +Variable.conj ``` #### Attributes to be renamed from xarray.Variable @@ -333,12 +330,10 @@ Questions: #### Attributes to be removed from xarray.Variable ```python - - Variable.values # Probably also remove -- this is a legacy from before Xarray supported dask arrays. ".data" is enough. +Variable.values # Probably also remove -- this is a legacy from before Xarray supported dask arrays. ".data" is enough. # Encoding - Variable.encoding - +Variable.encoding ``` ### Appendix: Implementation Details @@ -347,17 +342,16 @@ Questions: ```python class VariableArithmetic( - ImplementsArrayReduce, - IncludeReduceMethods, - IncludeCumMethods, - IncludeNumpySameMethods, - SupportsArithmetic, - VariableOpsMixin, + ImplementsArrayReduce, + IncludeReduceMethods, + IncludeCumMethods, + IncludeNumpySameMethods, + SupportsArithmetic, + VariableOpsMixin, ): - __slots__ = () - # prioritize our operations over those of numpy.ndarray (priority=0) - __array_priority__ = 50 - + __slots__ = () + # prioritize our operations over those of numpy.ndarray (priority=0) + __array_priority__ = 50 ``` - Move over `_typed_ops.VariableOpsMixin` @@ -369,7 +363,6 @@ class VariableArithmetic( - The Variable constructor will need to be rewritten to no longer accept tuples, encodings, etc. These details should be handled at the Xarray data structure level. - What happens to `duck_array_ops?` - What about Variable.chunk and "chunk managers"? - - Could this functionality be left in Xarray proper for now? Alternative array types like JAX also have some notion of "chunks" for parallel arrays, but the details differ in a number of ways from the Dask/Cubed. - Perhaps variable.chunk/load methods should become functions defined in xarray that convert Variable objects. This is easy so long as xarray can reach in and replace .data diff --git a/doc/contribute/contributing.rst b/doc/contribute/contributing.rst index 339050a7f8a..6afd844f84b 100644 --- a/doc/contribute/contributing.rst +++ b/doc/contribute/contributing.rst @@ -72,6 +72,7 @@ If you are reporting a bug, please use the provided template which includes the ```python import xarray as xr + ds = xr.Dataset(...) ... @@ -82,6 +83,7 @@ If you are reporting a bug, please use the provided template which includes the ```python import xarray as xr + xr.show_versions() ...