Allow Variable type as dim argument to concat#8384
Conversation
|
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
max-sixty
left a comment
There was a problem hiding this comment.
Thanks @maresb !
In order to test this, would you be up for finding a test which passes a Variable, and adding a -> None to its return type? Then mypy will check the types. (and mypy should raise an error on the old code).
|
Thanks @max-sixty for the response! If I understand what you mean, then this test already meets your criteria except for mypy erroring on old code: [EDIT: See code snipped from the comment below because I selected the wrong function.] So now the mystery is why mypy allows this? By replacing the annotations which I modified in this PR with In contrast, if I among other errors, which is more logical to me. Any idea what's going on? |
|
That is odd! BTW I think the function that tests this is the one above: xarray/xarray/tests/test_concat.py Lines 898 to 903 in d40609a I had a look through — it's possible that I tried removing the overload and using Detailsdiff --git a/xarray/core/concat.py b/xarray/core/concat.py
index a136480b..acbd8c80 100644
--- a/xarray/core/concat.py
+++ b/xarray/core/concat.py
@@ -1,7 +1,7 @@
from __future__ import annotations
from collections.abc import Hashable, Iterable
-from typing import TYPE_CHECKING, Any, Union, overload
+from typing import TYPE_CHECKING, Any, Iterator, Sequence, Union, overload
import numpy as np
import pandas as pd
@@ -16,7 +16,7 @@
merge_attrs,
merge_collected,
)
-from xarray.core.types import T_DataArray, T_Dataset
+from xarray.core.types import T_DataArray, T_Dataset, T_Xarray
from xarray.core.variable import Variable
from xarray.core.variable import concat as concat_vars
@@ -31,47 +31,17 @@
T_DataVars = Union[ConcatOptions, Iterable[Hashable]]
-@overload
def concat(
- objs: Iterable[T_Dataset],
+ objs: Iterable[T_Xarray],
dim: Hashable | T_DataArray | pd.Index,
data_vars: T_DataVars = "all",
- coords: ConcatOptions | list[Hashable] = "different",
- compat: CompatOptions = "equals",
- positions: Iterable[Iterable[int]] | None = None,
- fill_value: object = dtypes.NA,
- join: JoinOptions = "outer",
- combine_attrs: CombineAttrsOptions = "override",
-) -> T_Dataset:
- ...
-
-
-@overload
-def concat(
- objs: Iterable[T_DataArray],
- dim: Hashable | T_DataArray | pd.Index,
- data_vars: T_DataVars = "all",
- coords: ConcatOptions | list[Hashable] = "different",
- compat: CompatOptions = "equals",
- positions: Iterable[Iterable[int]] | None = None,
- fill_value: object = dtypes.NA,
- join: JoinOptions = "outer",
- combine_attrs: CombineAttrsOptions = "override",
-) -> T_DataArray:
- ...
-
-
-def concat(
- objs,
- dim,
- data_vars: T_DataVars = "all",
coords="different",
compat: CompatOptions = "equals",
positions=None,
fill_value=dtypes.NA,
join: JoinOptions = "outer",
combine_attrs: CombineAttrsOptions = "override",
-):
+) -> T_Xarray:
"""Concatenate xarray objects along a new or existing dimension.
Parameters
@@ -449,7 +419,7 @@ def _parse_datasets(
def _dataset_concat(
- datasets: list[T_Dataset],
+ datasets: Iterable[T_Dataset],
dim: str | T_DataArray | pd.Index,
data_vars: T_DataVars,
coords: str | list[str],
diff --git a/xarray/plot/dataarray_plot.py b/xarray/plot/dataarray_plot.py
index 61f2014f..bdb97cac 100644
--- a/xarray/plot/dataarray_plot.py
+++ b/xarray/plot/dataarray_plot.py
@@ -193,8 +193,9 @@ def _prepare_plot1d_data(
for v in ["z", "x"]:
dim = coords_to_plot.get(v, None)
if (dim is not None) and (dim in darray.dims):
- darray_nan = np.nan * darray.isel({dim: -1})
- darray = concat([darray, darray_nan], dim=dim)
+ darray_nan: T_DataArray = np.nan * darray.isel({dim: -1})
+ arrays: list[T_DataArray] = [darray, darray_nan]
+ darray = concat(arrays, dim=dim)
dims_T.append(coords_to_plot[v])
# Lines should never connect to the same coordinate when stacked,Regardless, let's merge, thanks for the change! |
|
Thanks @max-sixty for the merge! I had a look at your diff and respective mypy error in First of all, one easy fix for darray_nan = np.nan * darray.isel({dim: -1})which gives a end_of_v = darray.isel({dim: -1})
darray_nan = end_of_v.copy(data=np.full_like(darray.data, np.nan))Then the error is that In the TypeVar docs, it states:
...
...
Looking at the definition Line 162 in f63ede9 we see that Thus according to the docs, One almost-solution is to switch from Lines 164 to 167 in f63ede9 But this would be bad since the (first) In conclusion, the minimally-disruptive solution appears to be to keep the The more obvious solution would be to not use |
|
That's a very impressive explanation @maresb — thank you!
We would like to support them better, so I we should try and move towards
So it sounds like this problem is intractable — that we can't use
"retain their concrete type through their methods" is important — IIUC, if we use |
In my current understanding this is correct. (I'm tenacious but no expert in the internals of Python typing.)
Not with
Not with
I don't understand exactly what you mean with your def sel_x(data: T_DataArrayOrSet) -> T_DataArrayOrSet:
return data.sel(x=2)It seems that there is some level of type "evaluation" in mypy since Ugliness of |
The disadvantage of That example is a good one — though if we typed ds = ds.sel(x=2)...because the variable would be a
Yes, I think you're correct! i.e. in place of making things generic, we could just write out all the concrete types, almost like manual C++ templating.. |
But semantically def identity(data: T_DataArrayOrSet) -> T_DataArrayOrSet:
return data
class DatasetSubclass(Dataset):
pass
ds = DatasetSubclass({"x": 0})
ds2 = identity(ds)
reveal_type(ds2) # DatasetSubclassIt's just that for whatever reason mypy is choking on the combination of As far as I can tell, the only fundamental reason to avoid |
|
OK, that's v helpful — I was wrong in my assumptions about these two constructions — they're actually different: It's only the
So I think that's correct. And not exactly what the comments above the types in our code state (!)
Yes, I think that's also correct (which makes the somewhat misguided comments to use
I'd be very open to this if it worked out overall. OTOH I would not favor having Thanks a lot for the feedback @maresb ! Lmk if there are some code or doc / comment changes we could make from this... |

Given the following excerpt from the
xr.concat()source codexarray/xarray/core/concat.py
Lines 475 to 480 in bb489fa
it seems like it's explicitly intended that the
dim=argument can be of typexr.Variable. However, it's not indicated as such in the type hints or the documentation. This leads to a type error in pyright when aVariabletype is used fordim.I'm submitting this PR to fix this apparent shortcoming. Or have I overlooked some reason why this should not be the case? Thanks for your consideration!