fix: Filter out StringDType even when the backing array is not NumpyExtensionArray#10559
fix: Filter out StringDType even when the backing array is not NumpyExtensionArray#10559dcherian merged 24 commits intopydata:mainfrom
StringDType even when the backing array is not NumpyExtensionArray#10559Conversation
|
I've added the |
object for pd.StringDtype in safe_cast_to_indexStringDType even when the backing array is not NumpyExtensionArray
| for k, v in dataframe.items(): | ||
| if not is_extension_array_dtype(v) or isinstance( | ||
| if not is_allowed_extension_array(v) or isinstance( | ||
| v.array, UNSUPPORTED_EXTENSION_ARRAY_TYPES |
There was a problem hiding this comment.
I didn't add UNSUPPORTED_EXTENSION_ARRAY_TYPES to the new is_allowed_extension_array function because we do allow them as backing arrays to Index object, I think. Maybe should get a test?
There was a problem hiding this comment.
I think we should merge these checks.
There was a problem hiding this comment.
I think we need to account for internal duck array support i.e., that which allows preserving the dtype of extension array indices that are in this UNSUPPORTED_EXTENSION_ARRAY_TYPES whitelist. See the note here:
xarray/xarray/core/extension_array.py
Lines 101 to 103 in 5ce69b2
There was a problem hiding this comment.
Was that for DatetimeIndex? If so, clarifying that comment would be helpful.
There was a problem hiding this comment.
It would be for any index whose underlying array is one of the ones in UNSUPPORTED_EXTENSION_ARRAY_TYPES, so possible DatetimeIndex but also likely other ones as well
|
Sadly these upstream failures seem related. @kmuehlbauer may be able to help reason through them |
|
@dcherian I think they're only related in so far as they come from xarray/xarray/core/variable.py Lines 215 to 216 in 5ce69b2 It appears (I have no experience with this, so correct me if I'm wrong), that the "real" type is encoded in dtype("O").metadata for certain netcdf4 variables. But this dtype.metadata is not preserved after a pd.Series call. I added a fix for it, but made it a separate PR as I'd like to ensure this PR does what it advertises first
|
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
…o ig/fix_string_dtype
|
@dcherian do you have any idea what to do about this mypy failure coming from https://github.com/pydata/xarray/pull/10559/files#diff-382db5ef1ceae5d2d63b503937f667dd7d61a3b1e579352db328c12379516151L161-L162 If I ignore it, it says the ignore is unused. If I try to ignore the specific warning raised, it says that the ignore is unnecessary. And now, if I use |
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
…o ig/fix_string_dtype
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
for more information, see https://pre-commit.ci
The linked issue is resolved by this PR but more broadly, the issue underlying it (letting in string dtypes that are not
NumpyExtensionArray) was bound to come up at some point so this PR more comprehensively fixes that issue and tries to centralize our "whitelist"whats-new.rstapi.rst