FIX: assert_identical now considers xindexes + improve RangeIndex equals#11035
FIX: assert_identical now considers xindexes + improve RangeIndex equals#11035max-sixty merged 16 commits intopydata:mainfrom
assert_identical now considers xindexes + improve RangeIndex equals#11035Conversation
|
Thinking about @keewis comment: #11033 (comment) here I have it attempt to compare via |
assert_identical now considers xindexes
|
Oh no. I should have run the full test suite locally instead of just my new ones Not super stoked to go in and modify many tests 🤔 |
|
draft until tests are something resembling passing |
|
Ok so a number of the test failures are basically floating point accumulation errors. e.g.: import xarray as xr
from xarray.indexes import RangeIndex
# Create a RangeIndex-backed dataset
index = RangeIndex.arange(0.0, 1.0, 0.1, dim='x')
ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index))
# Slice it
sliced = ds.isel(x=slice(1, 3))
# Create a fresh RangeIndex with the 'same' values
fresh_index = RangeIndex.arange(0.1, 0.3, 0.1, dim='x')
fresh = xr.Dataset(coords=xr.Coordinates.from_xindex(fresh_index))
# Compare the indexes
sliced_idx = sliced.xindexes['x']
fresh_idx = fresh.xindexes['x']
print('Both have the same coordinate values:')
print(f' sliced.x.values: {sliced.x.values}') # [0.1 0.2]
print(f' fresh.x.values: {fresh.x.values}') # [0.1 0.2]
print('But the internal RangeIndex state differs due to floating point:')
print(f' sliced: stop={sliced_idx.stop}, step={sliced_idx.step}')
# sliced: stop=0.30000000000000004, step=0.10000000000000002
print(f' fresh: stop={fresh_idx.stop}, step={fresh_idx.step}')
# fresh: stop=0.3, step=0.09999999999999999
print(f'sliced_idx.equals(fresh_idx): {sliced_idx.equals(fresh_idx)}') # False
print(f'sliced.identical(fresh): {sliced.identical(fresh)}') # Falsegives: so I have added a backwards compat that uses |
assert_identical now considers xindexesassert_identical now considers xindexes + improve RangeIndex equals
Without this you get: AssertionError: Left and right Dataset objects are not identical Differing indexes: L x IntervalIndex([(-1, 0], (0, 1]], dtype='interval[int64, right]', name='x') R x Index([(-1, 0], (0, 1]], dtype='object', name='x')
just matching what it is set to above. this was not caught before by assert_identical
|
I would support making incremental changes if that lets us make changes — e.g. make the change to the function, fix a few of the tests, but then have an LLM set some flag and then future contributions can work through the 50 places... |
I got sucked into a rhythm. I've fixed most of the issues and left a commit by commit breakdown in the first comment. The remaining ones I ran out of steam to fully fix are the changes in test_units and test_dataset which i use an escape hatch with a TODO: 27b4275 |
| return compat | ||
|
|
||
|
|
||
| def diff_indexes_repr(a_indexes, b_indexes, col_width: int = 20) -> str: |
There was a problem hiding this comment.
I've implemented something very similar: https://github.com/xarray-contrib/xdggs/blob/52d8b1dd23bf809757c7e3f5c04945129f6905af/xdggs/tests/__init__.py#L64-L105
There was a problem hiding this comment.
oh neat! I'll take a close look tomorrow. Is there anything different we should do here that would have made your xdggs use cases easier?
There was a problem hiding this comment.
there's not much that's different (the diff formatting is slightly different). However, compared to indexes_equal it may be worth grouping indexes with indexes.group_by_index() (which would mean we don't have to worry about caching)
| try: | ||
| a_repr = inline_index_repr( | ||
| a_indexes.to_pandas_indexes()[key], max_width=70 | ||
| ) | ||
| b_repr = inline_index_repr( | ||
| b_indexes.to_pandas_indexes()[key], max_width=70 | ||
| ) | ||
| except TypeError: | ||
| # Custom indexes may not support to_pandas_index() | ||
| a_repr = repr(a_idx) | ||
| b_repr = repr(b_idx) |
There was a problem hiding this comment.
might be worth calling index._repr_inline_(max_width=70) with a fallback to repr(index)
There was a problem hiding this comment.
Is this well defined API for a custom index to support? Def happy to add it, just also wondering if the knoweldge of that being helpful is (or should be) written down somewhere
There was a problem hiding this comment.
we already use those in inline_index_repr, so yes, this should be well defined.
This should definitely be part of the custom index development page, and worth adding if it is not already part of that.
|
see: 27b4275 I'm happy to open an issue about this instead of fixing here. If I understand correctly that there is a bug here? |
|
that would be great, thanks. In the long run I'd like to replace those with the tests in xarray-array-testing but didn't have time to make progress on that. |
|
thanks @ianhi ! |
identicaland friends now also compare xindexes. Checking that they are__equals__Note for Reviewers
This PR contains the initial fix for
assert_identical. However this uncovered several would have been failures in tests. The rest of this PR is correcting those issues. I tried to keep them in separate commits as much as possible.RangeIndex - dbb649a
RangeIndex was failing assert_identical due to floating point error accumulation after slicing. I updated the RangeIndex equals method to use
np.iscloseby default, but with the ability to fall back to exact comparisontest_concat - dcfc46e
It seems like this test was not checking the correct behavior. I think this changed at some point after #6385 and it wasn't caught by the identical check.
groupby intervalindex - 1a1219a
similar to concat. the current behavior is that an intervalIndex gets constructed.
timedelta dtype - 6214d4f
I think this was a typo that wasn't caught. the data is encoded with
sso I would expect it to come back withsnotnsThe thing I feel least confident in is how nice the formatting of the assertion error diff looks like. Some examples (from the tests):
🤖 Ideas and directions mine, typing by claude. I went through a few rounds of local review and revision before opening.
assert_identicalalso compare indexes? #11033whats-new.rstapi.rst