API: offsets.Day is always calendar-day#61985
Conversation
|
is stumping me. Who is our go-to person for debugging these? |
|
Still seeing Tried building the docs locally and jinja2 is complaining |
|
Looks like there was even more whitespace I messed up. It's happy now. |
pandas/core/resample.py
Outdated
| offset = offset.as_unit(unit) | ||
|
|
||
| freq_value = Timedelta(freq).as_unit(unit)._value | ||
| if isinstance(freq, Day): |
There was a problem hiding this comment.
In #41943 I was hoping that if D means calendar day, we wouldn't need to force to be a fixed frequency in resample
There was a problem hiding this comment.
yah i was hoping for this too but wasn't able to get it working. will give it another try after addressing other comments
There was a problem hiding this comment.
just double-checked. im pretty sure the place to change the treatment would be in _get_timestamp_range_edges a bit up the call stack. Changing that to have Day cases no longer go down the Tick route breaks test_resample_origin_with_day_freq_on_dst and test_resample_tz_localized2. it wasn't obvious that the "new" behavior was correct in those tests with that change. can you confirm?
There was a problem hiding this comment.
im pretty sure the place to change the treatment would be in _get_timestamp_range_edges
That makes sense to me, as I would expect us not to have to special case Day anymore
test_resample_origin_with_day_freq_on_dst and test_resample_tz_localized2.
Mind showing an example of the new behavior based on those tests? I would expect breakage for these tests since D changing to calendar day impacts the results around DST.
There was a problem hiding this comment.
Based on test_resample_tz_localized2:
idx = date_range("2001-09-20 15:59", "2001-09-20 16:00", freq="min", tz="Australia/Sydney")
s = Series([1, 2], index=idx)
result = s.resample("D", closed="right", label="right").mean()
ex_index = date_range("2001-09-21", periods=1, freq="D", tz="Australia/Sydney")
expected = Series([1.5], index=ex_index)
tm.assert_series_equal(result, expected) # Fails with actual result being...
ex_index = pd.date_range("2001-09-20", "2001-09-21", tz="Australia/Sydney", freq="D")
expected = Series([np.nan, 1.5], index=ex_index)
There was a problem hiding this comment.
For both examples, I would expect these to be equivalent to resampling by "B" since that's similar to resampling by a calendar day, and it matches your new expecteds
In [3]: idx = date_range("2001-09-20 15:59", "2001-09-20 16:00", freq="min", tz="Australia/Sydney")
...: s = Series([1, 2], index=idx)
In [5]: s.resample("B", closed="right", label="right").mean()
Out[5]:
2001-09-20 00:00:00+10:00 NaN
2001-09-21 00:00:00+10:00 1.5
Freq: B, dtype: float64
In [9]: tz = "America/Chicago"
...: unit = "ns"
...: start = Timestamp("2013-11-03", tz=tz)
...: end = Timestamp("2013-11-03 23:59", tz=tz)
...: rng = date_range(start, end, freq="1h").as_unit(unit)
...: ts = Series(np.ones(len(rng)), index=rng)
In [10]: ts.resample("B", origin="start", offset="-2h").sum()
Out[10]:
2013-11-01 00:00:00-05:00 25.0
Freq: B, dtype: float64There was a problem hiding this comment.
sounds good. ill update the expecteds.
There was a problem hiding this comment.
Wait, so the offset parameter is just ignored in these cases?
There was a problem hiding this comment.
Possibly it only applies to fixed frequencies? I am not that familiar with how parameter is supposed to work in resample
There was a problem hiding this comment.
Looking at the NDFrame.resample docstring, offset is added to origin, and origin only takes effect for Tick-frequencies. I'll do a follow up to warn if those params are passed and silently ignored.
|
Thanks @jbrockmendel. Good to finally have this change! |
|
Looks like we missed a usage of freq.nanos in the window code. I don't know that code too well. Does that need updating too? |
|
Hmm I think we always treated I guess technically with this change |
looks like we have one rolling test (test_rolling_datetime) with Day and tzaware self._on but i don't think it passes over a DST transition |
|
Just confirming that this is a deliberate change: assert isinstance(pd.offsets.Day(), pd.offsets.Tick)previously that passed (it was |
|
That was deliberate. |
In pandas 3.0, `Day` will no longer be considered a `Tick`-like frequency (pandas-dev/pandas#61985). This PR ports this change to the cftime version of this offset. The main implication in the cftime / xarray context is that it means the `offset` and `origin` options in `resample` will no longer have an effect when resampling to a daily frequency. As in pandas-dev/pandas#62101, warnings will be emitted if non-default values are passed.
I'm OK with ignoring this for now as I don't expect anyone to have any issues with it. But if hypothetically someone opens an issue about it, i think fixing it should be considered a bugfix rather than a separate breaking change. |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.Alternative to #55502 discussed at last week's dev meeting. This allows TimedeltaIndex.freq to be a
Dayeven though it is not a Tick.