Restrict clipping of DataFrame.corr only when cov=False#61214
Conversation
|
@mroeschke here is my pull request for fixing the |
|
Thanks! Would you be able to add new unit test that covers this? It seems like we didn't have one that hit this edge case previously. I think no release note is necessary, since the original one made clear it's only for |
| val1 = df.cov() | ||
| val2 = df.dropna().cov() |
There was a problem hiding this comment.
- Could you call this
resultandexpected? - For expected, could you construct the result without using
covi.e.DataFrame({"A": ..., "B": ...})
|
|
||
| def test_cov_with_missing_values(self): | ||
| df = DataFrame({"A": [1, 2, None, 4], "B": [2, 4, None, 9]}) | ||
| expected = DataFrame({"A": [1.0, 1.0], "B": [1.0, 1.0]}) |
There was a problem hiding this comment.
Looks like the the expected dataframe needs index=["A", "B"]: https://github.com/pandas-dev/pandas/actions/runs/14250783260/job/39942795933?pr=61214#step:5:45
And can you confirm that 1 is the expected value? If the 2.2.3 behavior is correct, then the values in #61154 (comment) were different:
Out[68]:
A B
A 2.333333 5.5
B 5.500000 13.0
I don't know whether it matters, but it might be worth testing both df.cov() and df.dropna().cov()
There was a problem hiding this comment.
Fixing now. Thanks!
|
Thanks @j-hendricks |
Closes #61154
DataFrame.corrwas clipped between-1and1to handle numerical precision errors. However, this was done regardless of whethercovequalsTrueorFalse, and should instead only be done whencov=False.doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.