Conversation
updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.10 → v0.12.2](astral-sh/ruff-pre-commit@v0.11.10...v0.12.2) - [github.com/python-jsonschema/check-jsonschema: 0.33.0 → 0.33.2](python-jsonschema/check-jsonschema@0.33.0...0.33.2) - [github.com/pre-commit/mirrors-mypy: v1.15.0 → v1.16.1](pre-commit/mirrors-mypy@v1.15.0...v1.16.1)
There was a problem hiding this comment.
two things to do before merging this PR:
-
Ignore PLC0415
importshould be at the top-level of a file -
fix the white space issue with
pyarrowfor strings
____________________________ test_split_whitespace _____________________________
def test_split_whitespace():
assert ak.str.split_whitespace(string_padded, max_splits=1).tolist() == [
[["", "αβγ "], ["", " "]],
[],
[["", "→δε← "], ["", "ζz zζ "], ["", "abc "]],
]
assert (
ak.str.split_whitespace(string_padded, max_splits=1).layout.form
== ak.str.split_whitespace(
ak.to_backend(string_padded, "typetracer"), max_splits=1
).layout.form
)
assert ak.str.split_whitespace(
string_padded, max_splits=1, reverse=True
).tolist() == [
[[" αβγ", ""], [" ", ""]],
[],
[[" →δε←", ""], [" ζz zζ", ""], [" abc", ""]],
]
assert (
ak.str.split_whitespace(string_padded, max_splits=1, reverse=True).layout.form
== ak.str.split_whitespace(
ak.to_backend(string_padded, "typetracer"), max_splits=1, reverse=True
).layout.form
)
> assert ak.str.split_whitespace(string_padded, max_splits=None).tolist() == [
[["", "αβγ", "", ""], ["", "", ""]],
[],
[["", "→δε←", "", ""], ["", "ζz", "zζ", "", ""], ["", "abc", "", ""]],
]
E AssertionError: assert [[['', 'αβγ', '', ''], ['', '', '']], [], [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', ' ']]] == [[['', 'αβγ', '', ''], ['', '', '']], [], [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', '', '']]]
E
E At index 2 diff: [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', ' ']] != [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', '', '']]
E
E Full diff:
E [
E [
E [
E '',
E 'αβγ',
E '',
E '',
E ],
E [
E '',
E '',
E '',
E ],
E ],
E [],
E [
E [
E '',
E '→δε←',
E '',
E '',
E ],
E [
E '',
E 'ζz',
E 'zζ',
E '',
E '',
E ],
E [
E '',
E 'abc',
E - '',
E + ' ',
E ? +
E - '',
E ],
E ],
E ]
tests/test_2616_use_pyarrow_for_strings.py:932: AssertionError|
Seems like |
|
At least this time it's consistently failing. My guess is that it's because it's using pyarrow 7 |
|
Oh, after merging the main into this one I get more failures in pyarrow tests: tests/test_2616_use_pyarrow_for_strings.py::test_trim_whitespace PASSED [ 63%]
Error: test_slice
pyarrow.lib.ArrowInvalid: Negative buffer resize: -40
This error occurred while calling
ak.str.slice(
<Array [['αβγ', ''], ..., ['→δε←', ..., 'abc']] type='3 * var * string'>
1
)
tests/test_2616_use_pyarrow_for_strings.py::test_slice FAILED [ 63%]
Error: test_split_whitespace
ValueError: buffer size must be a multiple of element size
This error occurred while calling
ak.str.split_whitespace(
<Array-typetracer [...] type='3 * var * string'>
max_splits = 1
)
tests/test_2616_use_pyarrow_for_strings.py::test_split_whitespace FAILED [ 63%]
Error: test_split_pattern
ValueError: buffer size must be a multiple of element size
This error occurred while calling
ak.str.split_pattern(
<Array-typetracer [...] type='3 * var * string'>
'123'
max_splits = 1
)
tests/test_2616_use_pyarrow_for_strings.py::test_split_pattern FAILED [ 63%]
Error: test_split_pattern_regex
ValueError: buffer size must be a multiple of element size
This error occurred while calling
ak.str.split_pattern_regex(
<Array-typetracer [...] type='3 * var * string'>
'\\d{3}'
max_splits = 1
)
tests/test_2616_use_pyarrow_for_strings.py::test_split_pattern_regex FAILED [ 63%] |
5dc63e4 to
d338564
Compare
|
Yesterday @henryiii was experimenting with increasing the version of pyarrow that we use for the tests. I don't know what the result of that was, but it is reasonable since pyarrow 7 is pretty old and even pyarrow 20 is built for Python 3.9 |
|
I stopped at 12, we could try newer ones. It's not actually a hard coded minimum, just a pin in a requirement files, so I think it's fine to make it as high as we need for this test. |
|
It got force pushed away, but https://github.com/scikit-hep/awkward/compare/5dc63e441d7cc11835198e8cbb859010b4fcf792..d33856406364f22c5012121985ec526a6fbdede0 you can see where I was up to 12. |
|
I guess it still failed with 12 https://github.com/scikit-hep/awkward/actions/runs/16306190398/job/46052715392 |
|
I think it's worth continuing what Henry was doing and increase the pyarrow version until it works |
|
@ianna I meant adding back the test that was failing on 3.9 and then increasing the version of pyarrow until it works |
|
That could be a follow up |
updates: