Skip to content

chore: update pre-commit hooks#3569

Merged
ianna merged 10 commits intomainfrom
pre-commit-ci-update-config
Jul 16, 2025
Merged

chore: update pre-commit hooks#3569
ianna merged 10 commits intomainfrom
pre-commit-ci-update-config

Conversation

pre-commit-ci bot added 2 commits July 7, 2025 19:36
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.10 → v0.12.2](astral-sh/ruff-pre-commit@v0.11.10...v0.12.2)
- [github.com/python-jsonschema/check-jsonschema: 0.33.0 → 0.33.2](python-jsonschema/check-jsonschema@0.33.0...0.33.2)
- [github.com/pre-commit/mirrors-mypy: v1.15.0 → v1.16.1](pre-commit/mirrors-mypy@v1.15.0...v1.16.1)
Copy link
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things to do before merging this PR:

  • Ignore PLC0415 import should be at the top-level of a file

  • fix the white space issue with pyarrow for strings

____________________________ test_split_whitespace _____________________________

    def test_split_whitespace():
        assert ak.str.split_whitespace(string_padded, max_splits=1).tolist() == [
            [["", "αβγ      "], ["", " "]],
            [],
            [["", "→δε←      "], ["", "ζz zζ     "], ["", "abc      "]],
        ]
        assert (
            ak.str.split_whitespace(string_padded, max_splits=1).layout.form
            == ak.str.split_whitespace(
                ak.to_backend(string_padded, "typetracer"), max_splits=1
            ).layout.form
        )
    
        assert ak.str.split_whitespace(
            string_padded, max_splits=1, reverse=True
        ).tolist() == [
            [["      αβγ", ""], [" ", ""]],
            [],
            [["     →δε←", ""], ["     ζz zζ", ""], ["      abc", ""]],
        ]
        assert (
            ak.str.split_whitespace(string_padded, max_splits=1, reverse=True).layout.form
            == ak.str.split_whitespace(
                ak.to_backend(string_padded, "typetracer"), max_splits=1, reverse=True
            ).layout.form
        )
    
>       assert ak.str.split_whitespace(string_padded, max_splits=None).tolist() == [
            [["", "αβγ", "", ""], ["", "", ""]],
            [],
            [["", "→δε←", "", ""], ["", "ζz", "zζ", "", ""], ["", "abc", "", ""]],
        ]
E       AssertionError: assert [[['', 'αβγ', '', ''], ['', '', '']], [], [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', ' ']]] == [[['', 'αβγ', '', ''], ['', '', '']], [], [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', '', '']]]
E         
E         At index 2 diff: [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', ' ']] != [['', '→δε←', '', ''], ['', 'ζz', 'zζ', '', ''], ['', 'abc', '', '']]
E         
E         Full diff:
E           [
E               [
E                   [
E                       '',
E                       'αβγ',
E                       '',
E                       '',
E                   ],
E                   [
E                       '',
E                       '',
E                       '',
E                   ],
E               ],
E               [],
E               [
E                   [
E                       '',
E                       '→δε←',
E                       '',
E                       '',
E                   ],
E                   [
E                       '',
E                       'ζz',
E                       'zζ',
E                       '',
E                       '',
E                   ],
E                   [
E                       '',
E                       'abc',
E         -             '',
E         +             ' ',
E         ?              +
E         -             '',
E                   ],
E               ],
E           ]


tests/test_2616_use_pyarrow_for_strings.py:932: AssertionError

@ariostas
Copy link
Member

Seems like split_whitespace has been buggy for a while apache/arrow#37757

@ariostas
Copy link
Member

At least this time it's consistently failing. My guess is that it's because it's using pyarrow 7

@ianna
Copy link
Member

ianna commented Jul 16, 2025

Oh, after merging the main into this one I get more failures in pyarrow tests:

tests/test_2616_use_pyarrow_for_strings.py::test_trim_whitespace PASSED  [ 63%]
Error: test_slice

pyarrow.lib.ArrowInvalid: Negative buffer resize: -40

This error occurred while calling

    ak.str.slice(
        <Array [['αβγ', ''], ..., ['→δε←', ..., 'abc']] type='3 * var * string'>
        1
    )
tests/test_2616_use_pyarrow_for_strings.py::test_slice FAILED            [ 63%]
Error: test_split_whitespace

ValueError: buffer size must be a multiple of element size

This error occurred while calling

    ak.str.split_whitespace(
        <Array-typetracer [...] type='3 * var * string'>
        max_splits = 1
    )
tests/test_2616_use_pyarrow_for_strings.py::test_split_whitespace FAILED [ 63%]
Error: test_split_pattern

ValueError: buffer size must be a multiple of element size

This error occurred while calling

    ak.str.split_pattern(
        <Array-typetracer [...] type='3 * var * string'>
        '123'
        max_splits = 1
    )
tests/test_2616_use_pyarrow_for_strings.py::test_split_pattern FAILED    [ 63%]
Error: test_split_pattern_regex

ValueError: buffer size must be a multiple of element size

This error occurred while calling

    ak.str.split_pattern_regex(
        <Array-typetracer [...] type='3 * var * string'>
        '\\d{3}'
        max_splits = 1
    )
tests/test_2616_use_pyarrow_for_strings.py::test_split_pattern_regex FAILED [ 63%]

@ianna ianna force-pushed the pre-commit-ci-update-config branch from 5dc63e4 to d338564 Compare July 16, 2025 00:34
@ianna ianna marked this pull request as draft July 16, 2025 00:36
@ianna ianna self-assigned this Jul 16, 2025
@ariostas
Copy link
Member

Yesterday @henryiii was experimenting with increasing the version of pyarrow that we use for the tests. I don't know what the result of that was, but it is reasonable since pyarrow 7 is pretty old and even pyarrow 20 is built for Python 3.9

@henryiii
Copy link
Member

I stopped at 12, we could try newer ones. It's not actually a hard coded minimum, just a pin in a requirement files, so I think it's fine to make it as high as we need for this test.

@henryiii
Copy link
Member

It got force pushed away, but https://github.com/scikit-hep/awkward/compare/5dc63e441d7cc11835198e8cbb859010b4fcf792..d33856406364f22c5012121985ec526a6fbdede0 you can see where I was up to 12.

@ariostas
Copy link
Member

@ianna ianna marked this pull request as ready for review July 16, 2025 19:03
@ianna ianna requested review from ariostas and ianna July 16, 2025 19:04
@ariostas
Copy link
Member

I think it's worth continuing what Henry was doing and increase the pyarrow version until it works

Copy link
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all tests pass!

@ariostas
Copy link
Member

@ianna I meant adding back the test that was failing on 3.9 and then increasing the version of pyarrow until it works

@henryiii
Copy link
Member

That could be a follow up

@ianna ianna merged commit a2aaf12 into main Jul 16, 2025
43 checks passed
@ianna ianna deleted the pre-commit-ci-update-config branch July 16, 2025 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants