Conversation
c9db103 to
1f64976
Compare
|
@henryiii I think you mentioned you have some script that parses every version from PyPI? I expect this branch to be about ~2x faster, will be interesting to see real world trade off of simple vs. non-simple versions. |
b507aaf to
cdeda42
Compare
|
I realized for the fast path simple version we can drop the regex and just directly do: tuple(map(int, version.strip().split(".")))My quick benchmarking on Python 3.14 shows doing this and catching the exception if it fails speeds up simple version parsing by 4.2x [faster], and slows down non-simple parsing 0.73x [slower], compared to main. So I came up with a fairly fast check using a frozenset that limited the overhead cost for non-simple versions, with this simple version parsing is sped up by 3.15x [faster], but the slow down in non-simple parsing is only 0.95x [slower], compared to main! |
|
FYI, I tried pretty much exactly this (probably still in a stash), and did see it was faster, but decided to work on over all performance of the regex and didn't come back to it, largely due to worries about the non-fast case getting slower. But it looks like you have that impact pretty small (for a nice fast-path improvement). I can check on the full version list soon(ish). (Edit: "this" being the 4.2x, 0.73x one) |
$ uv run tasks/benchmark_versions.py
Counter({'release': 7086247, 'pre': 532193, 'dev': 451268, 'post': 97508, 'invalid': 5422, 'epoch': 1155, 'local': 6})
Loaded 8,168,377 versions
Time: 9.9128 seconds
Per version: 0.242712358 µs
$ gh co 1082
$ uv run tasks/benchmark_versions.py
Counter({'release': 7086247, 'pre': 532193, 'dev': 451268, 'post': 97508, 'invalid': 5422, 'epoch': 1155, 'local': 6})
Loaded 8,168,377 versions
Time: 4.3251 seconds
Per version: 0.105899591 µs
$ git stash apply
$ uv run tasks/benchmark_versions.py
Counter({'release': 7086247, 'pre': 532193, 'dev': 451268, 'post': 97508, 'invalid': 5422, 'epoch': 1155, 'local': 6})
Loaded 8,168,377 versions
Time: 4.3296 seconds
Per version: 0.106007648 µsFinal version is with the diff above, using |
|
>>> a = '٠١٢.٣٤٥.٦٧٨٩'
>>> a.replace(".", "").isdecimal()
True
>>> tuple(map(int, a.split(".")))
(12, 345, 6789) |
|
@henryiii is that testing distinct versions or all versions for every project regardless of duplication? Either way I think this approach catches a nice balance, all your work on speeding up the general case isn't too badly impacted because which ever of these tests we choose then overhead is low enough. I want to benchmark the overhead across a number of Python versions to make sure there's not an unexpected regression. |
|
It's every version from PyPI, not deduplicated. |
|
I think this is ready, I tried the performance impact Python versions 3.8 to 3.15. I found there was variation, but generally this was a significant speed up, and that Python is trending faster on both main and this PR. |
uv run asv continuous main pull/1082/head --sort default
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY. |
|
FYI, I tried always stripping, and dropping the additional whitespace from the regex: diff --git a/src/packaging/version.py b/src/packaging/version.py
index 95c8e8d..f159323 100644
--- a/src/packaging/version.py
+++ b/src/packaging/version.py
@@ -343,7 +343,7 @@ class Version(_BaseVersion):
__slots__ = ("_dev", "_epoch", "_key_cache", "_local", "_post", "_pre", "_release")
__match_args__ = ("_str",)
- _regex = re.compile(r"\s*" + VERSION_PATTERN + r"\s*", re.VERBOSE | re.IGNORECASE)
+ _regex = re.compile(VERSION_PATTERN, re.VERBOSE | re.IGNORECASE)
_epoch: int
_release: tuple[int, ...]
@@ -364,6 +364,7 @@ class Version(_BaseVersion):
If the ``version`` does not conform to PEP 440 in any way then this
exception will be raised.
"""
+ version = version.strip()
if _SIMPLE_VERSION_INDICATORS.issuperset(version):
try:
self._release = tuple(map(int, version.split(".")))$ uv run asv continuous add-fast-path-for-parsing-simple-versions^ add-fast-path-for-parsing-simple-versions --sort default --no-only-changed
...
BENCHMARKS NOT SIGNIFICANTLY CHANGED.Looking at the results, it seems to me it is a hair (1-2%) slower on average. By the way, I had to do that with #1059, the separate repo would have requirement me to push it somewhere with an open PR. Table
|
Co-authored-by: Henry Schreiner <HenrySchreinerIII@gmail.com>
984e2de to
d6c0a36
Compare
This is a performance trade off, it fast paths versions only containing digits and dots (e.g. 1.2.3), at a slight cost to versions which are more complex (include any dev/pre/post/local components).
Testing on Python 3.14 on my machine and using these versions:
Simple: 1.0.0, 2.1.3, 0.0.1, 1.2.3, 2021.1.1, 1.2.3.4.5.6.7.8, 3.8.0, 1.0, 2.0, 10.5.3
Non-simple: 1.0.0a1, 2.0.0b2, 3.0.0rc1, 1.0.0.post1, 1.0.0.dev1, 1.0.0a1.dev1
I see simple versions go from an median of 1.16 µs to 0.49 µs (2.37x [faster]) and non-simple versions go from a median of 1.16 µs to 1.38 µs (0.84x [slower]).
Probably could do with a wider range of performance testing before accepting..