chore: benchmarking suite by henryiii · Pull Request #1059 · pypa/packaging

henryiii · 2026-01-14T17:07:54Z

This is the benchmarking suite I've been using, such as for https://iscinumpy.dev/post/packaging-faster. It could be a separate repository; there are good arguments for both. asv supports running from a separate repository and from a branch. Since I wrote it in a branch, opening it up here first to see what others think.

I haven't used asv before, so open to suggestions on best practices, like if it is usually run from the top level or from a dir, and where to put stuff.

brettcannon · 2026-01-14T20:25:58Z

It could be a separate repository; there are good arguments for both. asv supports running from a separate repository and from a branch. Since I wrote it in a branch, opening it up here first to see what others think.

I'm fine either way.

henryiii · 2026-02-11T15:41:35Z

I've also set this up as a separate repo. It works really well separately; the one issue is that testing a new version requires it to be pushed somewhere (a PR is fine). While the in-source version here allows you to make a commit but not push it anywhere. For #1082, I was curious about always stripping first, but didn't want to push that. But other than that, it's fine.

This is the config file for the repo:

{
    "version": 1,
    "project": "packaging",
    "project_url": "https//github.com/pypa/packaging",
    "show_commit_url": "https://github.com/pypa/packaging/commit/",
    "repo": "https://github.com/pypa/packaging.git",
    "environment_type": "virtualenv",
    "build_command": ["python -m pip wheel -w {build_cache_dir} {build_dir}"],
    "default_benchmark_timeout": 180,
    "regressions_thresholds": {
        ".*": 0.01
    },
    "pythons": ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
}

And you need a .gitignore, otherwise it's the same content pretty much. Oh, and this pyrpoject.toml, mostly generated by uv:

[project]
name = "packaging-benchmark"
version = "0.1.0"
description = "Benchmark suite for packaging"
readme = "README.md"
requires-python = ">=3.8"
dependencies = [
  "asv",
  "pip",
]

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

notatallshaw · 2026-02-13T14:17:28Z

benchmarks/version.py

+    @add_attributes(pretty_name="Version hash")
+    def time_hash(self) -> None:
+        for v in self.valid_versions:
+            hash(Version(v))
+
+    @add_attributes(pretty_name="Version __str__")
+    def time_str(self) -> None:
+        for version in self.valid_versions:
+            str(Version(version))
+
+    @add_attributes(pretty_name="Version sorting")
+    def time_sort(self) -> None:
+        sorted(Version(v) for v in self.valid_versions)


I think there are two sort of useful benchmarks:

Micro benchmarks - testing one very specific thing, you can run millions of times and look at the minimum (which will be the time it ran with the least interruptions), or median, to get a good idea if there was an speed up or slow down of specific code paths, independent of others

Scenario benchmarks - testing a real world scenario, can only run a few times, but because it takes so long it "averages" itself out

These are micro benchmarks, in which case we should avoid testing both Version construction, which is already benchmarked, and their functionality.

For example, we could have a PR that significantly speeds up Version construction but slows down Version stringification and these benchmarks would not show that trade off, time_str would still speed up if the construction was fast enough making it look like both scenarios improved.

The problem I was facing is that it's hard to handle the key caching. setup runs once in ASV, so any caching happens once. That not only happens between repeat runs, it also happens between benchmarks if they share the same list. To work around it, I had to have a loop inside the benchmark that cleared the cache first. The other option would be to fill the cache in setup, and not include it in the benchmark at all. That's probably a bit better for a micro-benchmark, but it also isn't quite correct if you are only sorting a version list without triggering the cache for something else.

Ah yes, I faced the same problem writing ad-hoc performance benchmarks with trying to use timeit, and just gave up on using timeit.

Ideally we'd want to micro benchmark both cached and uncached scenarios separately. I'll do some research to see if there's an approach that works.

Here's my suggestion:

class TimeVersionSuite: def setup(self) -> None: with (DIR / "version_sample.txt").open() as f: self.versions = [v.strip() for v in f.readlines()] self.valid_versions = [v for v in self.versions if valid_version(v)] self.version_objects_cold = [Version(v) for v in self.valid_versions] self.version_objects_warm = [Version(v) for v in self.valid_versions] for v in self.version_objects_warm: _ = v._key ... @add_attributes(pretty_name="Version sorting (cold cache)") def time_sort_cold(self) -> None: """Sorting when _key needs to be calculated during comparison.""" for v in self.version_objects_cold: v._key_cache = None sorted(self.version_objects_cold) @add_attributes(pretty_name="Version sorting (warm cache)") def time_sort_warm(self) -> None: """Sorting when _key is already cached.""" sorted(self.version_objects_warm)

The overhead added to time_sort_cold is tiny, and isn't affected by the cost of version construction.

notatallshaw · 2026-02-14T15:33:57Z

benchmarks/specifiers.py

+
+    @add_attributes(pretty_name="SpecifierSet filter")
+    def time_filter(self) -> None:
+        list(SpecifierSet(">5.0").filter(self.sample_versions))


The specifier used in SpecifierSet can have a lot of different impacts on the performance, which operator is chosen, if it includes a pre-release in the version, etc.

I suggest we at least start with one simple specifier like this one, and one complex specifier like >=1,~=1.1,!=1.2.1,==1.*,<1.9, we can expect such complex specifiers as multiple requirements combined during resolving requirements.

Do we also need to fill in or clear the cache?

Yeah, in fact there are now multiple layers. The version layer:

Filtering non-Version objects like strings

Filtering Version objects without key computed

Filtering Version objects with key computed

And the spec layer:

Filtering with _spec_version empty

Filtering with _spec_version not empty

For testing the performance of filtering I think it makes sense to test both with and without _spec_version populated.

To populate _spec_version:

for spec in self.specs_contains_warm: _ = spec.contains(Version("0"))

And to clear it out:

for spec in self.specs_contains_cold: for s in spec._specs: s._spec_version = None

I'm less sure about the different types of versions, there's an argument for testing all three types of versions against the cold/warm specifiers. But if we're only testing one version type it should be warm versions as that is then most clearly testing specifier filtering performance, not version performance.

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

henryiii added 11 commits February 11, 2026 10:45

chore: add some ASV based benchmarking

87c726d

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

chore: more benchmarks added

45c9fc5

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

chore: more benchmarks

c3f102a

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

fix: version __str__ issue

44534ef

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

chore: add canonicalize_name benchmark

81addc4

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

chore: add zeros benchmark

a27b8cf

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

chore: swap zeros for comparison

e170d31

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

fix: setup only runs once, cache needs clearing

9ddac11

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

fix: convert to Version in benchmarks too

38fed81

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

fix: test a range a Pythons

871309a

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

fix: better multi-env config

f7e2bae

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

henryiii force-pushed the henryiii/chore/bench branch from f1bab0b to f7e2bae Compare February 11, 2026 15:45

This was referenced Feb 11, 2026

perf: Add fast path for parsing simple versions (digits and dots only) #1082

Merged

perf: Add fast path for Version to Version comparison by skipping _key property #1083

Open

henryiii force-pushed the henryiii/chore/bench branch 10 times, most recently from 916089a to 1ad6ef8 Compare February 12, 2026 22:37

perf: run performance check on PRs

d00139d

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

henryiii force-pushed the henryiii/chore/bench branch from 1ad6ef8 to d00139d Compare February 12, 2026 22:51

henryiii added 2 commits February 12, 2026 18:02

Modify ASV command to include --no-only-changed option

9a011de

Update ASV benchmark summary output variable

5199931

henryiii added 2 commits February 12, 2026 21:09

Redirect stderr to stdout in ASV benchmark run

0e401b2

Remove unnecessary output redirection in ASV command

7226f0a

notatallshaw reviewed Feb 13, 2026

View reviewed changes

notatallshaw reviewed Feb 14, 2026

View reviewed changes

chore: update benchmarks from suggestions

c4fefdc

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: benchmarking suite#1059

chore: benchmarking suite#1059
henryiii wants to merge 17 commits intopypa:mainfrom
henryiii:henryiii/chore/bench

henryiii commented Jan 14, 2026 •

edited

Loading

Uh oh!

brettcannon commented Jan 14, 2026

Uh oh!

henryiii commented Feb 11, 2026 •

edited

Loading

Uh oh!

notatallshaw Feb 13, 2026

Uh oh!

henryiii Feb 13, 2026

Uh oh!

notatallshaw Feb 13, 2026 •

edited

Loading

Uh oh!

notatallshaw Feb 14, 2026

Uh oh!

notatallshaw Feb 14, 2026 •

edited

Loading

Uh oh!

henryiii Feb 14, 2026

Uh oh!

notatallshaw Feb 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

henryiii commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brettcannon commented Jan 14, 2026

Uh oh!

henryiii commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notatallshaw Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

henryiii Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

notatallshaw Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

notatallshaw Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

notatallshaw Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henryiii Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

notatallshaw Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

henryiii commented Jan 14, 2026 •

edited

Loading

henryiii commented Feb 11, 2026 •

edited

Loading

notatallshaw Feb 13, 2026 •

edited

Loading

notatallshaw Feb 14, 2026 •

edited

Loading

notatallshaw Feb 14, 2026 •

edited

Loading