Conversation
I'm fine either way. |
|
I've also set this up as a separate repo. It works really well separately; the one issue is that testing a new version requires it to be pushed somewhere (a PR is fine). While the in-source version here allows you to make a commit but not push it anywhere. For #1082, I was curious about always stripping first, but didn't want to push that. But other than that, it's fine. This is the config file for the repo: {
"version": 1,
"project": "packaging",
"project_url": "https//github.com/pypa/packaging",
"show_commit_url": "https://github.com/pypa/packaging/commit/",
"repo": "https://github.com/pypa/packaging.git",
"environment_type": "virtualenv",
"build_command": ["python -m pip wheel -w {build_cache_dir} {build_dir}"],
"default_benchmark_timeout": 180,
"regressions_thresholds": {
".*": 0.01
},
"pythons": ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"]
}And you need a .gitignore, otherwise it's the same content pretty much. Oh, and this pyrpoject.toml, mostly generated by uv: [project]
name = "packaging-benchmark"
version = "0.1.0"
description = "Benchmark suite for packaging"
readme = "README.md"
requires-python = ">=3.8"
dependencies = [
"asv",
"pip",
] |
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
f1bab0b to
f7e2bae
Compare
916089a to
1ad6ef8
Compare
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
1ad6ef8 to
d00139d
Compare
benchmarks/version.py
Outdated
| @add_attributes(pretty_name="Version hash") | ||
| def time_hash(self) -> None: | ||
| for v in self.valid_versions: | ||
| hash(Version(v)) | ||
|
|
||
| @add_attributes(pretty_name="Version __str__") | ||
| def time_str(self) -> None: | ||
| for version in self.valid_versions: | ||
| str(Version(version)) | ||
|
|
||
| @add_attributes(pretty_name="Version sorting") | ||
| def time_sort(self) -> None: | ||
| sorted(Version(v) for v in self.valid_versions) |
There was a problem hiding this comment.
I think there are two sort of useful benchmarks:
- Micro benchmarks - testing one very specific thing, you can run millions of times and look at the minimum (which will be the time it ran with the least interruptions), or median, to get a good idea if there was an speed up or slow down of specific code paths, independent of others
- Scenario benchmarks - testing a real world scenario, can only run a few times, but because it takes so long it "averages" itself out
These are micro benchmarks, in which case we should avoid testing both Version construction, which is already benchmarked, and their functionality.
For example, we could have a PR that significantly speeds up Version construction but slows down Version stringification and these benchmarks would not show that trade off, time_str would still speed up if the construction was fast enough making it look like both scenarios improved.
There was a problem hiding this comment.
The problem I was facing is that it's hard to handle the key caching. setup runs once in ASV, so any caching happens once. That not only happens between repeat runs, it also happens between benchmarks if they share the same list. To work around it, I had to have a loop inside the benchmark that cleared the cache first. The other option would be to fill the cache in setup, and not include it in the benchmark at all. That's probably a bit better for a micro-benchmark, but it also isn't quite correct if you are only sorting a version list without triggering the cache for something else.
There was a problem hiding this comment.
Ah yes, I faced the same problem writing ad-hoc performance benchmarks with trying to use timeit, and just gave up on using timeit.
Ideally we'd want to micro benchmark both cached and uncached scenarios separately. I'll do some research to see if there's an approach that works.
There was a problem hiding this comment.
Here's my suggestion:
class TimeVersionSuite:
def setup(self) -> None:
with (DIR / "version_sample.txt").open() as f:
self.versions = [v.strip() for v in f.readlines()]
self.valid_versions = [v for v in self.versions if valid_version(v)]
self.version_objects_cold = [Version(v) for v in self.valid_versions]
self.version_objects_warm = [Version(v) for v in self.valid_versions]
for v in self.version_objects_warm:
_ = v._key
...
@add_attributes(pretty_name="Version sorting (cold cache)")
def time_sort_cold(self) -> None:
"""Sorting when _key needs to be calculated during comparison."""
for v in self.version_objects_cold:
v._key_cache = None
sorted(self.version_objects_cold)
@add_attributes(pretty_name="Version sorting (warm cache)")
def time_sort_warm(self) -> None:
"""Sorting when _key is already cached."""
sorted(self.version_objects_warm)The overhead added to time_sort_cold is tiny, and isn't affected by the cost of version construction.
|
|
||
| @add_attributes(pretty_name="SpecifierSet filter") | ||
| def time_filter(self) -> None: | ||
| list(SpecifierSet(">5.0").filter(self.sample_versions)) |
There was a problem hiding this comment.
The specifier used in SpecifierSet can have a lot of different impacts on the performance, which operator is chosen, if it includes a pre-release in the version, etc.
I suggest we at least start with one simple specifier like this one, and one complex specifier like >=1,~=1.1,!=1.2.1,==1.*,<1.9, we can expect such complex specifiers as multiple requirements combined during resolving requirements.
There was a problem hiding this comment.
Do we also need to fill in or clear the cache?
There was a problem hiding this comment.
Yeah, in fact there are now multiple layers. The version layer:
- Filtering non-Version objects like strings
- Filtering Version objects without key computed
- Filtering Version objects with key computed
And the spec layer:
- Filtering with _spec_version empty
- Filtering with _spec_version not empty
For testing the performance of filtering I think it makes sense to test both with and without _spec_version populated.
To populate _spec_version:
for spec in self.specs_contains_warm:
_ = spec.contains(Version("0"))And to clear it out:
for spec in self.specs_contains_cold:
for s in spec._specs:
s._spec_version = NoneI'm less sure about the different types of versions, there's an argument for testing all three types of versions against the cold/warm specifiers. But if we're only testing one version type it should be warm versions as that is then most clearly testing specifier filtering performance, not version performance.
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
This is the benchmarking suite I've been using, such as for https://iscinumpy.dev/post/packaging-faster. It could be a separate repository; there are good arguments for both.
asvsupports running from a separate repository and from a branch. Since I wrote it in a branch, opening it up here first to see what others think.I haven't used asv before, so open to suggestions on best practices, like if it is usually run from the top level or from a dir, and where to put stuff.