PERF: get_requirement() raise cache from 2048 to 8192 elements#12873
PERF: get_requirement() raise cache from 2048 to 8192 elements#12873notatallshaw merged 3 commits intopypa:mainfrom
Conversation
b9eef58 to
04b2630
Compare
|
FWIW, this, I think, was clear when it was changed to 2048: #12663 (comment) Originally when packaging 22.0+ was going to be vendored the plan was to drop the cache altogether. But as I showed in that PR, there are real world highly esoteric backtracking scenarios where it makes a noticable difference. I don't have an opinion on if this is a good change or not, but I do think it would be worth running a memory profiler, like memray to see it changes the peak memory usage (though due to #12834 it will be a small fraction of the total). |
|
ping on this. any interest in getting this merged? I'd like to get the few percent of performance :D |
|
Do you have a reproducible example that shows this performance improvement? |
small performance improvements when installing hundreds of packages.
|
alright, run on the largest envs I could find, 697 dependencies. |
a9be84a to
3634e87
Compare
|
I've resent the PR. adjusted to 10000 max cache to fit large venvs. updated the news items. updated to main branch from today. |
|
The question of how this affects memory usage has still not been addressed. Also, how does this affect performance/memory use of small installs? For instance, does the large cache slow down installation of a single requirement (something like I'm not against this change, and I know micro-optimisations like this can add up, but I'd caution against spending too much time on small improvements like this without having a better sense of where the real bottlenecks are. |
|
hello @pfmoore , this has no effect on memory usage. measuring memory usage on a few runs, the memory vary around 80-90MB. irrelevant of the cache size. the cache size makes no meaningful difference.
the bottleneck is on parsing the requirement strings. it's pretty clear. |
|
I agree that the memory difference is small enough that it's not really measurable within the margin of error. Here's a snapshot on main branch for a large resolution just before it completes: And here's a snapshot on this branch: This is sample profiling so don't read too deeply into some of the differences. But in broad strokes, when connecting to a JSON Simple API the memory is dominated by |
|
If there is no/little appreciable gain in memory usage due to this change, I'm fine with it. It seems a bit silly to need such a big cache for simply parsing requirement strings, but I'd imagine that reworking pip/packaging to avoid the parsing would be hard/impossible. |
|
I agree with @ichard26 here - @notatallshaw you added this to the 25.1 milestone, do you plan on merging it? If not, can you remove the milestone and it can go in when it's ready - there's no real need to tie it to a particular release. |
|
Thanks for the ping, I had been planning to merge. |




Hello,
To summarize, the cache was too small :D
Should we have a NEWS item for this? It's very internal with no relevance to end users. EDIT: I added the news item since the CI was red
2% faster pip install on dry run, less in actual run.
You can debug the cache with
print(f"parsing {req_string} {get_requirement.cache_info()}")before:

after:

Regards.