Skip to content

Limit default threads#1412

Merged
tmccombs merged 3 commits intosharkdp:masterfrom
tmccombs:limit-default-threads
Oct 28, 2023
Merged

Limit default threads#1412
tmccombs merged 3 commits intosharkdp:masterfrom
tmccombs:limit-default-threads

Conversation

@tmccombs
Copy link
Collaborator

@tmccombs tmccombs commented Oct 26, 2023

Set maximum default threads

Set a limit of how many threads fd will use by default. On hosts that
have a large number of cores, using additional threads has diminishing
returns, and having large numbers of threads increases the setup cost.
Thus we don't necessarily want to use the same number of threads as we
have cores.

Fixes: #1203

I think this might help startup time a little bit, since it looks like
on linux at least the std implementation uses syscalls to determine the
parallelism, whereas num_cpus was processing the /proc/cpuinfo file.

It also removes another dependency.
@tmccombs
Copy link
Collaborator Author

this builds on #1410

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Can you please add a ChangeLog with a reference to the bug ticket?

Set a limit of how many threads fd will use by default. On hosts that
have a large number of cores, using additional threads has diminishing
returns, and having large numbers of threads increases the setup cost.
Thus we don't necessarily want to use the same number of threads as we
have cores.

Fixes: sharkdp#1203
@tmccombs tmccombs force-pushed the limit-default-threads branch from 114bc76 to 5ee6365 Compare October 26, 2023 06:48
@tmccombs tmccombs merged commit 95b4dff into sharkdp:master Oct 28, 2023
@tmccombs tmccombs deleted the limit-default-threads branch October 28, 2023 05:26
## Changes

- The default number of threads is now constrained to be at most 16. This should improve startup time on
systems with many CPU cores. (#1203)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but isn't MAX_DEFAULT_THREADS actually 20?

I think it would also be good to note that maximum in the CLI help.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. I had a local change to set it to 16 instead of 20, but that didn't get included in the commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to tweak the limit again post #1422 anyway. I get better perf all the way up to 48 threads on my 24 core CPU and the startup overhead is minimal now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just checked more carefully and the ideal thread count for me is 24 on my 24 core/48 thread CPU. I'm curious how this scales on different machines.

  ./fd-batch -u -j24 --search-path ~/code/bfs/bench/corpus/chromium ran
    1.02 ± 0.04 times faster than ./fd-batch -u -j28 --search-path ~/code/bfs/bench/corpus/chromium
    1.02 ± 0.03 times faster than ./fd-batch -u -j32 --search-path ~/code/bfs/bench/corpus/chromium
    1.07 ± 0.03 times faster than ./fd-batch -u -j20 --search-path ~/code/bfs/bench/corpus/chromium
    1.07 ± 0.03 times faster than ./fd-batch -u -j36 --search-path ~/code/bfs/bench/corpus/chromium
    1.08 ± 0.03 times faster than ./fd-batch -u -j40 --search-path ~/code/bfs/bench/corpus/chromium
    1.09 ± 0.02 times faster than ./fd-batch -u -j48 --search-path ~/code/bfs/bench/corpus/chromium
    1.09 ± 0.02 times faster than ./fd-batch -u -j44 --search-path ~/code/bfs/bench/corpus/chromium
    1.21 ± 0.02 times faster than ./fd-batch -u -j16 --search-path ~/code/bfs/bench/corpus/chromium
    1.52 ± 0.03 times faster than ./fd-batch -u -j12 --search-path ~/code/bfs/bench/corpus/chromium
    2.15 ± 0.04 times faster than ./fd-batch -u -j8 --search-path ~/code/bfs/bench/corpus/chromium

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fd starts slowly on systems with very many cores

4 participants

Comments