Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,14 @@
- Breaking: `.git/` is now ignored by default when using `--hidden` / `-H`, use `--no-ignore` / `-I` or
`--no-ignore-vcs` to override, see #1387 and #1396 (@skoriop)


## Bugfixes

## Changes

- The default number of threads is now constrained to be at most 16. This should improve startup time on
systems with many CPU cores. (#1203)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but isn't MAX_DEFAULT_THREADS actually 20?

I think it would also be good to note that maximum in the CLI help.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. I had a local change to set it to 16 instead of 20, but that didn't get included in the commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to tweak the limit again post #1422 anyway. I get better perf all the way up to 48 threads on my 24 core CPU and the startup overhead is minimal now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just checked more carefully and the ideal thread count for me is 24 on my 24 core/48 thread CPU. I'm curious how this scales on different machines.

  ./fd-batch -u -j24 --search-path ~/code/bfs/bench/corpus/chromium ran
    1.02 ± 0.04 times faster than ./fd-batch -u -j28 --search-path ~/code/bfs/bench/corpus/chromium
    1.02 ± 0.03 times faster than ./fd-batch -u -j32 --search-path ~/code/bfs/bench/corpus/chromium
    1.07 ± 0.03 times faster than ./fd-batch -u -j20 --search-path ~/code/bfs/bench/corpus/chromium
    1.07 ± 0.03 times faster than ./fd-batch -u -j36 --search-path ~/code/bfs/bench/corpus/chromium
    1.08 ± 0.03 times faster than ./fd-batch -u -j40 --search-path ~/code/bfs/bench/corpus/chromium
    1.09 ± 0.02 times faster than ./fd-batch -u -j48 --search-path ~/code/bfs/bench/corpus/chromium
    1.09 ± 0.02 times faster than ./fd-batch -u -j44 --search-path ~/code/bfs/bench/corpus/chromium
    1.21 ± 0.02 times faster than ./fd-batch -u -j16 --search-path ~/code/bfs/bench/corpus/chromium
    1.52 ± 0.03 times faster than ./fd-batch -u -j12 --search-path ~/code/bfs/bench/corpus/chromium
    2.15 ± 0.04 times faster than ./fd-batch -u -j8 --search-path ~/code/bfs/bench/corpus/chromium


## Other

# v8.7.1
Expand Down
17 changes: 0 additions & 17 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 0 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ aho-corasick = "1.0"
nu-ansi-term = "0.49"
argmax = "0.3.1"
ignore = "0.4.20"
num_cpus = "1.16"
regex = "1.9.6"
regex-syntax = "0.7"
ctrlc = "3.2"
Expand Down
42 changes: 29 additions & 13 deletions src/cli.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use std::num::NonZeroUsize;
use std::path::{Path, PathBuf};
use std::time::Duration;

Expand Down Expand Up @@ -497,8 +498,8 @@ pub struct Opts {

/// Set number of threads to use for searching & executing (default: number
/// of available CPU cores)
#[arg(long, short = 'j', value_name = "num", hide_short_help = true, value_parser = clap::value_parser!(u32).range(1..))]
pub threads: Option<u32>,
#[arg(long, short = 'j', value_name = "num", hide_short_help = true, value_parser = str::parse::<NonZeroUsize>)]
pub threads: Option<NonZeroUsize>,

/// Milliseconds to buffer before streaming search results to console
///
Expand Down Expand Up @@ -687,17 +688,8 @@ impl Opts {
self.min_depth.or(self.exact_depth)
}

pub fn threads(&self) -> usize {
// This will panic if the number of threads passed in is more than usize::MAX in an environment
// where usize is less than 32 bits (for example 16-bit architectures). It's pretty
// unlikely fd will be running in such an environment, and even more unlikely someone would
// be trying to use that many threads on such an environment, so I think panicing is an
// appropriate way to handle that.
std::cmp::max(
self.threads
.map_or_else(num_cpus::get, |n| n.try_into().expect("too many threads")),
1,
)
pub fn threads(&self) -> NonZeroUsize {
self.threads.unwrap_or_else(default_num_threads)
}

pub fn max_results(&self) -> Option<usize> {
Expand All @@ -719,6 +711,30 @@ impl Opts {
}
}

/// Get the default number of threads to use, if not explicitly specified.
fn default_num_threads() -> NonZeroUsize {
// If we can't get the amount of parallelism for some reason, then
// default to a single thread, because that is safe.
// Note that the minimum value for a NonZeroUsize is 1.
// Unfortunately, we can't do `NonZeroUsize::new(1).unwrap()`
// in a const context.
const FALLBACK_PARALLELISM: NonZeroUsize = NonZeroUsize::MIN;
// As the number of threads increases, the startup time suffers from
// initializing the threads, and we get diminishing returns from additional
// parallelism. So set a maximum number of threads to use by default.
//
// This value is based on some empirical observations, but the ideal value
// probably depends on the exact hardware in use.
//
// Safety: The literal "20" is known not to be zero.
const MAX_DEFAULT_THREADS: NonZeroUsize = unsafe { NonZeroUsize::new_unchecked(20) };

std::cmp::min(
std::thread::available_parallelism().unwrap_or(FALLBACK_PARALLELISM),
MAX_DEFAULT_THREADS,
)
}

#[derive(Copy, Clone, PartialEq, Eq, ValueEnum)]
pub enum FileType {
#[value(alias = "f")]
Expand Down
2 changes: 1 addition & 1 deletion src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ fn construct_config(mut opts: Opts, pattern_regexps: &[String]) -> Result<Config
max_depth: opts.max_depth(),
min_depth: opts.min_depth(),
prune: opts.prune,
threads: opts.threads(),
threads: opts.threads().get(),
max_buffer_time: opts.max_buffer_time,
ls_colors,
interactive_terminal,
Expand Down