Skip to content

Improve startup time #1408

@sharkdp

Description

@sharkdp

fds startup time is quite slow. On my 12 core system, it takes ~ 20 ms for "searching" an empty folder. This is fast enough not to be noticeable by humans, but it looks bad in benchmarks when comparing fd with other tools on small folders 1. And it's also an actual problem for use cases where fd is called repeatedly from a script.

Some of that overhead is caused by the spawning of threads, and that problem is already tracked in #1203. But I think there is more that can be done. Instead of using my usual go-to performance tool (perf), let's look at the magic-trace output of a fd call in an empty folder 2. If someone is interested, I've attached the trace to this post. Go to https://magic-trace.org/ to load it in their viewer.

The full trace looks like this:

image

The first 2.2 ms are typical process startup things (before main). I don't think there is any room for optimization here (?)

image

The next ~2 ms are more interesting:

image

some notable steps (even if insignificant in time)

  • parsing command-line arguments (724 µs)
  • a "is_existing_directory" check on the search path (28 µs)
  • the parsing of the (empty) search pattern regex (32 µs)
  • isatty check (5 ·s)
  • LsColors::from_env (579 µs)
  • num_cpus::linux::get_num_cpus` (441 µs)
  • RegexBuilder::build (171 µs)

Some things were surprising to me. I didn't expect the get_num_cpus call to take this long. There might be some room for improvement here by doing things in parallel (e.g. LsColors::from_env)? But only if the thread overhead is not too high.

Then we start the actual scan, which takes the majority of the time:

image

Here, I'm not so sure how to interpret the trace, as things are actually happening on multiple threads. But we can (presumably) see some of the thread spawning/joining time here (~ 5 ms):

image

and some gitignore matcher logic going on here (370 µs total):

image

Most of the time is actually unaccounted for in the trace, because I can only see:

image

We can see a bit more when switching off LTO:

image

Apparently, 11 ms are spent in crossbeam_channel::channel::bounded's from_iter method? (probably the receive call?) — even though we don't have any work to do. On a -j1 run, this part only takes 1 ms.

Footnotes

  1. Those "small" folders can be pretty large, actually. It takes hundreds of thousands of files before we can make up for the startup "penalty".

  2. I recently discovered this and used it successfully to benchmark (and then optimize) the startup time of other programs.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions