-
-
Notifications
You must be signed in to change notification settings - Fork 975
Description
EDIT: Ah, happy new year!
Describe the bug you encountered:
When comparing the speed of fd with musl vs glibc, they are very comparable with -j1, but when the core count goes above a certain threshold, musl gets diminishing returns, to the point that with -j12, on a 6-core system (with hyperthreading), it's actually slower than with -j4.
With fd + musl:
~ ➜ time fd -j1 freebsd ~/ >/dev/null
________________________________________________________
Executed in 3.17 secs fish external
usr time 1.81 secs 0.00 micros 1.81 secs
sys time 1.36 secs 599.00 micros 1.36 secs
~ took 3s ➜ time fd -j4 freebsd ~/ >/dev/null
________________________________________________________
Executed in 2.11 secs fish external
usr time 5.40 secs 621.00 micros 5.40 secs
sys time 2.47 secs 46.00 micros 2.47 secs
~ took 2s ➜ time fd -j8 freebsd ~/ >/dev/null
________________________________________________________
Executed in 2.62 secs fish external
usr time 11.58 secs 476.00 micros 11.58 secs
sys time 7.39 secs 35.00 micros 7.39 secs
~ took 2s ➜ time fd -j12 freebsd ~/ >/dev/null
________________________________________________________
Executed in 2.98 secs fish external
usr time 16.22 secs 0.00 micros 16.22 secs
sys time 15.78 secs 559.00 micros 15.78 secs
With fd + musl + pre-loaded jemalloc:
~ took 2s ➜ time LD_PRELOAD=/usr/lib/libjemalloc.so.2 fd -j1 freebsd ~/ >/dev/null
________________________________________________________
Executed in 2.75 secs fish external
usr time 1.43 secs 0.00 micros 1.43 secs
sys time 1.32 secs 913.00 micros 1.32 secs
~ ➜ time LD_PRELOAD=/usr/lib/libjemalloc.so.2 fd -j4 freebsd ~/ >/dev/null
________________________________________________________
Executed in 790.87 millis fish external
usr time 1.75 secs 930.00 micros 1.75 secs
sys time 1.37 secs 0.00 micros 1.37 secs
~ ➜ time LD_PRELOAD=/usr/lib/libjemalloc.so.2 fd -j8 freebsd ~/ >/dev/null
________________________________________________________
Executed in 552.14 millis fish external
usr time 2.40 secs 0.00 micros 2.40 secs
sys time 1.89 secs 882.00 micros 1.89 secs
~ ➜ time LD_PRELOAD=/usr/lib/libjemalloc.so.2 fd -j12 freebsd ~/ >/dev/null
________________________________________________________
Executed in 539.76 millis fish external
usr time 3.42 secs 828.00 micros 3.42 secs
sys time 2.76 secs 0.00 micros 2.76 secs
With fd + glibc:
[chroot] /home/ericonr > time fd -j1 freebsd ~/ >/dev/null
________________________________________________________
Executed in 2,73 secs fish external
usr time 1,28 secs 488,00 micros 1,28 secs
sys time 1,35 secs 141,00 micros 1,35 secs
[chroot] /home/ericonr > time fd -j4 freebsd ~/ >/dev/null
________________________________________________________
Executed in 854,58 millis fish external
usr time 1,75 secs 0,00 micros 1,75 secs
sys time 1,61 secs 585,00 micros 1,61 secs
[chroot] /home/ericonr > time fd -j8 freebsd ~/ >/dev/null
________________________________________________________
Executed in 601,37 millis fish external
usr time 2,25 secs 439,00 micros 2,25 secs
sys time 2,35 secs 144,00 micros 2,35 secs
[chroot] /home/ericonr > time fd -j12 freebsd ~/ >/dev/null
________________________________________________________
Executed in 558,01 millis fish external
usr time 3,10 secs 612,00 micros 3,10 secs
sys time 3,15 secs 0,00 micros 3,15 secs
With perf, I can track the issue (apparently) down to __lock:
Children Self Command Shared Object Symbol
+ 29.83% 29.58% fd libc.so [.] __lock
+ 27.29% 0.00% fd libc.so [.] a_cas (inlined)
This matches with what was observed: with more threads, you get more lock contention, and can end up spending way too much time locked. I'd expect a similar effect to be observable with hardened_malloc from GrapheneOS and possibly OpenBSD's malloc. Theoretically speaking, rust might not require a hardened malloc, but you usually don't want to carry your own malloc for each application either, so it'd be nice if this could perform better "out of the box", by simply performing less allocations (which helps by removing most of the dependency on malloc performance). That said, I don't know how feasible fixing this is.
Describe what you expected to happen:
I'd expected all versions of the test to show similar growth in execution speed. Note that this is in my ~/ directory, because otherwise execution is short enough that you can't spot the issue (the numbers with a tarball of the linux kernel were too close).
What version of fd are you using?
fd 8.2.1
Which operating system / distribution are you on?
Linux (kernel 5.10.2), with musl 1.2.1
glibc used for testing was 2.30
jemalloc used for LD_PRELOAD was 5.2.1