Cranelift: ISLE mid-end performance regression (up to -66.08%)

Hi,

I performance-tested a no-opt version of Cranelift without ISLE mid-end optimizations. (You can confirmed this at https://github.com/prosyslab/wasmtime/commit/c0df585025897098e83703cca343e139ed30a119)

Compared to the latest version of upstream Cranelift, surprisingly, the no-opt version produced a significantly faster x86_64 code for blake3, keccak, xchacha20. This experiment is conducted using the sightglass-cli.

Given that only mid-end rules are removed, some codegen backend might causing problem interacting with the mid-end. I want to investigate this problem, but I'm completely lost which part to look at first. Any comments will be appreciated.


Here is the demonstration:
```
> cargo run --release -- benchmark benchmarks/blake3-scalar/benchmark.wasm --engine engines/wasmtime/v-main/libengine.so engines/wasmtime/v-no-opts/libengine.so --pin
    Finished `release` profile [optimized] target(s) in 0.08s
     Running `target/release/sightglass-cli benchmark benchmarks/blake3-scalar/benchmark.wasm --engine engines/wasmtime/v-main/libengine.so engines/wasmtime/v-no-opts/libengine.so --pin`

execution :: cycles :: benchmarks/blake3-scalar/benchmark.wasm

  Δ = 502562.18 ± 4271.59 (confidence = 99%)

  no-opts/libengine.so is 2.55x to 2.58x faster than main/libengine.so!

  [816778 823363.48 880808] main/libengine.so
  [315506 320801.30 460046] no-opts/libengine.so

compilation :: cycles :: benchmarks/blake3-scalar/benchmark.wasm

  Δ = 51372480.98 ± 1109163.62 (confidence = 99%)

  no-opts/libengine.so is 1.19x to 1.20x faster than main/libengine.so!

  [310860540 313559414.86 330384818] main/libengine.so
  [260245364 262186933.88 278160088] no-opts/libengine.so

instantiation :: cycles :: benchmarks/blake3-scalar/benchmark.wasm

  Δ = 17444.42 ± 10445.70 (confidence = 99%)

  no-opts/libengine.so is 1.06x to 1.23x faster than main/libengine.so!

  [90902 137381.98 258206] main/libengine.so
  [87766 119937.56 198140] no-opts/libengine.so
```


Plus, here are some data for other benchmarks.
`--iterations-per-process 10 --benchmark-phase execution ----pin` is used.

bench | v-no-opts | base |  speedup
:- | -: | -: | -:
blake3-scalar | 320,225 | 868,750 | -63.14%
blake3-simd | 320,689 | 945,427 | -66.08%
bz2 | 88,887,466 | 86,904,121 | 2.28%
pulldown-cmark | 6,630,447 | 6,705,562 | -1.12%
regex | 209,902,394 | 211,477,705 | -0.74%
shootout-base64 | 383,700,851 | 352,817,318 | 8.75%
shootout-keccak | 25,589,899 | 49,540,506 | -48.35%
shootout-xchacha20 | 4,489,570 | 4,816,315 | -6.78%
spidermonkey | 644,434,235 | 627,374,660 | 2.72%


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: ISLE mid-end performance regression (up to -66.08%) #12106

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bench	v-no-opts	base	speedup
blake3-scalar	320,225	868,750	-63.14%
blake3-simd	320,689	945,427	-66.08%
bz2	88,887,466	86,904,121	2.28%
pulldown-cmark	6,630,447	6,705,562	-1.12%
regex	209,902,394	211,477,705	-0.74%
shootout-base64	383,700,851	352,817,318	8.75%
shootout-keccak	25,589,899	49,540,506	-48.35%
shootout-xchacha20	4,489,570	4,816,315	-6.78%
spidermonkey	644,434,235	627,374,660	2.72%

Cranelift: ISLE mid-end performance regression (up to -66.08%) #12106

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions