Cranelift: GVN spectre guards and run redundant load elimination twice#5517
Conversation
While these instructions have a side effect that is otherwise invisible to the optimizer, the side effect in question is idempotent, so it can be de-duplicated by GVN.
This allows us to actually replace redundant Wasm loads with dynamic memories. While this improves our hand-crafted test sequences, it doesn't seem to have any improvement on sightglass benchmarks run with dynamic memories, however it also isn't a hit to compilation times, so seems generally good to land anyways: ``` $ cargo run --release -- benchmark -e ~/scratch/once.so -e ~/scratch/twice.so -m insts-retired --processes 20 --iterations-per-process 3 --engine-flags="--static-memory-maximum-size 0" -- benchmarks/default.suite compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [683595240 683768610.53 684097577] once.so [683597068 700115966.83 1664907164] twice.so instantiation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [44107 60411.07 92785] once.so [44138 59552.32 92097] twice.so compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [17369916 17404839.78 17471458] once.so [17369935 17625713.87 30700150] twice.so compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [126523640 126566170.80 126648265] once.so [126523076 127174580.30 163145149] twice.so instantiation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [34569 35686.25 36513] once.so [34651 35749.97 36953] twice.so instantiation :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [35146 36639.10 37707] once.so [34472 36580.82 38431] twice.so execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [7055720115 7055841324.82 7056180024] once.so [7055717681 7055877095.85 7056225217] twice.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [46436881 46437081.28 46437691] once.so [46436883 46437127.68 46437766] twice.so execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm No difference in performance. [653010530 653010533.27 653010539] once.so [653010531 653010532.95 653010538] twice.so ```
| ;; @006e return v13, v23 | ||
| ;; } | ||
| ;; @006e return v13, v13 | ||
| ;; } No newline at end of file |
There was a problem hiding this comment.
nit: missing trailing newline
There was a problem hiding this comment.
This is the CRANELIFT_TEST_BLESS=1 update code, can fix it in a follow up PR.
| ;; @005c v20 = select_spectre_guard v10, v9, v8 ; v9 = 0 | ||
| ;; @005c v21 = load.i32 little heap v20 | ||
| ;; v21 -> v12 |
| .other_side_effects(true) | ||
| // We can de-duplicate spectre selects since the side effect is | ||
| // idempotent. | ||
| .side_effects_okay_for_gvn(true), |
There was a problem hiding this comment.
Do we actually need select_spectre_guard to have "side effects"? The important part is that it lowers to a conditional move whose output can't be speculated, right? Moving, duplicating, merging or removing it (as prevented by other_side_effects(true) shouldn't matter, right?
There was a problem hiding this comment.
I had this discussion with @cfallin (out of office right now) and the conclusion was that we probably didn't technically need it, but leaving it as-is was a nice way to be conservative here, since we want to be really careful with it.
I do plan to use side_effects_okay_for_gvn for some more stuff (eg uadd_overflow_trap) that is not movable (ie invalid to LICM hoist) but is GVN-able, so it isn't like this infrastructure is completely unnecessary.
There was a problem hiding this comment.
Maybe call it side_effects_idempotent and add a comment that it allows merging instructions into an identical instruction earlier in the runtime program order but doesn't allow moving?
These two changes are collectively required to deduplicate two identical Wasm loads with dynamic memories.