Cranelift sinking loop-invariant code into a loop

Given this input CLIF:

```clif
function %foo(i64, i64) {
block0(v0: i64, v1: i64):
    ;; Create a loop-invariant value `v10` which is some operation which
    ;; includes a constant somewhere.
    v8 = load.f64 v0+100
    v9 = f64const 0x1.0000000000000p1
    v10 = fdiv v8, v9

    ;; jump to the loop
    v3 = iconst.i64 0
    jump block2(v3)  ; v3 = 0

block2(v11: i64):
    ;; store the loop-invariant `v10` to memory "somewhere"
    v15 = iadd v0, v11
    store.f64 v10, v15

    ;; loop breakout condition
    v17 = iadd_imm v11, 1
    v19 = icmp_imm ne v17, 100
    brif v19, block2(v17), block1

block1:
    return
}
```

this will currently optimize as:

```
$ cargo run compile -p ./foo.clif --target aarch64 --set opt_level=speed
function %foo(i64, i64) fast {
block0(v0: i64, v1: i64):
    v8 = load.f64 v0+100
    v3 = iconst.i64 0
    jump block2(v3)  ; v3 = 0

block2(v11: i64):
    v9 = f64const 0x1.0000000000000p1
    v10 = fdiv.f64 v8, v9  ; v9 = 0x1.0000000000000p1
    v15 = iadd.i64 v0, v11
    store v10, v15
    v20 = iconst.i64 1
    v17 = iadd v11, v20  ; v20 = 1
    v21 = iconst.i64 100
    v19 = icmp ne v17, v21  ; v21 = 100
    brif v19, block2(v17), block1

block1:
    return
}
```

This notably sinks the `fdiv` operation, which is loop invariant and originally outside of the loop, into the loop.

After debugging with @elliottt the reason this seems to be happening is that during elaboration to elaborate the `fdiv` instruction it first elaborates the inputs to the `fdiv` instruction. When doing so one of the inputs is an `f64const` which is rematerializable, meaning that the constant value is materialized inside of the loop. Next when elaborating `fdiv` it sees that one of its inputs is inside of the loop, so it concludes that the `fdiv` must also itself be inside of the loop. Put another way the rematerialization of the constant argument additionally forces the `fdiv` to go inside of the loop. This hypothesis was tested by [commenting out these lines](https://github.com/bytecodealliance/wasmtime/blob/5c7ed43058fd8bc3122b20debe712c9cbfecf9f2/cranelift/codegen/src/opts/remat.isle#L28-L31) which avoided putting the `fdiv` into the loop. This isn't a complete fix, however, because rematerialized integer constants would still cause integer operations to be sunk into loops erroneously.

I remember first noticing this behavior in an [older issue](https://github.com/bytecodealliance/wasmtime/issues/7085) about false dependencies where a division operation was sunk into a loop, although the performance regression in that issue was due to something else. @elliottt's and my investigation of https://github.com/bytecodealliance/wasmtime/issues/7246 however turned up that this sinking is the cause of the slowdown there. The `vdivsd` instruction is present in the loop when in the original wasm the `f64.div` was not present in the loop. The above commenting of `remat.isle` rules improves the runtime performance of that test case to be as expected.

cc @elliottt and @cfallin as y'all are likely interested in this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift sinking loop-invariant code into a loop #7283

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cranelift sinking loop-invariant code into a loop #7283

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions