Skip to content

Cranelift sinking loop-invariant code into a loop #7283

@alexcrichton

Description

@alexcrichton

Given this input CLIF:

function %foo(i64, i64) {
block0(v0: i64, v1: i64):
    ;; Create a loop-invariant value `v10` which is some operation which
    ;; includes a constant somewhere.
    v8 = load.f64 v0+100
    v9 = f64const 0x1.0000000000000p1
    v10 = fdiv v8, v9

    ;; jump to the loop
    v3 = iconst.i64 0
    jump block2(v3)  ; v3 = 0

block2(v11: i64):
    ;; store the loop-invariant `v10` to memory "somewhere"
    v15 = iadd v0, v11
    store.f64 v10, v15

    ;; loop breakout condition
    v17 = iadd_imm v11, 1
    v19 = icmp_imm ne v17, 100
    brif v19, block2(v17), block1

block1:
    return
}

this will currently optimize as:

$ cargo run compile -p ./foo.clif --target aarch64 --set opt_level=speed
function %foo(i64, i64) fast {
block0(v0: i64, v1: i64):
    v8 = load.f64 v0+100
    v3 = iconst.i64 0
    jump block2(v3)  ; v3 = 0

block2(v11: i64):
    v9 = f64const 0x1.0000000000000p1
    v10 = fdiv.f64 v8, v9  ; v9 = 0x1.0000000000000p1
    v15 = iadd.i64 v0, v11
    store v10, v15
    v20 = iconst.i64 1
    v17 = iadd v11, v20  ; v20 = 1
    v21 = iconst.i64 100
    v19 = icmp ne v17, v21  ; v21 = 100
    brif v19, block2(v17), block1

block1:
    return
}

This notably sinks the fdiv operation, which is loop invariant and originally outside of the loop, into the loop.

After debugging with @elliottt the reason this seems to be happening is that during elaboration to elaborate the fdiv instruction it first elaborates the inputs to the fdiv instruction. When doing so one of the inputs is an f64const which is rematerializable, meaning that the constant value is materialized inside of the loop. Next when elaborating fdiv it sees that one of its inputs is inside of the loop, so it concludes that the fdiv must also itself be inside of the loop. Put another way the rematerialization of the constant argument additionally forces the fdiv to go inside of the loop. This hypothesis was tested by commenting out these lines which avoided putting the fdiv into the loop. This isn't a complete fix, however, because rematerialized integer constants would still cause integer operations to be sunk into loops erroneously.

I remember first noticing this behavior in an older issue about false dependencies where a division operation was sunk into a loop, although the performance regression in that issue was due to something else. @elliottt's and my investigation of #7246 however turned up that this sinking is the cause of the slowdown there. The vdivsd instruction is present in the loop when in the original wasm the f64.div was not present in the loop. The above commenting of remat.isle rules improves the runtime performance of that test case to be as expected.

cc @elliottt and @cfallin as y'all are likely interested in this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions