x64: Migrate {s,u}{div,rem} to ISLE by alexcrichton · Pull Request #6008 · bytecodealliance/wasmtime

alexcrichton · 2023-03-13T20:46:43Z

This commit migrates most of the logic of the div/rem CLIF instructions to ISLE for the x64 backend. Previously most of the logic was encapsulated between the emit_div_rem_seq method and the CheckedDivOrRemSeq pseudo-instruction. Nick and I found last week, however, that this poorly interacts with division-by-constants because despite being a constant checks were still emitted to ensure it wasn't a divide-by-zero or similar. To improve this constant case I figured it'd be a good opportunity to move things to ISLE.

Unfortunately though these instructions couldn't be moved 100% to ISLE. Signed division, for example, needs to access labels/jumps to check for INT_MIN / -1. Signed remainder needs to not actually execute idiv when the divisor is -1 and instead return 0 as well. This means that while the bulk of the logic now lives in ISLE but a chunk still lives inside the pseudo-instructions in emit.rs.

Specifically the changes in this PR are:

The CheckedDivOrRemSeq pseudo-instruction and its helpers were removed.
The Div instruction was split into Div and Div8 since 8-bit division has different regalloc behavior.
New ValidateSdivDivisor{,64} instructions were added. The 64-bit one is different because it needs a temporary register as input.
New CheckedSRem{,8} instructions were added. These assume a check-for-zero has already happened but internally do branching to handle a -1 divisor. The 8-bit version is different due to different regalloc behavior.
The ISLE for x64 now special-cases div/rem by constants to elide checks if it can.

Overall despite replacing one pseudo-op with 4 I feel that this is a simplification relative to prior. The majority of the logic now resides in ISLE and the isel optimizations around constants are more obvious and easier to understand. Additionally the fiddly-bits about how idiv/div work on x64 are encoded in ISLE rules too.

I still think there's room to improve here in the long run. For example Wasmtime currently sets avoid_div_traps=true which isn't actually necessary. Most checks can be elided here to defer to signal handling to catch issues for x64. I hope to do this in a follow-up PR later on.

This adds a suite of `*.clif` files which are intended to test the `avoid_div_traps=true` compilation of the `{s,u}{div,rem}` instructions.

Move the 8-bit `Div` logic into a dedicated `Div8` instruction to avoid having conditionally-used registers with respect to regalloc.

cfallin

This looks right to me; of course the main source of truth here is that the Wasm testsuite remains happy, because it hits the corners (div-by-0, INT_MIN / -1) whose trapping behavior we carry over into CLIF semantics.

I've added some thoughts on comment clarifications but nothing major. Otherwise this is very nice and much better-factored than our existing code -- the semantics of "safe divisor" checks that trap or pass through are very clear. Thanks!

cfallin · 2023-03-13T20:55:38Z

-                sink.bind_label(do_op);
-            }
+            // Here, divisor == -1.
+            // x % -1 = 0; put the result into the destination, $rax.


The rax here is outdated (left over from when we used pinned vregs rather then SSA temps); just "into the destination" is correct I think.

Oh I forgot to update this, but I'll leave it as rax/rdx since this is still using fixed registers for the idiv instruction it embeds. For 8-bit srem this'll be %rax and for 16-to-64-bits this is %rdx. It's a bit subtle where this opcode always produces the remainder (due to this branch) but only conditionally produces the quotient due to the branch here not defining the quotient. That works for the lowerings today (since only srem uses this which uses the remainder)

Is there a better way to model this with regalloc constraints though? Should I remove the dst_quotient parameter to the instruction and inform regalloc that otherwise rax/rdx are both defined during the instruction?

Ah, no, the constraints are the right way; the "separate reservation of rax/rdx" was the pre-SSA methodology that carries other issues and that we've moved away from.

I guess it's just a matter of semantics; the shift in mindset has us referring to sources/dests symbolically everywhere except at the constraints, even if here at emit time it is always rax. Happy to keep the comment here though if you'd prefer it to be clearer/more explicit.

Ok makes sense. I've reworked the comments in this area to hopefully be a bit more precise about what's going on on both halves of this branching and why dst_quotient isn't initialized on one branch.

github-actions · 2023-03-13T21:45:22Z

Subscribe to Label Action

cc @cfallin, @fitzgen, @saulecabrera

Details

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle", "winch"

Thus the following users have been cc'd because of the following labels:

cfallin: isle
fitzgen: isle
saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Prior to this commit Wasmtime would configure `avoid_div_traps=true` unconditionally for Cranelift. This, for the division-based instructions, would change emitted code to explicitly trap on trap conditions instead of letting the `div` x86 instruction trap. There's no specific reason for Wasmtime, however, to specifically avoid traps in the `div` instruction. This means that the extra generated branches on x86 aren't necessary since the `div` and `idiv` instructions already trap for similar conditions as wasm requires. This commit instead disables the `avoid_div_traps` setting for Wasmtime's usage of Cranelift. Subsequently the codegen rules were updated slightly: * When `avoid_div_traps=true`, traps are no longer emitted for `div` instructions. * The `udiv`/`urem` instructions now list their trap as divide-by-zero instead of integer overflow. * The lowering for `sdiv` was updated to still explicitly check for zero but the integer overflow case is deferred to the instruction itself. * The lowering of `srem` no longer checks for zero and the listed trap for the `div` instruction is a divide-by-zero. This means that the codegen for `udiv` and `urem` no longer have any branches. The codegen for `sdiv` removes one branch but keeps the zero-check to differentiate the two kinds of traps. The codegen for `srem` removes one branch but keeps the -1 check since the semantics of `srem` mismatch with the semantics of `idiv` with a -1 divisor (specifically for INT_MIN). This is unlikely to have really all that much of a speedup but was something I noticed during bytecodealliance#6008 which seemed like it'd be good to clean up. Plus Wasmtime's signal handling was already set up to catch `SIGFPE`, it was just never firing.

* x64: Take SIGFPE signals for divide traps Prior to this commit Wasmtime would configure `avoid_div_traps=true` unconditionally for Cranelift. This, for the division-based instructions, would change emitted code to explicitly trap on trap conditions instead of letting the `div` x86 instruction trap. There's no specific reason for Wasmtime, however, to specifically avoid traps in the `div` instruction. This means that the extra generated branches on x86 aren't necessary since the `div` and `idiv` instructions already trap for similar conditions as wasm requires. This commit instead disables the `avoid_div_traps` setting for Wasmtime's usage of Cranelift. Subsequently the codegen rules were updated slightly: * When `avoid_div_traps=true`, traps are no longer emitted for `div` instructions. * The `udiv`/`urem` instructions now list their trap as divide-by-zero instead of integer overflow. * The lowering for `sdiv` was updated to still explicitly check for zero but the integer overflow case is deferred to the instruction itself. * The lowering of `srem` no longer checks for zero and the listed trap for the `div` instruction is a divide-by-zero. This means that the codegen for `udiv` and `urem` no longer have any branches. The codegen for `sdiv` removes one branch but keeps the zero-check to differentiate the two kinds of traps. The codegen for `srem` removes one branch but keeps the -1 check since the semantics of `srem` mismatch with the semantics of `idiv` with a -1 divisor (specifically for INT_MIN). This is unlikely to have really all that much of a speedup but was something I noticed during #6008 which seemed like it'd be good to clean up. Plus Wasmtime's signal handling was already set up to catch `SIGFPE`, it was just never firing. * Remove the `avoid_div_traps` cranelift setting With no known users currently removing this should be possible and helps simplify the x64 backend. * x64: GC more support for avoid_div_traps Remove the `validate_sdiv_divisor*` pseudo-instructions and clean up some of the ISLE rules now that `div` is allowed to itself trap unconditionally. * x64: Store div trap code in instruction itself * Keep divisors in registers, not in memory Don't accidentally fold multiple traps together * Handle EXC_ARITHMETIC on macos * Update emit tests * Update winch and tests

alexcrichton added 10 commits March 13, 2023 13:36

x64: Add precise-output tests for div traps

dd4c4fc

This adds a suite of `*.clif` files which are intended to test the `avoid_div_traps=true` compilation of the `{s,u}{div,rem}` instructions.

x64: Remove conditional regalloc in Div instruction

1301c21

Move the 8-bit `Div` logic into a dedicated `Div8` instruction to avoid having conditionally-used registers with respect to regalloc.

x64: Migrate non-trapping, udiv/urem to ISLE

80b2cb7

x64: Port checked udiv to ISLE

2f1edf6

x64: Migrate urem entirely to ISLE

9615f4b

x64: Use test instead of cmp to compare-to-zero

b839040

x64: Port sdiv lowering to ISLE

755d574

x64: Port srem lowering to ISLE

ec493ee

Tidy up regalloc behavior and fix tests

f4f6516

Update docs and winch

d00a24a

fitzgen approved these changes Mar 13, 2023

View reviewed changes

Comment thread cranelift/codegen/src/isa/x64/inst/mod.rs Outdated

Comment thread cranelift/codegen/src/isa/x64/inst.isle Outdated

cfallin approved these changes Mar 13, 2023

View reviewed changes

alexcrichton added 4 commits March 13, 2023 15:58

Review comments

4653a51

Reword again

024ca48

More refactoring test fixes

d50e621

More test fixes

f747a23

alexcrichton enabled auto-merge March 14, 2023 01:14

alexcrichton added this pull request to the merge queue Mar 14, 2023

Merged via the queue into bytecodealliance:main with commit 5c1b468 Mar 14, 2023

alexcrichton deleted the x64-div-in-isle branch March 14, 2023 02:27

alexcrichton mentioned this pull request Mar 15, 2023

x64: Take SIGFPE signals for divide traps #6026

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64: Migrate {s,u}{div,rem} to ISLE#6008

x64: Migrate {s,u}{div,rem} to ISLE#6008
alexcrichton merged 14 commits into
bytecodealliance:mainfrom
alexcrichton:x64-div-in-isle

alexcrichton commented Mar 13, 2023

Uh oh!

Uh oh!

Uh oh!

cfallin left a comment

Uh oh!

Uh oh!

Uh oh!

cfallin Mar 13, 2023

Uh oh!

alexcrichton Mar 13, 2023

Uh oh!

cfallin Mar 13, 2023

Uh oh!

alexcrichton Mar 14, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexcrichton commented Mar 13, 2023

Uh oh!

Uh oh!

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cfallin Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

alexcrichton Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

cfallin Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

alexcrichton Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 13, 2023

Subscribe to Label Action

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants