x64: Migrate {s,u}{div,rem} to ISLE#6008
Conversation
This adds a suite of `*.clif` files which are intended to test the
`avoid_div_traps=true` compilation of the `{s,u}{div,rem}` instructions.
Move the 8-bit `Div` logic into a dedicated `Div8` instruction to avoid having conditionally-used registers with respect to regalloc.
cfallin
left a comment
There was a problem hiding this comment.
This looks right to me; of course the main source of truth here is that the Wasm testsuite remains happy, because it hits the corners (div-by-0, INT_MIN / -1) whose trapping behavior we carry over into CLIF semantics.
I've added some thoughts on comment clarifications but nothing major. Otherwise this is very nice and much better-factored than our existing code -- the semantics of "safe divisor" checks that trap or pass through are very clear. Thanks!
| sink.bind_label(do_op); | ||
| } | ||
| // Here, divisor == -1. | ||
| // x % -1 = 0; put the result into the destination, $rax. |
There was a problem hiding this comment.
The rax here is outdated (left over from when we used pinned vregs rather then SSA temps); just "into the destination" is correct I think.
There was a problem hiding this comment.
Oh I forgot to update this, but I'll leave it as rax/rdx since this is still using fixed registers for the idiv instruction it embeds. For 8-bit srem this'll be %rax and for 16-to-64-bits this is %rdx. It's a bit subtle where this opcode always produces the remainder (due to this branch) but only conditionally produces the quotient due to the branch here not defining the quotient. That works for the lowerings today (since only srem uses this which uses the remainder)
Is there a better way to model this with regalloc constraints though? Should I remove the dst_quotient parameter to the instruction and inform regalloc that otherwise rax/rdx are both defined during the instruction?
There was a problem hiding this comment.
Ah, no, the constraints are the right way; the "separate reservation of rax/rdx" was the pre-SSA methodology that carries other issues and that we've moved away from.
I guess it's just a matter of semantics; the shift in mindset has us referring to sources/dests symbolically everywhere except at the constraints, even if here at emit time it is always rax. Happy to keep the comment here though if you'd prefer it to be clearer/more explicit.
There was a problem hiding this comment.
Ok makes sense. I've reworked the comments in this area to hopefully be a bit more precise about what's going on on both halves of this branching and why dst_quotient isn't initialized on one branch.
Subscribe to Label Actioncc @cfallin, @fitzgen, @saulecabrera DetailsThis issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle", "winch"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Prior to this commit Wasmtime would configure `avoid_div_traps=true` unconditionally for Cranelift. This, for the division-based instructions, would change emitted code to explicitly trap on trap conditions instead of letting the `div` x86 instruction trap. There's no specific reason for Wasmtime, however, to specifically avoid traps in the `div` instruction. This means that the extra generated branches on x86 aren't necessary since the `div` and `idiv` instructions already trap for similar conditions as wasm requires. This commit instead disables the `avoid_div_traps` setting for Wasmtime's usage of Cranelift. Subsequently the codegen rules were updated slightly: * When `avoid_div_traps=true`, traps are no longer emitted for `div` instructions. * The `udiv`/`urem` instructions now list their trap as divide-by-zero instead of integer overflow. * The lowering for `sdiv` was updated to still explicitly check for zero but the integer overflow case is deferred to the instruction itself. * The lowering of `srem` no longer checks for zero and the listed trap for the `div` instruction is a divide-by-zero. This means that the codegen for `udiv` and `urem` no longer have any branches. The codegen for `sdiv` removes one branch but keeps the zero-check to differentiate the two kinds of traps. The codegen for `srem` removes one branch but keeps the -1 check since the semantics of `srem` mismatch with the semantics of `idiv` with a -1 divisor (specifically for INT_MIN). This is unlikely to have really all that much of a speedup but was something I noticed during bytecodealliance#6008 which seemed like it'd be good to clean up. Plus Wasmtime's signal handling was already set up to catch `SIGFPE`, it was just never firing.
* x64: Take SIGFPE signals for divide traps Prior to this commit Wasmtime would configure `avoid_div_traps=true` unconditionally for Cranelift. This, for the division-based instructions, would change emitted code to explicitly trap on trap conditions instead of letting the `div` x86 instruction trap. There's no specific reason for Wasmtime, however, to specifically avoid traps in the `div` instruction. This means that the extra generated branches on x86 aren't necessary since the `div` and `idiv` instructions already trap for similar conditions as wasm requires. This commit instead disables the `avoid_div_traps` setting for Wasmtime's usage of Cranelift. Subsequently the codegen rules were updated slightly: * When `avoid_div_traps=true`, traps are no longer emitted for `div` instructions. * The `udiv`/`urem` instructions now list their trap as divide-by-zero instead of integer overflow. * The lowering for `sdiv` was updated to still explicitly check for zero but the integer overflow case is deferred to the instruction itself. * The lowering of `srem` no longer checks for zero and the listed trap for the `div` instruction is a divide-by-zero. This means that the codegen for `udiv` and `urem` no longer have any branches. The codegen for `sdiv` removes one branch but keeps the zero-check to differentiate the two kinds of traps. The codegen for `srem` removes one branch but keeps the -1 check since the semantics of `srem` mismatch with the semantics of `idiv` with a -1 divisor (specifically for INT_MIN). This is unlikely to have really all that much of a speedup but was something I noticed during #6008 which seemed like it'd be good to clean up. Plus Wasmtime's signal handling was already set up to catch `SIGFPE`, it was just never firing. * Remove the `avoid_div_traps` cranelift setting With no known users currently removing this should be possible and helps simplify the x64 backend. * x64: GC more support for avoid_div_traps Remove the `validate_sdiv_divisor*` pseudo-instructions and clean up some of the ISLE rules now that `div` is allowed to itself trap unconditionally. * x64: Store div trap code in instruction itself * Keep divisors in registers, not in memory Don't accidentally fold multiple traps together * Handle EXC_ARITHMETIC on macos * Update emit tests * Update winch and tests
This commit migrates most of the logic of the div/rem CLIF instructions to ISLE for the x64 backend. Previously most of the logic was encapsulated between the
emit_div_rem_seqmethod and theCheckedDivOrRemSeqpseudo-instruction. Nick and I found last week, however, that this poorly interacts with division-by-constants because despite being a constant checks were still emitted to ensure it wasn't a divide-by-zero or similar. To improve this constant case I figured it'd be a good opportunity to move things to ISLE.Unfortunately though these instructions couldn't be moved 100% to ISLE. Signed division, for example, needs to access labels/jumps to check for
INT_MIN / -1. Signed remainder needs to not actually executeidivwhen the divisor is -1 and instead return 0 as well. This means that while the bulk of the logic now lives in ISLE but a chunk still lives inside the pseudo-instructions inemit.rs.Specifically the changes in this PR are:
CheckedDivOrRemSeqpseudo-instruction and its helpers were removed.Divinstruction was split intoDivandDiv8since 8-bit division has different regalloc behavior.ValidateSdivDivisor{,64}instructions were added. The 64-bit one is different because it needs a temporary register as input.CheckedSRem{,8}instructions were added. These assume a check-for-zero has already happened but internally do branching to handle a -1 divisor. The 8-bit version is different due to different regalloc behavior.Overall despite replacing one pseudo-op with 4 I feel that this is a simplification relative to prior. The majority of the logic now resides in ISLE and the isel optimizations around constants are more obvious and easier to understand. Additionally the fiddly-bits about how idiv/div work on x64 are encoded in ISLE rules too.
I still think there's room to improve here in the long run. For example Wasmtime currently sets
avoid_div_traps=truewhich isn't actually necessary. Most checks can be elided here to defer to signal handling to catch issues for x64. I hope to do this in a follow-up PR later on.