Skip to content

x64: Migrate {s,u}{div,rem} to ISLE#6008

Merged
alexcrichton merged 14 commits into
bytecodealliance:mainfrom
alexcrichton:x64-div-in-isle
Mar 14, 2023
Merged

x64: Migrate {s,u}{div,rem} to ISLE#6008
alexcrichton merged 14 commits into
bytecodealliance:mainfrom
alexcrichton:x64-div-in-isle

Conversation

@alexcrichton
Copy link
Copy Markdown
Member

This commit migrates most of the logic of the div/rem CLIF instructions to ISLE for the x64 backend. Previously most of the logic was encapsulated between the emit_div_rem_seq method and the CheckedDivOrRemSeq pseudo-instruction. Nick and I found last week, however, that this poorly interacts with division-by-constants because despite being a constant checks were still emitted to ensure it wasn't a divide-by-zero or similar. To improve this constant case I figured it'd be a good opportunity to move things to ISLE.

Unfortunately though these instructions couldn't be moved 100% to ISLE. Signed division, for example, needs to access labels/jumps to check for INT_MIN / -1. Signed remainder needs to not actually execute idiv when the divisor is -1 and instead return 0 as well. This means that while the bulk of the logic now lives in ISLE but a chunk still lives inside the pseudo-instructions in emit.rs.

Specifically the changes in this PR are:

  • The CheckedDivOrRemSeq pseudo-instruction and its helpers were removed.
  • The Div instruction was split into Div and Div8 since 8-bit division has different regalloc behavior.
  • New ValidateSdivDivisor{,64} instructions were added. The 64-bit one is different because it needs a temporary register as input.
  • New CheckedSRem{,8} instructions were added. These assume a check-for-zero has already happened but internally do branching to handle a -1 divisor. The 8-bit version is different due to different regalloc behavior.
  • The ISLE for x64 now special-cases div/rem by constants to elide checks if it can.

Overall despite replacing one pseudo-op with 4 I feel that this is a simplification relative to prior. The majority of the logic now resides in ISLE and the isel optimizations around constants are more obvious and easier to understand. Additionally the fiddly-bits about how idiv/div work on x64 are encoded in ISLE rules too.

I still think there's room to improve here in the long run. For example Wasmtime currently sets avoid_div_traps=true which isn't actually necessary. Most checks can be elided here to defer to signal handling to catch issues for x64. I hope to do this in a follow-up PR later on.

Comment thread cranelift/codegen/src/isa/x64/inst/mod.rs Outdated
Comment thread cranelift/codegen/src/isa/x64/inst.isle Outdated
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right to me; of course the main source of truth here is that the Wasm testsuite remains happy, because it hits the corners (div-by-0, INT_MIN / -1) whose trapping behavior we carry over into CLIF semantics.

I've added some thoughts on comment clarifications but nothing major. Otherwise this is very nice and much better-factored than our existing code -- the semantics of "safe divisor" checks that trap or pass through are very clear. Thanks!

Comment thread cranelift/codegen/src/isa/x64/inst.isle Outdated
Comment thread cranelift/codegen/src/isa/x64/inst.isle Outdated
sink.bind_label(do_op);
}
// Here, divisor == -1.
// x % -1 = 0; put the result into the destination, $rax.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rax here is outdated (left over from when we used pinned vregs rather then SSA temps); just "into the destination" is correct I think.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I forgot to update this, but I'll leave it as rax/rdx since this is still using fixed registers for the idiv instruction it embeds. For 8-bit srem this'll be %rax and for 16-to-64-bits this is %rdx. It's a bit subtle where this opcode always produces the remainder (due to this branch) but only conditionally produces the quotient due to the branch here not defining the quotient. That works for the lowerings today (since only srem uses this which uses the remainder)

Is there a better way to model this with regalloc constraints though? Should I remove the dst_quotient parameter to the instruction and inform regalloc that otherwise rax/rdx are both defined during the instruction?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, the constraints are the right way; the "separate reservation of rax/rdx" was the pre-SSA methodology that carries other issues and that we've moved away from.

I guess it's just a matter of semantics; the shift in mindset has us referring to sources/dests symbolically everywhere except at the constraints, even if here at emit time it is always rax. Happy to keep the comment here though if you'd prefer it to be clearer/more explicit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok makes sense. I've reworked the comments in this area to hopefully be a bit more precise about what's going on on both halves of this branching and why dst_quotient isn't initialized on one branch.

Comment thread cranelift/codegen/src/isa/x64/inst/emit_tests.rs
Comment thread cranelift/codegen/src/isa/x64/lower.isle Outdated
Comment thread cranelift/codegen/src/isa/x64/lower.isle
Comment thread cranelift/codegen/src/isa/x64/inst.isle Outdated
@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen isle Related to the ISLE domain-specific language winch Winch issues or pull requests labels Mar 13, 2023
@github-actions
Copy link
Copy Markdown

Subscribe to Label Action

cc @cfallin, @fitzgen, @saulecabrera

Details This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle", "winch"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle
  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton alexcrichton enabled auto-merge March 14, 2023 01:14
@alexcrichton alexcrichton added this pull request to the merge queue Mar 14, 2023
Merged via the queue into bytecodealliance:main with commit 5c1b468 Mar 14, 2023
@alexcrichton alexcrichton deleted the x64-div-in-isle branch March 14, 2023 02:27
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Mar 15, 2023
Prior to this commit Wasmtime would configure `avoid_div_traps=true`
unconditionally for Cranelift. This, for the division-based
instructions, would change emitted code to explicitly trap on trap
conditions instead of letting the `div` x86 instruction trap.

There's no specific reason for Wasmtime, however, to specifically avoid
traps in the `div` instruction. This means that the extra generated
branches on x86 aren't necessary since the `div` and `idiv` instructions
already trap for similar conditions as wasm requires.

This commit instead disables the `avoid_div_traps` setting for
Wasmtime's usage of Cranelift. Subsequently the codegen rules were
updated slightly:

* When `avoid_div_traps=true`, traps are no longer emitted for `div`
  instructions.
* The `udiv`/`urem` instructions now list their trap as divide-by-zero
  instead of integer overflow.
* The lowering for `sdiv` was updated to still explicitly check for zero
  but the integer overflow case is deferred to the instruction itself.
* The lowering of `srem` no longer checks for zero and the listed trap
  for the `div` instruction is a divide-by-zero.

This means that the codegen for `udiv` and `urem` no longer have any
branches. The codegen for `sdiv` removes one branch but keeps the
zero-check to differentiate the two kinds of traps. The codegen for
`srem` removes one branch but keeps the -1 check since the semantics of
`srem` mismatch with the semantics of `idiv` with a -1 divisor
(specifically for INT_MIN).

This is unlikely to have really all that much of a speedup but was
something I noticed during bytecodealliance#6008 which seemed like it'd be good to clean
up. Plus Wasmtime's signal handling was already set up to catch
`SIGFPE`, it was just never firing.
alexcrichton added a commit that referenced this pull request Mar 16, 2023
* x64: Take SIGFPE signals for divide traps

Prior to this commit Wasmtime would configure `avoid_div_traps=true`
unconditionally for Cranelift. This, for the division-based
instructions, would change emitted code to explicitly trap on trap
conditions instead of letting the `div` x86 instruction trap.

There's no specific reason for Wasmtime, however, to specifically avoid
traps in the `div` instruction. This means that the extra generated
branches on x86 aren't necessary since the `div` and `idiv` instructions
already trap for similar conditions as wasm requires.

This commit instead disables the `avoid_div_traps` setting for
Wasmtime's usage of Cranelift. Subsequently the codegen rules were
updated slightly:

* When `avoid_div_traps=true`, traps are no longer emitted for `div`
  instructions.
* The `udiv`/`urem` instructions now list their trap as divide-by-zero
  instead of integer overflow.
* The lowering for `sdiv` was updated to still explicitly check for zero
  but the integer overflow case is deferred to the instruction itself.
* The lowering of `srem` no longer checks for zero and the listed trap
  for the `div` instruction is a divide-by-zero.

This means that the codegen for `udiv` and `urem` no longer have any
branches. The codegen for `sdiv` removes one branch but keeps the
zero-check to differentiate the two kinds of traps. The codegen for
`srem` removes one branch but keeps the -1 check since the semantics of
`srem` mismatch with the semantics of `idiv` with a -1 divisor
(specifically for INT_MIN).

This is unlikely to have really all that much of a speedup but was
something I noticed during #6008 which seemed like it'd be good to clean
up. Plus Wasmtime's signal handling was already set up to catch
`SIGFPE`, it was just never firing.

* Remove the `avoid_div_traps` cranelift setting

With no known users currently removing this should be possible and helps
simplify the x64 backend.

* x64: GC more support for avoid_div_traps

Remove the `validate_sdiv_divisor*` pseudo-instructions and clean up
some of the ISLE rules now that `div` is allowed to itself trap
unconditionally.

* x64: Store div trap code in instruction itself

* Keep divisors in registers, not in memory

Don't accidentally fold multiple traps together

* Handle EXC_ARITHMETIC on macos

* Update emit tests

* Update winch and tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language winch Winch issues or pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants