Add opt patterns for 3-way comparison (Ord::cmp or <=>)#7636
Conversation
| ;; x == y ? 0 : x < y ? -1 : +1 | ||
| (rule (simplify (select ty (eq rty x y) | ||
| (iconst ty (u64_from_imm64 0)) | ||
| (select ty (ult rty x y) | ||
| (iconst ty neg_1) | ||
| (iconst ty (u64_from_imm64 1))))) | ||
| (if-let -1 (i64_sextend_imm64 ty neg_1)) | ||
| (sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y)))) |
There was a problem hiding this comment.
For example, this is the select pair that Clang 17 appears to use today: https://cpp.godbolt.org/z/nx16oGrPj
%cmp.lt = icmp ult i32 %a, %b
%sel.lt = select i1 %cmp.lt, i8 -1, i8 1
%cmp.eq = icmp eq i32 %a, %b
%sel.eq = select i1 %cmp.eq, i8 0, i8 %sel.ltfor which it emits the surprisingly-long assembly
xor ecx, ecx
cmp edi, esi
mov eax, 0
sbb eax, eax
or al, 1
cmp edi, esi
movzx eax, al
cmove eax, ecxwhereas with this PR, cranelift can do just
function %spaceship_u32(i32, i32) -> i8 {
block0(v0: i32, v1: i32):
v2 = icmp ugt v0, v1
v3 = icmp ult v0, v1
v4 = isub v2, v3
return v4
}
; VCode:
; pushq %rbp
; movq %rsp, %rbp
; block0:
; cmpl %esi, %edi
; setnbe %al
; cmpl %esi, %edi
; setb %r10b
; subl %eax, %r10d, %eax
; movq %rbp, %rsp
; popq %rbp
; ret
(That does have an extra cmp, but there's enough ISLE in this already that I'll leave fixing that for a future PR.)
| ;; Clang 17 use different sequences -- so we just match all of them. | ||
|
|
||
| ;; x < y ? -1 : x == y ? 0 : +1 | ||
| ;; x < y ? -1 : x != y ? +1 : 0 |
There was a problem hiding this comment.
This is (after LLVM optimizations) the one that Rust started using in rust-lang/rust#63758 : https://rust.godbolt.org/z/rrrvx8xce
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
alexcrichton
left a comment
There was a problem hiding this comment.
This looks great to me, thanks for adding these!
I've left a few comments, although they're mostly just cosmetic. The one about (extractor) might be interesting to explore if you're curious but don't take it as a requirement for this PR.
| (rule (simplify (eq _ (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0)))) | ||
| (eq ty x (iconst ty (imm64 0)))) | ||
| (rule (simplify (ne _ (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0)))) | ||
| (ne ty x (iconst ty (imm64 0)))) | ||
| (rule (simplify (icmp _ cc (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0)))) | ||
| (if (signed_cond_code cc)) | ||
| (icmp ty cc x (iconst ty (imm64 0)))) |
There was a problem hiding this comment.
Doesn't need to be included in this PR specifically, but to make sure I understand, these can all be duplicated for uextend with the exception of changing signed_cond_code, right?
There was a problem hiding this comment.
There's definitely a more interesting pattern in here trying to get out.
I started down a path of "I think this actually works for any fits-in-i8 constant", but then I think it's more like the "known bits" pattern that uextend has above, since i8 is just the simple version of that, and I started adding new things to the prelude and such to try to match them well, then thought better of it, reverted all that, and just did the basic "compare with zero" stuff like the "zero-extended integers aren't zero" pattern just above it.
So yeah, I think there's far more possible here, but I didn't want to go there this PR.
| (rule (simplify (select ty (ult rty x y) | ||
| (iconst ty neg_1) | ||
| (ne rty x y))) | ||
| (if-let -1 (i64_sextend_imm64 ty neg_1)) | ||
| (sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y)))) | ||
| (rule (simplify (select ty (ult rty x y) | ||
| (iconst ty neg_1) | ||
| (uextend ty (ne rty x y)))) | ||
| (if-let -1 (i64_sextend_imm64 ty neg_1)) | ||
| (sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y)))) |
There was a problem hiding this comment.
In the x64 backend there's a maybe_uextend extractor which you might be able to use here to deduplicate these rules.
Additionally the comment above includes:
;; x < y ? -1 : x == y ? 0 : +1
but these rules don't seem to match that, is that a mistake?
There was a problem hiding this comment.
It does and it doesn't depending on your perspective.
That case is picked up by that pattern, as you can see in the cmp_u1a test https://github.com/bytecodealliance/wasmtime/pull/7636/files#diff-f7401bd506e4a5000af4eed4a8d0b9a4138c423291933d35ec683ff6a88f63f8R208
But it happens via this select (icmp ...) 0 1 simplification pattern: https://github.com/bytecodealliance/wasmtime/pull/7615/files#diff-ec130b69b5b51fce365916a438a4e1178165f56a6d4f0a3ec89d6bbafec0b454R12-R16
And thus the pattern here sees it as x < y ? -1 : x != y.
| ;; also matching things like `(a <=> b) < 1` or `(a <=> b) <= -1`. | ||
|
|
||
| ;; (a <=> b) == 0 --> a == b | ||
| (rule (simplify (eq ty (isub ty (sgt rty x y) (slt rty x y)) (iconst ty (u64_from_imm64 0)))) |
There was a problem hiding this comment.
One possible way you could simplify this is something like:
(decl spaceship_s (Value Value) Value)
(extractor (spaceship_s v) (isub _ (sgt rty x y) (slt rty x y)))
I forget the precise syntax for (extractor ...) but I think it can be used along the lines of a macro-of-sorts to reduce the repetition here. That being said I've found them historically a bit cumbersome to use and I'm not sure you can pattern-match out the rty still as well (and maybe ty too?) so this may not amount to much after all.
There was a problem hiding this comment.
I'll give this (and the maybe_uextend from the other thread) a try. There's so much copy-pasting here -- and corresponding opportunities to forget to update the ugts to sgts and vice versa -- that even if it's a bit cumbersome it may well be worth it.
|
Adding the extractors and constructors worked great! Huge improvement, thanks.
|
alexcrichton
left a comment
There was a problem hiding this comment.
Nice I agree it worked out well! Thanks again!
|
Failure looks unrelated to this PR, I'll debug that separately. |
This commit is an attempt to address the CI failure popping up in bytecodealliance#7636. I can reproduce the failure locally and it appears to be due to the fact that macOS is giving spawned threads via `pthread_create` a 512K stack by default. Cranelift when compiled in debug mode is taking more stack than this (but working in release mode). I was talking with Trevor and Jamey about possible ways to reduce the stack size here but for now I'm going with Jamey's suggestion of increasing the stack size as the best course of action for now.
|
Ok once #7651 lands I'll retry landing this |
This commit is an attempt to address the CI failure popping up in #7636. I can reproduce the failure locally and it appears to be due to the fact that macOS is giving spawned threads via `pthread_create` a 512K stack by default. Cranelift when compiled in debug mode is taking more stack than this (but working in release mode). I was talking with Trevor and Jamey about possible ways to reduce the stack size here but for now I'm going with Jamey's suggestion of increasing the stack size as the best course of action for now.
Previous zulip conversation: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/3-way.20compare.20.28Rust.20.60cmp.60.2C.20C.2B.2B20.20.60.3C.3D.3E.60.29/near/404519375
This PR adds a bunch of ISLE opt patterns for three-way comparison on primitives, as exposed by
Ord::cmpin Rust or as<=>"spaceship" in C++20. It's split in two parts:selects that are actually doing this, of which there are sadly many possible forms and no well-known-best way. By doing this cranelift can emit a much simpler MachInst pattern for it than if it needs to use cmovs and constants..sort_by(|a, b| b.cmp(a))in Rust to reverse-sort something, it immediately just checks forOrdering::Less, and if you have#[derive(PartialOrd)] struct Foo(u32);then the generated<operator is doing the equivalent of(a <=> b) < 0, which this will improve toa < b.