x64: Peephole optimization for x < 0#4625
Conversation
cfallin
left a comment
There was a problem hiding this comment.
Looks good! Two followup thoughts:
- I don't think the Wasm frontend uses
B1at all, even for temporaries, though I could be wrong. If not, could we do the same for whatever ops it generates for the Wasm compare opcodes that produce ani320/1 flag? - I think we can get the opposite case (
x >= 0) as well with a negation; that might be worth doing?
|
(If you want to merge as-is that's fine too; just writing down some ideas that came to mind!) |
| ; movq %rsp, %rbp | ||
| ; block0: | ||
| ; shrl $31, %edi, %edi | ||
| ; sarl $31, %edi, %edi |
There was a problem hiding this comment.
Hmm, why it uses signed right shift now? It should be an unsigned shift. Or -1 / 0 is equivalent to true / false. Sorry for the possibly simple questions. I'm not familiar with the generally accepted rules of codegen specifically for cranelift
There was a problem hiding this comment.
This was with an eye towards better supporting our boolean representations, as mentioned in #3205, as sar will do sign extension.
There was a problem hiding this comment.
Given that it's not clear to me that we can use the wider boolean types with icmp, I've switched back to shr with a type guard on the result. Sorry about the back and forth here!
| ; sarl $31, %edi, %edi | ||
| ; notl %edi, %edi |
There was a problem hiding this comment.
Is it correct? I guess it should be:
notl %edi, %edi
shrl $31, %edi, %edihttps://godbolt.org/z/s1fccc5Ex
However if codegen uses -1 / 0 instead 1 / 0 for true / false. It should be ok.
Also I'm wondering how ordering not / shr may affect on micro/macro/instruction fusion on x64
There was a problem hiding this comment.
I've inverted it and restricted this optimization to boolean-producing instructions. Also this shouldn't affect the test/jmp fusion, as this optimization will only fire for icmp instructions whose results are used directly, not as the input to a branch decision. Was there another fusion case you were thinking this might invalidate?
| ;; Peephole optimization for `0 > x`, when x is a signed 64 bit value | ||
| (rule (lower (icmp (IntCC.SignedGreaterThan) (u64_from_iconst 0) x @ (value_type $I64))) | ||
| (x64_sar $I64 x (Imm8Reg.Imm8 63))) |
There was a problem hiding this comment.
Doesn't ISLE and cranelift ir always canonicalize constants on the right?
There was a problem hiding this comment.
Not currently, hence the symmetric cases for this optimization. This is something we might look into after the mid-end optimization work has landed.
Fixes #4607
Add a peephole optimization for
icmp slt x, (iconst 0)for 32 and 64 bit values, turning them into a right shift by 31 or 63 bits respectively. This optimization only applies to the case where the result of theicmpis used directly, cases likebrz (icmp ..)will be left alone.