Skip to content

x64: Peephole optimization for x < 0#4625

Merged
elliottt merged 4 commits into
bytecodealliance:mainfrom
elliottt:trevor/x64-icmp-bool-opt
Aug 9, 2022
Merged

x64: Peephole optimization for x < 0#4625
elliottt merged 4 commits into
bytecodealliance:mainfrom
elliottt:trevor/x64-icmp-bool-opt

Conversation

@elliottt
Copy link
Copy Markdown
Member

@elliottt elliottt commented Aug 5, 2022

Fixes #4607

Add a peephole optimization for icmp slt x, (iconst 0) for 32 and 64 bit values, turning them into a right shift by 31 or 63 bits respectively. This optimization only applies to the case where the result of the icmp is used directly, cases like brz (icmp ..) will be left alone.

Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Two followup thoughts:

  1. I don't think the Wasm frontend uses B1 at all, even for temporaries, though I could be wrong. If not, could we do the same for whatever ops it generates for the Wasm compare opcodes that produce an i32 0/1 flag?
  2. I think we can get the opposite case (x >= 0) as well with a negation; that might be worth doing?

@cfallin
Copy link
Copy Markdown
Member

cfallin commented Aug 5, 2022

(If you want to merge as-is that's fine too; just writing down some ideas that came to mind!)

@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Aug 5, 2022
; movq %rsp, %rbp
; block0:
; shrl $31, %edi, %edi
; sarl $31, %edi, %edi
Copy link
Copy Markdown
Contributor

@MaxGraey MaxGraey Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why it uses signed right shift now? It should be an unsigned shift. Or -1 / 0 is equivalent to true / false. Sorry for the possibly simple questions. I'm not familiar with the generally accepted rules of codegen specifically for cranelift

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was with an eye towards better supporting our boolean representations, as mentioned in #3205, as sar will do sign extension.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's not clear to me that we can use the wider boolean types with icmp, I've switched back to shr with a type guard on the result. Sorry about the back and forth here!

Comment on lines +167 to +168
; sarl $31, %edi, %edi
; notl %edi, %edi
Copy link
Copy Markdown
Contributor

@MaxGraey MaxGraey Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct? I guess it should be:

notl    %edi, %edi
shrl    $31, %edi, %edi

https://godbolt.org/z/s1fccc5Ex

However if codegen uses -1 / 0 instead 1 / 0 for true / false. It should be ok.

Also I'm wondering how ordering not / shr may affect on micro/macro/instruction fusion on x64

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inverted it and restricted this optimization to boolean-producing instructions. Also this shouldn't affect the test/jmp fusion, as this optimization will only fire for icmp instructions whose results are used directly, not as the input to a branch decision. Was there another fusion case you were thinking this might invalidate?

Comment on lines +1508 to +1510
;; Peephole optimization for `0 > x`, when x is a signed 64 bit value
(rule (lower (icmp (IntCC.SignedGreaterThan) (u64_from_iconst 0) x @ (value_type $I64)))
(x64_sar $I64 x (Imm8Reg.Imm8 63)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't ISLE and cranelift ir always canonicalize constants on the right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, hence the symmetric cases for this optimization. This is something we might look into after the mid-end optimization work has landed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

@elliottt elliottt marked this pull request as ready for review August 8, 2022 19:48
@elliottt elliottt requested a review from cfallin August 8, 2022 19:49
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@elliottt elliottt merged commit ed7dfd3 into bytecodealliance:main Aug 9, 2022
@elliottt elliottt deleted the trevor/x64-icmp-bool-opt branch August 9, 2022 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cranelift: Add x < 0 -> x >>> bit_size(ty) - 1 lowering rule

3 participants