Cranelift/s390x: do not use one-way conditional branches.#10087
Conversation
This is a followup to bytecodealliance#10086, this time removing the one-armed branch variant for s390x. This branch was only used as the default-target branch in the `br_table` lowering. This PR incorporates the branch into the `JTSequence` pseudo-instruction. Some care is needed to keep the `ProducesBool` abstraction; it is unwrapped into its `ProducesFlags` and the `JTSequence` becomes a `ConsumesFlags`, so the compare for the jump-table bound (for default target) is not part of the pseudoinst. (This is OK because regalloc-inserted moves never alter flags, by explicit contract; the same reason allows cmp/branch terminators.) Along the way I noticed that the comments on `JTSequence` claimed that `targets` included the default, but this is (no longer?) the case, as the targets are unwrapped by `jump_table_targets` which peels off the first (default) separately. Aside from comments, this only affected pretty-printing; codegen was correct. With this, we have no more one-armed branches; hence, this fixes bytecodealliance#9980.
6225ef6 to
c214eea
Compare
|
cc @uweigand if you are available to review? Otherwise we're probably OK with the usual reviewer rotation... |
uweigand
left a comment
There was a problem hiding this comment.
Looks generally good to me, but see inline for a couple of comments.
As a longer-term improvement, I'm wondering if the whole default case handling couldn't just be done by common code using regular basic blocks and branch instructions, enabling all the normal optimization machinery? The back-end would then only have to do the actual tablejump (simply assuming that no overflow can happen).
| ; sllg %r4, %r3, 2 | ||
| ; clgfi %r3, 3 | ||
| ; jghe label4 | ||
| ; sllg %r3, %r3, 2 |
There was a problem hiding this comment.
The generated code is slightly worse now as the shifting is done before the branch. This probably doesn't have a very significant impact on the pipeline, but it does increase register pressure. One alternative would be to compare the shifted value instead (assuming there are no overflow considerations).
But I guess this isn't serious enough that it should prevent merging this.
There was a problem hiding this comment.
Hmm, that's true -- I did the code motion without looking at the instruction semantics in detail to optimize. We can optimize in a followup if needed I suppose.
9a932fe to
bf8a9b6
Compare
|
Could I bother someone for a rubber-stamp here (@uweigand's not an official approver) -- @alexcrichton or @fitzgen maybe? (Or @abrown as original reviewer?) Thanks! |
This is a followup to #10086, this time removing the one-armed branch variant for s390x. This branch was only used as the default-target branch in the
br_tablelowering. This PR incorporates the branch into theJTSequencepseudo-instruction. Some care is needed to keep theProducesBoolabstraction; it is unwrapped into itsProducesFlagsand theJTSequencebecomes aConsumesFlags, so the compare for the jump-table bound (for default target) is not part of the pseudoinst. (This is OK because regalloc-inserted moves never alter flags, by explicit contract; the same reason allows cmp/branch terminators.)Along the way I noticed that the comments on
JTSequenceclaimed thattargetsincluded the default, but this is (no longer?) the case, as the targets are unwrapped byjump_table_targetswhich peels off the first (default) separately. Aside from comments, this only affected pretty-printing; codegen was correct.With this, we have no more one-armed branches; hence, this fixes #9980.