x64: Lower fcvt_to_{u,s}int{,_sat} in ISLE#4704
Conversation
c12866f to
1f60df4
Compare
1f60df4 to
e462c1d
Compare
|
|
||
| ;; Converting to unsigned int so if float src is negative or NaN | ||
| ;; will first set to zero. | ||
| (tmp2 Xmm (x64_pxor src src)) ;; TODO: unnecessary dependency on src |
There was a problem hiding this comment.
I was having trouble figuring out how to make a zero in a register, and would like to avoid the unnecessary dependency on the input. I tried allocating a temp and using it as input to pxor, but the following snippet caused a panic in regalloc2:
(tmp2 WritableXmm (temp_writable_xmm))
(tmp2 Xmm (x64_pxor tmp2, tmp2))
It seems that using a temp register without first having assigned to it is not expected. Is there an easy way to make a zero that I'm missing?
There was a problem hiding this comment.
The pxor instruction takes both args as sources, so expects them to be defined; so regalloc2 is correctly panic'ing here that there is a use of an undefined value.
The issue is that we haven't special-cased the "xor of any value with itself gives zero" semantics of the instruction: RA2 doesn't know (can't know) that this particular instruction is invariant to its input, so it's fine to use an undefined value.
I think we do special-case this for at least one other xor variant (XmmUninitializedConst enum arm, IIRC?) -- we'd want to do the same for pxor here.
There was a problem hiding this comment.
I rebased on #4709 to pull in the changes to produces_const, and it removed some unnecessary movs as a side-benefit :D
There was a problem hiding this comment.
I've also rebased this on #4714, as it turns out that treating the cmpps with the same source registers as a constant is not correct. I couldn't reliably add a movdqa instruction in the translation to ISLE, which meant that we couldn't control the argument to the cmpps instruction. As a result, we would get spurious errors, as it would compare random values against themselves, and sometimes those random values would be NaN.
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
e462c1d to
d22f51c
Compare
| (dst Xmm (x64_cvttps2dq $F32X4 dst)) | ||
|
|
||
| ;; TODO: regalloc2 introduces a useless move here, is that | ||
| ;; acceptable? |
There was a problem hiding this comment.
Is this TODO still active?
In general it's good to understand why spurious moves occur, but in some cases rearranging the ops causes things to shift somewhat and it happens; especially when the old handwritten code was making multiple defs on one temp which forced values into the same location (a sort of accidental constraint). If we get one extra move in a long conversion sequence it's not the end of the world, IMHO.
There was a problem hiding this comment.
If it's not bad to include that move there then I'll remove the TODO 👍
98c359c to
b7c1225
Compare
b7c1225 to
f451260
Compare
Migrate the lowering for the four instructions fcvt_to_{u,s}int{,_sat} to ISLE.
I realized while porting this lowering that we currently don't support the unsaturating versions of both instructions for vector types, and that we currently only support vector conversions from
F32X4toI32X4. I didn't want to tackle implementing those here, and have preserved the current behavior instead. That bug is tracked in #4693.