x64: Improve memory support in {insert,extract}lane#5982
Conversation
This commit improves adds support to Cranelift to emit `pextr{b,w,d,q}`
with a memory destination, merging a store-of-extract operation into one
instruction. Additionally AVX support is added for the `pextr*`
instructions.
I've additionally tried to ensure that codegen tests and runtests exist
for all forms of these instructions too.
|
This is a generalization of #5924 for more vector types and more lanes. |
abrown
left a comment
There was a problem hiding this comment.
LGTM but see a couple comments below.
| (side_effect | ||
| (x64_movsd_store (to_amode flags address offset) value))) | ||
| (rule 2 (lower (store flags | ||
| (has_type $I8 (extractlane value (u8_from_uimm8 n))) |
There was a problem hiding this comment.
I checked and u8_from_uimm8 does not guarantee that n is in the right range (e.g., 0 to 15 here). Did you see any validation that would prevent wrong n here at the CLIF level or do we rely exclusively on the Wasm-level construction? If there is nothing and it ends up being too tricky to do here in ISLE, maybe emit.rs could gain some assert!s or something like that.
There was a problem hiding this comment.
I think this is the block in the verifier which validates that lanes are always inbounds, so I think we should be good on that front. I can add some extra asserts though to the backend too.
| ; movq %rsp, %rbp | ||
| ; block0: | ||
| ; vmovsd 0(%rdi), %xmm3 | ||
| ; vmovsd %xmm0, %xmm3, %xmm0 |
There was a problem hiding this comment.
Looking at the other cases, I started to worry that one of these instructions actually zeroes bits we do not want to zero. I think it is this case; from the MOVSD documentation:
Legacy version: When the source and destination operands are XMM registers, bits MAXVL:64 of the destination operand remains unchanged. When the source operand is a memory location and destination operand is an XMM registers, the quadword at bits 127:64 of the destination operand is cleared to all 0s, bits MAXVL:128 of the destination operand remains unchanged.
VEX and EVEX encoded register-register syntax: Moves a scalar double precision floating-point value from the second source operand (the third operand) to the low quadword element of the destination operand (the first operand). Bits 127:64 of the destination operand are copied from the first source operand (the second operand). Bits (MAXVL-1:128) of the corresponding destination register are zeroed
So it is the legacy SSE case where a movsd 0(%rdi) .... would zero too many bits but actually in the AVX code we could merge these two vmovsd into one. Is there a way to do that?
There was a problem hiding this comment.
Going by this which I realize isn't official but has been what I've been using so far, my read is that VEX.128.F2.0F 11 /r: VMOVSD xmm1, xmm2, xmm3 has different semantics than VEX.128.F2.0F 10 /r: VMOVSD xmm1, m64 so I think that AVX and SSE match here where if you use the register-to-register form it preserves the upper bits but if you use the memory-to-register form it always zeros the upper bits, so I don't think we can fuse?
I'll admit though that this is all real subtle and if the official docs are different I wouldn't be too surprised.
| ; vmovsd 0(%rdi), %xmm3 | ||
| ; vmovsd %xmm0, %xmm3, %xmm0 | ||
| ; movsd 0(%rdi), %xmm3 | ||
| ; movsd %xmm0, %xmm3, %xmm0 |
|
I should also note that |
This commit improves adds support to Cranelift to emit
pextr{b,w,d,q}with a memory destination, merging a store-of-extract operation into one instruction. Additionally AVX support is added for thepextr*instructions.I've additionally tried to ensure that codegen tests and runtests exist for all forms of these instructions too.