Winch: Add splat instructions to x64 using AVX2#10028
Conversation
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "winch"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
|
||
| impl SplatKind { | ||
| /// Get the operand size for `vpbroadcast`. | ||
| pub(super) fn vpbroadcast_operand_size(&self) -> OperandSize { |
There was a problem hiding this comment.
Instead of making this specific to x64, can we instead call this lane_size or something along those lines and move the implementation to masm.rs? I don't think we'd only need this for x64.
| // Results in the first 4 bytes and second 4 bytes being | ||
| // swapped and then the swapped bytes being copied. | ||
| // [d0, d1, d2, d3, d4, d5, d6, d7, ...] yields | ||
| // [d4, d5, d6, d7, d0, d1, d2, d3, d4, d5, d6, d7, d0, d1, d2, d3]. | ||
| let mask = 0b0100_0100; |
There was a problem hiding this comment.
If I'm not wrong, this is the exact same comment/mask value used in load_splat. Could we extract this as a helper?
| /// Takes the int in a `src` operand and replicates it across lanes of | ||
| /// `size` in a destination result. | ||
| fn splat_int(&mut self, context: &mut CodeGenContext<Emission>, size: SplatKind) -> Result<()>; | ||
|
|
||
| /// Takes the `src` operand and replicates it across lanes of `size` in | ||
| /// `dst`. | ||
| fn splat(&mut self, dst: WritableReg, src: RegImm, size: SplatKind) -> Result<()>; | ||
|
|
There was a problem hiding this comment.
Taking a look at the callsites, and trying to think about future backends, I'm tempted to suggest adding a single entry here, splat, defined as
fn splat(&mut self, context: &mut CodeGenContext, kind: SplatKind) {
...
}That's potentially the most flexible approach, and that would potentially make the definition more uniform across backends: as far as I can tell, the only reason why we have a special splat_int is because of the requirements in x64.
For that suggestion to work though, we'd probably need to extend SplatKind to encode the source type, (e.g., SplatKind::I8x16) -- I don't think this changes anything fundamental, since we'd still be able to derive the lane_size.
There was a problem hiding this comment.
The one thing with changing SplatKind to encode an I or F is the v128.load*_splat instructions aren't necessarily operating on integers or floats. It's operating on whatever state was stored in the memory. That said, it should still be fine to have the visitor specify a SplatKind::I32x4 for v128.load32_splat even if it's operating on 4 floats, it'll still have the same result.
There was a problem hiding this comment.
To be explicit, we could rename the current SplatKind to SplatLoadKind and introduce the type-aware-version in SplatKind. That would tighten the implementation and keep the semantics as close as possible to the Wasm semantics.
| match self { | ||
| Imm::I32(n) => n.to_le_bytes().to_vec(), | ||
| Imm::I64(n) => n.to_le_bytes().to_vec(), | ||
| Imm::F32(n) => n.to_le_bytes().to_vec(), | ||
| Imm::F64(n) => n.to_le_bytes().to_vec(), | ||
| Imm::V128(n) => n.to_le_bytes().to_vec(), |
There was a problem hiding this comment.
Could you return a slice here instead? The constant pool will copy the bytes into a Vec<u8>, but returning a slice will at least prevent heap allocations in case this gets used for anything else that's not constant pool related.
There was a problem hiding this comment.
To be more specific, using as_slice() instead of to_vec(), similar to the other uses of add_constant.
There was a problem hiding this comment.
If I use as_slice() and return a &[u8], I get this message from the compiler: cannot return value referencing temporary value. Likely because the array that's created by to_le_bytes() gets dropped when the method returns. The arrays are also of different lengths between the variants. We could maybe use some unsafe code to create a reference to the underlying bytes. Otherwise I'm not sure how to avoid the heap allocation.
There was a problem hiding this comment.
Given that this is not used for anything that the constant pool, I don't think we need to introduce any unsafe code. Perhaps adding a comment stating that this method heap allocates and it's intended to be used mostly with the constant pool?
Part of #8093. Implements the following instructions on x64 with AVX2 support:
i8x16.splati16x8.splati32x4.splati64x2.splatf32x4.splatf64x2.splat