Winch: v128 logical ops for x64#10109
Conversation
3b7585c to
d11724d
Compare
|
Moving my review over to @saulecabrera who knows this better than I |
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "winch"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
|
||
| pub fn emit_load_lane(&mut self, arg: &MemArg, lane: u8, size: OperandSize) -> Result<()> { | ||
| let dst = self.context.pop_to_reg(self.masm, None)?; | ||
| if let Some(addr) = self.emit_compute_heap_address(&arg, size)? { | ||
| let src = self.masm.address_at_reg(addr, 0)?; | ||
| self.masm.load_lane(writable!(dst.reg), src, lane, size)?; | ||
| self.context.stack.push(dst.into()); | ||
| self.context.free_reg(addr); | ||
| } | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| pub fn emit_store_lane(&mut self, arg: &MemArg, lane: u8, size: OperandSize) -> Result<()> { | ||
| let src = self.context.pop_to_reg(self.masm, None)?; | ||
| if let Some(addr) = self.emit_compute_heap_address(&arg, size)? { | ||
| let dst = self.masm.address_at_reg(addr, 0)?; | ||
| self.masm.store_lane(src.reg, dst, lane, size)?; | ||
| self.context.free_reg(addr); | ||
| self.context.free_reg(src); | ||
| } | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Instead of introducing new load/stored methods, I'd suggest calling wasm_load and wasm_store. This approach would probably require modifying the existing LoadKind definition in case the current definition can't represent the semantics of lane loading/storing. That's the approach we've followed for other vector loads (e.g., load_splat)
1 main reason:
- Wasm loads/stores are critical, from the sandboxing perspective, and we're trying to keep the implementation as tight as possible, reusing the existing methods makes auditing them and maintaining them much easier.
| fn not128v(&mut self, dst: WritableReg) -> Result<()>; | ||
|
|
||
| /// Perform a logical `and` operation on `src1` and `src1`, both 128bits vector values, writing | ||
| /// the result to `dst`. | ||
| fn and128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>; | ||
|
|
||
| /// Perform a logical `and_not` operation on `src1` and `src1`, both 128bits vector values, writing | ||
| /// the result to `dst`. | ||
| /// | ||
| /// `and_not` is not commutative: dst = !src1 & src2. | ||
| fn and_not128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>; | ||
|
|
||
| /// Perform a logical `or` operation on `src1` and `src1`, both 128bits vector values, writing | ||
| /// the result to `dst`. | ||
| fn or128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>; | ||
|
|
||
| /// Perform a logical `xor` operation on `src1` and `src1`, both 128bits vector values, writing | ||
| /// the result to `dst`. | ||
| fn xor128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()>; | ||
|
|
||
| /// Given two 128bits vectors `src1` and `src2`, and a 128bits bitmask `mask`, selects bits | ||
| /// from `src1` when mask is 1, and from `src2` when mask is 0. | ||
| /// | ||
| /// This is equivalent to: `v128.or(v128.and(src1, mask), v128.and(src2, v128.not(mask)))`. | ||
| fn bitselect128v(&mut self, src1: Reg, src2: Reg, mask: Reg, dst: WritableReg) -> Result<()>; | ||
|
|
||
| /// If any bit in `src` is 1, set `dst` to 1, or 0 otherwise. | ||
| fn any_true128v(&mut self, src: Reg, dst: WritableReg) -> Result<()>; |
There was a problem hiding this comment.
A small remark on naming: to be consistent with the rest of the naming in the MacroAssembler for type-specific instructions, perhaps we could consider prefixing all these methods with v128_? e.g., v128_or, similar to float operations, e.g., float_abs
| fn not128v(&mut self, dst: WritableReg) -> Result<()> { | ||
| let tmp = regs::scratch_xmm(); | ||
| // First, we initialize `tmp` with all ones, by comparing it with itself. | ||
| self.asm | ||
| .xmm_rmi_rvex(AvxOpcode::Vpcmpeqd, tmp, tmp, writable!(tmp)); | ||
| // then we `xor` tmp and `dst` together, yielding `!dst`. | ||
| self.asm | ||
| .xmm_rmi_rvex(AvxOpcode::Vpxor, tmp, dst.to_reg(), dst); | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn and128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> { | ||
| self.asm.xmm_rmi_rvex(AvxOpcode::Vpand, src1, src2, dst); | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn and_not128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> { | ||
| self.asm.xmm_rmi_rvex(AvxOpcode::Vpandn, src1, src2, dst); | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn or128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> { | ||
| self.asm.xmm_rmi_rvex(AvxOpcode::Vpor, src1, src2, dst); | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn xor128v(&mut self, src1: Reg, src2: Reg, dst: WritableReg) -> Result<()> { | ||
| self.asm.xmm_rmi_rvex(AvxOpcode::Vpxor, src1, src2, dst); | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn bitselect128v(&mut self, src1: Reg, src2: Reg, mask: Reg, dst: WritableReg) -> Result<()> { | ||
| let tmp = regs::scratch_xmm(); | ||
| self.and128v(src1, mask, writable!(tmp))?; | ||
| self.and_not128v(mask, src2, dst)?; | ||
| self.or128v(dst.to_reg(), tmp, dst)?; | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| fn any_true128v(&mut self, src: Reg, dst: WritableReg) -> Result<()> { | ||
| self.asm.xmm_vptest(src, src); | ||
| self.asm.setcc(IntCmpKind::Ne, dst); | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
We need to check that the host/target supports AVX for all of these methods:
if !self.flags.has_avx() {
bail!(CodeGenError::UnimplementedForNoAvx);
}db0aedd to
8264ff8
Compare
|
@saulecabrera it should be good now. I have taken the opportunity to tighten store/loads even more regarding atomics operation. Having the |
|
There's a conflict, once resolved, we can land this one. |
8264ff8 to
4629f6a
Compare
|
done |
|
looks like a spurious failure? @saulecabrera |
|
it's a bit hidden (alas) but I think that failure was non-spurious: https://github.com/bytecodealliance/wasmtime/actions/runs/13016537423/job/36307106911 |
|
Yeah, doesn't seem to be spurious. You need to update the spec tests here https://github.com/bytecodealliance/wasmtime/blob/main/crates/wast-util/src/lib.rs#L492 to include the ones that you enabled, so that it's marked as expected that these tests should fail on architectures that don't support AVX+. I believe this is mostly for for darwin x86_64, which runs via Rosetta and therefore there's no AVX support. |
|
I think I have fixed it, but I don't have a mac to try it right now. Thanks! |
| } | ||
| } | ||
| LoadKind::VectorLane(LaneSelector { lane, size }) => { | ||
| let byte_tmp = regs::scratch(); |
There was a problem hiding this comment.
The failures in CI are related to the vpinsr*instructions, which are emitted here. We need to add self.ensure_has_avx()? to get CI green.
There was a problem hiding this comment.
I setup the action on my fork, all tests pass now. worry for the back and forth 😓
implement
v128simd logical operations:v128.notv128.andv128.andnotv128.orv128.xorv128.bitselectv128.any_truev128.load8_lanev128.load16_lanev128.load32_lanev128.load64_lanev128.store8_lanev128.store16_lanev128.store32_lanev128.store64_lane#8093