Skip to content

Winch: i8x16.shuffle for x64 with AVX#9959

Merged
saulecabrera merged 10 commits into
bytecodealliance:mainfrom
jeffcharles:simd-shuffle
Jan 15, 2025
Merged

Winch: i8x16.shuffle for x64 with AVX#9959
saulecabrera merged 10 commits into
bytecodealliance:mainfrom
jeffcharles:simd-shuffle

Conversation

@jeffcharles
Copy link
Copy Markdown
Contributor

@jeffcharles jeffcharles commented Jan 9, 2025

Part of #8093. Implements i8x16.shuffle on x64 with AVX extensions.

@jeffcharles jeffcharles requested review from a team as code owners January 9, 2025 16:39
@jeffcharles jeffcharles requested review from fitzgen and removed request for a team January 9, 2025 16:39
@jeffcharles jeffcharles changed the title i8x16.shuffle for x64 with AVX512 Winch: i8x16.shuffle for x64 with AVX512 Jan 9, 2025
@saulecabrera
Copy link
Copy Markdown
Member

I can take this review as well.

@saulecabrera saulecabrera requested review from saulecabrera and removed request for a team and fitzgen January 9, 2025 17:23
@github-actions github-actions Bot added the winch Winch issues or pull requests label Jan 9, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 9, 2025

Subscribe to Label Action

cc @saulecabrera

Details This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

  • saulecabrera: winch

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@jeffcharles jeffcharles changed the title Winch: i8x16.shuffle for x64 with AVX512 Winch: i8x16.shuffle for x64 with AVX Jan 13, 2025
Comment thread winch/codegen/src/isa/x64/masm.rs Outdated
Comment on lines +1263 to +1290
if self.flags.has_avx() {
// Use `vpshufb` with `lanes` to set the lanes in `lhs` and `rhs`
// separately to either the selected index or 0.
// Then use `vpor` to combine `lhs` and `rhs` into `dst`.
// Setting the most significant bit in the mask's lane to 1 will
// result in corresponding lane in the destination register being
// set to 0. 0x80 sets the most significant bit to 1.
let mut mask_lhs: [u8; 16] = [0x80; 16];
let mut mask_rhs: [u8; 16] = [0x80; 16];
for i in 0..lanes.len() {
if lanes[i] < 16 {
mask_lhs[i] = lanes[i];
} else {
mask_rhs[i] = lanes[i] - 16;
}
}
let mask_lhs = self.asm.add_constant(&mask_lhs);
let mask_rhs = self.asm.add_constant(&mask_rhs);

self.asm.xmm_vpshufb_rrm(dst, lhs, &mask_lhs);
let scratch = writable!(regs::scratch_xmm());
self.asm.xmm_vpshufb_rrm(scratch, rhs, &mask_rhs);
self.asm.vpor(dst, dst.to_reg(), scratch.to_reg());
} else {
bail!(CodeGenError::UnimplementedForNoAvx)
}
Ok(())
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion perhaps to improve readability, given that the then branch is somewhat lengthy, could we invert the check so that we return early in case there's no avx support?

if !self.flags().has_avx() {
  bail!(...);
}

// ....
Ok(())

@saulecabrera saulecabrera added this pull request to the merge queue Jan 15, 2025
Merged via the queue into bytecodealliance:main with commit ba950f2 Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

winch Winch issues or pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants