Add simd_extmul_* support for x64#3084
Conversation
ba3d91f to
71574cd
Compare
71574cd to
6ccd1d8
Compare
abrown
left a comment
There was a problem hiding this comment.
This looks good to me with the minor comments about inlining and reordering the lowerings. @akirilov-arm, the code_translator.rs part should look OK for the aarch64 backend, right?
| insn: swiden1_high, | ||
| input: 0, | ||
| }, | ||
| ]; |
There was a problem hiding this comment.
Not sure why swiden_input is needed: couldn't the InsnInputs be created inline?
| dst, | ||
| )); | ||
| } | ||
| _ => panic!("Unsupported extmul_low_signed type"), |
There was a problem hiding this comment.
Initially I thought that this matching was too greedy--that it would match sequences that it shouldn't be and then crash because the types are wrong. I think now that it is OK because swiden_high only really allows the following types: I8X16, I16X8, I32X4. I think it might be worthwhile to mention this assumption in some comments at the top so that future readers aren't confused.
v0 = swiden.i64
| let ty = ty.unwrap(); | ||
| if ty == types::I64X2 { | ||
|
|
||
| // First check for ext_mul_* instructions. Where possible ext_mul_* lowerings |
There was a problem hiding this comment.
I think the order of the lowerings should be reversed: scalar lowerings first, then vector lowerings, then pattern matching lowerings. As it stands in this PR, every imul.i64 is going to require a bunch of matching calls and type comparisons before it ever gets lowered. I think scalar multiplication is going to be the most common, then plain vector multiplication, so I would think we would want to make those common paths shorter. (Not sure exactly how much this affects compile time but hopefully you see what I mean).
There was a problem hiding this comment.
Sure .. I agree. That was my original thought but I never reverted the initial implementation. Upon looking at this again I see why I naturally put the extmul first. It is because when we share all these instructions with imul the current checks such as types::I64x2, and ty.lane_count() > 1 are entered before we have a chance to check for an opcode source. I think the best we can do is have a matches_input_any at the top and have a branch from there. That is what I will do before merging. I supposed if there is another/better solution we can refactor later.
There was a problem hiding this comment.
Also note, the code must remain at the top as the fall through target since I am using "if let Some() .." to check for op sources and there is no if not let Some() ..". Still there is just one branch to get to the other lowerings.
|
@abrown I will let @sparker-arm comment because he has done the same changes in PR #3070. |
384b69b to
02abbe8
Compare
No description provided.