riscv64: Implement iadd_pairwise#6568
Conversation
| if op.forbids_src_dst_overlaps() { | ||
| collector.reg_late_use(vd_src); | ||
| } else { | ||
| collector.reg_use(vd_src); | ||
| } | ||
|
|
||
| collector.reg_reuse_def(vd, 1); // `vd` == `vd_src`. |
There was a problem hiding this comment.
The vslideup instruction both modifies the destination register and also has the requirement that none of the input registers must be the same as the destination register.
I've used reg_late_use and reg_reuse_def to model this (and it seems to work), but this always emits a move before the instruction even when the register is otherwise unused, so there's probably a better way of expressing this which I'm not entirely sure how
There was a problem hiding this comment.
I'm not sure I understand the intended semantics here: src and dst (vd_src and vd, rather) must not overlap, but vd is also a reuse of vd_src? Or is the non-overlap constraint between vs2 and vd?
There was a problem hiding this comment.
Oops, I could have written that better. vd must be the same as vd_src (since they are encoded in the same field), and none of the other inputs (vs2 and vm) must be the same register as vd.
vm (The mask register) is slightly hidden here since it's not always applicable, but when it is, it's a fixed_use(v0)
There was a problem hiding this comment.
Ah, OK. The right way to encode that is probably to use a late-use on vs2 (and on vm when present), rather than on vd_src, I think.
There was a problem hiding this comment.
👍 I'm not entirely sure I did the right thing. I also couldn't find something along the lines of reg_fixed_late_use for vm, does regalloc support encoding multiple constraints by calling reg_fixed_use and reg_late_use on the same VReg?
There was a problem hiding this comment.
Yep, it should be possible to do that -- if it's not on the OperandCollector API we can add it. The early/late ("position"), use/def ("kind"), and fixed/any/reg/stack ("constraint") are all orthogonal fields of the Operand.
|
This looks good to me; I'd just like @cfallin to review the regalloc bits and then approve this if that part looks okay. |
239e348 to
e1aeee9
Compare
👋 Hey,
This PR implements both the move instruction for vector registers and
iadd_pairwise.We can't really implement
iadd_pairwisein the best way possible, since that requires supporting LMUL > 1 which dynamically changes how many registers are available. (At least as far as I know, If regalloc supports this it would be nice to start using it)