riscv64: Add vconst lowerings#6324
Conversation
jameysharp
left a comment
There was a problem hiding this comment.
I have a few questions and suggestions, but if you feel like merging this now I think that would be fine.
| sink.use_label_at_offset(sink.cur_offset(), label, LabelUse::PCRelHi20); | ||
| let inst = Inst::Auipc { | ||
| rd, | ||
| imm: Imm20::from_bits(0), | ||
| }; | ||
| inst.emit(&[], sink, emit_info, state); | ||
|
|
||
| // Emit an add to the address with a relocation. | ||
| // This later gets patched up with the correct offset. | ||
| sink.use_label_at_offset(sink.cur_offset(), label, LabelUse::PCRelLo12I); | ||
| Inst::AluRRImm12 { | ||
| alu_op: AluOPRRI::Addi, | ||
| rd, | ||
| rs: rd.to_reg(), | ||
| imm12: Imm12::zero(), | ||
| } | ||
| .emit(&[], sink, emit_info, state); |
There was a problem hiding this comment.
Is this correct if the constant ends up being roughly 2048 bytes away, modulo 4096? I'm wondering if the high 20 bits might be off by one sometimes since the two instructions are extracting their 20/12 bits from offsets which differ by 4 bytes.
There was a problem hiding this comment.
This comment reminded me that we discussed something almost exactly like this a while ago. And I found it:
#5951 (comment)
The Call Relocation in the JIT performs exactly the same arithmetic (It's the same pair of relocations as this). And it is different from this one, we only offset the Lo12 by 4 not the Hi20, which makes sense.
I'm also going to try to reproduce that off-by-one with a test, I get really confused by this relocation math.
There was a problem hiding this comment.
I think I got this right, but I'm never confident when this sort of relocation math is involved.
We had 2 issues here. The first one you pointed out, only the Lo12 relocation should have the 4byte offset. The second one was the Hi20 relocation, that needs to get a 0x800 offset so that it skips to the next page as soon as the offset goes out of range for Lo12 since it is signed.
eb0cdee to
fabce61
Compare
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
jameysharp
left a comment
There was a problem hiding this comment.
I think this is ready to go, but I'm taking this week off and don't want to think quite hard enough to decide that. @elliottt, could you give this a quick pass and see if it makes sense to you too?
| // We add 4 here since this relocation usually follows a PCRelHi20 relocation, at the previous | ||
| // instruction. So we need to account for the 4 byte difference in offsets there. | ||
| let lo12 = (offset + 4) as u32 & 0xFFF; | ||
| let insn = (insn & 0xFFFFF) | (lo12 << 20); |
There was a problem hiding this comment.
This feels backwards to me, but I think it's probably correct.
The instruction using Hi20 comes first, followed by the Lo12. So if the Hi20 label is at address x, then Lo12 is at address x+4.
Based on that, if we want to compute the same address for both labels, I expected that either Hi20 should use x+4, or Lo12 should use x-4.
Of the two instructions, the one which actually examines the program counter is auipc, so I think we need to compute the offset relative to that instruction. So I think you are correct that it's Lo12 that needs adjustment.
But I guess here we aren't given the address of the instruction, right? offset is the distance from this instruction to the address we want, or in other words, target_address - insn_address. So if insn_address is x-4, then offset is target_address - (x - 4), which is equivalent to (target_address - x) + 4.
And that's what you've implemented. If there's a way to make the comment more clear that'd be fantastic, but I'm at least reasonably convinced that this is correct.
| ; .byte 0x57, 0x70, 0x04, 0xcc | ||
| ; auipc t6, 0 | ||
| ; addi t6, t6, 0x14 | ||
| ; .byte 0x07, 0x85, 0x0f, 0x02 | ||
| ; ret |
There was a problem hiding this comment.
It looks like we're discovering some gaps in capstone :(
There was a problem hiding this comment.
Yeah, unfortunately capstone doesn't recognize any V extension instructions 😞
elliottt
left a comment
There was a problem hiding this comment.
This looks good to me, just want to confirm that these offsets are ignored on purpose though :)
elliottt
left a comment
There was a problem hiding this comment.
Looks good to me, thank you @afonso360!
This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.
* riscv64: Use Vector Regclass * riscv64: Add assert to `Inst::Mov` It isn't ready yet * riscv64: Add SIMD vconst large test This was meant to exercise the changes in #6324 but was failing in RISC-V due to some missing regalloc bits. * riscv64: Restrict spill slot size * riscv64: Mark v0 as preferred * riscv64: Const compute clobbers
👋 Hey,
This PR does a number of things but the main goal is to enable
vconstloading from the constant pool on the RISC-V backend.Store/EmitinstructionsLoadAddrwhen we can't directly support theAModeencoding.rsfileAMode'sVecLoad/VecStoreto avoid adding a 0 offset where possiblevconstloweringThis is quite big, but it made sense to me to have it all together in a PR. Let me know if you'd like me to split this up a bit more.