riscv64: Add `vconst` lowerings by afonso360 · Pull Request #6324 · bytecodealliance/wasmtime

afonso360 · 2023-05-02T09:57:32Z

👋 Hey,

This PR does a number of things but the main goal is to enable vconst loading from the constant pool on the RISC-V backend.

Cleanups to Store/Emit instructions
- We now fallback to LoadAddr when we can't directly support the AMode
- Moved the instruction encoding to the encoding.rs file
Add Constant Pool and MachLabel based AMode's
Cleanup the existing VecLoad/VecStore to avoid adding a 0 offset where possible
Add the vconst lowering

This is quite big, but it made sense to me to have it all together in a PR. Let me know if you'd like me to split this up a bit more.

jameysharp

I have a few questions and suggestions, but if you feel like merging this now I think that would be fine.

jameysharp · 2023-05-02T17:26:02Z

+                        sink.use_label_at_offset(sink.cur_offset(), label, LabelUse::PCRelHi20);
+                        let inst = Inst::Auipc {
+                            rd,
+                            imm: Imm20::from_bits(0),
+                        };
+                        inst.emit(&[], sink, emit_info, state);
+
+                        // Emit an add to the address with a relocation.
+                        // This later gets patched up with the correct offset.
+                        sink.use_label_at_offset(sink.cur_offset(), label, LabelUse::PCRelLo12I);
+                        Inst::AluRRImm12 {
+                            alu_op: AluOPRRI::Addi,
+                            rd,
+                            rs: rd.to_reg(),
+                            imm12: Imm12::zero(),
+                        }
+                        .emit(&[], sink, emit_info, state);


Is this correct if the constant ends up being roughly 2048 bytes away, modulo 4096? I'm wondering if the high 20 bits might be off by one sometimes since the two instructions are extracting their 20/12 bits from offsets which differ by 4 bytes.

This comment reminded me that we discussed something almost exactly like this a while ago. And I found it:
#5951 (comment)

The Call Relocation in the JIT performs exactly the same arithmetic (It's the same pair of relocations as this). And it is different from this one, we only offset the Lo12 by 4 not the Hi20, which makes sense.

I'm also going to try to reproduce that off-by-one with a test, I get really confused by this relocation math.

I think I got this right, but I'm never confident when this sort of relocation math is involved.

We had 2 issues here. The first one you pointed out, only the Lo12 relocation should have the 4byte offset. The second one was the Hi20 relocation, that needs to get a 0x800 offset so that it skips to the next page as soon as the offset goes out of range for Lo12 since it is signed.

afonso360 · 2023-05-05T14:16:41Z

I rebased this on main, to get #6325 which had some cleanups that were useful for this PR (i.e. the unsigned_field_width function suggested above).

I've left the previous commits intact and only added from 1048da3 onwards.

This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.

jameysharp

I think this is ready to go, but I'm taking this week off and don't want to think quite hard enough to decide that. @elliottt, could you give this a quick pass and see if it makes sense to you too?

jameysharp · 2023-05-08T17:59:01Z

+                // We add 4 here since this relocation usually follows a PCRelHi20 relocation, at the previous
+                // instruction. So we need to account for the 4 byte difference in offsets there.
+                let lo12 = (offset + 4) as u32 & 0xFFF;
+                let insn = (insn & 0xFFFFF) | (lo12 << 20);


This feels backwards to me, but I think it's probably correct.

The instruction using Hi20 comes first, followed by the Lo12. So if the Hi20 label is at address x, then Lo12 is at address x+4.

Based on that, if we want to compute the same address for both labels, I expected that either Hi20 should use x+4, or Lo12 should use x-4.

Of the two instructions, the one which actually examines the program counter is auipc, so I think we need to compute the offset relative to that instruction. So I think you are correct that it's Lo12 that needs adjustment.

But I guess here we aren't given the address of the instruction, right? offset is the distance from this instruction to the address we want, or in other words, target_address - insn_address. So if insn_address is x-4, then offset is target_address - (x - 4), which is equivalent to (target_address - x) + 4.

And that's what you've implemented. If there's a way to make the comment more clear that'd be fantastic, but I'm at least reasonably convinced that this is correct.

Thank you for the additional comment, @afonso360!

elliottt · 2023-05-09T21:44:58Z

+;   .byte 0x57, 0x70, 0x04, 0xcc
+;   auipc t6, 0
+;   addi t6, t6, 0x14
+;   .byte 0x07, 0x85, 0x0f, 0x02
+;   ret


It looks like we're discovering some gaps in capstone :(

Yeah, unfortunately capstone doesn't recognize any V extension instructions 😞

elliottt

This looks good to me, just want to confirm that these offsets are ignored on purpose though :)

elliottt

Looks good to me, thank you @afonso360!

This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.

* riscv64: Use Vector Regclass * riscv64: Add assert to `Inst::Mov` It isn't ready yet * riscv64: Add SIMD vconst large test This was meant to exercise the changes in #6324 but was failing in RISC-V due to some missing regalloc bits. * riscv64: Restrict spill slot size * riscv64: Mark v0 as preferred * riscv64: Const compute clobbers

afonso360 requested a review from a team as a code owner May 2, 2023 09:57

afonso360 requested review from jameysharp and removed request for a team May 2, 2023 09:57

github-actions Bot added the cranelift Issues related to the Cranelift code generator label May 2, 2023

jameysharp approved these changes May 2, 2023

View reviewed changes

alexcrichton mentioned this pull request May 3, 2023

riscv64: Add VecALUImm instruction format #6325

Merged

afonso360 added 13 commits May 5, 2023 11:08

riscv64: Use LoadAddr on Load/Store

867551a

riscv64: Add I Type encoding

9b9268d

riscv64: Add S Type encoding

e420731

riscv64: Use LoadAddr on VecLoad/VecStore

ac5fbd3

riscv64: Add Const/Lable AModes

16687fa

riscv64: Add Label Address Generation

0105eb8

riscv64: Add vconst support

df1bed7

riscv64: Use unsigned_field_width in encode

1048da3

riscv64: Use WritableReg in encode

5d1f8fe

riscv64: Deduplicate AMode formatting

59b0983

riscv64: Refcator VectorLoad/Store AMode Pattern matching

f3f2fd1

riscv64: Avoid passing fp and sp through the register allocator

c2a89d5

riscv64: Fix PCRel{Hi20,Lo12I} relocation

fabce61

afonso360 force-pushed the riscv-vec-vconst branch from eb0cdee to fabce61 Compare May 5, 2023 14:03

afonso360 added a commit to afonso360/wasmtime that referenced this pull request May 5, 2023

riscv64: Add SIMD vconst large test

220f1ca

This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.

afonso360 added a commit to afonso360/wasmtime that referenced this pull request May 5, 2023

riscv64: Add SIMD vconst large test

09a9210

This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.

jameysharp reviewed May 8, 2023

View reviewed changes

riscv64: Update PCRelLo12I Comment

d8ae950

elliottt reviewed May 9, 2023

View reviewed changes

Comment thread cranelift/codegen/src/isa/riscv64/inst/emit.rs

Comment thread cranelift/codegen/src/isa/riscv64/inst/emit.rs

elliottt approved these changes May 9, 2023

View reviewed changes

afonso360 added this pull request to the merge queue May 9, 2023

Merged via the queue into bytecodealliance:main with commit b9e4474 May 9, 2023

afonso360 deleted the riscv-vec-vconst branch May 9, 2023 23:29

afonso360 added a commit to afonso360/wasmtime that referenced this pull request May 10, 2023

riscv64: Add SIMD vconst large test

546045a

This was meant to exercise the changes in bytecodealliance#6324 but was failing in RISC-V due to some missing regalloc bits.

afonso360 mentioned this pull request May 10, 2023

riscv64: Use Vector RegClass for Vectors #6366

Merged

Conversation

afonso360 commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameysharp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jameysharp May 2, 2023

Choose a reason for hiding this comment

Uh oh!

afonso360 May 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afonso360 May 5, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afonso360 commented May 5, 2023

Uh oh!

jameysharp left a comment

Choose a reason for hiding this comment

Uh oh!

jameysharp May 8, 2023

Choose a reason for hiding this comment

Uh oh!

elliottt May 9, 2023

Choose a reason for hiding this comment

Uh oh!

elliottt May 9, 2023

Choose a reason for hiding this comment

Uh oh!

afonso360 May 9, 2023

Choose a reason for hiding this comment

Uh oh!

elliottt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

elliottt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

afonso360 commented May 2, 2023 •

edited

Loading

afonso360 May 4, 2023 •

edited

Loading