Skip to content

Cranelift: Utilize AMode::RegScaled on aarch64#6743

Closed
maekawatoshiki wants to merge 1 commit into
bytecodealliance:mainfrom
maekawatoshiki:uint/dev
Closed

Cranelift: Utilize AMode::RegScaled on aarch64#6743
maekawatoshiki wants to merge 1 commit into
bytecodealliance:mainfrom
maekawatoshiki:uint/dev

Conversation

@maekawatoshiki
Copy link
Copy Markdown
Contributor

Summary

@maekawatoshiki maekawatoshiki requested a review from a team as a code owner July 18, 2023 15:51
@maekawatoshiki maekawatoshiki requested review from elliottt and removed request for a team July 18, 2023 15:51
@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Jul 18, 2023
@elliottt
Copy link
Copy Markdown
Member

@alexcrichton or @cfallin, could you take a look at this?

@alexcrichton
Copy link
Copy Markdown
Member

Sure yeah I can help review. Looking over this I can't help but personally feel that a bit of a refactor is in order (preexisting issue, not a new one) as I find it a bit odd to follow. This feels similar to the addressing that historically happened in x64 but I changed awhile back where it was recursive and walked thorugh an entire iadd chain to compute the final amode. To me the problem here is that all add operands are collected into a vector but if the vector is larger than 1 or 2 elements everything falls back to the normal operands and amodes instead.

Instead I liked the idea in the x64 backend where you start with an AMode and then add items to it which might extend the amode. This is the to_amode_add function in the x64 lowering where it's iteratively modified as possible through subsequent "peeling" of add operands. This avoided the x64-specific problem of recursing entirely and blowing the stack on large iadd chains, but I think would be appropriate for aarch64 as well since it avoids collecting large 32/64-bit operand lists and collects only precisely what's needed (in theory).

Now I realize though that the refactoring may be a bit bigger than desired though on this PR. I bring this up though because this PR has a few bugs I'll list below. I think the bugs stem from the fact that currently the only pattern matched is "a bunch of adds" and the pattern matching add here should only match a single shift in a very specific location as opposed to any location.

Multiple shifted operands

function %a(i64, i64, i64) -> i32 {
block0(v0: i64, v1: i64, v2: i64):
  v3 = iconst.i64 3
  v4 = ishl v0, v3
  v5 = ishl v1, v3
  v6 = iadd v4, v5
  v7 = iadd v6, v2
  v8 = load.i32 v7
  return v8
}

yields:

$ cargo run -q compile foo.clif --target aarch64
thread 'main' panicked at 'assertion failed: shifted.is_none()', cranelift/codegen/src/isa/aarch64/lower.rs:303:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

A single shifted operand

function %a(i64) -> i32 {
block0(v0: i64):
  v1 = iconst.i64 3
  v2 = ishl v0, v1
  v3 = load.i32 v2
  return v3
}

yields:

$ cargo run -q compile foo.clif --target aarch64
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `32`,
 right: `64`', cranelift/codegen/src/isa/aarch64/inst/emit.rs:1030:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Shift not actually taken into account

function %a(i64, i32, i64) -> i32 {
block0(v0: i64, v1: i32, v2: i64):
  v3 = uextend.i64 v1
  v4 = iadd v0, v3
  v5 = ishl_imm v2, 2
  v6 = iadd v4, v5
  v7 = load.i32 v6
  return v7
}

yields:

$ cargo run -q compile foo.clif --target aarch64 -D
.byte 0, 72, 97, 184, 192, 3, 95, 214

Disassembly of 8 bytes:
   0:   00 48 61 b8             ldr     w0, [x0, w1, uxtw]
   4:   c0 03 5f d6             ret

(note that processing of x2 is missing here)

There's also a number of other cases here where shifted is only taken into account in one case rather than all of them so the fall-through is ignoring shifted in more cases than just this.


Those are some examples which are currently incorrect with this PR, and at least personally I don't see an "easy fix" with a small piece of handling here-or-there to fix these issues. That's why I think that it may be best to refactor this amode computation to be a bit simpler than before (probably in ISLE rather than Rust) and then I think it might be less brittle to add a special case. For example a to_amode_add function for AArch64 could pattern match an existing amode of UnsignedOffset with offset 0 with a shifting operation to produce RegScaled. Once RegScaled is produced, however, no further amode massaging would be done.

I could very well be wrong though and there may be an easy fix to these issues.

@alexcrichton alexcrichton requested review from alexcrichton and removed request for elliottt July 20, 2023 12:14
@maekawatoshiki
Copy link
Copy Markdown
Contributor Author

I appreciate your detailed comment.

I totally agree that we need to refactor this amode computation, since I thought so when I first read the code... Also the bugs you mentioned are really helpful (and I should have tested on such cases).

I plan on adding ISLE rules that replaces the existing amode computation.

@maekawatoshiki
Copy link
Copy Markdown
Contributor Author

I tried adding ISLE rules that replace existing lower_address, while referring to #5986.
But I figured out that it's hard to be done for me and, for someone, we should open an issue for it :)

@alexcrichton
Copy link
Copy Markdown
Member

I've made a first attempt in #6805 to move lower_address into ISLE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cranelift: Generate load/store using AMode::RegScaled on aarch64

3 participants