Our prototype verifier seems to have found a completeness/performance (not correctness) bug in the handling of some narrow immediate values in aarch64.
aarch64 has the following rules that check if the negated value of a constant fits in an immediate (Imm12), doing a corresponding switch between add and subtract:
;; Same as the previous special case, except we can switch the subtraction to an
;; addition if the negated immediate fits in 12 bits.
(rule 2 (lower (has_type (fits_in_64 ty) (isub x (imm12_from_negated_value y))))
(add_imm ty x y))
;; Same as the previous special cases, except we can switch the addition to a
;; subtraction if the negated immediate fits in 12 bits.
(rule 2 (lower (has_type (fits_in_64 ty) (iadd x (imm12_from_negated_value y))))
(sub_imm ty x y))
(rule 3 (lower (has_type (fits_in_64 ty) (iadd (imm12_from_negated_value x) y)))
(sub_imm ty y x))
These rules should work for all fits_in_64 types (i8, i16, i32, i64), but they seem to only work as intended for i64 (they aren't stricly needed for i8, since the versions of these rules without negation trigger there since i8 values always fit in Imm12).
In particular, our prototype verifier found that for i16 and i32, these rules only apply if the operand extracted with imm12_from_negated_value is exactly 0, rather than values in the range Cranelift lowers to Imm12: constants with negated values val: (val < 0xfff) || (val < 0xfff_000 && (val & 0xfff == 0)).
For example, adding a constant -1 to a value should always be able to be lowered to subtracting an immediate 1 from the value, but as shown in the test cases below, clif only uses sub for i8 and i64 in this case and otherwise keeps a mov and add.
What I think is the bug
As discussed in #5700, Clif semantics are that narrow (i8, i64, i32) iconst are stored in an underlying u64 that is zero-extended, not sign-extended, from the narrow bits.
imm12_from_negated_value negates this zero-extended value as an i64 then checks whether that fits into an immediate:
fn imm12_from_negated_u64(&mut self, n: u64) -> Option<Imm12> {
Imm12::maybe_from_u64((n as i64).wrapping_neg() as u64)
}
Any narrow negative constant, after being zero extended, has left-filled zeros, so negating the value no longer produces a constant that is unsigned < 0xfff or < 0xfff_000.
When handling narrow iconst values, imm12_from_negated_value should negate the narrow value then zero extend in this case. In general, ISLE terms operating on the u64 may need to see the u32, u16, etc, instead. I'll work on a PR for at least fixing imm12_from_negated_value.
.clif Test Case
imm12-64.clif
test interpret
test run
target aarch64
function %a(i64) -> i64 {
block0(v0: i64):
v1 = iconst.i64 -1
v3 = iadd v0, v1
return v3
}
; run: %a(0) == -1
Because the .clif parser currently sign-extends instead of zero-extends values, for narrow constants I use the decimal -1 value as an unsigned u64 to construct the iconst:
imm12-32.clif
test interpret
test run
target aarch64
function %a(i32) -> i32 {
block0(v0: i32):
v1 = iconst.i32 4294967295
v3 = iadd v0, v1
return v3
}
; run: %a(0) == -1
Steps to Reproduce
cargo run -- compile --target aarch64 -D imm12-64.clif
cargo run -- compile --target aarch64 -D imm12-32.clif
cargo run -- compile --target aarch64 -D imm12-16.clif
cargo run -- compile --target aarch64 -D imm12-8.clif
Expected Results
This function gets rewritten to:
in all cases.
Actual Results
cargo run -- compile --target aarch64 -D imm12-64.clif
.byte 0, 4, 0, 209, 192, 3, 95, 214
Disassembly of 8 bytes:
0: 00 04 00 d1 sub x0, x0, #1 <-- good use of immediate
4: c0 03 5f d6 ret
cargo run -- compile --target aarch64 -D imm12-32.clif
.byte 2, 0, 128, 18, 0, 0, 2, 11, 192, 3, 95, 214
Disassembly of 12 bytes:
0: 02 00 80 12 mov w2, #-1
4: 00 00 02 0b add w0, w0, w2 <-- no immediate
8: c0 03 5f d6 ret
cargo run -- compile --target aarch64 -D imm12-16.clif
.byte 226, 255, 159, 82, 0, 0, 2, 11, 192, 3, 95, 214
Disassembly of 12 bytes:
0: e2 ff 9f 52 mov w2, #0xffff
4: 00 00 02 0b add w0, w0, w2 <-- no immediate
8: c0 03 5f d6 ret
cargo run -- compile --target aarch64 -D imm12-8.clif
.byte 0, 252, 3, 17, 192, 3, 95, 214
Disassembly of 8 bytes:
0: 00 fc 03 11 add w0, w0, #0xff <-- good use of immediate because #0xff is small
4: c0 03 5f d6 ret
Versions and Environment
Cranelift version or commit: c4a2c1e
Operating system: macOS
Architecture: aarch64
Our prototype verifier seems to have found a completeness/performance (not correctness) bug in the handling of some narrow immediate values in
aarch64.aarch64has the following rules that check if the negated value of a constant fits in an immediate (Imm12), doing a corresponding switch between add and subtract:These rules should work for all
fits_in_64types (i8, i16, i32, i64), but they seem to only work as intended for i64 (they aren't stricly needed for i8, since the versions of these rules without negation trigger there since i8 values always fit inImm12).In particular, our prototype verifier found that for i16 and i32, these rules only apply if the operand extracted with
imm12_from_negated_valueis exactly 0, rather than values in the range Cranelift lowers toImm12: constants with negated valuesval:(val < 0xfff) || (val < 0xfff_000 && (val & 0xfff == 0)).For example, adding a constant -1 to a value should always be able to be lowered to subtracting an immediate 1 from the value, but as shown in the test cases below, clif only uses
subfor i8 and i64 in this case and otherwise keeps amovandadd.What I think is the bug
As discussed in #5700, Clif semantics are that narrow (i8, i64, i32)
iconstare stored in an underlyingu64that is zero-extended, not sign-extended, from the narrow bits.imm12_from_negated_valuenegates this zero-extended value as an i64 then checks whether that fits into an immediate:Any narrow negative constant, after being zero extended, has left-filled zeros, so negating the value no longer produces a constant that is unsigned
< 0xfffor< 0xfff_000.When handling narrow iconst values,
imm12_from_negated_valueshould negate the narrow value then zero extend in this case. In general, ISLE terms operating on theu64may need to see theu32,u16, etc, instead. I'll work on a PR for at least fixingimm12_from_negated_value..clifTest Caseimm12-64.clifBecause the
.clifparser currently sign-extends instead of zero-extends values, for narrow constants I use the decimal -1 value as an unsigned u64 to construct the iconst:imm12-32.clifSteps to Reproduce
Expected Results
This function gets rewritten to:
in all cases.
Actual Results
Versions and Environment
Cranelift version or commit: c4a2c1e
Operating system: macOS
Architecture: aarch64