Skip to content

aarch64: Add support for the fmls instruction#5895

Merged
alexcrichton merged 1 commit into
bytecodealliance:mainfrom
alexcrichton:fmls
Mar 2, 2023
Merged

aarch64: Add support for the fmls instruction#5895
alexcrichton merged 1 commit into
bytecodealliance:mainfrom
alexcrichton:fmls

Conversation

@alexcrichton
Copy link
Copy Markdown
Member

This commit adds lowerings to the AArch64 backend for the fmls instruction which is intended to be leveraged in the relaxed-simd proposal for WebAssembly. This should hopefully allow for a teeny-bit-more efficient codegen for this operator instead of using the fmla instruction plus a negation instruction.

This commit adds lowerings to the AArch64 backend for the `fmls`
instruction which is intended to be leveraged in the relaxed-simd
proposal for WebAssembly. This should hopefully allow for a
teeny-bit-more efficient codegen for this operator instead of using the
`fmla` instruction plus a negation instruction.
@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Feb 28, 2023
Copy link
Copy Markdown
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

(vec_rrr_mod (VecALUModOp.Fmls) z x y (vector_size ty)))

(rule 2 (lower (has_type ty @ (multi_lane _ _) (fma x (fneg y) z)))
(vec_rrr_mod (VecALUModOp.Fmls) z x y (vector_size ty)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if both x and y are fneg then this can emit fmla instead of fneg+fmls, right? But I guess that's a rewrite we ought to do in the egraph optimizations instead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! The x64 rules actually end up implementing that (they enable sort of switching back and forth given their structure) but it wasn't as obvious to do here - x64 uses a helper that manages sinking a load as well which adds a fair number of permutations.

I'll send a follow-up which implements the egraph optimization.

@alexcrichton alexcrichton added this pull request to the merge queue Mar 2, 2023
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Mar 2, 2023
This implements comments from bytecodealliance#5895 to cancel out `fneg` operations in
`fma` instructions. Additional support for `fmul` is added as well.
Merged via the queue into bytecodealliance:main with commit 9984e95 Mar 2, 2023
@alexcrichton alexcrichton deleted the fmls branch March 2, 2023 06:51
alexcrichton added a commit that referenced this pull request Mar 2, 2023
This implements comments from #5895 to cancel out `fneg` operations in
`fma` instructions. Additional support for `fmul` is added as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants