Skip to content

Enable simd_extmul_* for AArch64#3070

Merged
cfallin merged 3 commits into
bytecodealliance:mainfrom
sparker-arm:simd-extmul-aarch64
Jul 28, 2021
Merged

Enable simd_extmul_* for AArch64#3070
cfallin merged 3 commits into
bytecodealliance:mainfrom
sparker-arm:simd-extmul-aarch64

Conversation

@sparker-arm
Copy link
Copy Markdown
Contributor

Lower simd_extmul_[low/high][signed/unsigned] to [s|u]widen inputs to
an imul node.

This also reorganises the aarch64 'long mul' operations and their assembly.

Copyright (c) 2021, Arm Limited.

@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:wasm labels Jul 8, 2021
@sparker-arm sparker-arm force-pushed the simd-extmul-aarch64 branch from 00b264c to ff2cd0f Compare July 9, 2021 09:28
@akirilov-arm akirilov-arm requested a review from cfallin July 14, 2021 17:44
@cfallin
Copy link
Copy Markdown
Member

cfallin commented Jul 14, 2021

@sparker-arm @akirilov-arm I'm currently out on vacation until Mon Jul 26; I can review this when I'm back if you prefer, or perhaps one of @abrown or @alexcrichton could review?

@akirilov-arm
Copy link
Copy Markdown
Contributor

@cfallin I pinged you because historically you had done most of the AArch64-related reviews (and Sam was actually unable to request a review from anybody), but another reviewer would be absolutely fine.

Copy link
Copy Markdown
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty reasonable to me!

Umull32,
/// Unsigned multiply add long
Umlal8,
Umlal16,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this and Umlal8 aren't actually generated in any lowerings (unless I'm missing something), but do you want to leave these in for completeness with possible future lowerings?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly - thought it would be weird to just port the 32-bit version.

ra: zero_reg(),
});
} else if ty.is_vector() {
for ext_op in &[
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible way to structure this is to perhaps check for is_vector and i64x2 early on at the top of this block with early-returns if they match, and if that fails all the remaining cases share the logic of

                    let rn = put_input_in_reg(ctx, inputs[0], NarrowValueMode::None);
                    let rm = put_input_in_reg(ctx, inputs[1], NarrowValueMode::None);
                    let rd = get_output_reg(ctx, outputs[0]).only_reg().unwrap();

as a prefix.

Just a thought though, happy to defer to you who work on this much more than I!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even though I don't like the duplication, or how much conditional stuff is going on here, we would only be able to factor out one collection of these calls as the I128 case uses, ever so slightly, different calls to handle multiple registers. So I think I prefer the clarity.

Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Just one request for a doc-comment below.

None
}

pub(crate) fn match_vec_long_mul<C: LowerCtx<I = Inst>>(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a doc-comment here describing the return tuple? E.g. it's not clear to me reading at this point what the bool denotes, until I notice the low/high pattern below in the implementation.

Lower simd_extmul_[low/high][signed/unsigned] to [s|u]widen inputs to
an imul node.

Copyright (c) 2021, Arm Limited.
Copyright (c) 2021, Arm Limited.
And removed an accidental code move.

Copyright (c) 2021, Arm Limited.
@sparker-arm sparker-arm force-pushed the simd-extmul-aarch64 branch from ff2cd0f to 5eb2dca Compare July 28, 2021 12:18
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cfallin cfallin merged commit 323197e into bytecodealliance:main Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:wasm cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants