Change the default FMA codegen to be of 231 form instead of 213 form by hanblee · Pull Request #25387 · dotnet/coreclr

hanblee · 2019-06-25T18:59:08Z

as it is likely to be most prevalent. E.g., x = Fma.MultiplyAdd(a, b, x), and it removes the need for the redundant register to register move.

Current codegen for the following code:

static unsafe float fmaTest2(float *b, float *c, int d)
{
    Vector256<float> tmp = Vector256<float>.Zero;
    for (int k = 0; k + 8 <= d; k += 8)
    {
        Vector256<float> v1 = Avx.LoadVector256(b + k);
        Vector256<float> v2 = Avx.LoadVector256(c + k);
        tmp = Fma.MultiplyAdd(Avx.Multiply(v2, v2), Avx.Multiply(v1, v1), tmp);
    }

    return tmp.ToScalar();
}

is

G_M8770_IG03:
       4C63C8               movsxd   r9, eax
       C4A17C100C89         vmovups  ymm1, ymmword ptr[rcx+4*r9]
       C4A17C10148A         vmovups  ymm2, ymmword ptr[rdx+4*r9]
       C5EC59D2             vmulps   ymm2, ymm2, ymm2
       C5F459C9             vmulps   ymm1, ymm1, ymm1
       C4E26DA8C8           vfmadd213ps ymm1, ymm2, ymm0
       C5FC28C1             vmovaps  ymm0, ymm1
       83C008               add      eax, 8
       448D4808             lea      r9d, [rax+8]
       453BC8               cmp      r9d, r8d
       7ED4                 jle      SHORT G_M8770_IG03

With this PR:

G_M8770_IG03:
       4C63C8               movsxd   r9, eax
       C4A17C100C89         vmovups  ymm1, ymmword ptr[rcx+4*r9]
       C4A17C10148A         vmovups  ymm2, ymmword ptr[rdx+4*r9]
       C5EC59D2             vmulps   ymm2, ymm2, ymm2
       C5F459C9             vmulps   ymm1, ymm1, ymm1
       C4E275B8C2           vfmadd231ps ymm0, ymm1, ymm2
       83C008               add      eax, 8
       448D4808             lea      r9d, [rax+8]
       453BC8               cmp      r9d, r8d
       7ED8                 jle      SHORT G_M8770_IG03

since it is likely to be most prevalent. E.g.,x = Fma.MultiplyAdd(a, b, x);

hanblee · 2019-06-25T19:00:32Z

@tannergooding PTAL

tannergooding · 2019-06-25T19:52:55Z

@hanblee, do you have any numbers/metrics showing that 231 is more common than 213 across most workloads?

I had done some initial analysis of various C/C++ algorithms and raw assembly using vfmadd (for example, many of the CRT Math functions) and it looked to be that 213 was more common by far (namely because the addend tended to be some well-defined constant). It also requires the fewest transformations to the tree.

I would much rather see this fixed, longer term, by having the register allocator support preferencing better and for it to also support indicating more than one operand can be reg optional.

Also CC. @CarolEidt

tannergooding · 2019-06-25T19:54:28Z

For this case in particular, if none of the operands can be contained, it would be ideal if we could choose the encoding based on whether the target register matches one of the input registers.

hanblee · 2019-06-25T21:05:50Z

@tannergooding

Here is one view based on GitHub search results for the following four columns. 231 form is slightly (~10%) more common than 213 form.

Languages	vfmadd213pd	vfmadd213ps	vfmadd231pd	vfmadd231ps
Unix Assembly	7,960	7,650	7,050	6,042
LLVM	7,163	7,955	3,982	4,981
Text	5,646	29,298	5,763	29,413
C	2,950	2,979	4,553	4,376
D	1,787	1,799	1,799	1,799
Makefile	1,771	1,783	1,784	1,783
C++	1,459	10,274	10,052	15,054
PHP	1,293	1,301	1,300	1,301
HTML	557	542	532	553
Assembly/Python	526	430	543	733
Total	31,112	64,011	37,358	66,035
Total (excluding text/html)	24,909	34,171	31,063	36,069
Total (pd + ps)		95,123		103,393
Total (pd + ps; excluding text/html)		59,080		67,132

tannergooding · 2019-06-25T22:10:52Z

-        op1Reg = op1->gtRegNum;
-        op2Reg = op2->gtRegNum;
-
-        isCommutative = !copiesUpperBits;


I'm pretty sure we need to keep the isCommutative handling for the cases where the multiplicand and multiplier can be swapped.

I missed that. Thanks.

tannergooding · 2019-06-25T22:11:59Z

-                {
-                    // 213 form: op1 = (op2 * op1) + op3
-
-                    if (copiesUpperBits)


Similar comment, pretty sure we need to preserve this logic since op1 impacts determinism.

For clarity as well, I think it makes sense to continue to keep the reg-only case separate, even if it's using the same form as the case above.

CarolEidt · 2019-06-26T01:10:30Z

-                {
-                    // 213 form: op1 = (op2 * op1) + op3
-
-                    if (copiesUpperBits)


For clarity as well, I think it makes sense to continue to keep the reg-only case separate, even if it's using the same form as the case above.

CarolEidt · 2019-06-26T01:18:05Z

I'm not sure this is adequately motivated.
In any event, it should not be too hard to at least

improve the preferencing somewhat
modify codegen to change the form based on the target register if possible

Supporting multiple reg-optional operands is a rather complex change (though it would have broad benefit), so it is not likely to happen soon.

hanblee · 2019-06-26T16:35:12Z

Thanks for your comments, and I agree on the need for a more comprehensive solutions. Closing this as a result.

CarolEidt · 2019-06-26T16:55:18Z

Thanks @hanblee for raising this issue. I've filed https://github.com/dotnet/coreclr/issues/25434 to make the improvements that have been discussed.

Change the default FMA codegen to be of 231 form

338d4cf

since it is likely to be most prevalent. E.g.,x = Fma.MultiplyAdd(a, b, x);

jkotas added the area-CodeGen label Jun 25, 2019

tannergooding reviewed Jun 25, 2019

View reviewed changes

CarolEidt suggested changes Jun 26, 2019

View reviewed changes

hanblee closed this Jun 26, 2019

hanblee deleted the fmadefault branch June 28, 2019 16:02

CarolEidt mentioned this pull request Jan 31, 2020

Improve preferencing and code generation for FMA dotnet/runtime#12984

Closed

Jflaurendeau mentioned this pull request Sep 9, 2024

FusedMultiplyAdd (FMA) default to vfmadd213, even in situation where 231 (I think) should be preferred dotnet/runtime#107538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the default FMA codegen to be of 231 form instead of 213 form#25387

Change the default FMA codegen to be of 231 form instead of 213 form#25387
hanblee wants to merge 1 commit intodotnet:masterfrom
hanblee:fmadefault

hanblee commented Jun 25, 2019

Uh oh!

hanblee commented Jun 25, 2019

Uh oh!

tannergooding commented Jun 25, 2019

Uh oh!

tannergooding commented Jun 25, 2019

Uh oh!

hanblee commented Jun 25, 2019 •

edited

Loading

Uh oh!

tannergooding Jun 25, 2019

Uh oh!

hanblee Jun 26, 2019

Uh oh!

tannergooding Jun 25, 2019

Uh oh!

CarolEidt Jun 26, 2019

Uh oh!

CarolEidt Jun 26, 2019

Uh oh!

CarolEidt commented Jun 26, 2019

Uh oh!

hanblee commented Jun 26, 2019

Uh oh!

CarolEidt commented Jun 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hanblee commented Jun 25, 2019

Uh oh!

hanblee commented Jun 25, 2019

Uh oh!

tannergooding commented Jun 25, 2019

Uh oh!

tannergooding commented Jun 25, 2019

Uh oh!

hanblee commented Jun 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding Jun 25, 2019

Choose a reason for hiding this comment

Uh oh!

hanblee Jun 26, 2019

Choose a reason for hiding this comment

Uh oh!

tannergooding Jun 25, 2019

Choose a reason for hiding this comment

Uh oh!

CarolEidt Jun 26, 2019

Choose a reason for hiding this comment

Uh oh!

CarolEidt Jun 26, 2019

Choose a reason for hiding this comment

Uh oh!

CarolEidt commented Jun 26, 2019

Uh oh!

hanblee commented Jun 26, 2019

Uh oh!

CarolEidt commented Jun 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hanblee commented Jun 25, 2019 •

edited

Loading