Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Change the default FMA codegen to be of 231 form instead of 213 form#25387

Closed
hanblee wants to merge 1 commit intodotnet:masterfrom
hanblee:fmadefault
Closed

Change the default FMA codegen to be of 231 form instead of 213 form#25387
hanblee wants to merge 1 commit intodotnet:masterfrom
hanblee:fmadefault

Conversation

@hanblee
Copy link
Copy Markdown

@hanblee hanblee commented Jun 25, 2019

as it is likely to be most prevalent. E.g., x = Fma.MultiplyAdd(a, b, x), and it removes the need for the redundant register to register move.

Current codegen for the following code:

static unsafe float fmaTest2(float *b, float *c, int d)
{
    Vector256<float> tmp = Vector256<float>.Zero;
    for (int k = 0; k + 8 <= d; k += 8)
    {
        Vector256<float> v1 = Avx.LoadVector256(b + k);
        Vector256<float> v2 = Avx.LoadVector256(c + k);
        tmp = Fma.MultiplyAdd(Avx.Multiply(v2, v2), Avx.Multiply(v1, v1), tmp);
    }

    return tmp.ToScalar();
}

is

G_M8770_IG03:
       4C63C8               movsxd   r9, eax
       C4A17C100C89         vmovups  ymm1, ymmword ptr[rcx+4*r9]
       C4A17C10148A         vmovups  ymm2, ymmword ptr[rdx+4*r9]
       C5EC59D2             vmulps   ymm2, ymm2, ymm2
       C5F459C9             vmulps   ymm1, ymm1, ymm1
       C4E26DA8C8           vfmadd213ps ymm1, ymm2, ymm0
       C5FC28C1             vmovaps  ymm0, ymm1
       83C008               add      eax, 8
       448D4808             lea      r9d, [rax+8]
       453BC8               cmp      r9d, r8d
       7ED4                 jle      SHORT G_M8770_IG03

With this PR:

G_M8770_IG03:
       4C63C8               movsxd   r9, eax
       C4A17C100C89         vmovups  ymm1, ymmword ptr[rcx+4*r9]
       C4A17C10148A         vmovups  ymm2, ymmword ptr[rdx+4*r9]
       C5EC59D2             vmulps   ymm2, ymm2, ymm2
       C5F459C9             vmulps   ymm1, ymm1, ymm1
       C4E275B8C2           vfmadd231ps ymm0, ymm1, ymm2
       83C008               add      eax, 8
       448D4808             lea      r9d, [rax+8]
       453BC8               cmp      r9d, r8d
       7ED8                 jle      SHORT G_M8770_IG03

since it is likely to be most prevalent. E.g.,x = Fma.MultiplyAdd(a, b, x);
@hanblee
Copy link
Copy Markdown
Author

hanblee commented Jun 25, 2019

@tannergooding PTAL

@tannergooding
Copy link
Copy Markdown
Member

@hanblee, do you have any numbers/metrics showing that 231 is more common than 213 across most workloads?

I had done some initial analysis of various C/C++ algorithms and raw assembly using vfmadd (for example, many of the CRT Math functions) and it looked to be that 213 was more common by far (namely because the addend tended to be some well-defined constant). It also requires the fewest transformations to the tree.

I would much rather see this fixed, longer term, by having the register allocator support preferencing better and for it to also support indicating more than one operand can be reg optional.

Also CC. @CarolEidt

@tannergooding
Copy link
Copy Markdown
Member

For this case in particular, if none of the operands can be contained, it would be ideal if we could choose the encoding based on whether the target register matches one of the input registers.

@hanblee
Copy link
Copy Markdown
Author

hanblee commented Jun 25, 2019

@tannergooding

Here is one view based on GitHub search results for the following four columns. 231 form is slightly (~10%) more common than 213 form.

Languages vfmadd213pd vfmadd213ps vfmadd231pd vfmadd231ps
Unix Assembly 7,960 7,650 7,050 6,042
LLVM 7,163 7,955 3,982 4,981
Text 5,646 29,298 5,763 29,413
C 2,950 2,979 4,553 4,376
D 1,787 1,799 1,799 1,799
Makefile 1,771 1,783 1,784 1,783
C++ 1,459 10,274 10,052 15,054
PHP 1,293 1,301 1,300 1,301
HTML 557 542 532 553
Assembly/Python 526 430 543 733
Total 31,112 64,011 37,358 66,035
Total (excluding text/html) 24,909 34,171 31,063 36,069
Total (pd + ps)   95,123   103,393
Total (pd + ps; excluding text/html)   59,080   67,132

op1Reg = op1->gtRegNum;
op2Reg = op2->gtRegNum;

isCommutative = !copiesUpperBits;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we need to keep the isCommutative handling for the cases where the multiplicand and multiplier can be swapped.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that. Thanks.

Comment thread src/jit/lsraxarch.cpp
{
// 213 form: op1 = (op2 * op1) + op3

if (copiesUpperBits)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment, pretty sure we need to preserve this logic since op1 impacts determinism.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity as well, I think it makes sense to continue to keep the reg-only case separate, even if it's using the same form as the case above.

Comment thread src/jit/lsraxarch.cpp
{
// 213 form: op1 = (op2 * op1) + op3

if (copiesUpperBits)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity as well, I think it makes sense to continue to keep the reg-only case separate, even if it's using the same form as the case above.

@CarolEidt
Copy link
Copy Markdown

I'm not sure this is adequately motivated.
In any event, it should not be too hard to at least

  • improve the preferencing somewhat
  • modify codegen to change the form based on the target register if possible

Supporting multiple reg-optional operands is a rather complex change (though it would have broad benefit), so it is not likely to happen soon.

@hanblee
Copy link
Copy Markdown
Author

hanblee commented Jun 26, 2019

Thanks for your comments, and I agree on the need for a more comprehensive solutions. Closing this as a result.

@hanblee hanblee closed this Jun 26, 2019
@CarolEidt
Copy link
Copy Markdown

Thanks @hanblee for raising this issue. I've filed https://github.com/dotnet/coreclr/issues/25434 to make the improvements that have been discussed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants