[TRITON] fix: MXFP4 mantissa rounding by hann-wang · Pull Request #975 · ROCm/aiter

hann-wang · 2025-09-09T10:03:15Z

fixes #974

carlushuang · 2025-09-22T08:02:01Z

lucas-santos-amd · 2025-10-06T14:02:55Z

Hi @hann-wang, I couldn't replicate what you said about denormal numbers here #974, from my testing, the current AITER kernel does the rounding this way:
|x| < 0.25 to 0
0.25 <= |x| < 0.75 to 0.5
0.75 <= |x| to 1

While your implementation does it like this:
|x| <= 0.25 to 0,
0.25 < |x| < 0.75 to 0.5 and
0.75 <= |x| to 1

They differ only for |x| = 0.25, where your kernel follows the roundTiesToEven rule (the same issue your kernel fixes for normal values). So, can you please elaborate on this problem and maybe provide an input where this happens? Thanks!

Also, I couldn't get your unit test to run, it gives me this error:

>       e2m1_value = torch.where(denormal_mask, denormal_x, e2m1_value)
E       RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 2

hann-wang · 2025-10-11T03:43:05Z

Hi @hann-wang, I couldn't replicate what you said about denormal numbers here #974, from my testing, the current AITER kernel does the rounding this way: |x| < 0.25 to 0 0.25 <= |x| < 0.75 to 0.5 0.75 <= |x| to 1

While your implementation does it like this: |x| <= 0.25 to 0, 0.25 < |x| < 0.75 to 0.5 and 0.75 <= |x| to 1

They differ only for |x| = 0.25, where your kernel follows the roundTiesToEven rule (the same issue your kernel fixes for normal values). So, can you please elaborate on this problem and maybe provide an input where this happens? Thanks!

Also, I couldn't get your unit test to run, it gives me this error:
>       e2m1_value = torch.where(denormal_mask, denormal_x, e2m1_value)
E       RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 2

Here is a minimal reproducible example for the issue I mentioned.

mxfp4_mantissa_rounding.py

hann-wang · 2025-10-11T04:02:34Z

Hi @hann-wang, I couldn't replicate what you said about denormal numbers here #974, from my testing, the current AITER kernel does the rounding this way: |x| < 0.25 to 0 0.25 <= |x| < 0.75 to 0.5 0.75 <= |x| to 1

While your implementation does it like this: |x| <= 0.25 to 0, 0.25 < |x| < 0.75 to 0.5 and 0.75 <= |x| to 1

They differ only for |x| = 0.25, where your kernel follows the roundTiesToEven rule (the same issue your kernel fixes for normal values). So, can you please elaborate on this problem and maybe provide an input where this happens? Thanks!

Also, I couldn't get your unit test to run, it gives me this error:
>       e2m1_value = torch.where(denormal_mask, denormal_x, e2m1_value)
E       RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 2

You are right and I made a mistake describing the issue. Just updated the description #974.

The current AITER does not follow the round even rule at 0.25.

lucas-santos-amd · 2025-11-19T19:10:32Z

Hi @hann-wang, can you please do a merge/rebase from main? Looks like there are some conflicts that need to be solved before merging your PR.

Copilot

Pull Request Overview

This PR fixes the MXFP4 mantissa rounding implementation to address issue #974. The changes update both the PyTorch reference implementation and the Triton kernel to use a more correct rounding approach.

Key Changes:

Implements proper round-to-nearest-even (banker's rounding) for mantissa values
Separates handling of denormal, normal, and saturated values with explicit masking
Adds constants for FP32 and FP4 format specifications (exponent bias, mantissa/exponent bits)

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
op_tests/triton_tests/test_quant_mxfp4.py	Updates PyTorch reference implementation with corrected MXFP4 quantization logic including proper rounding
aiter/ops/triton/_triton_kernels/quant.py	Updates Triton kernel implementation to match the corrected quantization algorithm

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aiter/ops/triton/_triton_kernels/quant.py

hann-wang · 2025-11-21T02:09:04Z

Hi @hann-wang, can you please do a merge/rebase from main? Looks like there are some conflicts that need to be solved before merging your PR.

Hi @lucas-santos-amd , I just merged the changes from main.

lucas-santos-amd · 2025-11-26T17:26:05Z

Hi @hann-wang, can you please do a merge/rebase from main? Looks like there are some conflicts that need to be solved before merging your PR.

Hi @lucas-santos-amd , I just merged the changes from main.

I'll merge the PR now, thanks for your contribution!

* fix: MXFP4 mantissa rounding * fix: mantissa rounding in test_quant_mxfp4 * refactor dynamic_mxfp4_quant * chore: format * fix: mxfp4 quantization tests * chore: format * fix: mxfp4 quantization test with correct bitwidth and sign * chore: restore DEBUG_MODE * chore: align test_quant_mxfp4 with triton kernel --------- Co-authored-by: lucas-santos-amd <Lucas.Santos@amd.com>

hann-wang added 2 commits September 9, 2025 18:02

fix: MXFP4 mantissa rounding

4b5f6e7

fix: mantissa rounding in test_quant_mxfp4

5ced7ca

hann-wang marked this pull request as draft September 10, 2025 08:29

hann-wang added 5 commits September 12, 2025 15:15

refactor dynamic_mxfp4_quant

06faa1b

chore: format

ff1c32a

fix: mxfp4 quantization tests

1d8b675

chore: format

685948a

Merge branch 'main' into han/fix_mxfp4_mantissa

ec33dde

hann-wang added 3 commits October 11, 2025 04:52

fix: mxfp4 quantization test with correct bitwidth and sign

88c7a16

chore: restore DEBUG_MODE

cdcac9d

Merge remote-tracking branch 'origin/main' into han/fix_mxfp4_mantissa

9703030

lucas-santos-amd marked this pull request as ready for review October 30, 2025 20:41

lucas-santos-amd self-requested a review October 30, 2025 20:41

lucas-santos-amd previously approved these changes Oct 30, 2025

View reviewed changes

Merge branch 'main' into han/fix_mxfp4_mantissa

177167d

Merge branch 'main' into han/fix_mxfp4_mantissa

a31b891

Copilot AI review requested due to automatic review settings November 20, 2025 23:57

Copilot AI reviewed Nov 20, 2025

View reviewed changes

aiter/ops/triton/_triton_kernels/quant.py Show resolved Hide resolved

aiter/ops/triton/_triton_kernels/quant.py Show resolved Hide resolved

chore: align test_quant_mxfp4 with triton kernel

54129fe

hann-wang dismissed lucas-santos-amd’s stale review via 54129fe November 21, 2025 01:56

Merge branch 'main' into han/fix_mxfp4_mantissa

99095ee

lucas-santos-amd added 2 commits November 21, 2025 11:34

Merge branch 'main' into han/fix_mxfp4_mantissa

344d359

Merge branch 'main' into han/fix_mxfp4_mantissa

8c95ca2

lucas-santos-amd self-requested a review November 26, 2025 17:23

lucas-santos-amd approved these changes Nov 26, 2025

View reviewed changes

lucas-santos-amd changed the title ~~fix: MXFP4 mantissa rounding~~ [TRITON] fix: MXFP4 mantissa rounding Nov 26, 2025

lucas-santos-amd merged commit f4e4188 into ROCm:main Nov 26, 2025
20 of 23 checks passed

Knarf04 mentioned this pull request Mar 11, 2026

[Issue]: Incorrect MXFP4 mantissa rounding in fp4_utils.py #2247

Open

Knarf04 added a commit to Knarf04/aiter that referenced this pull request Mar 11, 2026

[MXFP4] Patch fp4_utils.py rounding logic following ROCm#975

efcf473

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRITON] fix: MXFP4 mantissa rounding#975

[TRITON] fix: MXFP4 mantissa rounding#975
lucas-santos-amd merged 16 commits intoROCm:mainfrom
hann-wang:han/fix_mxfp4_mantissa

hann-wang commented Sep 9, 2025

Uh oh!

carlushuang commented Sep 22, 2025

Uh oh!

lucas-santos-amd commented Oct 6, 2025 •

edited

Loading

Uh oh!

hann-wang commented Oct 11, 2025

Uh oh!

hann-wang commented Oct 11, 2025

Uh oh!

lucas-santos-amd commented Nov 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

hann-wang commented Nov 21, 2025

Uh oh!

lucas-santos-amd commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hann-wang commented Sep 9, 2025

Uh oh!

carlushuang commented Sep 22, 2025

Uh oh!

lucas-santos-amd commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hann-wang commented Oct 11, 2025

Uh oh!

hann-wang commented Oct 11, 2025

Uh oh!

lucas-santos-amd commented Nov 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

hann-wang commented Nov 21, 2025

Uh oh!

lucas-santos-amd commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lucas-santos-amd commented Oct 6, 2025 •

edited

Loading