Skip to content

[TRITON] fix: MXFP4 mantissa rounding#975

Merged
lucas-santos-amd merged 16 commits intoROCm:mainfrom
hann-wang:han/fix_mxfp4_mantissa
Nov 26, 2025
Merged

[TRITON] fix: MXFP4 mantissa rounding#975
lucas-santos-amd merged 16 commits intoROCm:mainfrom
hann-wang:han/fix_mxfp4_mantissa

Conversation

@hann-wang
Copy link
Contributor

fixes #974

@hann-wang hann-wang marked this pull request as draft September 10, 2025 08:29
@carlushuang
Copy link
Collaborator

cc @vgokhale

@lucas-santos-amd
Copy link
Contributor

lucas-santos-amd commented Oct 6, 2025

Hi @hann-wang, I couldn't replicate what you said about denormal numbers here #974, from my testing, the current AITER kernel does the rounding this way:
|x| < 0.25 to 0
0.25 <= |x| < 0.75 to 0.5
0.75 <= |x| to 1

While your implementation does it like this:
|x| <= 0.25 to 0,
0.25 < |x| < 0.75 to 0.5 and
0.75 <= |x| to 1

They differ only for |x| = 0.25, where your kernel follows the roundTiesToEven rule (the same issue your kernel fixes for normal values). So, can you please elaborate on this problem and maybe provide an input where this happens? Thanks!

Also, I couldn't get your unit test to run, it gives me this error:

>       e2m1_value = torch.where(denormal_mask, denormal_x, e2m1_value)
E       RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 2

@hann-wang
Copy link
Contributor Author

Hi @hann-wang, I couldn't replicate what you said about denormal numbers here #974, from my testing, the current AITER kernel does the rounding this way: |x| < 0.25 to 0 0.25 <= |x| < 0.75 to 0.5 0.75 <= |x| to 1

While your implementation does it like this: |x| <= 0.25 to 0, 0.25 < |x| < 0.75 to 0.5 and 0.75 <= |x| to 1

They differ only for |x| = 0.25, where your kernel follows the roundTiesToEven rule (the same issue your kernel fixes for normal values). So, can you please elaborate on this problem and maybe provide an input where this happens? Thanks!

Also, I couldn't get your unit test to run, it gives me this error:

>       e2m1_value = torch.where(denormal_mask, denormal_x, e2m1_value)
E       RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 2

Here is a minimal reproducible example for the issue I mentioned.

mxfp4_mantissa_rounding.py

@hann-wang
Copy link
Contributor Author

Hi @hann-wang, I couldn't replicate what you said about denormal numbers here #974, from my testing, the current AITER kernel does the rounding this way: |x| < 0.25 to 0 0.25 <= |x| < 0.75 to 0.5 0.75 <= |x| to 1

While your implementation does it like this: |x| <= 0.25 to 0, 0.25 < |x| < 0.75 to 0.5 and 0.75 <= |x| to 1

They differ only for |x| = 0.25, where your kernel follows the roundTiesToEven rule (the same issue your kernel fixes for normal values). So, can you please elaborate on this problem and maybe provide an input where this happens? Thanks!

Also, I couldn't get your unit test to run, it gives me this error:

>       e2m1_value = torch.where(denormal_mask, denormal_x, e2m1_value)
E       RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 2

You are right and I made a mistake describing the issue. Just updated the description #974.

The current AITER does not follow the round even rule at 0.25.

@lucas-santos-amd lucas-santos-amd marked this pull request as ready for review October 30, 2025 20:41
@lucas-santos-amd lucas-santos-amd self-requested a review October 30, 2025 20:41
@lucas-santos-amd
Copy link
Contributor

Hi @hann-wang, can you please do a merge/rebase from main? Looks like there are some conflicts that need to be solved before merging your PR.

Copilot AI review requested due to automatic review settings November 20, 2025 23:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the MXFP4 mantissa rounding implementation to address issue #974. The changes update both the PyTorch reference implementation and the Triton kernel to use a more correct rounding approach.

Key Changes:

  • Implements proper round-to-nearest-even (banker's rounding) for mantissa values
  • Separates handling of denormal, normal, and saturated values with explicit masking
  • Adds constants for FP32 and FP4 format specifications (exponent bias, mantissa/exponent bits)

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
op_tests/triton_tests/test_quant_mxfp4.py Updates PyTorch reference implementation with corrected MXFP4 quantization logic including proper rounding
aiter/ops/triton/_triton_kernels/quant.py Updates Triton kernel implementation to match the corrected quantization algorithm

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hann-wang
Copy link
Contributor Author

Hi @hann-wang, can you please do a merge/rebase from main? Looks like there are some conflicts that need to be solved before merging your PR.

Hi @lucas-santos-amd , I just merged the changes from main.

@lucas-santos-amd lucas-santos-amd self-requested a review November 26, 2025 17:23
@lucas-santos-amd
Copy link
Contributor

Hi @hann-wang, can you please do a merge/rebase from main? Looks like there are some conflicts that need to be solved before merging your PR.

Hi @lucas-santos-amd , I just merged the changes from main.

I'll merge the PR now, thanks for your contribution!

@lucas-santos-amd lucas-santos-amd changed the title fix: MXFP4 mantissa rounding [TRITON] fix: MXFP4 mantissa rounding Nov 26, 2025
@lucas-santos-amd lucas-santos-amd merged commit f4e4188 into ROCm:main Nov 26, 2025
20 of 23 checks passed
farlukas pushed a commit that referenced this pull request Dec 4, 2025
* fix: MXFP4 mantissa rounding

* fix: mantissa rounding in test_quant_mxfp4

* refactor dynamic_mxfp4_quant

* chore: format

* fix: mxfp4 quantization tests

* chore: format

* fix: mxfp4 quantization test with correct bitwidth and sign

* chore: restore DEBUG_MODE

* chore: align test_quant_mxfp4 with triton kernel

---------

Co-authored-by: lucas-santos-amd <Lucas.Santos@amd.com>
nsusanto pushed a commit that referenced this pull request Dec 4, 2025
* fix: MXFP4 mantissa rounding

* fix: mantissa rounding in test_quant_mxfp4

* refactor dynamic_mxfp4_quant

* chore: format

* fix: mxfp4 quantization tests

* chore: format

* fix: mxfp4 quantization test with correct bitwidth and sign

* chore: restore DEBUG_MODE

* chore: align test_quant_mxfp4 with triton kernel

---------

Co-authored-by: lucas-santos-amd <Lucas.Santos@amd.com>
zhuyuhua-v pushed a commit that referenced this pull request Dec 17, 2025
* fix: MXFP4 mantissa rounding

* fix: mantissa rounding in test_quant_mxfp4

* refactor dynamic_mxfp4_quant

* chore: format

* fix: mxfp4 quantization tests

* chore: format

* fix: mxfp4 quantization test with correct bitwidth and sign

* chore: restore DEBUG_MODE

* chore: align test_quant_mxfp4 with triton kernel

---------

Co-authored-by: lucas-santos-amd <Lucas.Santos@amd.com>
Knarf04 added a commit to Knarf04/aiter that referenced this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Issue]: Incorrect MXFP4 mantissa rounding

4 participants