Skip to content

[DO NOT LAND] Test run for MLP bias accuracy issue#609

Open
xuzhao9 wants to merge 6 commits into
mainfrom
xz9/chenning-main
Open

[DO NOT LAND] Test run for MLP bias accuracy issue#609
xuzhao9 wants to merge 6 commits into
mainfrom
xz9/chenning-main

Conversation

@xuzhao9
Copy link
Copy Markdown
Contributor

@xuzhao9 xuzhao9 commented Oct 31, 2025

torch.compile has numeric issue with used with torch.amp.autocast and bias

The problem is gone when:

  1. Not using amp, or
  2. Not using bias

#607

Reproduce:

LD_LIBRARY_PATH="$HOME/.conda/envs/py312/lib" python run.py --op mlp  --metrics accuracy --use_bias --num-inputs 1

The issue is not related to amp. It is only related to the post_grad pass of pt2 where it decomposes addmm into mm + triton add. This should be easily reproduced with a simpler test.

@meta-cla meta-cla Bot added the cla signed label Oct 31, 2025
@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload December 19, 2025 15:48 — with GitHub Actions Inactive
@xuzhao9 xuzhao9 temporarily deployed to docker-s3-upload December 19, 2025 18:30 — with GitHub Actions Inactive
@xuzhao9
Copy link
Copy Markdown
Contributor Author

xuzhao9 commented Jan 28, 2026

Closing as this is nvidia kernel issue.

@xuzhao9
Copy link
Copy Markdown
Contributor Author

xuzhao9 commented Feb 4, 2026

A minimal reproduction:

import torch
input1 = torch.randn((960, 512), dtype=torch.bfloat16).cuda()
input2 = torch.randn((512, 2048), dtype=torch.bfloat16).cuda()
input3 = torch.randn((2048,), dtype=torch.bfloat16).cuda()
output1_int = torch.addmm(input3, input1, input2)
output1 = torch.randn(960, 2048, dtype=torch.bfloat16).cuda()
# torch.clamp_min(output1_int, 0, out=output1)
output1 = output1_int
output2 = torch.randn(960, 2048, dtype=torch.bfloat16).cuda()
output2_int = torch.mm(input1, input2)
output2_int2 = torch.add(output2_int, input3)
# torch.clamp_min(output2_int2, 0, out=output2)
output2 = output2_int2
torch.testing.assert_close(output1, output2)
## CLI output
## Traceback (most recent call last):
##   File "/data/users/xzhao9/tmp/test.py", line 19, in <module>
##     torch.testing.assert_close(output1, output2)
##   File "/data/users/xzhao9/uv_venvs/py312/lib/python3.12/site-packages/torch/testing/_comparison.py", line 1600, in assert_close
##     raise error_metas[0].to_error(msg)
## AssertionError: Tensor-likes are not close!
## Mismatched elements: 4924 / 1966080 (0.3%)

After discussing with PyTorch developers, it is because post_grad pass will decompose addmm into mm and add by default, and it needs to set torch.mm(..., out_dtype=torch.float32) to make the result consistent with torch.addmm().

@xuzhao9
Copy link
Copy Markdown
Contributor Author

xuzhao9 commented Feb 5, 2026

Fix: pytorch/pytorch#174403

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants