Skip to content

[Draft] [Preview] Support gfx1201#1681

Draft
tjtanaa wants to merge 19 commits intoROCm:mainfrom
EmbeddedLLM:support_gfx1201
Draft

[Draft] [Preview] Support gfx1201#1681
tjtanaa wants to merge 19 commits intoROCm:mainfrom
EmbeddedLLM:support_gfx1201

Conversation

@tjtanaa
Copy link
Contributor

@tjtanaa tjtanaa commented Dec 18, 2025

Motivation

There has been huge interest in the vLLM community in using AITER triton kernels for Radeon. The testing has shown significant performance benefits on Radeon GPU as well. vllm-project/vllm#28649 (comment)

We are working with @hongxiayang on enabling and upstreaming all the tuning configs and tuning scripts to AITER Radeon as a proper support.

The work will then be broken down into multiple PRs to upstream to AITER.

Technical Details

Phase 1 (Done)

Tasks

Understand which triton op is failing for gfx1201
Run unit tests using the community patch vllm-project/vllm#28649

Results is based on this commit f4e4188

All of the important triton kernels can run on RDNA 4:

  1. test_gemm_a8w8.log all passed.
  2. test_gemm_a8w8_per_token_scale.log all passed.
  3. test_gemm_a8w8_block_scale.log all passed.
  4. test_batched_gemm_a8w8.log All passed, just some OOM
  5. test_batched_gemm_bf16.log All passed, just OOM
  6. test_moe.log All passed with just 4/850 fails. Great
  7. test_unified_attention.log All passing with 41/823 failures (hardware config failure, has been solved in EmbeddedLLM@1574097 of our branch)
  8. test_rmsnorm.log All pass, just very small mismatch and OOM .
  9. test_mha.log all workings for forwards one, just with some OOM

Phase 2 (Enable GPU Arch on gfx1201)

Tasks:

  1. Add the gfx1201 to GPU arch.
  2. Run all unit tests and make sure the kernels that are important for actual deployments are passing. Fix any issues related to the failure.
  3. Add Tuning scripts for Radeon GPU with proper search space.
  4. Evaluate the performance gain on vLLM.

The checklist of this phase is the list of ops to enable

Current progress:

  • gemm_a16w16 ✅
  • gemm_a8w8_block_scale ✅
  • gmm
  • gemm_a8w8✅
  • gemm_a8w8_per_token_scale ✅
  • batched_gemm_a8w8
  • batched_gemm_bf16
  • unified_attention✅
  • moe
  • rmsnorm
  • mha (forward)
  • gemm_a16w16_gated

Test Plan

Ensure we can run the unit tests.
Kernels are also tuned.

Test Result

Phase 1 (DONE): Test Results of unit tests using the community patch

  1. Run all unit tests and make sure the kernels that are important for actual deployments are passing. Fix any issues related to the failure.

All of the important triton kernels can run on RDNA 4:

  1. test_gemm_a8w8.log all passed.
  2. test_gemm_a8w8_per_token_scale.log all passed.
  3. test_gemm_a8w8_block_scale.log all passed.
  4. test_batched_gemm_a8w8.log All passed, just some OOM
  5. test_batched_gemm_bf16.log All passed, just OOM
  6. test_moe.log All passed with just 4/850 fails. Great
  7. test_unified_attention.log All passing with 41/823 failures (hardware config failure, has been solved in EmbeddedLLM@1574097 of our branch)
  8. test_rmsnorm.log All pass, just very small mismatch and OOM .
  9. test_mha.log all workings for forwards one, just with some OOM

Submission Checklist

@tjtanaa
Copy link
Contributor Author

tjtanaa commented Dec 18, 2025

CC @hongxiayang @mgehre-amd

@tjtanaa
Copy link
Contributor Author

tjtanaa commented Dec 18, 2025

@valarLip could we get some preliminary thoughts about this? Will there be any concerns in us upstreaming the configs and tuning scripts, and fixes like this unified attention fix is acceptable or not (EmbeddedLLM@1574097)?
We will try our best to make sure the changes are as little as possible as the main repo AITER is designed for Instinct GPUs.

Signed-off-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: Jeff Aw <jeffaw99@hotmail.com>

Signed-off-by: Amir Balwel <amoooori04@gmail.com>
@androiddrew
Copy link

@tjtanaa if I built the vllm/docker/Dockerfile.rocm_base with your branch what a VLLM_ ennvars would I need to set to test these kernels?

@tjtanaa
Copy link
Contributor Author

tjtanaa commented Jan 5, 2026

@androiddrew Currently there are two things that has to happen on vLLM to use enable the use of these triton kernels

  1. all aiter functions on vLLM upstream has safeguard conditions where non-gfx9 arch will not be able to trigger AITER kernels.
  2. vLLM only integrated HIP, ASM and CK kernels for most of the ops. The triton AITER ops has to be integrated separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants