Skip to content

Utilize AITER HSA embeddings#477

Open
Micky774 wants to merge 13 commits intozain/aiter-staticfrom
zain/aiter-hsa-embed
Open

Utilize AITER HSA embeddings#477
Micky774 wants to merge 13 commits intozain/aiter-staticfrom
zain/aiter-hsa-embed

Conversation

@Micky774
Copy link
Contributor

@Micky774 Micky774 commented Mar 9, 2026

Description

Currently AITER relies on reading AITER_ASM_DIR which is not in general thread-safe when we (and potentially others) modify the variable at runtime. AITER supports embedding the HSA table into a header which will be used directly instead of relying on runtime loading of the kernels via AITER_ASM_DIR. This results in thread-safe operation, without significant cost.

There is an AITER PR open for compatibility on their side, however that would still require us to use newer versions of AITER which is pending this PR.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Uses PyTorch's generate_aiter_embedded_hsa.py to generate header file for the HSA embedding
  • Updates build system to use HSA generating script, and properly pass to compile.py
  • Requires the use flag passthrough in AITER's compile.py which is pending a PR.
  • AITER diff for reference.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@Micky774 Micky774 changed the title Zain/aiter hsa embed Utilize AITER HSA embeddings Mar 9, 2026
Copy link
Collaborator

@wangye805 wangye805 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, after seeing ROCm/aiter@a64fa18...1b616b0#diff-fbc2a753d0cd5161aad3c68beb6eab5a577bae39fe4cb7b1af1c3b6c8ae963beR72, I sort of understand that aiter compile_ops provide hooks to modify the source generation, compilation flags, and so on in their json file.

In this case, how about move the archive (static lib) part into their compile.py? We can provide another option or env to their compile.py to go the static lib path. Therefore TE side is just loading their lib.a.

AITER_EMBEDDED_HSA_HEADER=aiter_embedded_hsa.h
AITER_EMBEDDED_HSA_INCLUDE_DIR=${AITER_EMBEDDED_HSA_HEADER_DIR})
else()
message(STATUS "[AITER-PREBUILT] No supported V3 ASM arch selected; skipping embedded HSA generation.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[AITER-PREBUILT]? [AITER-BUILD]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@Micky774 Micky774 force-pushed the zain/aiter-hsa-embed branch from 1c08539 to 38eb092 Compare March 10, 2026 19:03
@Micky774
Copy link
Contributor Author

Okay, after seeing ROCm/aiter@a64fa18...1b616b0#diff-fbc2a753d0cd5161aad3c68beb6eab5a577bae39fe4cb7b1af1c3b6c8ae963beR72, I sort of understand that aiter compile_ops provide hooks to modify the source generation, compilation flags, and so on in their json file.

In this case, how about move the archive (static lib) part into their compile.py? We can provide another option or env to their compile.py to go the static lib path. Therefore TE side is just loading their lib.a.

To do that, we'd need to essentially implement this PR on the AITER side anyways. We can't use their existing hooks exposed in compile.py to actually build static archives. The "hack" on our end is still the simplest way of accomplishing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't need path splitting there either, it should work with DAITER_EMBEDDED_HSA_HEADER=\"{header_path_var}\"'
Also, I wonder if it can be overridden w/o AITERT scripts change with using HIPCC_COMPILE_FLAGS_APPEND or whatever

--output ${AITER_EMBEDDED_HSA_HEADER}
--output ${AITER_EMBEDDED_HSA_HEADER_PATH}
--subdirs ${AITER_EMBEDDED_HSA_SUBDIRS}
RESULT_VARIABLE AITER_MAKE_HSA_RET
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can use COMMAND_ERROR_IS_FATAL ANY instead so CMake will fail itself on command failure

@wangye805
Copy link
Collaborator

Okay, after seeing ROCm/aiter@a64fa18...1b616b0#diff-fbc2a753d0cd5161aad3c68beb6eab5a577bae39fe4cb7b1af1c3b6c8ae963beR72, I sort of understand that aiter compile_ops provide hooks to modify the source generation, compilation flags, and so on in their json file.
In this case, how about move the archive (static lib) part into their compile.py? We can provide another option or env to their compile.py to go the static lib path. Therefore TE side is just loading their lib.a.

To do that, we'd need to essentially implement this PR on the AITER side anyways. We can't use their existing hooks exposed in compile.py to actually build static archives. The "hack" on our end is still the simplest way of accomplishing it.

I'm not saying just building the static library in aiter optest compile.py. I'm referring to adding the archive part after lib.so's are finished building. Perhaps after https://github.com/ROCm/aiter/blob/9c2fef901dfbc1cc0e212038173d9d46160e3e12/op_tests/cpp/mha/compile.py#L98, then run the archive to get the lib.a

@@ -0,0 +1,151 @@
#!/usr/bin/env python3
# Copyright (c) Meta Platforms, Inc. and affiliates.
# Copyright (c) 2026, Advanced Micro Devices, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "this file was modified for rocm Transformer Engine", like

# This file was modified for portability to AMDGPU

Ilya had more experience on the copyright area. Can you comment on this @ipanfilo ? Thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimally it should say where is it copied from and refer to Pytorch for original file license

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants