Skip to content

[CUDA] Remove htanh from unsupported math ops for CUDA 12.8#17639

Merged
tqchen merged 1 commit intoapache:mainfrom
MasterJH5574:tvm-dev/2025-02-11-cuda-htanh
Feb 11, 2025
Merged

[CUDA] Remove htanh from unsupported math ops for CUDA 12.8#17639
tqchen merged 1 commit intoapache:mainfrom
MasterJH5574:tvm-dev/2025-02-11-cuda-htanh

Conversation

@MasterJH5574
Copy link
Copy Markdown
Contributor

@MasterJH5574 MasterJH5574 commented Feb 11, 2025

This PR removes htanh from the list of unsupported CUDA half operators, as it is started to be supported since CUDA 12.8 (https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-math-release-12-8).

Specifically, we added a CUDA version check in the generated CUDA code, so that when the CUDA version is older than 12.8, htanh will still be treated as an unsupported operator and fall back to the packed operation. While for newer CUDA versions, we directly use the function that is defined in cuda_fp16.h.

This PR removes htanh from the list of unsupported CUDA half
operators, as it is started to be supported since CUDA 12.8.

Specifically, we added a CUDA version check in the generated
CUDA code, so that when the CUDA version is older than 12.8,
htanh will still be treated as an unsupported operator and fall
back to the packed operation. While for newer CUDA versions,
we directly use the function that is defined in `cuda_fp16.h`.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2025-02-11-cuda-htanh branch from 3a52974 to 8226c81 Compare February 11, 2025 15:25
@tqchen tqchen merged commit 4ac03b3 into apache:main Feb 11, 2025
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
…7639)

This PR removes htanh from the list of unsupported CUDA half
operators, as it is started to be supported since CUDA 12.8.

Specifically, we added a CUDA version check in the generated
CUDA code, so that when the CUDA version is older than 12.8,
htanh will still be treated as an unsupported operator and fall
back to the packed operation. While for newer CUDA versions,
we directly use the function that is defined in `cuda_fp16.h`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants