Skip to content

Conversation

@abhilash1910
Copy link
Contributor

Description

Issue Link - #981

Changes to be addressed in this WIP PR:

  • LTO IR testing
  • Is there a way to add multiple modules?
    {If / when it is possible to add multiple modules, a test with code that uses something from libdevice is probably a good idea.
    It's also useful to be able to lazily add a module}
  • apply bitcode pattern input for libnvvm

cc @leofang

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 5, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@abhilash1910 abhilash1910 marked this pull request as draft November 5, 2025 02:17
@leofang leofang added this to the cuda.core beta 9 milestone Nov 10, 2025
@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Nov 10, 2025
@leofang
Copy link
Member

leofang commented Nov 17, 2025

Thanks, @abhilash1910! Any ETA to wrap this up?

@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

@leofang leofang linked an issue Nov 25, 2025 that may be closed by this pull request
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Abhilash! Leaving a few early feedbacks.

Comment on lines 538 to 546
bitcode_path = os.environ.get("BITCODE_NVVM_PATH")
if not bitcode_path:
pytest.skip("BITCODE_NVVM_PATH environment variable is not set.Disabling the test.")
bitcode_file = Path(bitcode_path)
if not bitcode_file.exists():
pytest.skip(f"Bitcode file not found: {bitcode_path}")

if bitcode_file.suffix != ".bc":
pytest.skip(f"Expected .bc file, got: {bitcode_file.suffix}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible for us to avoid having a file locally? We have bitcode in this repo already:

MINIMAL_NVVMIR_TXT_TEMPLATE = b"""\
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
define void @kernel() {
entry:
ret void
}
!nvvm.annotations = !{!0}
!0 = !{void ()* @kernel, !"kernel", i32 1}
!nvvmir.version = !{!1}
!1 = !{i32 %d, i32 0, i32 %d, i32 0}
""" # noqa: E501
MINIMAL_NVVMIR_BITCODE_STATIC = {
(1, 3): # (major, debug_major)
"4243c0de3514000005000000620c30244a59be669dfbb4bf0b51804c01000000210c00007f010000"
"0b02210002000000160000000781239141c80449061032399201840c250508191e048b62800c4502"
"42920b42641032143808184b0a3232884870c421234412878c1041920264c808b1142043468820c9"
"01323284182a282a90317cb05c9120c3c8000000892000000b0000003222c80820624600212b2498"
"0c212524980c19270c85a4906032645c20246382a01801300128030173046000132677b00778a007"
"7cb0033a680377b0877420877408873618877a208770d8e012e5d006f0a0077640077a600774a007"
"7640076d900e71a00778a00778d006e980077a80077a80076d900e7160077a100776a0077160076d"
"900e7320077a300772a0077320076d900e7640077a600774a0077640076d900e71200778a0077120"
"0778a00771200778d006e6300772a0077320077a300772d006e6600774a0077640077a600774d006"
"f6100776a0077160077a100776d006f6300772a0077320077a300772d006f6600774a0077640077a"
"600774d006f610077280077a10077280077a10077280076de00e7160077a300772a0077640071a21"
"4c0e11de9c2e4fbbcfbe211560040000000000000000000000000620b141a0e86000004016080000"
"06000000321e980c19114c908c092647c6044362098c009401000000b1180000ac0000003308801c"
"c4e11c6614013d88433884c38c4280077978077398710ce6000fed100ef4800e330c421ec2c11dce"
"a11c6630053d88433884831bcc033dc8433d8c033dcc788c7470077b08077948877070077a700376"
"788770208719cc110eec900ee1300f6e300fe3f00ef0500e3310c41dde211cd8211dc2611e663089"
"3bbc833bd04339b4033cbc833c84033bccf0147660077b6807376887726807378087709087706007"
"76280776f8057678877780875f08877118877298877998812ceef00eeee00ef5c00eec300362c8a1"
"1ce4a11ccca11ce4a11cdc611cca211cc4811dca6106d6904339c84339984339c84339b8c3389443"
"3888033b94c32fbc833cfc823bd4033bb0c30cc7698770588772708374680778608774188774a087"
"19ce530fee000ff2500ee4900ee3400fe1200eec500e3320281ddcc11ec2411ed2211cdc811edce0"
"1ce4e11dea011e66185138b0433a9c833bcc50247660077b68073760877778077898514cf4900ff0"
"500e331e6a1eca611ce8211ddec11d7e011ee4a11ccc211df0610654858338ccc33bb0433dd04339"
"fcc23ce4433b88c33bb0c38cc50a877998877718877408077a28077298815ce3100eecc00ee5500e"
"f33023c1d2411ee4e117d8e11dde011e6648193bb0833db4831b84c3388c4339ccc33cb8c139c8c3"
"3bd4033ccc48b471080776600771088771588719dbc60eec600fede006f0200fe5300fe5200ff650"
"0e6e100ee3300ee5300ff3e006e9e00ee4500ef83023e2ec611cc2811dd8e117ec211de6211dc421"
"1dd8211de8211f66209d3bbc433db80339948339cc58bc7070077778077a08077a488777708719cb"
"e70eef300fe1e00ee9400fe9a00fe530c3010373a8077718875f988770708774a08774d087729881"
"844139e0c338b0433d904339cc40c4a01dcaa11de0411edec11c662463300ee1c00eec300fe9400f"
"e5000000792000001d000000721e482043880c19097232482023818c9191d144a01028643c313242"
"8e9021a318100a00060000006b65726e656c0000230802308240042308843082400c330c4230cc40"
"0c4441c84860821272b3b36b730973737ba30ba34b7b739b1b2528d271b3b36b4b9373b12b939b4b"
"7b731b2530000000a9180000250000000b0a7228877780077a587098433db8c338b04339d0c382e6"
"1cc6a10de8411ec2c11de6211de8211ddec11d1634e3600ee7500fe1200fe4400fe1200fe7500ef4"
"b08081077928877060077678877108077a28077258709cc338b4013ba4833d94c3026b1cd8211cdc"
"e11cdc201ce4611cdc201ce8811ec2611cd0a11cc8611cc2811dd861c1010ff4200fe1500ff4800e"
"00000000d11000000600000007cc3ca4833b9c033b94033da0833c94433890c30100000061200000"
"06000000130481860301000002000000075010cd14610000000000007120000003000000320e1022"
"8400fb020000000000000000650c00001f000000120394f000000000030000000600000006000000"
"4c000000010000005800000000000000580000000100000070000000000000000c00000013000000"
"1f000000080000000600000000000000700000000000000000000000010000000000000000000000"
"060000000000000006000000ffffffff00240000000000005d0c00000d0000001203946700000000"
"6b65726e656c31352e302e376e7670747836342d6e76696469612d637564613c737472696e673e00"
"00000000",
(2, 3): # (major, debug_major)
"4243c0de3514000005000000620c30244a59be669dfbb4bf0b51804c01000000210c000080010000"
"0b02210002000000160000000781239141c80449061032399201840c250508191e048b62800c4502"
"42920b42641032143808184b0a3232884870c421234412878c1041920264c808b1142043468820c9"
"01323284182a282a90317cb05c9120c3c8000000892000000b0000003222c80820624600212b2498"
"0c212524980c19270c85a4906032645c20246382a01801300128030173046000132677b00778a007"
"7cb0033a680377b0877420877408873618877a208770d8e012e5d006f0a0077640077a600774a007"
"7640076d900e71a00778a00778d006e980077a80077a80076d900e7160077a100776a0077160076d"
"900e7320077a300772a0077320076d900e7640077a600774a0077640076d900e71200778a0077120"
"0778a00771200778d006e6300772a0077320077a300772d006e6600774a0077640077a600774d006"
"f6100776a0077160077a100776d006f6300772a0077320077a300772d006f6600774a0077640077a"
"600774d006f610077280077a10077280077a10077280076de00e7160077a300772a0077640071a21"
"4c0e11de9c2e4fbbcfbe211560040000000000000000000000000620b141a0286100004016080000"
"06000000321e980c19114c908c092647c60443620914c10840190000b1180000ac0000003308801c"
"c4e11c6614013d88433884c38c4280077978077398710ce6000fed100ef4800e330c421ec2c11dce"
"a11c6630053d88433884831bcc033dc8433d8c033dcc788c7470077b08077948877070077a700376"
"788770208719cc110eec900ee1300f6e300fe3f00ef0500e3310c41dde211cd8211dc2611e663089"
"3bbc833bd04339b4033cbc833c84033bccf0147660077b6807376887726807378087709087706007"
"76280776f8057678877780875f08877118877298877998812ceef00eeee00ef5c00eec300362c8a1"
"1ce4a11ccca11ce4a11cdc611cca211cc4811dca6106d6904339c84339984339c84339b8c3389443"
"3888033b94c32fbc833cfc823bd4033bb0c30cc7698770588772708374680778608774188774a087"
"19ce530fee000ff2500ee4900ee3400fe1200eec500e3320281ddcc11ec2411ed2211cdc811edce0"
"1ce4e11dea011e66185138b0433a9c833bcc50247660077b68073760877778077898514cf4900ff0"
"500e331e6a1eca611ce8211ddec11d7e011ee4a11ccc211df0610654858338ccc33bb0433dd04339"
"fcc23ce4433b88c33bb0c38cc50a877998877718877408077a28077298815ce3100eecc00ee5500e"
"f33023c1d2411ee4e117d8e11dde011e6648193bb0833db4831b84c3388c4339ccc33cb8c139c8c3"
"3bd4033ccc48b471080776600771088771588719dbc60eec600fede006f0200fe5300fe5200ff650"
"0e6e100ee3300ee5300ff3e006e9e00ee4500ef83023e2ec611cc2811dd8e117ec211de6211dc421"
"1dd8211de8211f66209d3bbc433db80339948339cc58bc7070077778077a08077a488777708719cb"
"e70eef300fe1e00ee9400fe9a00fe530c3010373a8077718875f988770708774a08774d087729881"
"844139e0c338b0433d904339cc40c4a01dcaa11de0411edec11c662463300ee1c00eec300fe9400f"
"e5000000792000001e000000721e482043880c19097232482023818c9191d144a01028643c313242"
"8e9021a318100a00060000006b65726e656c0000230802308240042308843082400c23080431c320"
"04c30c045118858c04262821373bbb36973037b737ba30bab437b7b95102231d373bbbb6343917bb"
"32b9b9b437b7518203000000a9180000250000000b0a7228877780077a587098433db8c338b04339"
"d0c382e61cc6a10de8411ec2c11de6211de8211ddec11d1634e3600ee7500fe1200fe4400fe1200f"
"e7500ef4b08081077928877060077678877108077a28077258709cc338b4013ba4833d94c3026b1c"
"d8211cdce11cdc201ce4611cdc201ce8811ec2611cd0a11cc8611cc2811dd861c1010ff4200fe150"
"0ff4800e00000000d11000000600000007cc3ca4833b9c033b94033da0833c94433890c301000000"
"6120000006000000130481860301000002000000075010cd14610000000000007120000003000000"
"320e10228400fc020000000000000000650c00001f000000120394f0000000000300000006000000"
"060000004c000000010000005800000000000000580000000100000070000000000000000c000000"
"130000001f0000000800000006000000000000007000000000000000000000000100000000000000"
"00000000060000000000000006000000ffffffff00240000000000005d0c00000d00000012039467"
"000000006b65726e656c31352e302e376e7670747836342d6e76696469612d637564613c73747269"
"6e673e0000000000",
}
@pytest.fixture(params=("txt", "bitcode_static"))
def minimal_nvvmir(request):
major, minor, debug_major, debug_minor = nvvm.ir_version()
if request.param == "txt":
return MINIMAL_NVVMIR_TXT_TEMPLATE % (major, debug_major)
bitcode_static_binascii = MINIMAL_NVVMIR_BITCODE_STATIC.get((major, debug_major))
if bitcode_static_binascii:
return binascii.unhexlify(bitcode_static_binascii)
raise RuntimeError(
"Static bitcode for NVVM IR version "
f"{major}.{debug_major} is not available in this test.\n"
"Maintainers: Please run the helper script to generate it and add the "
"output to the MINIMAL_NVVMIR_BITCODE_STATIC dict:\n"
" ../../toolshed/build_static_bitcode_input.py"
)

so I suggest that we move it to the common place, say a new file under cuda_python_test_helpers:
https://github.com/NVIDIA/cuda-python/tree/main/cuda_python_test_helpers/cuda_python_test_helpers
and have it imported in both cuda.bindings/core tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be partially addressed now as I have yet to remove the existing invocations from cuda_bindings test

Comment on lines 532 to 533
extra_name = f"{options.name}_extra_{i}"
nvvm.add_module_to_program(self._mnff.handle, extra_source, len(extra_source), extra_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two notes:

  1. I am torn if we should allow users to specify the module names, instead of us making one up for them. Could you try my "sequence of 2-tuples" suggestion from our offline chat, and see how bad the code looks like? We should have something similar already in one of the options.
  2. As discussed let's check if all modules can be lazily loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is done
  2. if we use lazy mode always then we have to remove textual IR as input as lazy mode always supports bitcode format.

@leofang what do you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the lazy loading does not support text IR, only bitcode IR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think so , looking at the error

@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

@leofang
Copy link
Member

leofang commented Dec 16, 2025

FYI, there are some merge conflicts due to #1351 that need to be resolved 😅

@leofang leofang requested a review from rwgk January 9, 2026 05:18
@leofang
Copy link
Member

leofang commented Jan 14, 2026

@rwgk @rparolin Could you share feedbacks with @abhilash1910 with regard to the changes on the pathfinder? Let's make sure this PR can be merged for cuda-core v0.6.0.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal to add find_libdevice under _dynamic_libs. What do people think about introducing _static_libs? This would be for .a and .bc files.

Just noting: the found_via feature is missing, but we can do that in a follow-on PR.

We'll also need pathfinder tests, but I can help with that in a follow-on PR.

)


class LibdeviceNotFoundError(Exception):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with DynamicLibNotFoundError (in load_dl_common.py):

class LibdeviceNotFoundError(RuntimeError):

pass


def _get_cuda_home_or_path() -> str | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use cuda.pathfinder._utils.get_cuda_home_or_path?



class _FindLibdevice:
FILENAME = "libdevice.10.bc"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this up to the top of the file, to make it more obvious that we have this hard-wired?

Comment on lines +85 to +88
if IS_WINDOWS:
bases = [r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA", r"C:\CUDA"]
else:
bases = ["/usr/local/cuda", "/opt/cuda"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Could you please also move this up to the top?

More fundamental:

Thus far we've tried hard to avoid hard-coding these paths. It's of course very easy for us, but can also easily backfire if there are multiple CUDA installations. We should probably make sure @kkraus14 is on board with this. (If he is I'm fine with it, too.)

if os.path.isfile(file_path):
return file_path
# Versioned paths (e.g., /usr/local/cuda-13.0)
for versioned in sorted(glob.glob(base + "*"), reverse=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not doing this for dynamic libs or headers. We could wind up with a weird mix of versions between dynamic libs, headers, and libdevice if we don't have a general mechanism for deciding "What is the system installation?" I fear if that happens, the consequences will be subtle but potentially extremely confusing and time consuming to troubleshoot. @kkraus14 again for guidance.

def find_libdevice() -> str:
"""Find the path to libdevice.10.bc.
Raises:
LibdeviceNotFoundError: If libdevice.10.bc cannot be found
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        LibdeviceNotFoundError: If the libdevice*.bc file cannot be found

(To avoid hard-wiring the name multiple times.)

return abs_path


def get_libdevice_path() -> str | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely make this DRY and find naming that's intuitive in the call sites; i.e. people shouldn't need to come back here to figure out the difference between find/get.

My preferred solution would be

def find_libdevice(optional: bool = False) -> str | None:

Simple, just a tiny little bit weaker for type checkers.

is_culink_backend = _linker._decide_nvjitlink_or_driver()

try:
from cuda_python_test_helpers.nvvm_bitcode import minimal_nvvmir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this import unconditional. At least that's how I would start, then add except ModuleNotFoundError (instead of ImportError, which can mask all sort of bugs) if we really need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NVVM support - follow-up

3 participants