-
Notifications
You must be signed in to change notification settings - Fork 240
[NVVM] Support - Followup enhancements #1218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks, @abhilash1910! Any ETA to wrap this up? |
|
pre-commit.ci autofix |
|
pre-commit.ci autofix |
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Abhilash! Leaving a few early feedbacks.
cuda_core/tests/test_program.py
Outdated
| bitcode_path = os.environ.get("BITCODE_NVVM_PATH") | ||
| if not bitcode_path: | ||
| pytest.skip("BITCODE_NVVM_PATH environment variable is not set.Disabling the test.") | ||
| bitcode_file = Path(bitcode_path) | ||
| if not bitcode_file.exists(): | ||
| pytest.skip(f"Bitcode file not found: {bitcode_path}") | ||
|
|
||
| if bitcode_file.suffix != ".bc": | ||
| pytest.skip(f"Expected .bc file, got: {bitcode_file.suffix}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible for us to avoid having a file locally? We have bitcode in this repo already:
cuda-python/cuda_bindings/tests/test_nvvm.py
Lines 12 to 141 in b9c76b3
| MINIMAL_NVVMIR_TXT_TEMPLATE = b"""\ | |
| target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64" | |
| target triple = "nvptx64-nvidia-cuda" | |
| define void @kernel() { | |
| entry: | |
| ret void | |
| } | |
| !nvvm.annotations = !{!0} | |
| !0 = !{void ()* @kernel, !"kernel", i32 1} | |
| !nvvmir.version = !{!1} | |
| !1 = !{i32 %d, i32 0, i32 %d, i32 0} | |
| """ # noqa: E501 | |
| MINIMAL_NVVMIR_BITCODE_STATIC = { | |
| (1, 3): # (major, debug_major) | |
| "4243c0de3514000005000000620c30244a59be669dfbb4bf0b51804c01000000210c00007f010000" | |
| "0b02210002000000160000000781239141c80449061032399201840c250508191e048b62800c4502" | |
| "42920b42641032143808184b0a3232884870c421234412878c1041920264c808b1142043468820c9" | |
| "01323284182a282a90317cb05c9120c3c8000000892000000b0000003222c80820624600212b2498" | |
| "0c212524980c19270c85a4906032645c20246382a01801300128030173046000132677b00778a007" | |
| "7cb0033a680377b0877420877408873618877a208770d8e012e5d006f0a0077640077a600774a007" | |
| "7640076d900e71a00778a00778d006e980077a80077a80076d900e7160077a100776a0077160076d" | |
| "900e7320077a300772a0077320076d900e7640077a600774a0077640076d900e71200778a0077120" | |
| "0778a00771200778d006e6300772a0077320077a300772d006e6600774a0077640077a600774d006" | |
| "f6100776a0077160077a100776d006f6300772a0077320077a300772d006f6600774a0077640077a" | |
| "600774d006f610077280077a10077280077a10077280076de00e7160077a300772a0077640071a21" | |
| "4c0e11de9c2e4fbbcfbe211560040000000000000000000000000620b141a0e86000004016080000" | |
| "06000000321e980c19114c908c092647c6044362098c009401000000b1180000ac0000003308801c" | |
| "c4e11c6614013d88433884c38c4280077978077398710ce6000fed100ef4800e330c421ec2c11dce" | |
| "a11c6630053d88433884831bcc033dc8433d8c033dcc788c7470077b08077948877070077a700376" | |
| "788770208719cc110eec900ee1300f6e300fe3f00ef0500e3310c41dde211cd8211dc2611e663089" | |
| "3bbc833bd04339b4033cbc833c84033bccf0147660077b6807376887726807378087709087706007" | |
| "76280776f8057678877780875f08877118877298877998812ceef00eeee00ef5c00eec300362c8a1" | |
| "1ce4a11ccca11ce4a11cdc611cca211cc4811dca6106d6904339c84339984339c84339b8c3389443" | |
| "3888033b94c32fbc833cfc823bd4033bb0c30cc7698770588772708374680778608774188774a087" | |
| "19ce530fee000ff2500ee4900ee3400fe1200eec500e3320281ddcc11ec2411ed2211cdc811edce0" | |
| "1ce4e11dea011e66185138b0433a9c833bcc50247660077b68073760877778077898514cf4900ff0" | |
| "500e331e6a1eca611ce8211ddec11d7e011ee4a11ccc211df0610654858338ccc33bb0433dd04339" | |
| "fcc23ce4433b88c33bb0c38cc50a877998877718877408077a28077298815ce3100eecc00ee5500e" | |
| "f33023c1d2411ee4e117d8e11dde011e6648193bb0833db4831b84c3388c4339ccc33cb8c139c8c3" | |
| "3bd4033ccc48b471080776600771088771588719dbc60eec600fede006f0200fe5300fe5200ff650" | |
| "0e6e100ee3300ee5300ff3e006e9e00ee4500ef83023e2ec611cc2811dd8e117ec211de6211dc421" | |
| "1dd8211de8211f66209d3bbc433db80339948339cc58bc7070077778077a08077a488777708719cb" | |
| "e70eef300fe1e00ee9400fe9a00fe530c3010373a8077718875f988770708774a08774d087729881" | |
| "844139e0c338b0433d904339cc40c4a01dcaa11de0411edec11c662463300ee1c00eec300fe9400f" | |
| "e5000000792000001d000000721e482043880c19097232482023818c9191d144a01028643c313242" | |
| "8e9021a318100a00060000006b65726e656c0000230802308240042308843082400c330c4230cc40" | |
| "0c4441c84860821272b3b36b730973737ba30ba34b7b739b1b2528d271b3b36b4b9373b12b939b4b" | |
| "7b731b2530000000a9180000250000000b0a7228877780077a587098433db8c338b04339d0c382e6" | |
| "1cc6a10de8411ec2c11de6211de8211ddec11d1634e3600ee7500fe1200fe4400fe1200fe7500ef4" | |
| "b08081077928877060077678877108077a28077258709cc338b4013ba4833d94c3026b1cd8211cdc" | |
| "e11cdc201ce4611cdc201ce8811ec2611cd0a11cc8611cc2811dd861c1010ff4200fe1500ff4800e" | |
| "00000000d11000000600000007cc3ca4833b9c033b94033da0833c94433890c30100000061200000" | |
| "06000000130481860301000002000000075010cd14610000000000007120000003000000320e1022" | |
| "8400fb020000000000000000650c00001f000000120394f000000000030000000600000006000000" | |
| "4c000000010000005800000000000000580000000100000070000000000000000c00000013000000" | |
| "1f000000080000000600000000000000700000000000000000000000010000000000000000000000" | |
| "060000000000000006000000ffffffff00240000000000005d0c00000d0000001203946700000000" | |
| "6b65726e656c31352e302e376e7670747836342d6e76696469612d637564613c737472696e673e00" | |
| "00000000", | |
| (2, 3): # (major, debug_major) | |
| "4243c0de3514000005000000620c30244a59be669dfbb4bf0b51804c01000000210c000080010000" | |
| "0b02210002000000160000000781239141c80449061032399201840c250508191e048b62800c4502" | |
| "42920b42641032143808184b0a3232884870c421234412878c1041920264c808b1142043468820c9" | |
| "01323284182a282a90317cb05c9120c3c8000000892000000b0000003222c80820624600212b2498" | |
| "0c212524980c19270c85a4906032645c20246382a01801300128030173046000132677b00778a007" | |
| "7cb0033a680377b0877420877408873618877a208770d8e012e5d006f0a0077640077a600774a007" | |
| "7640076d900e71a00778a00778d006e980077a80077a80076d900e7160077a100776a0077160076d" | |
| "900e7320077a300772a0077320076d900e7640077a600774a0077640076d900e71200778a0077120" | |
| "0778a00771200778d006e6300772a0077320077a300772d006e6600774a0077640077a600774d006" | |
| "f6100776a0077160077a100776d006f6300772a0077320077a300772d006f6600774a0077640077a" | |
| "600774d006f610077280077a10077280077a10077280076de00e7160077a300772a0077640071a21" | |
| "4c0e11de9c2e4fbbcfbe211560040000000000000000000000000620b141a0286100004016080000" | |
| "06000000321e980c19114c908c092647c60443620914c10840190000b1180000ac0000003308801c" | |
| "c4e11c6614013d88433884c38c4280077978077398710ce6000fed100ef4800e330c421ec2c11dce" | |
| "a11c6630053d88433884831bcc033dc8433d8c033dcc788c7470077b08077948877070077a700376" | |
| "788770208719cc110eec900ee1300f6e300fe3f00ef0500e3310c41dde211cd8211dc2611e663089" | |
| "3bbc833bd04339b4033cbc833c84033bccf0147660077b6807376887726807378087709087706007" | |
| "76280776f8057678877780875f08877118877298877998812ceef00eeee00ef5c00eec300362c8a1" | |
| "1ce4a11ccca11ce4a11cdc611cca211cc4811dca6106d6904339c84339984339c84339b8c3389443" | |
| "3888033b94c32fbc833cfc823bd4033bb0c30cc7698770588772708374680778608774188774a087" | |
| "19ce530fee000ff2500ee4900ee3400fe1200eec500e3320281ddcc11ec2411ed2211cdc811edce0" | |
| "1ce4e11dea011e66185138b0433a9c833bcc50247660077b68073760877778077898514cf4900ff0" | |
| "500e331e6a1eca611ce8211ddec11d7e011ee4a11ccc211df0610654858338ccc33bb0433dd04339" | |
| "fcc23ce4433b88c33bb0c38cc50a877998877718877408077a28077298815ce3100eecc00ee5500e" | |
| "f33023c1d2411ee4e117d8e11dde011e6648193bb0833db4831b84c3388c4339ccc33cb8c139c8c3" | |
| "3bd4033ccc48b471080776600771088771588719dbc60eec600fede006f0200fe5300fe5200ff650" | |
| "0e6e100ee3300ee5300ff3e006e9e00ee4500ef83023e2ec611cc2811dd8e117ec211de6211dc421" | |
| "1dd8211de8211f66209d3bbc433db80339948339cc58bc7070077778077a08077a488777708719cb" | |
| "e70eef300fe1e00ee9400fe9a00fe530c3010373a8077718875f988770708774a08774d087729881" | |
| "844139e0c338b0433d904339cc40c4a01dcaa11de0411edec11c662463300ee1c00eec300fe9400f" | |
| "e5000000792000001e000000721e482043880c19097232482023818c9191d144a01028643c313242" | |
| "8e9021a318100a00060000006b65726e656c0000230802308240042308843082400c23080431c320" | |
| "04c30c045118858c04262821373bbb36973037b737ba30bab437b7b95102231d373bbbb6343917bb" | |
| "32b9b9b437b7518203000000a9180000250000000b0a7228877780077a587098433db8c338b04339" | |
| "d0c382e61cc6a10de8411ec2c11de6211de8211ddec11d1634e3600ee7500fe1200fe4400fe1200f" | |
| "e7500ef4b08081077928877060077678877108077a28077258709cc338b4013ba4833d94c3026b1c" | |
| "d8211cdce11cdc201ce4611cdc201ce8811ec2611cd0a11cc8611cc2811dd861c1010ff4200fe150" | |
| "0ff4800e00000000d11000000600000007cc3ca4833b9c033b94033da0833c94433890c301000000" | |
| "6120000006000000130481860301000002000000075010cd14610000000000007120000003000000" | |
| "320e10228400fc020000000000000000650c00001f000000120394f0000000000300000006000000" | |
| "060000004c000000010000005800000000000000580000000100000070000000000000000c000000" | |
| "130000001f0000000800000006000000000000007000000000000000000000000100000000000000" | |
| "00000000060000000000000006000000ffffffff00240000000000005d0c00000d00000012039467" | |
| "000000006b65726e656c31352e302e376e7670747836342d6e76696469612d637564613c73747269" | |
| "6e673e0000000000", | |
| } | |
| @pytest.fixture(params=("txt", "bitcode_static")) | |
| def minimal_nvvmir(request): | |
| major, minor, debug_major, debug_minor = nvvm.ir_version() | |
| if request.param == "txt": | |
| return MINIMAL_NVVMIR_TXT_TEMPLATE % (major, debug_major) | |
| bitcode_static_binascii = MINIMAL_NVVMIR_BITCODE_STATIC.get((major, debug_major)) | |
| if bitcode_static_binascii: | |
| return binascii.unhexlify(bitcode_static_binascii) | |
| raise RuntimeError( | |
| "Static bitcode for NVVM IR version " | |
| f"{major}.{debug_major} is not available in this test.\n" | |
| "Maintainers: Please run the helper script to generate it and add the " | |
| "output to the MINIMAL_NVVMIR_BITCODE_STATIC dict:\n" | |
| " ../../toolshed/build_static_bitcode_input.py" | |
| ) |
so I suggest that we move it to the common place, say a new file under
cuda_python_test_helpers:https://github.com/NVIDIA/cuda-python/tree/main/cuda_python_test_helpers/cuda_python_test_helpers
and have it imported in both cuda.bindings/core tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be partially addressed now as I have yet to remove the existing invocations from cuda_bindings test
| extra_name = f"{options.name}_extra_{i}" | ||
| nvvm.add_module_to_program(self._mnff.handle, extra_source, len(extra_source), extra_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two notes:
- I am torn if we should allow users to specify the module names, instead of us making one up for them. Could you try my "sequence of 2-tuples" suggestion from our offline chat, and see how bad the code looks like? We should have something similar already in one of the options.
- As discussed let's check if all modules can be lazily loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is done
- if we use lazy mode always then we have to remove textual IR as input as lazy mode always supports bitcode format.
@leofang what do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that the lazy loading does not support text IR, only bitcode IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I think so , looking at the error
|
pre-commit.ci autofix |
|
FYI, there are some merge conflicts due to #1351 that need to be resolved 😅 |
|
@rwgk @rparolin Could you share feedbacks with @abhilash1910 with regard to the changes on the pathfinder? Let's make sure this PR can be merged for cuda-core v0.6.0. |
rwgk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not ideal to add find_libdevice under _dynamic_libs. What do people think about introducing _static_libs? This would be for .a and .bc files.
Just noting: the found_via feature is missing, but we can do that in a follow-on PR.
We'll also need pathfinder tests, but I can help with that in a follow-on PR.
| ) | ||
|
|
||
|
|
||
| class LibdeviceNotFoundError(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with DynamicLibNotFoundError (in load_dl_common.py):
class LibdeviceNotFoundError(RuntimeError):
| pass | ||
|
|
||
|
|
||
| def _get_cuda_home_or_path() -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use cuda.pathfinder._utils.get_cuda_home_or_path?
|
|
||
|
|
||
| class _FindLibdevice: | ||
| FILENAME = "libdevice.10.bc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move this up to the top of the file, to make it more obvious that we have this hard-wired?
| if IS_WINDOWS: | ||
| bases = [r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA", r"C:\CUDA"] | ||
| else: | ||
| bases = ["/usr/local/cuda", "/opt/cuda"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Could you please also move this up to the top?
More fundamental:
Thus far we've tried hard to avoid hard-coding these paths. It's of course very easy for us, but can also easily backfire if there are multiple CUDA installations. We should probably make sure @kkraus14 is on board with this. (If he is I'm fine with it, too.)
| if os.path.isfile(file_path): | ||
| return file_path | ||
| # Versioned paths (e.g., /usr/local/cuda-13.0) | ||
| for versioned in sorted(glob.glob(base + "*"), reverse=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not doing this for dynamic libs or headers. We could wind up with a weird mix of versions between dynamic libs, headers, and libdevice if we don't have a general mechanism for deciding "What is the system installation?" I fear if that happens, the consequences will be subtle but potentially extremely confusing and time consuming to troubleshoot. @kkraus14 again for guidance.
| def find_libdevice() -> str: | ||
| """Find the path to libdevice.10.bc. | ||
| Raises: | ||
| LibdeviceNotFoundError: If libdevice.10.bc cannot be found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LibdeviceNotFoundError: If the libdevice*.bc file cannot be found
(To avoid hard-wiring the name multiple times.)
| return abs_path | ||
|
|
||
|
|
||
| def get_libdevice_path() -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely make this DRY and find naming that's intuitive in the call sites; i.e. people shouldn't need to come back here to figure out the difference between find/get.
My preferred solution would be
def find_libdevice(optional: bool = False) -> str | None:
Simple, just a tiny little bit weaker for type checkers.
| is_culink_backend = _linker._decide_nvjitlink_or_driver() | ||
|
|
||
| try: | ||
| from cuda_python_test_helpers.nvvm_bitcode import minimal_nvvmir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make this import unconditional. At least that's how I would start, then add except ModuleNotFoundError (instead of ImportError, which can mask all sort of bugs) if we really need it.
Description
Issue Link - #981
Changes to be addressed in this WIP PR:
{If / when it is possible to add multiple modules, a test with code that uses something from libdevice is probably a good idea.
It's also useful to be able to lazily add a module}
cc @leofang