Pass the -numba-debug flag to libnvvm#681
Conversation
When using the CUDA Toolkit release 13.1 or later for debug builds, we need to pass the -numba-debug flag to libnvvm in order to enable enhanced debug information. Closes NVIDIA#679
Greptile SummaryThis PR adds support for the Key changes:
Implementation quality:
Confidence Score: 5/5
Important Files Changed
|
…bnvvm. As suggested by leofang, the original approach of determinging if the -numba-debug flag should be used is unreliable. Instead, use the pattern from one of the tests and compile a test program to examine it's PTX output version to deduce the CUDA Toolkit version. This result is cached in the NVVM singleton.
There was a problem hiding this comment.
Additional Comments (3)
-
numba_cuda/numba/cuda/cudadrv/nvvm.py, line 254-255 (link)syntax: Incorrect type for
optionsparameter.nvvmVerifyProgramexpectsPOINTER(c_char_p)but receives a Python list. -
numba_cuda/numba/cuda/cudadrv/nvvm.py, line 257 (link)syntax: Same issue: incorrect type for
optionsparameter. Must useoption_ptrsinstead. -
numba_cuda/numba/cuda/cudadrv/nvvm.py, line 274-275 (link)logic: Unconditional destruction of potentially uninitialized program handle. If
nvvmCreateProgramfails at line 239,programremains an emptyc_void_p()and destroying it could cause issues.
1 file reviewed, 3 comments
There was a problem hiding this comment.
Additional Comments (3)
-
numba_cuda/numba/cuda/cudadrv/nvvm.py, line 262 (link)syntax: use
c_size_t()instead ofc_int()to match function signature -
numba_cuda/numba/cuda/cudadrv/nvvm.py, line 255-256 (link)syntax: options must be encoded to bytes for ctypes
-
numba_cuda/numba/cuda/cudadrv/nvvm.py, line 277-279 (link)logic: calling
check_errorin finally block can raise exception during error handling, masking the original error. wrap in try-except or check error without raising
1 file reviewed, 3 comments
There was a problem hiding this comment.
Additional Comments (1)
-
numba_cuda/numba/cuda/cudadrv/nvvm.py, line 283-285 (link)logic: re-raising the exception defeats the graceful fallback logic. If test program compilation fails,
_libnvvm_cuda_versionstaysNonebut the exception propagates, preventing the caller from using the fallback behavior. Consider removing theraiseto allow silent failure and returnNone.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
1 file reviewed, 1 comment
|
/ok to test 6764681 |
| try: | ||
| self.check_error(err, "Failed to destroy test program.") | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
I think there's no point checking the error if we're going to swallow the exception the check will raise anyway.
| try: | |
| self.check_error(err, "Failed to destroy test program.") | |
| except Exception: | |
| pass |
| err = self.nvvmGetCompiledResult(program, ptx_data) | ||
| self.check_error(err, "Failed to get test program compiled result.") | ||
| except Exception as exception: | ||
| print(f"Exception compiling test program: {exception}") |
There was a problem hiding this comment.
This should probably be a warning rather than a print:
| print(f"Exception compiling test program: {exception}") | |
| warnings.warn( | |
| f"Exception compiling test program: {exception}", | |
| category=NvvmWarning | |
| ) |
| self.check_error(err, "Failed to get test program compiled result.") | ||
| except Exception as exception: | ||
| print(f"Exception compiling test program: {exception}") | ||
| raise exception |
There was a problem hiding this comment.
I don't think we should re-raise the exception, just let it pass - otherwise I'd expect it to propagate all the way back to the user, which we may not want.
| raise exception |
| self._libnvvm_cuda_version = ( | ||
| get_minimal_required_cuda_ver_from_ptx_ver(ptx_version) | ||
| ) | ||
| except Exception: |
There was a problem hiding this comment.
It seems that the only likely exception we'd expect from the PTX version functions is a ValueError - anything else is a bit more surprising so we should let it manifest to expose the underlying bug instead:
| except Exception: | |
| except ValueError: |
| # pass in the -numba-debug flag. | ||
| if "g" in options: | ||
| ctk_version = self.driver.get_cuda_version() | ||
| if ctk_version is None or ctk_version >= (13, 1): |
There was a problem hiding this comment.
If we couldn't determine the version of the CTK, that could be because the version is 12.x because the necessary PTX version functions weren't present in the CUDA bindings for 12.x. So I think it'd be safer to assume we don't pass the -numba-debug flag if we can't determine the version:
| if ctk_version is None or ctk_version >= (13, 1): | |
| if ctk_version is not None and ctk_version >= (13, 1): |
gmarkall
left a comment
There was a problem hiding this comment.
I made some comments on the diff, but I also saw that the CI is failing.
If you set up commit signing (so that your commits show as "Verified" rather than "Unverified") you should be able to trigger the CI yourself by commenting /ok to test as well.
| except Exception as exception: | ||
| print(f"Exception compiling test program: {exception}") | ||
| raise exception |
There was a problem hiding this comment.
logic: catching the exception, printing it, and re-raising defeats the purpose of graceful fallback. the code at lines 299-305 expects exceptions to be silently caught, allowing _libnvvm_cuda_version to remain None. this re-raise will prevent the function from returning None on error.
| except Exception as exception: | |
| print(f"Exception compiling test program: {exception}") | |
| raise exception | |
| except Exception: | |
| pass |
… compiler options. Add a check_options() method to the NVVM class to determine if a given combination of compiler options is supported. This is done by compiling a short test program with the provided options, and returning True/False depending on the result. Add a check to the CompilationUnit class for the -numba-debug flag when compiling with debugging enabled.
|
/ok to test f32c2a5 |
|
/ok to test f32c2a5 |
|
/ok to test 56b553b |
gmarkall
left a comment
There was a problem hiding this comment.
This looks good to me. I just pushed a fix so that the pre-commit checks should pass.
- Add Python 3.14 to the wheel publishing matrix (NVIDIA#750) - feat: swap out internal device array usage with `StridedMemoryView` (NVIDIA#703) - Fix max block size computation in `forall` (NVIDIA#744) - Fix prologue debug line info pointing to decorator instead of def line (NVIDIA#746) - Fix kernel return type in DISubroutineType debug metadata (NVIDIA#745) - Fix missing line info in Jupyter notebooks (NVIDIA#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (NVIDIA#698) - chore(deps): add cuda-pathfinder to pixi deps (NVIDIA#741) - fix: enable flake8-bugbear lints and fix found problems (NVIDIA#708) - fix: Fix race condition in CUDA Simulator (NVIDIA#690) - ci: run tests in parallel (NVIDIA#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (NVIDIA#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (NVIDIA#739) - Pass the -numba-debug flag to libnvvm (NVIDIA#681) - ci: remove rapids containers from conda ci (NVIDIA#737) - Use `pathfinder` for dynamic libraries (NVIDIA#308) - CI: Add CUDA 13.1 testing support (NVIDIA#705) - Adding `pixi run test` and `pixi run test-par` support (NVIDIA#724) - Disable per-PR nvmath tests + follow same test practice (NVIDIA#723) - chore(deps): regenerate pixi lockfile (NVIDIA#722) - Fix DISubprogram line number to point to function definition line (NVIDIA#695) - revert: chore(dev): build pixi using rattler (NVIDIA#713) (NVIDIA#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (NVIDIA#692) - chore(dev): build pixi using rattler (NVIDIA#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (NVIDIA#704)
- Add Python 3.14 to the wheel publishing matrix (#750) - feat: swap out internal device array usage with `StridedMemoryView` (#703) - Fix max block size computation in `forall` (#744) - Fix prologue debug line info pointing to decorator instead of def line (#746) - Fix kernel return type in DISubroutineType debug metadata (#745) - Fix missing line info in Jupyter notebooks (#742) - Fix: Pass correct flags to linker when debugging in the presence of LTOIR code (#698) - chore(deps): add cuda-pathfinder to pixi deps (#741) - fix: enable flake8-bugbear lints and fix found problems (#708) - fix: Fix race condition in CUDA Simulator (#690) - ci: run tests in parallel (#740) - feat: users can pass `shared_memory_carveout` to @cuda.jit (#642) - Fix compatibility with NumPy 2.4: np.trapz and np.in1d removed (#739) - Pass the -numba-debug flag to libnvvm (#681) - ci: remove rapids containers from conda ci (#737) - Use `pathfinder` for dynamic libraries (#308) - CI: Add CUDA 13.1 testing support (#705) - Adding `pixi run test` and `pixi run test-par` support (#724) - Disable per-PR nvmath tests + follow same test practice (#723) - chore(deps): regenerate pixi lockfile (#722) - Fix DISubprogram line number to point to function definition line (#695) - revert: chore(dev): build pixi using rattler (#713) (#719) - [feat] Initial version of the Numba CUDA GDB pretty-printer (#692) - chore(dev): build pixi using rattler (#713) - build(deps): bump the actions-monthly group across 1 directory with 8 updates (#704) <!-- Thank you for contributing to numba-cuda :) Here are some guidelines to help the review process go smoothly. 1. Please write a description in this text box of the changes that are being made. 2. Please ensure that you have written units tests for the changes made/features added. 3. If you are closing an issue please use one of the automatic closing words as noted here: https://help.github.com/articles/closing-issues-using-keywords/ 4. If your pull request is not ready for review but you want to make use of the continuous integration testing facilities please label it with `[WIP]`. 5. If your pull request is ready to be reviewed without requiring additional work on top of it, then remove the `[WIP]` label (if present) and replace it with `[REVIEW]`. If assistance is required to complete the functionality, for example when the C/C++ code of a feature is complete but Python bindings are still required, then add the label `[HELP-REQ]` so that others can triage and assist. The additional changes then can be implemented on top of the same PR. If the assistance is done by members of the rapidsAI team, then no additional actions are required by the creator of the original PR for this, otherwise the original author of the PR needs to give permission to the person(s) assisting to commit to their personal fork of the project. If that doesn't happen then a new PR based on the code of the original PR can be opened by the person assisting, which then will be the PR that will be merged. 6. Once all work has been done and review has taken place please do not add features or make changes out of the scope of those requested by the reviewer (doing this just add delays as already reviewed code ends up having to be re-reviewed/it is hard to tell what is new etc!). Further, please do not rebase your branch on main/force push/rewrite history, doing any of these causes the context of any comments made by reviewers to be lost. If conflicts occur against main they should be resolved by merging main into the branch used for making the pull request. Many thanks in advance for your cooperation! -->
When using the CUDA Toolkit release 13.1 or later for debug builds, we need to pass the -numba-debug flag to libnvvm in order to enable enhanced debug information.
Closes #679