[Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes #478

atmnp · 2025-09-17T18:55:19Z

This PR vendors in more of the cpython registries from upstream numba - charseq, iterators, listobj, slicing, and unicode. There are some components from cpython remaining and will be included in a future PR. Some dangling imports are also fixed.

…ing, and unicode

copy-pr-bot · 2025-09-17T18:55:23Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

atmnp · 2025-09-17T18:55:32Z

/ok to test

atmnp · 2025-09-17T19:58:09Z

/ok to test

brandon-b-miller · 2025-09-23T19:02:26Z

numba_cuda/numba/cuda/core/inline_closurecall.py

 )
 from numba.core.imputils import impl_ret_untracked
-from numba.core.extending import intrinsic
+from numba.cuda.extending import intrinsic


Just to confirm my understanding, this is leftover from previous efforts and should be cleaned up, but isn't directly related to the other changes in this PR correct?

Yep, our policy so far has been to fix dangling imports as we see them.

brandon-b-miller · 2025-09-23T19:05:00Z

numba_cuda/numba/cuda/cpython/charseq.py

+@overload_method(types.UnicodeCharSeq, "find")
+@overload_method(types.CharSeq, "find")
+@overload_method(types.Bytes, "find")
+def unicode_charseq_find(a, b):


A lot of these lowerings seems to provide operations on some kind of string dtype. I'm not aware of support for strings in general for numba-cuda. Indeed, removing a bunch of these locally on my clone of this branch seems to still result in a passing state. Are all of these needed for the CUDA target?

brandon-b-miller · 2025-09-23T19:09:22Z

numba_cuda/numba/cuda/cpython/unicode.py

+            ("hash", _Py_hash_t),
+            ("meminfo", types.MemInfoPointer(types.voidptr)),
+            # A pointer to the owner python str/unicode object
+            ("parent", types.pyobject),


This sticks out to me as well as potentially not being needed for the GPU. Without bonafide python objects on the GPU, I'm guessing there's no currently supported python expressions that lead to a valid variable backed by this type.

brandon-b-miller

I don't see anything inherently breaking about this though I think we could maybe use a separate discussion about wether it's worth vendoring any code at all that interacts with the python host runtime. I don't see how that code could be used without substantial reimplementation for the GPU.

brandon-b-miller · 2025-09-23T21:26:59Z

/ok to test

atmnp · 2025-09-24T17:04:19Z

/ok to test

brandon-b-miller · 2025-09-25T15:21:28Z

Connected with @atmnp offline around the unicode/string generation pieces of things, leaving them in for now.

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

- Add support for cache-hinted load and store operations (#587) - Add more thirdparty tests (#586) - Add sphinx-lint to pre-commit and fix errors (#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (#544) - chore: clean up dead workaround for unavailable `lru_cache` (#598) - chore(docs): format types docs (#596) - refactor: decouple `Context` from `Stream` and `Event` objects (#579) - Fix freezing in of constant arrays with negative strides (#589) - Update tests to accept variants of generated PTX (#585) - refactor: replace device functionality with `cuda.core` APIs (#581) - Move frontend tests to `cudapy` namespace (#558) - Generalize the concurrency group for main merges (#582) - ci: move pre-commit checks to pre commit action (#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (#574) - ci: ensure that python version in ci matches matrix (#575) - Fix the `cuda.is_supported_version()` API (#571) - Fix checks on main (#576) - feat: add `math.nextafter` (#543) - ci: replace conda testing with pixi (#554) - [CI] Run PR workflow on merge to main (#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (#569) - test: enable fail-on-warn and clean up resulting failures (#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (#550) - test: revert back to ipc futures that await each iteration (#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (#534) - Remove dependencies on target_extension for CUDA target (#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (#559) - [WIP] Port numpy reduction tests to CUDA (#523) - ci: add timeout to avoid blocking the job queue (#556) - Handle `cuda.core.Stream` in driver operations (#401) - feat: add support for `math.exp2` (#541) - Vendor in types and datamodel for CUDA-specific changes (#533) - refactor: cleanup device constructor (#548) - bench: add cupy to array constructor kernel launch benchmarks (#547) - perf: cache dimension computations (#542) - perf: remove duplicated size computation (#537) - chore(perf): add torch to benchmark (#539) - test: speed up ipc tests by ~6.5x (#527) - perf: speed up kernel launch (#510) - perf: remove context threading in various pointer abstractions (#536) - perf: reduce the number of `__cuda_array_interface__` accesses (#538) - refactor: remove unnecessary custom map and set implementations (#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (#513) - test: add benchmarks for kernel launch for reproducibility (#528) - test(pixi): update pixi testing command to work with the new `testing` directory (#522) - refactor: fully remove `USE_NV_BINDING` (#525) - Draft: Vendor in the IR module (#439) - pyproject.toml: add search path for Pyrefly (#524) - Vendor in numba.core.typing for CUDA-specific changes (#473) - Use numba.config when available, otherwise use numba.cuda.config (#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (#502) - build: allow parallelization of nvcc testing builds (#521) - chore(dev-deps): add pixi (#505) - Vendor the imputils module for CUDA refactoring (#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (#519) - Switch back to stable cuDF release in thirdparty tests (#518) - Updating .gitignore with binaries in the `testing` folder (#516) - Remove some unnecessary uses of ContextResettingTestCase (#507) - Vendor in _helperlib cext for CUDA-specific changes (#512) - Vendor in typeconv for future CUDA-specific changes (#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (#494) - Make the CUDA target the default for CUDA overload decorators (#511) - Remove C extension loading hacks (#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (#433) - Fix Bf16 Test OB Error (#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (#488) - Improve debug value range coverage (#461) - Add `compile_all` API (#484) - Vendor in core.registry for CUDA-specific changes (#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (#476) - [test] Remove dependency on cpu_target (#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (#475) - [test] Use numpy's tolerance for float16 (#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (#478)

atmnp added 5 commits September 17, 2025 11:23

fix dangling import to intrinsic from numba.core.extending

406db17

fix dangling unbox import

c3a4799

other dangling intrinsic import

29d9fc3

vendor in some cpython registries - charseq, iterators, listobj, slic…

ec4ccb4

…ing, and unicode

python 3.9 support

d779527

atmnp self-assigned this Sep 17, 2025

atmnp added the 2 - In Progress Currently a work in progress label Sep 17, 2025

Merge branch 'main' into atmn/vendor-in-cpython-2

117fbd0

remove ref to legacy type system

960a6b6

atmnp changed the title ~~[WIP][Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes~~ [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes Sep 17, 2025

atmnp added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 17, 2025

atmnp requested a review from brandon-b-miller September 23, 2025 16:15

Merge branch 'main' into atmn/vendor-in-cpython-2

ad17085

brandon-b-miller reviewed Sep 23, 2025

View reviewed changes

brandon-b-miller approved these changes Sep 23, 2025

View reviewed changes

brandon-b-miller and others added 2 commits September 24, 2025 09:43

Merge branch 'main' into atmn/vendor-in-cpython-2

3354c12

Merge branch 'main' into atmn/vendor-in-cpython-2

8c1ab95

brandon-b-miller added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Sep 25, 2025

brandon-b-miller merged commit fc8b050 into NVIDIA:main Sep 25, 2025
56 checks passed

gmarkall mentioned this pull request Nov 20, 2025

Bump version to 0.21.0 #602

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes #478

[Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes #478

Uh oh!

atmnp commented Sep 17, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 17, 2025

Uh oh!

atmnp commented Sep 17, 2025

Uh oh!

atmnp commented Sep 17, 2025

Uh oh!

brandon-b-miller Sep 23, 2025

Uh oh!

atmnp Sep 23, 2025

Uh oh!

brandon-b-miller Sep 23, 2025

Uh oh!

brandon-b-miller Sep 23, 2025

Uh oh!

brandon-b-miller left a comment

Uh oh!

brandon-b-miller commented Sep 23, 2025

Uh oh!

atmnp commented Sep 24, 2025

Uh oh!

brandon-b-miller commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes #478

[Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes #478

Uh oh!

Conversation

atmnp commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Sep 17, 2025

Uh oh!

atmnp commented Sep 17, 2025

Uh oh!

atmnp commented Sep 17, 2025

Uh oh!

brandon-b-miller Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

atmnp Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller left a comment

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Sep 23, 2025

Uh oh!

atmnp commented Sep 24, 2025

Uh oh!

brandon-b-miller commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

atmnp commented Sep 17, 2025 •

edited

Loading