Add multi-GPU CI runners (t4 and h100 with 2 GPUs) #1505

Andy-Jost · 2026-01-15T22:17:51Z

Summary

Add support for multi-GPU CI testing by introducing GPU_COUNT field to the test matrix and adding t4 and h100 2-GPU configurations.

Changes

ci/test-matrix.yml:
- Added GPU_COUNT field to all entries for consistency
- Added two new multi-GPU entries: t4 and h100 with GPU_COUNT: '2'
- Removed special_runners section - entries now integrated directly into pull-request matrix
- Aligned columns for readability (can be reverted if needed)
.github/workflows/test-wheel-linux.yml:
- Updated runs-on to use ${{ matrix.GPU_COUNT }} instead of hardcoded -1
- Updated job name to show (x2) suffix for multi-GPU tests (e.g., t4(x2))
- Removed special_runners handling logic (no longer needed)
.github/workflows/test-wheel-windows.yml:
- Updated runs-on to use ${{ matrix.GPU_COUNT }} for consistency
cuda_core/examples/simple_multi_gpu_example.py:
- Switched from old CuPy RNG (cp.random.random()) to new RNG (cp.random.default_rng()) to avoid requiring libcurand.so
cuda_core/tests/test_launcher.py:
- Switched to new CuPy RNG to avoid libcurand dependency

Test Coverage

Multi-GPU runners are now included in the standard PR test matrix
Job names clearly indicate GPU count: py3.13, 13.1.0, local, t4(x2)

Closes #1501

copy-pr-bot · 2026-01-15T22:17:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Andy-Jost · 2026-01-15T22:18:02Z

/ok to test f97ef26

Andy-Jost · 2026-01-15T22:47:33Z

/ok to test 9163804

.github/workflows/test-wheel-linux.yml

Add GPU_COUNT field to test matrix to support multi-GPU configurations. This enables rigorous testing of peer access, device switching, and other multi-GPU functionality in CI. Closes NVIDIA#1501

Simplifies CI configuration by moving special runner entries directly into the pull-request matrix rather than handling them separately.

Replace cp.random.default_rng() with cp.arange() to avoid requiring cuRAND library which may not match the installed CUDA version.

Andy-Jost · 2026-01-15T23:32:29Z

/ok to test 074a6ca

Andy-Jost · 2026-01-15T23:41:35Z

Notes on test-matrix.yml changes:

Added GPU_COUNT field to all entries for consistency and to support multi-GPU runners (t4 and h100 with 2 GPUs).
Aligned columns for readability. This is optional and can be reverted if anyone objects.

Andy-Jost · 2026-01-15T23:41:42Z

/ok to test 3184316

Switch from cp.random.random() (old RNG requiring libcurand.so) to cp.random.default_rng().random() (new RNG with pre-compiled curand device libs bundled in CuPy).

Andy-Jost · 2026-01-15T23:49:36Z

/ok to test 4214c2c

kkraus14 · 2026-01-16T01:57:28Z

Do we want to run all of our CI jobs on these multi-GPU runners or can we only run multi-GPU specific tests / examples / etc. as needed?

leofang

Do we want to run all of our CI jobs on these multi-GPU runners or can we only run multi-GPU specific tests / examples / etc. as needed?

I suggested offline that we add 2 dual-GPU jobs on the per-PR basis for now, and monitor the usage in the next few days. We don't have the infra for as-needed tests yet (#299 is a good start).

leofang · 2026-01-16T03:10:01Z

ci/test-matrix.yml

-  special_runners:
-    amd64:
-      - { ARCH: 'amd64', PY_VER: '3.13', CUDA_VER: '13.0.2', LOCAL_CTK: '1', GPU: 'H100', DRIVER: 'latest' }
-      - { ARCH: 'amd64', PY_VER: '3.13', CUDA_VER: '13.1.0', LOCAL_CTK: '1', GPU: 'H100', DRIVER: 'latest' }


My only nitpick is that it'd be nice to have a comment or code block that separates out "special runners" (including the 2-GPU ones introduced in this PR) from the regular matrix. It's easier to eyeball and update.

leofang · 2026-01-16T03:12:36Z

cuda_core/examples/simple_multi_gpu_example.py

-a = cp.random.random(size, dtype=dtype)
-b = cp.random.random(size, dtype=dtype)
+rng = cp.random.default_rng()
+a = rng.random(size, dtype=dtype)
+b = rng.random(size, dtype=dtype)


Keep a note on this change for posterity:

We want to encourage end users to use the new NumPy/CuPy RNG interface

But the real reason that we must do it in this PR is: The new RNG does not require libcurand to be installed. CuPy is self-contained -- for all of our use cases we just need NVRTC and driver. Our CI relies on this assumption (to save resources).

mdboom · 2026-01-16T14:42:22Z

🎉 There are a few APIs about to land in cuda.core.system related to multiple GPUs that could benefit from this testing.

github-actions · 2026-01-16T15:10:03Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

Andy-Jost self-assigned this Jan 15, 2026

This comment has been minimized.

Sign in to view

Andy-Jost force-pushed the multi-gpu-ci branch from f97ef26 to 9163804 Compare January 15, 2026 22:47

leofang reviewed Jan 15, 2026

View reviewed changes

.github/workflows/test-wheel-linux.yml Outdated Show resolved Hide resolved

Andy-Jost added 3 commits January 15, 2026 15:14

Add multi-GPU CI runners (t4 and h100 with 2 GPUs)

f19d167

Add GPU_COUNT field to test matrix to support multi-GPU configurations. This enables rigorous testing of peer access, device switching, and other multi-GPU functionality in CI. Closes NVIDIA#1501

Remove special_runners, integrate entries into pull-request matrix

cc13ecf

Simplifies CI configuration by moving special runner entries directly into the pull-request matrix rather than handling them separately.

Fix test to avoid cuRAND dependency

074a6ca

Replace cp.random.default_rng() with cp.arange() to avoid requiring cuRAND library which may not match the installed CUDA version.

Andy-Jost force-pushed the multi-gpu-ci branch from 9163804 to 074a6ca Compare January 15, 2026 23:32

Add notes about GPU_COUNT and column alignment

3184316

Andy-Jost force-pushed the multi-gpu-ci branch from 8c0c7f4 to 3184316 Compare January 15, 2026 23:41

Use CuPy's new RNG to avoid libcurand dependency

4214c2c

Switch from cp.random.random() (old RNG requiring libcurand.so) to cp.random.default_rng().random() (new RNG with pre-compiled curand device libs bundled in CuPy).

Andy-Jost changed the title ~~[WIP] Test multi-GPU CI runners~~ Add multi-GPU CI runners (t4 and h100 with 2 GPUs) Jan 15, 2026

Andy-Jost marked this pull request as ready for review January 15, 2026 23:50

leofang approved these changes Jan 16, 2026

View reviewed changes

leofang added this to the cuda.core beta 12 milestone Jan 16, 2026

leofang added P0 High priority - Must do! CI/CD CI/CD infrastructure enhancement Any code-related improvements labels Jan 16, 2026

leofang merged commit 53c8d4a into NVIDIA:main Jan 16, 2026
88 checks passed

Add multi-GPU CI runners (t4 and h100 with 2 GPUs) #1505

Add multi-GPU CI runners (t4 and h100 with 2 GPUs) #1505

Conversation

Andy-Jost commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Coverage

Uh oh!

copy-pr-bot bot commented Jan 15, 2026

Uh oh!

Andy-Jost commented Jan 15, 2026

Uh oh!

This comment has been minimized.

Andy-Jost commented Jan 15, 2026

Uh oh!

Uh oh!

Andy-Jost commented Jan 15, 2026

Uh oh!

Andy-Jost commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andy-Jost commented Jan 15, 2026

Uh oh!

Andy-Jost commented Jan 15, 2026

Uh oh!

kkraus14 commented Jan 16, 2026

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

leofang Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

leofang Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdboom commented Jan 16, 2026

Uh oh!

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Andy-Jost commented Jan 15, 2026 •

edited

Loading

Andy-Jost commented Jan 15, 2026 •

edited

Loading

leofang Jan 16, 2026 •

edited

Loading