Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Jan 16, 2026

Summary

This PR reorganizes the graph tests into a new tests/graph/ subdirectory, adds a public Graph.handle property, and introduces tests for device-side graph launch.

Closes #1331

Changes

Test Reorganization

  • Create new tests/graph/ subdirectory following the pattern of tests/memory_ipc/
  • Break apart test_graph.py into logical groupings:
    • test_basic.py: basic graph construction and topology tests
    • test_conditional.py: conditional node tests (if, if-else, switch, while)
    • test_advanced.py: child graphs, update, stream lifetime
    • test_options.py: debug print, complete options, build mode
  • Incorporate test_graph_mem.py (renamed to test_capture_alloc.py)
  • Add shared kernel compilation helpers in tests/helpers/graph_kernels.py

API Addition

  • Add Graph.handle property to expose the underlying CUgraphExec handle
  • Follows the existing pattern of Stream.handle and Event.handle
  • Forward-compatible with future RAII handle refactoring

Device-Side Graph Launch Tests

  • Add test_device_launch.py with tests for device-side graph launch
  • Uses cudaGraphLaunch() from device code via cudaStreamGraphTailLaunch
  • Requires Hopper (sm_90+) architecture
  • Uses cuda.pathfinder.find_nvidia_header_directory() to locate libcudadevrt.a

Test Coverage

  • 58 tests total in tests/graph/
  • All tests pass on Hopper (H200)
  • Device launch tests skip on pre-Hopper architectures

@Andy-Jost Andy-Jost added the cuda.core Everything related to the cuda.core module label Jan 16, 2026
@Andy-Jost Andy-Jost self-assigned this Jan 16, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Jan 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 6e30c53

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from test_graph.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from test_graph.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from test_graph.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from test_graph.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from test_graph.py

@Andy-Jost
Copy link
Contributor Author

/ok to test 5c63733

- Reorganize graph tests into tests/graph/ subdirectory:
  - test_basic.py: basic graph construction tests
  - test_conditional.py: conditional node tests (if, if-else, switch, while)
  - test_advanced.py: child graphs, update, stream lifetime
  - test_options.py: debug print, complete options, build mode
  - test_capture_alloc.py: graph memory resource tests (moved from test_graph_mem.py)
  - test_device_launch.py: new device-side graph launch tests

- Add Graph.handle property to expose CUgraphExec handle, consistent with
  Stream.handle and Event.handle patterns

- Add tests/helpers/graph_kernels.py with shared kernel compilation helpers

- Device-side graph launch tests require Hopper (sm_90+) architecture
@Andy-Jost
Copy link
Contributor Author

/ok to test da6205c

Comment on lines +33 to +61
def _find_cudadevrt_library():
"""Find the CUDA device runtime static library using pathfinder.
Uses load_nvidia_dynamic_lib("cudart") to locate the CUDA runtime,
then derives the static library path from that location.
Returns:
Path to libcudadevrt.a (Linux) or cudadevrt.lib (Windows), or None if not found.
"""
try:
loaded = load_nvidia_dynamic_lib("cudart")
except Exception:
return None

lib_dir = os.path.dirname(loaded.abs_path)

if IS_WINDOWS:
# Windows: .dll is in bin/, .lib is in lib/x64/
base = os.path.dirname(lib_dir)
path = os.path.join(base, "lib", "x64", "cudadevrt.lib")
if os.path.isfile(path):
return path
else:
# Linux: .so and .a are both in lib/
path = os.path.join(lib_dir, "libcudadevrt.a")
if os.path.isfile(path):
return path

return None
Copy link
Contributor Author

@Andy-Jost Andy-Jost Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the best way to locate the cudadevrt library (needed for device launch). I'm open to suggestions here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the existing requirement that CUDA_HOME needs to be set when running the cuda-core tests: did you consider finding the .lib/.a files relative to CUDA_HOME? I'm guessing that's probably pretty simple.

In any case, could you please reference #716 from the helper function, and maybe add a comment under 716 to point back to the helper function, after this PR is merged?

If you want to keep the current implementation:

    try:
        loaded = load_nvidia_dynamic_lib("cudart")
    except DynamicLibNotFoundError:
        return None

That's far less likely to mask bugs.

@github-actions
Copy link

self._mnff.close()

@property
def handle(self) -> driver.CUgraphExec:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we decide if we want to expose CUgraph or CUgraphExec? Both can find their use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a great question that I tried to sidestep in this change. (I'm working on another change to add resource handles to the graph module.)

My understanding is that class GraphBuilder corresponds to CUgraph and class Graph corresponds to CUgraphExec.

Following the existing pattern, after moving to resource handles, we would update the internal names to clarify:

  • GraphBuilder._mnff.graphGraphBuilder._mnff._h_graph (CUgraph)
  • Graph._mnff.graphGraph._mnff._h_graph_exec (CUgraphExec)

And the property types would be:

  • GraphBuilder.handle -> driver.CUgraph
  • Graph.handle -> driver.CUgraphExec

FYI, I added this property to expose CUgraphExec for the following code in test_device_launch.py:

inner_graph_handle = int(inner_graph.handle)
...
launch(gb_outer, LaunchConfig(grid=1, block=1), launcher_kernel, inner_graph_handle)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CUDA graph phase N - Device-side graph launch

3 participants