Skip to content

CUDA PoC#145

Draft
torsteingrindvik wants to merge 1 commit intonagisa:mainfrom
torsteingrindvik:cuda-test
Draft

CUDA PoC#145
torsteingrindvik wants to merge 1 commit intonagisa:mainfrom
torsteingrindvik:cuda-test

Conversation

@torsteingrindvik
Copy link
Copy Markdown

In similar spirit to #144 , putting this up as a draft in case someone is interested.

Just does the minimal dirty work necessary to see how CUDA work could be added to Tracy.

Here is an example of a complex app that uses async + this PR:

image

Before this the CPU work (bottom left, top right) would just look like it had empty air in-between, but now we see there are mem copies going on.

Another example:

image

The above shows CPU work sections connected by lots of kernel launches.

The GPU related activity is quite noisy.

The PR lacks a few things:

  • A proper bindgen command with an allowlist that is close to the hand-picked "generated_cuda" file in this PR. See that file for the command used in this PR. Using that command produces thousands of lines of Rust code with bindings to things which aren't useful and won't even compile.
  • A layer on top of the sys crate that allows opt-in to CUDA. This way tracing-tracy won't need a dep on the sys crate. Opt-in perhaps through options on TracyLayer or something.
  • A CUDA feature flag
  • Cleanup? Calling tracy_CUDACtx_Destroy at some point. Not sure it's actually worthwhile.
  • Seeing if tracy_CUDACtx_Name does something nice
  • Figuring out why there are so many unknown spans with little info

Signed-off-by: Torstein Grindvik <torstein.grindvik@muybridge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant