-
Notifications
You must be signed in to change notification settings - Fork 65
GPU Shared AtPoints Bases #1711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
37e9e49 to
514184d
Compare
|
Yay - it works locally but fails on Nother. Those are always fun to debug |
|
Cuda is more betterer now. Need to check thread block sizes setup for Hip for 3D |
dcfc06f to
ff3d054
Compare
|
Ok, now every element other than the first in 3D is wrong for HIP. Progress, but super bizarre how it doesn't seem to behave as I expect. Almost the same logic as CUDA so I have to miss some subtle change between HIP shared and CUDA shared as those two have slightly diverged. |
88a3d5f to
ebcc04c
Compare
ebcc04c to
1b3d9bd
Compare
|
Ok, the kernels work now. We can optimize them in the future |
|
Local testing for Ratel passes |
include/ceed/jit-source/cuda/cuda-shared-basis-tensor-at-points.h
Outdated
Show resolved
Hide resolved
zatkins-dev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! I have a few clarifying questions, but I think this is good to merge
include/ceed/jit-source/cuda/cuda-shared-basis-tensor-at-points-templates.h
Show resolved
Hide resolved
co-authored-by: zatkins-dev <zach.atkins@colorado.edu>
This PR adds AtPoints to
/gpu/[cuda,hip]/shared, which is a blocker for a/gpu/[cuda,hip]/genAtPoints capability.Work in progress. Thet35*series tests pass, but thet59*series don't all pass.There seems to be an issue with the 2D/3D transpose interp and grad with multiple elements.Passing for libCEED t* tests.
petsc-bpsswarmisn't passing yet though. There is also some issue with a few Ratel tests where they get stuck, probably on a__syncthreads(). (MMS tests it seems).