-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
RAPIDS libraries can often install successfully via pip or conda but fail at runtime due to the underlying CUDA toolkit not being setup properly. Some scenarios which I have personally come across of this:
- shared libraries (like
libcudart.soorlibnvrtc.so) are not findable at runtime. Pip wheels previously did not provide these (although with cupy 14, things looks much better). And any setup with a preinstalled CUDA toolkit could have an incorrect configuration. - CUDA toolkit version (either from pip/conda installations or
CUDA_HOMEand/usr/local/cudasymlink resolution) does not match with GPU driver's CUDA version. - The scenario above is further exacerbated by libraries hardcoding
/usr/local/cudaas a fallback search path, so a stale symlink loads wrong libraries.
A check on rapids-cli can be added which checks for
- discoverability of shared libraries via
cuda-pathfinder - version consistency between these libraries found the
cuda-pathfinder, the GPU driver, the/usr/local/cudasymlink and theCUDA_HOME/CUDA_PATHenvironment variables (if present). Mismatch on major versions is an automatic error but I am curious about what is the recommended approach if there is a mismatch for a minor version. Is it warning or should that be an error?
I think having this check fills in a very important gap in existing rapids doctor checks, and builds upon information which can be gathered by rapids debug.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels