Conversation
fallback version
|
Hi @ookaay, thanks for the PR. In general this looks like a good addition, but I'm not sure that it actually works (did you test this with a machine that has a CUDA GPU?). Your code (and your tests) assume that the line for the multiprocessors look like: but this is not the correct format, these lines first have the number of multiprocessors and end with "CUDA cores" (which are the product of the number of multiprocessors and the cores per multiprocessor). E.g., on my Laptop this looks like: Also, the parsing code assumes that the compute compatibility is <10. Another question: what is the source for the translation of compute capability into number of cores/SM? Finally, is using |
|
Thanks @mstimberg, this makes sense. You’re right, the parsing I added is too tied to one deviceQuery output format. The assumptions I made about the multiprocessor line and the compute capability parsing are too narrow. I also take your point about deviceQuery itself. If the same information can be obtained more reliably from nvidia-smi, that’s probably the better direction. I’ll rework the implementation with that in mind before pushing this further. |
My question was also: did you run this on a machine where it worked? If I am not mistaken, the multiprocessor line has always looked like it does in my example. |
|
No, I didn’t run this on a real machine. I only made sure the implementation compiled correctly, but I didn’t validate the parsing against actual hardware output. |
|
I see, but does that mean that the assumed parsing output is LLM-generated? And same for the mapping from compute capability to cores/SM? As we've seen for the deviceQuery output, these are not necessarily correct, and it would be helpful to know for the review where I can verify these things. |
|
Yes, I did some brainstorming with AI while working on this, and I was wrong not to check the deviceQuery output format against the actual source first. For the compute capability to cores/SM mapping, that’s the mapping used in _ConvertSMVer2Cores in NVIDIA_CUDA-11.2_Samples/common/inc/helper_cuda.h. I also should have tested it on a real CUDA GPU. I only checked that the code path compiled. Using deviceQuery was the wrong direction. I should have stayed with nvidia-smi. I’ll rework it around that, verify it on a GPU runtime in Colab, and open a new PR after implementing it. |
This PR addresses parts of #255 and #295.
For #255, automatic GPU selection previously relied on the legacy compute-capability-based choice. I changed it so automatic selection now uses a CUDA-style performance estimate:
performance = multiprocessors * cuda_cores_per_multiprocessor * clock_rate
If performance detection is unavailable, it falls back to the old compute-capability-based logic.
For #295, I updated brian2cuda/tools/test_suite/run_test_suite.py so it no longer runs the Brian test flow directly itself. Instead, it now delegates to the shared brian2cuda.tests.run entrypoint.
I verified the new GPU-selection path, the fallback to the legacy selection logic, and that run_test_suite.py now routes through the shared Brian2CUDA test runner.
I also fixed the build configuration by changing the setuptools_scm fallback version from an invalid value to a valid PEP 440 version, so the package can build correctly in CI.
@mstimberg @denisalevi