Replace rocprofilerv1 with rocprofiler-sdk (v3) for rocmon#716
Merged
Replace rocprofilerv1 with rocprofiler-sdk (v3) for rocmon#716
Conversation
The new rocmon will use rocprofiler-sdk.
... in rocprofiler-v3
Apparently for some very weird reasons, rocprofiler_start_context will fail on the first call with ROCPROFILER_STATUS_ERROR_HSA_NOT_LOADED. However, it will return ROCPROFILER_STATUS_SUCCESS when simply calling it a second time. Let's see if that's not causing issues.
Turns out this is completely unnecessary. We can just use PTHREAD_MUTEX_INITIALIZER instead.
Using HIP before that makes rocprofiler-sdk not initialize properly and it will complain that HSA is not initialized. Unforuntately this requires the code to become more ugly, since we have to filter out the "unwanted" (aka HIP_VISIBLE_DEVICES) GPUs at a later time. Accordingly we keep track of all devices, but mark them whether they are used or not, and which HIP ID they have (if applicable).
We use the rocmon marker API, because it requires less code and it's no longer necessary to have duplicate logic for writing and reading result files.
This never deleted a "file". Instead it used to destroy in memory data from the parsed marker file, which is now automatically destroyed via Lua's garbage collector.
This may very well be an expected case and shouldn't terminate execution.
Contributor
Author
|
Currently still fails on MI300X (gfx942): |
Contributor
Author
|
That's looking better now. Not sure how the MI300A worked before ("how could multi GPU work at all before?"): MI300X (gfx942): Details |
Contributor
Author
|
While the groups probably still need fixing, I'll merge this for now. It'll probably be best to open separate issues for that. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces support for device counting with rocprofiler-sdk. Device counting allows out-of-band counting of GPUs. While the actual counting is still performed within the target process, this is not strictly necessary.
The first few tests on MI210 look good so far, although they still need verification against AMD's own rocprofilerv3. That however is a bit difficult, since I don't think the rocprofilerv3 frontend application actually supports device counting. I did some rudimentary scripting to sum up the per-kernel values and compare them against the device counted ones. There were differences of about 5%, which means we're at least in the same ballpark.
MI210 (gfx90a) example:
Details
While the "utilization" in percent doesn't make much sense (being >100%), the event values appear to match what rocprofilerv3 measured. So perhaps LIKWID's MEM metric formula is wrong?
4x MI300A (gfx942) example:
Details
The RX 6900 XT (gfx1030) appears to read zeroes only. But I vaguely remember that GPU didn't function properly with AMD tools either. Still needs testing:
Details
MI300X unfortunately wasn't available for testing yet.