-
Notifications
You must be signed in to change notification settings - Fork 258
Add support for direct store in epilogue and padding support for wave transfer without transpose #3465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
32a7c74 to
461a40d
Compare
kabrahamAMD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good work
| index_t, | ||
| index_t) | ||
| { | ||
| // Notes: padding is currently not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment (or remove, as it provides no additional information to the error message)
Need to add template parameter. It will be removed during refactoring
461a40d to
9134cbb
Compare
|
|
||
| // Limitations of the current implementation: | ||
| // - no multiAB | ||
| // - GemmSpecialization Default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like GemmSpec default is no longer a hard requirement for using WaveTransfer, but only when A B layouts are not row column respectively? If so update comment?
|
|
||
| // We need to investigate if it makes sense to remove cshuffle for smaller types | ||
| // Currently we use direct store for NRepeat equal to 4 or 8. For 16 bit type we use at | ||
| // lease buffer store 64 bit for 16 contiguous threads -> 128 bytes in toral (full cache line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typos: lease, toral
|
LGTM, had some small notes and I have a few questions:
|
Discussed offline. Basically benchmarks have been performed and wavetransfer did seem to be a decent bit better for 2D. No significant improvement for 3D so no instances added. We want to look into more automated benchmarks later. |
Proposed changes
Summary:
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered