Merged
Conversation
Many L2 & L1 routines, which used multiple kernels to compute result leaked internal cl_events. Removed '-g' clBuildProgram option from library debug builds. This flag causes an internal compiler error for debug builds in xTRMM. The flag is not very useful and has caused too many problems.
Adding netlib as a dependency for the client on linux based systems.
Fixing internal library cl_event leaks
Removing ACML as a build dependency for tests
Extensions should never be enabled unconditionally, and not all platforms support cl_amd_printf (for example, on ARM devices that support it one might wnat cl_arm_printf instead). Moreover, the extension is not needed at all on devices supporting OpenCL 1.2 Since printf are only needed for debugging anyway, we can safely comment all lines enabling this extension. If needed, a more sophisticated common blurb (checking for OpenCL version and extension availability for printf) can be designed, with some easy on/off switch.
This is similar to the already merged changes from PR#115, and it fixes the same kind of error (unaligned memory access on NVIDIA GPUs).
If we have to build from source, do not bail out early only showing the error number: print the whole log and _then_ run CL_CHECK on the original error to exit.
Outputting a double-precision constant in a single-precision kernel can make compilation impossible on very strict platforms if the device has no double-precision capability. (For example, it causes an assert in Beignet on Haswell). Replace all 'naked' 0.0, 1.0 and 2.0 occurrences with the corresponding integer and trust on automatic promotion. For xgemm and its Python generator, replace the switch with a unified definition using the DATA_TYPE_STRING.
Portability fixes 👍 Thank you for the fixes @Oblomov
setDeviceType only takes one argument: params. This was causing build errors.
Fix call to setDeviceType in test-performance.cpp Looks good to me - 👍 🐛
.s is more portable, and .x does not work with -std=c99 which does not have anonymous structs. Fix #307.
This increases the chance that they will just work for most users. Previously there was a mixture of CPU and GPU, which only works if users have both CPU and GPU OpenCL devices.
.s[0] + CL_DEVICE_TYPE_ALL
Remove usage of CMAKE_SOURCE_DIR
fixes compilation issues with autogemm
Fixes #318 Change link to build wiki page
DGEMM: Use kernels from Tensile project on VEGA20
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.