Skip to content

update to main dev#2

Merged
tingxingdong merged 28 commits intotingxingdong:masterfrom
clMathLibraries:develop
Apr 28, 2021
Merged

update to main dev#2
tingxingdong merged 28 commits intotingxingdong:masterfrom
clMathLibraries:develop

Conversation

@tingxingdong
Copy link
Copy Markdown
Owner

No description provided.

Kent Knox and others added 28 commits January 19, 2017 09:13
Many L2 & L1 routines, which used multiple kernels to compute result
leaked internal cl_events.

Removed '-g' clBuildProgram option from library debug builds.  This
flag causes an internal compiler error for debug builds in xTRMM.  The
flag is not very useful and has caused too many problems.
Adding netlib as a dependency for the client on linux based
systems.
Fixing internal library cl_event leaks
Removing ACML as a build dependency for tests
Extensions should never be enabled unconditionally, and not all
platforms support cl_amd_printf (for example, on ARM devices that
support it one might wnat cl_arm_printf instead). Moreover, the
extension is not needed at all on devices supporting OpenCL 1.2

Since printf are only needed for debugging anyway, we can safely comment
all lines enabling this extension. If needed, a more sophisticated
common blurb (checking for OpenCL version and extension availability for
printf) can be designed, with some easy on/off switch.
This is similar to the already merged changes from PR#115, and it fixes
the same kind of error (unaligned memory access on NVIDIA GPUs).
If we have to build from source, do not bail out early only showing the
error number: print the whole log and _then_ run CL_CHECK on the
original error to exit.
Outputting a double-precision constant in a single-precision kernel
can make compilation impossible on very strict platforms if the device
has no double-precision capability. (For example, it causes an assert in
Beignet on Haswell).

Replace all 'naked' 0.0, 1.0 and 2.0 occurrences with the corresponding
integer and trust on automatic promotion.

For xgemm and its Python generator, replace the switch with a unified
definition using the DATA_TYPE_STRING.
Portability fixes
👍  Thank you for the fixes @Oblomov
setDeviceType only takes one argument: params. This was causing build errors.
Fix call to setDeviceType in test-performance.cpp

Looks good to me - 👍 🐛
.s is more portable, and .x does not work with -std=c99 which
does not have anonymous structs.

Fix #307.
This increases the chance that they will just work for most users.

Previously there was a mixture of CPU and GPU, which only works
if users have both CPU and GPU OpenCL devices.
Remove usage of CMAKE_SOURCE_DIR
fixes compilation issues with autogemm
Fixes #318

Change link to build wiki page
DGEMM: Use kernels from Tensile project on VEGA20
@tingxingdong tingxingdong merged commit fbe95c0 into tingxingdong:master Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants