update to main dev by tingxingdong · Pull Request #2 · tingxingdong/clBLAS-private

tingxingdong · 2021-04-28T16:38:29Z

No description provided.

Many L2 & L1 routines, which used multiple kernels to compute result leaked internal cl_events. Removed '-g' clBuildProgram option from library debug builds. This flag causes an internal compiler error for debug builds in xTRMM. The flag is not very useful and has caused too many problems.

Adding netlib as a dependency for the client on linux based systems.

Fixing internal library cl_event leaks

Removing ACML as a build dependency for tests

Extensions should never be enabled unconditionally, and not all platforms support cl_amd_printf (for example, on ARM devices that support it one might wnat cl_arm_printf instead). Moreover, the extension is not needed at all on devices supporting OpenCL 1.2 Since printf are only needed for debugging anyway, we can safely comment all lines enabling this extension. If needed, a more sophisticated common blurb (checking for OpenCL version and extension availability for printf) can be designed, with some easy on/off switch.

This is similar to the already merged changes from PR#115, and it fixes the same kind of error (unaligned memory access on NVIDIA GPUs).

If we have to build from source, do not bail out early only showing the error number: print the whole log and _then_ run CL_CHECK on the original error to exit.

Outputting a double-precision constant in a single-precision kernel can make compilation impossible on very strict platforms if the device has no double-precision capability. (For example, it causes an assert in Beignet on Haswell). Replace all 'naked' 0.0, 1.0 and 2.0 occurrences with the corresponding integer and trust on automatic promotion. For xgemm and its Python generator, replace the switch with a unified definition using the DATA_TYPE_STRING.

@Oblomov

Portability fixes 👍 Thank you for the fixes @Oblomov

setDeviceType only takes one argument: params. This was causing build errors.

Fix call to setDeviceType in test-performance.cpp Looks good to me - 👍 🐛

.s is more portable, and .x does not work with -std=c99 which does not have anonymous structs. Fix #307.

This increases the chance that they will just work for most users. Previously there was a mixture of CPU and GPU, which only works if users have both CPU and GPU OpenCL devices.

.s[0] + CL_DEVICE_TYPE_ALL

Remove usage of CMAKE_SOURCE_DIR

fixes compilation issues with autogemm

Fixes #318 Change link to build wiki page

DGEMM: Use kernels from Tensile project on VEGA20

Kent Knox and others added 28 commits January 19, 2017 09:13

Version bump to 2.13

d7a4a3d

Removing ACML as a build dependency for tests

7431066

Adding netlib as a dependency for the client on linux based systems.

Merge pull request #300 from kknox/fix-event-leak

05c432a

Fixing internal library cl_event leaks

Merge pull request #301 from kknox/client-netlib

b4ffb61

Removing ACML as a build dependency for tests

Fix unaligned memory access

e75a74d

This is similar to the already merged changes from PR#115, and it fixes the same kind of error (unaligned memory access on NVIDIA GPUs).

xgemm: should build log on error

bd83bf4

If we have to build from source, do not bail out early only showing the error number: print the whole log and _then_ run CL_CHECK on the original error to exit.

Merge pull request #302 from Oblomov/portability

c6a215b

Portability fixes 👍 Thank you for the fixes @Oblomov

Fix call to setDeviceType in test-performance.cpp

2bbe985

setDeviceType only takes one argument: params. This was causing build errors.

Merge pull request #303 from robertlong13/patch-1

35fbf16

Fix call to setDeviceType in test-performance.cpp Looks good to me - 👍 🐛

Remove usage of CMAKE_SOURCE_DIR

a382d33

Use .s[0] instead of .x in example_ctrsm.c

63cc8e8

.s is more portable, and .x does not work with -std=c99 which does not have anonymous structs. Fix #307.

Use CL_DEVICE_TYPE_ALL for all samples.

0d300eb

This increases the chance that they will just work for most users. Previously there was a mixture of CPU and GPU, which only works if users have both CPU and GPU OpenCL devices.

Merge pull request #313 from cirosantilli/device-all

a5c35fc

.s[0] + CL_DEVICE_TYPE_ALL

Merge pull request #305 from pfultz2/cmake

fd40f3b

Remove usage of CMAKE_SOURCE_DIR

fix os x compatibility

d051fee

fix 'initializer must be an initializer list' error

cb0e33d

fix 'initializer must be an initializer list' error

a439b23

Merge pull request #317 from krocki/master

daaccfd

fixes compilation issues with autogemm

Update README.md

2fba4c3

Fixes #318 Change link to build wiki page

Tensile dgemm kernels for vega20 and sgemm kernels for vega10

2d63b70

cleanup

a0e2fbc

cleanup

4dd9a6f

cleanup

3049899

cleanup

b556d73

Merge pull request #336 from stmuxa/tensile-dgemm-kernels

25ff135

DGEMM: Use kernels from Tensile project on VEGA20

tingxingdong merged commit fbe95c0 into tingxingdong:master Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update to main dev#2

update to main dev#2
tingxingdong merged 28 commits intotingxingdong:masterfrom
clMathLibraries:develop

tingxingdong commented Apr 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

tingxingdong commented Apr 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants