Draft
Conversation
…ting Adds AOCL-BLAS 5.2 (BLIS/BLAS/CBLAS) as an alternative to OpenBLAS for rocBLAS client testing. AOCL-BLAS provides ILP64 support needed for large-scale stress tests and serves as a complementary CPU BLAS provider. - Add host-aocl-blas artifact (Linux-only, enabled with THEROCK_BUILD_TESTING) - Build AOCL 5.2 BLAS component from source with ILP64 and multithreading - Install to lib/host-math/ alongside OpenBLAS - Enable LINK_BLIS in rocBLAS when testing is enabled - Add therock_test_validate_static_lib() for static library validation
AOCL's build system ignores CMAKE_INSTALL_INCLUDEDIR for headers, causing them to install to dist/include/ instead of the expected dist/lib/host-math/include/ location. This commit adds a custom CMake command to copy headers to the correct location after staging. Also fixes dependency ordering by moving therock-aocl-blas from RUNTIME_DEPS to BUILD_DEPS for rocBLAS, since it's a static library that must be built before rocBLAS links against it. Changes: - Add custom command to copy AOCL headers after staging - Set CMAKE_INSTALL_LIBDIR/BINDIR/INCLUDEDIR for AOCL build - Move therock-aocl-blas to BUILD_DEPS for proper build ordering - Update library validation path to match new location
During rocBLAS builds, pip installs Tensile from the source directory, creating egg-info and build artifacts that can cause permission errors on subsequent builds. Add automatic cleanup of these artifacts during rocBLAS+expunge to prevent build failures. Cleans: - Tensile.egg-info/ - Package metadata - build/ - In-tree build directory - dist/ - Distribution directory - .eggs/ - Egg cache directory
…ting Adds AOCL-BLAS 5.2 (BLIS/BLAS/CBLAS) as an alternative to OpenBLAS for rocBLAS client testing. AOCL-BLAS provides ILP64 support needed for large-scale stress tests and serves as a complementary CPU BLAS provider. - Add host-aocl-blas artifact (Linux-only, enabled with THEROCK_BUILD_TESTING) - Build AOCL 5.2 BLAS component from source with ILP64 and multithreading - Install to lib/host-math/ alongside OpenBLAS - Enable LINK_BLIS in rocBLAS when testing is enabled - Add therock_test_validate_static_lib() for static library validation
AOCL headers now install to lib/host-math/include/aocl/ subdirectory, matching the OpenBLAS pattern (lib/host-math/include/openblas/). Changes: - Replace broken custom_command/custom_target with install(CODE) script - Headers copied during install phase, not as separate build step - Updated CMake package config to point to include/aocl/ subdirectory - Preserves directory structure for Au/, Capi/, alci/ subdirectories Verified: - rocBLAS configure finds headers successfully - rocblas-test and rocblas-bench link against AOCL - Binaries show "Using reference library .../libaocl.a"
Pre-commit hook fixes: - Remove trailing whitespace from aocl-config.cmake.in - Apply black formatting to validate_static_library.py
Fixes all 8 issues raised by Copilot PR review: 1. validate_static_library.py: Filter empty strings when counting object files 2. validate_static_library.py: Fail validation if archive is empty (0 objects) or 0 MB 3. therock_testing.cmake: Skip static lib validation in sanitizer builds (matches shared lib) 4. aocl-config.cmake.in: Clarify OpenMP is needed for multithreaded AOCL-BLAS 5. aocl/CMakeLists.txt: Replace local path with public GitHub URL reference 6. BLAS/CMakeLists.txt: Add Tensile cleanup to clean target (not just expunge) 7. aocl/CMakeLists.txt: Clarify OpenMP comment - AOCL manages discovery internally 8. aocl/CMakeLists.txt: Add verification for install-time GLOB, fail if no headers found
The +clean target isn't available at this point in CMakeLists.txt. Only +expunge is explicitly created and can be used as a dependency. Fixes CMake configuration error: Cannot add target-level dependencies to non-existent target "rocBLAS+clean"
- Add validation for critical AOCL headers (blis.h, cblas.h, blis64.h) in install script - Remove unused @PACKAGE_INIT@ from aocl-config.cmake.in to match OpenBLAS pattern - Note: rocBLAS+expunge-tensile already only depends on rocBLAS+expunge (not +clean)
Create a symlink from libaocl.a to libcblas.a so that rocBLAS's find_library(NAMES cblas ...) can discover TheRock's AOCL through the hinted library paths. This allows the develop branch of rocm-libraries to work without modification. This is a temporary workaround until rocBLAS is updated to use find_package(aocl) for CMake package discovery.
AOCL-BLAS is disabled on Windows (disable_platforms = ["windows"]) so we should not add it as a dependency for rocBLAS on Windows. Windows builds will continue to use only OpenBLAS as the CPU reference BLAS for rocBLAS clients/tests.
Instead of symlinks, copy AOCL library and headers to the location
that rocBLAS's develop branch searches:
${BUILD_DIR}/deps/aocl/install_package/
This allows rocBLAS to find TheRock's AOCL 5.2 without any changes
to rocm-libraries, matching the existing search logic in
rocblas/clients/CMakeLists.txt.
Benefits:
- No symlinks (better Windows compatibility)
- No changes needed to rocm-libraries develop branch
- Works with existing rocBLAS AOCL discovery logic
Removes the previous libcblas.a symlink approach which didn't
handle headers.
Move the copy logic from rocBLAS CMakeLists (configure-time) to AOCL CMakeLists (install-time). This ensures the copy happens after AOCL builds but before rocBLAS configures, making it work in a single build pass. Timeline now: 1. Top-level configure 2. Build phase: - AOCL builds and installs (copies to rocBLAS deps/ location) - rocBLAS configures (finds AOCL in deps/ location) ✓ - rocBLAS builds This matches rocBLAS's existing search logic in rocblas/clients/CMakeLists.txt without requiring any changes to rocm-libraries.
Automatically set THEROCK_ENABLE_HOST_MATH=ON when THEROCK_BUILD_TESTING=ON to ensure host math libraries (OpenBLAS, AOCL-BLAS, SuiteSparse) are built for rocBLAS clients and tests. Without this, AOCL-BLAS was declared but never built on CI because its feature group (HOST_MATH) was disabled by default, causing rocBLAS configure to fail with "Could not find any BLAS library". Fixes CI failure where rocBLAS clients couldn't find BLAS library.
Pass -DBUILD_DIR to rocBLAS CMake configuration, pointing to its build
directory. This custom variable is used by rocBLAS clients/CMakeLists.txt
to locate bundled dependencies in ${BUILD_DIR}/deps/.
Without this, rocBLAS couldn't find TheRock's AOCL at
${BUILD_DIR}/deps/aocl/install_package/ and would fall back to system
AOCL installations, even though the files were correctly copied there
during AOCL's install phase.
Now rocBLAS will find: build/math-libs/BLAS/rocBLAS/build/deps/aocl/...
AOCL's CMakeLists.txt defaults CMAKE_CONFIGURATION_TYPES to Debug if not explicitly set. Even though we pass CMAKE_BUILD_TYPE=Release, the Debug configuration type was causing BLIS to build without optimizations, resulting in 100-450x slowdowns on triangular operations (trsm, trmm) and test timeouts. Add -DCMAKE_CONFIGURATION_TYPES=Release to explicitly force Release mode for all AOCL components. This fixes CI test timeouts where rocBLAS smoke tests failed to complete in 15 minutes due to Debug-mode AOCL performance.
CI diagnostics revealed we cannot detect actual CPU allocation: - multiprocessing.cpu_count() returns 256 (all system cores) - SLURM variables not available - cgroup limits not detectable via standard paths Without this fix, rocBLAS sets 254 OpenMP threads on containers with only ~48-64 allocated cores, causing 60-100x AOCL performance degradation due to thread oversubscription. Conservative value of 48 threads assumes typical CI allocation of 50-64 cores, leaving headroom for system threads as recommended by AOCL team (use allocated_cores - 4). This should reduce rocBLAS test time from ~8.9 min to ~2.5 min.
…vention Follow CMake best practices for package naming and target namespaces: - Change package name from 'aocl' to 'AOCL' (matches BLAS/LAPACK pattern) - Change target from 'AOCL::aocl' to 'AOCL::AOCL' (namespace matches package) - Rename config file to 'AOCLConfig.cmake' (standard for uppercase packages) - Update install path to 'cmake/AOCL' This follows Kitware's recommendation that package names and target namespaces should match exactly, which will be enforced in CMake 3.31+ via Common Package Specification (CPS). References: - https://www.kitware.com/psa-your-package-name-and-target-namespace-should-match/ - Standard CMake modules: BLAS::BLAS, LAPACK::LAPACK, ZLIB::ZLIB
This commit enables AOCL-BLAS to build on Windows alongside Linux, supporting the AOCL team's deliverable improvements. Key changes: - BUILD_TOPOLOGY: Remove Windows platform restriction for AOCL - therock_subproject: Preserve Windows SDK environment for nested CMake - BLAS/CMakeLists: Enable AOCL-BLAS for rocBLAS testing on Windows - aocl/CMakeLists: Major refactoring for cross-platform support - Use tony-davis fork with Windows/CMake fixes (temporary) - Add VS Clang-CL toolchain support for Windows - Cross-platform OpenMP configuration (MSVC vs GCC flags) - Disable AOCL_UTILS on Windows (MSVC incompatibility) - Modern CMake paths with GNUInstallDirs - Platform-specific library naming (aocl.lib vs libaocl.a) - Remove custom package config (AOCL now provides its own) Result: AOCL-BLAS builds successfully on both Windows and Linux, enabling rocBLAS testing with CPU reference BLAS on Windows. Co-authored-by: Cursor <cursoragent@cursor.com>
[RIPE FOR OWN PR] After amdsmi was moved from base/ to core/ (d3bb45a), it is only built when THEROCK_ENABLE_CORE_AMDSMI=ON. math-libs/BLAS/CMakeLists.txt still added the amdsmi target to optional deps on non-Windows unconditionally, so configure failed with "non-existent target amdsmi" when building rocBLAS/hipBLAS/hipBLASLt without CORE_AMDSMI. rocBLAS and hipBLASLt treat amdsmi as optional (GPU monitoring); they can build and run without it. Only add amdsmi to the optional deps when THEROCK_ENABLE_CORE_AMDSMI is ON. rocm_smi_lib remains in the list when not Windows (that target is always present). Co-authored-by: Cursor <cursoragent@cursor.com>
…LAS dep - third-party/aocl: Use SOURCE_DIR for fetch layout; install to lib/host-math/ with ENABLE_AOCL_UTILS=OFF; provide_package at lib/host-math/lib/cmake/AOCL. Remove legacy install(CODE) workaround; downstream uses find_package(AOCL CONFIG). - therock_subproject: Resolve build deps from THEROCK_STAGE_DIR (not DIST_DIR) so subprojects find package configs from each dep's stage install tree (fixes AOCL and other stage-installed packages). - math-libs/BLAS: Treat AOCL-BLAS as runtime-only optional dep for rocBLAS (mirror OpenBLAS); remove rocBLAS build dep on therock-aocl-blas. Co-authored-by: Cursor <cursoragent@cursor.com>
Reduce test_rocblas.py to minimal AOCL change (cap OMP_NUM_THREADS=48) and single log line; remove CPU allocation diagnostics block. skip-checks: true Co-authored-by: Cursor <cursoragent@cursor.com>
Keep Windows env preservation block only; resolve build deps from dist again to avoid global behavior change. skip-checks: true Co-authored-by: Cursor <cursoragent@cursor.com>
Drop rocBLAS+expunge-tensile; unrelated to AOCL integration. skip-checks: true Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist