Implement Kokkos support for MICM chemical kinetics and optimizations#967
Implement Kokkos support for MICM chemical kinetics and optimizations#967bbakernoaa wants to merge 12 commits into
Conversation
This commit implements a Kokkos backend for MICM, following the design of the existing CUDA driver. It includes: - KokkosDenseMatrix and KokkosSparseMatrix using Kokkos Views - KokkosProcessSet with vectorized kernels for forcing and Jacobian - Integration into the CMake build system (MICM_ENABLE_KOKKOS) - Unit tests for the new Kokkos components
…0980220 Add Kokkos support for MICM chemical kinetics
…/Jacobian indexing - Add SetAlgebraicVariableIds method to KokkosProcessSet for setting algebraic variable flags on device - Optimize forcing term indexing to use number_of_forcing_species instead of number_of_species - Optimize Jacobian term indexing by pre-computing jacobian_group_size and jac_group_offset to reduce redundant calculations - Refactor Jacobian index computation to use pre-computed offset for better performance - Update test file with copyright header and consolidate test utilities using shared test policy - Improve memory access patterns in AddForcingTerms and SubtractJacobianTerms kernels
|
Well, this is cool. This is also vaguely related to #966. I've been wanting to consider replacing our pure CUDA backend with Kokkos anyway, and this is a natural first step. @sjsprecious, do you have any thoughts about this? |
|
@K20shores I'm trying to move some of our stuff @noaa to using kokkos as a standard way of doing it. At least where I have a semblance of control :) Besides the kokkos templating idea is pretty intriguing |
- Break long lines in loop assignments to improve code readability - Split long Kokkos::View declarations across multiple lines - Reformat assignment statements in SetJacobianFlatIds method - Improve line length consistency throughout kokkos_process_set.hpp - Update corresponding test file formatting to match header changes
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #967 +/- ##
==========================================
- Coverage 94.55% 94.21% -0.34%
==========================================
Files 65 66 +1
Lines 4150 4461 +311
==========================================
+ Hits 3924 4203 +279
- Misses 226 258 +32 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
K20shores
left a comment
There was a problem hiding this comment.
No concerns, mostly just rambling comments. The one change I'm asking for is an update to our ci action that runs on a gpu.
The comments are about our matrix types. We have many matrix types, and we could probably drop support for our non-vectorized types, which would reduce the amount of duplicated code we have anyway. Adding kokkos gives me more motivation to consider that.
- Add new kokkos_cuda_tests job to runner workflow - Configure NVIDIA GPU container with CUDA 13.0 support - Install required dependencies (cmake, g++) - Build with Kokkos CUDA enabled on Ampere architecture - Run full test suite on GPU hardware with parallel execution - Enable GPU resource allocation via Cirrus group configuration
|
@bbakernoaa did you happen to use AI in any of this? No worries if you did (#966, which I "authored" was done entirely with Claude). If you did, would you mind adding a statement to the PR description about how AI was used, if it was? |
- Bump Kokkos version to 4.7.04 for improved performance and bug fixes - Ensures compatibility with latest Kokkos features and optimizations
I did actually :). I was using the Amazon Q Developer (which uses Claude as the LLM) |
|
@bbakernoaa oh guess I missed it |
|
No you didn't! I just added the disclaimer |
sjsprecious
left a comment
There was a problem hiding this comment.
Thanks @bbakernoaa for working on this Kokkos implementation. It looks really cool and I think it can be a good comparison to our existing C++/CUDA implementation.
A few quick comments as the first-round review and we iterate on them later as needed.
| DenseMatrixPolicy forcing{ n_cells, n_species, 1000.0 }; | ||
| state.rate_constants_ = rate_constants; | ||
|
|
||
| CheckCopyToDevice<DenseMatrixPolicy>(state.variables_); |
There was a problem hiding this comment.
@K20shores this looks like a bug to me. How could our previous GPU test passes without this copy?
There was a problem hiding this comment.
Oh, haha. This test has zero assertions. I guess we should add some
There was a problem hiding this comment.
Because it is a random system and we do not know what the deterministic correct answer is?
There was a problem hiding this comment.
I guess so. We could just recreate the expected forcings by writing the algorithm by hand, but we already have tests to make sure our different policies get the correct answer from known systems.
I think the intent of this test is that we can solve many systems without failing, which this test is already doing.
…rocess set - Replace `auto` with explicit `StatePolicy` template parameter in `AddForcingTerms` method - Replace `auto` with explicit `StatePolicy` template parameter in `SubtractJacobianTerms` method - Improve code clarity and type safety by making state parameter types explicit - Enhance IDE support and type checking for Kokkos process operations
- Update Kokkos dependency from 4.7.04 to 5.1.0 in cmake/dependencies.cmake - Replace HostMirror type alias with explicit Kokkos::View<T*, Kokkos::HostSpace> in dense and sparse matrix headers - Remove RandomAccess memory trait from ProcessSetParam view declarations for improved compatibility - Add blank lines between include groups for better code organization - Reorder includes alphabetically (type_traits before vector) - Add braces to single-statement for loops for consistency and readability - Format GroupVectorSize() methods across multiple lines for improved readability
- Add Kokkos serial backend CI jobs to macOS, Ubuntu, and Windows workflows - Remove kokkos_util.hpp header and consolidate utility includes in main Kokkos.hpp - Refactor KokkosProcessSet to remove redundant Initialize() calls and simplify device struct updates - Clean up trailing whitespace in workflow files - Reorganize include order in Kokkos.hpp for better dependency management - Update test CMakeLists.txt to remove kokkos_util.hpp references - Improve code formatting and consistency across Kokkos utility headers
|
@sjsprecious @K20shores I think I addressed most of the comments. If you could take another look and let me know any other suggestions, I'll take care of them. |
| add_executable(test_kokkos_process_set test_kokkos_process_set.cpp) | ||
| target_link_libraries(test_kokkos_process_set PUBLIC musica::micm musica::micm_kokkos GTest::gtest) | ||
| add_test(NAME kokkos_process_set COMMAND test_kokkos_process_set WORKING_DIRECTORY ${CMAKE_BINARY_DIR}) |
There was a problem hiding this comment.
We have a utility for this in cmake/test_util.cmake You should be able to do something like this
| add_executable(test_kokkos_process_set test_kokkos_process_set.cpp) | |
| target_link_libraries(test_kokkos_process_set PUBLIC musica::micm musica::micm_kokkos GTest::gtest) | |
| add_test(NAME kokkos_process_set COMMAND test_kokkos_process_set WORKING_DIRECTORY ${CMAKE_BINARY_DIR}) | |
| create_standard_test(NAME kokkos_process_set SOURCES test_kokkos_process_set.cpp LIBRARIES musica::micm_kokkos) |
There was a problem hiding this comment.
same change can be done here as suggested above
| DenseMatrixPolicy forcing{ n_cells, n_species, 1000.0 }; | ||
| state.rate_constants_ = rate_constants; | ||
|
|
||
| CheckCopyToDevice<DenseMatrixPolicy>(state.variables_); |
There was a problem hiding this comment.
Oh, haha. This test has zero assertions. I guess we should add some
sjsprecious
left a comment
There was a problem hiding this comment.
Thanks @bbakernoaa for addressing the comments. I probably should ask this question at the very beginning: what is your motivation of doing this work? If you have a Kokkos host model and want to call MICM with a Kokkos wrapper, I think what you are doing here is fine. But if you want to show Kokkos is a better way to move on compared to C++/CUDA approach, I think we need a deeper refactor here and probably need an offline discussion first before going too far.
@sjsprecious sure. We are currently integrating MICM/MUSICA within our new chemical component within the UFS where we are exploring the adoption of Kokkos across our ecosystem for different components. For instance, https://github.com/ufs-community/CECE already supports Kokkos. |
@bbakernoaa Thanks for your detailed explanations. That sounds great. Here you only wrap the ProcessSet class with Kokkos. Do you plan to issue another PR to include the LU solver with Kokkos later or you will use your own solver in UFS? Also out of curiosity: it was my first time to hear that NOAA has efforts with investing in C++/Kokkos. Is there a trend or proposal inside NOAA to shift from Fortran? |
|
There is a little movement on that. WaveWatch4 is currently in early development and is being rewritten in C++/Kokkos as well. Yes I'll add the LU solver with a later PR hopefully. This is just the initial piece hopefully. |
|
@bbakernoaa would it be possible to make the kokkos matrix apis match our internal implemenations? Things like If you do that, you won't actually need to implement the process set and jacobian/forcing. We could rewrite our solvers to use these APIs entirely and then we could run our solvers entirely just based off of implementations of the matrix APIs. |
… views - Remove persistent h_view_ member from KokkosDenseMatrix and KokkosSparseMatrix - Replace manual element-by-element loops with Kokkos::deep_copy for better performance - Use unmanaged host views in CopyToDevice() and CopyToHost() to avoid redundant allocations - Simplify constructor initialization by removing h_view_ creation - Add comprehensive documentation explaining device/host synchronization pattern - Improve Fill() method by removing unnecessary h_view_ management - Reduces memory overhead and aligns with CUDA matrix implementation pattern
The latest commit should address this some. I tried to use the CopyToDevice/CopyToHost to wrap MICM's data_.data() pointers. This should now line up with the VectorMaxtrix within MICM where Kokkos View is just the device-side mirror. |
- Add test_kokkos_sparse_matrix.cpp with full test coverage for KokkosSparseMatrix - Test zero matrix initialization and const zero matrix behavior - Test scalar assignment and diagonal operations - Test single and multi-block matrix operations with various ordering policies - Test print and print non-zero functionality across all vector orderings - Add using declaration for base class assignment operator in KokkosSparseMatrix - Register new test executable in CMakeLists.txt with proper linking and test configuration - Ensures KokkosSparseMatrix implementation is fully validated across different matrix configurations
|
@K20shores I added the inheritance from the SparceMatrix base class in the KokkosSparseMatrix class. |
|
@bbakernoaa I guess that's okay. We don't really need inheritance. We use policy-based-design, which means as long as a class implements a function, it is usable with our policies. Sorry, I could have been more clear here. |
I think the purpose of inheriting here is that we don't need to replicate the SparseMatrix in Kokkos. The Kokkos piece is just the device-side mirror and this allows for the policy-based design in MICM to be leveraged within Kokkos without replicating it. If we are going to move the entire MICM SparseMatrix to Kokkos I think thats one thing otherwise I'm not sure I see the benefit here. Just to note that this is exactly the same pattern as the CudaDenseMatrix and CudaSparseMatrix. |
… coverage - Rename type aliases from KokkosVectorOrdering to KokkosOrdering for clarity - Add BlockFunction infrastructure tests (ArrayFunction, MultiMatrixArrayFunction) - Add temporary variable and view reuse tests (MultipleTemporaries, BlockViewReuse) - Add function reusability and multi-matrix structure tests - Add sparse + dense/vector matrix interaction tests - Add ordering compatibility and error condition tests - Organize tests into logical sections with comments for maintainability - Expand test coverage across all four vector orderings (1-4) where applicable
|
@bbakernoaa fair. I would still like to see the changes in the cmake files. Other than that I think I'm fine with these changes |
sjsprecious
left a comment
There was a problem hiding this comment.
Given the user's intent and no impact on the existing C++/CUDA code, the Kokkos implementation looks fine to me.
It seems two Clang CI tests are failing. @K20shores any idea what could be wrong here?
|
@sjsprecious yes, it's trying to pull this branch, but the branch lives in a fork so it can't run. We should update these actions so that they don't run on forks. Not an issue that @bbakernoaa should address and should not hold up the merge |
boulderdaze
left a comment
There was a problem hiding this comment.
Looks good! I have a few minor suggestions that should be straightforward to fix.
| if(MICM_ENABLE_KOKKOS) | ||
| add_subdirectory(util) | ||
| add_subdirectory(process) | ||
| endif() |
There was a problem hiding this comment.
The test/unit/CMakeLists.txt already includes a check for adding the kokos subdirectory. Could you remove the redundant check?
| if(MICM_ENABLE_KOKKOS) | |
| add_subdirectory(util) | |
| add_subdirectory(process) | |
| endif() | |
| add_subdirectory(util) | |
| add_subdirectory(process) |
| struct ProcessInfoParam | ||
| { | ||
| std::size_t process_id_; | ||
| std::size_t independent_id_; | ||
| std::size_t number_of_dependent_reactants_; | ||
| std::size_t number_of_products_; | ||
| }; | ||
|
|
||
| /// This struct holds device-side copies of ProcessSet data for use in Kokkos kernels | ||
| struct ProcessSetParam | ||
| { | ||
| Kokkos::View<std::size_t*> number_of_reactants_; | ||
| Kokkos::View<std::size_t*> reactant_ids_; | ||
| Kokkos::View<std::size_t*> number_of_products_; | ||
| Kokkos::View<std::size_t*> product_ids_; | ||
| Kokkos::View<double*> yields_; | ||
| Kokkos::View<ProcessInfoParam*> jacobian_process_info_; | ||
| Kokkos::View<std::size_t*> jacobian_reactant_ids_; | ||
| Kokkos::View<std::size_t*> jacobian_product_ids_; | ||
| Kokkos::View<double*> jacobian_yields_; | ||
| Kokkos::View<std::size_t*> jacobian_flat_ids_; | ||
| Kokkos::View<uint8_t*> is_algebraic_variable_; | ||
| }; |
There was a problem hiding this comment.
ProcessInfoParam and ProcessSetParam are specific to ProcessSet which are built during ProcessSet construction.
Could you move them to kokkos/process/ directory? That would better mirror the micm structure.
micm/process/process_set.hpp -> micm/kokkos/process/kokkos_process_set.hppAlso since this file would still contain CopyVectorToView, it might make sense to rename it to something more general, like as kokos_utils.hpp.
This pull request introduces Kokkos support to the codebase, enabling performance portability for matrix and process set operations. The changes add CMake options and build logic for Kokkos, implement Kokkos-based dense and sparse matrix classes, and provide a Kokkos-enabled
ProcessSetfor parallel computations. Additionally, new unit tests are set up for the Kokkos functionality.Build system: Kokkos integration
MICM_ENABLE_KOKKOSCMake option and logic to fetch and build Kokkos as a dependency. [1] [2]micm_kokkosinterface library and link with Kokkos when enabled.New Kokkos-based core classes
KokkosDenseMatrixandKokkosSparseMatrixclasses for device-host memory management and parallel operations, inheriting from existing matrix types. [1] [2]Kokkos-enabled process set
KokkosProcessSetclass, a Kokkos-enabled extension ofProcessSet, supporting parallel computation of forcing and Jacobian terms using Kokkos kernels.Kokkos.hppfor Kokkos-based types and process sets.AI Assistance Disclosure: The Kokkos implementation and templating in this pull request were developed with the assistance of Amazon Q Developer (powered by the Claude LLM). All code has been reviewed and tested.