Skip to content

Implement Kokkos support for MICM chemical kinetics and optimizations#967

Open
bbakernoaa wants to merge 12 commits into
NCAR:mainfrom
bbakernoaa:feature/kokkos
Open

Implement Kokkos support for MICM chemical kinetics and optimizations#967
bbakernoaa wants to merge 12 commits into
NCAR:mainfrom
bbakernoaa:feature/kokkos

Conversation

@bbakernoaa
Copy link
Copy Markdown

@bbakernoaa bbakernoaa commented Apr 17, 2026

This pull request introduces Kokkos support to the codebase, enabling performance portability for matrix and process set operations. The changes add CMake options and build logic for Kokkos, implement Kokkos-based dense and sparse matrix classes, and provide a Kokkos-enabled ProcessSet for parallel computations. Additionally, new unit tests are set up for the Kokkos functionality.

Build system: Kokkos integration

  • Added MICM_ENABLE_KOKKOS CMake option and logic to fetch and build Kokkos as a dependency. [1] [2]
  • Updated build targets to create micm_kokkos interface library and link with Kokkos when enabled.
  • Added test directory structure for Kokkos-enabled unit tests. [1] [2] [3]

New Kokkos-based core classes

  • Implemented KokkosDenseMatrix and KokkosSparseMatrix classes for device-host memory management and parallel operations, inheriting from existing matrix types. [1] [2]
  • Added utility for Kokkos initialization/finalization and parameter structs for managing device data. [1] [2]

Kokkos-enabled process set

  • Introduced KokkosProcessSet class, a Kokkos-enabled extension of ProcessSet, supporting parallel computation of forcing and Jacobian terms using Kokkos kernels.
  • Provided a convenience header Kokkos.hpp for Kokkos-based types and process sets.

AI Assistance Disclosure: The Kokkos implementation and templating in this pull request were developed with the assistance of Amazon Q Developer (powered by the Claude LLM). All code has been reviewed and tested.

bbakernoaa and others added 3 commits April 9, 2026 01:09
This commit implements a Kokkos backend for MICM, following the design of
the existing CUDA driver. It includes:
- KokkosDenseMatrix and KokkosSparseMatrix using Kokkos Views
- KokkosProcessSet with vectorized kernels for forcing and Jacobian
- Integration into the CMake build system (MICM_ENABLE_KOKKOS)
- Unit tests for the new Kokkos components
…0980220

Add Kokkos support for MICM chemical kinetics
…/Jacobian indexing

- Add SetAlgebraicVariableIds method to KokkosProcessSet for setting algebraic variable flags on device
- Optimize forcing term indexing to use number_of_forcing_species instead of number_of_species
- Optimize Jacobian term indexing by pre-computing jacobian_group_size and jac_group_offset to reduce redundant calculations
- Refactor Jacobian index computation to use pre-computed offset for better performance
- Update test file with copyright header and consolidate test utilities using shared test policy
- Improve memory access patterns in AddForcingTerms and SubtractJacobianTerms kernels
@K20shores
Copy link
Copy Markdown
Collaborator

K20shores commented Apr 17, 2026

Well, this is cool. This is also vaguely related to #966.

I've been wanting to consider replacing our pure CUDA backend with Kokkos anyway, and this is a natural first step. @sjsprecious, do you have any thoughts about this?

@bbakernoaa
Copy link
Copy Markdown
Author

@K20shores I'm trying to move some of our stuff @noaa to using kokkos as a standard way of doing it. At least where I have a semblance of control :)

Besides the kokkos templating idea is pretty intriguing

- Break long lines in loop assignments to improve code readability
- Split long Kokkos::View declarations across multiple lines
- Reformat assignment statements in SetJacobianFlatIds method
- Improve line length consistency throughout kokkos_process_set.hpp
- Update corresponding test file formatting to match header changes
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.21%. Comparing base (67f4370) to head (ac84d8c).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #967      +/-   ##
==========================================
- Coverage   94.55%   94.21%   -0.34%     
==========================================
  Files          65       66       +1     
  Lines        4150     4461     +311     
==========================================
+ Hits         3924     4203     +279     
- Misses        226      258      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@K20shores K20shores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns, mostly just rambling comments. The one change I'm asking for is an update to our ci action that runs on a gpu.

The comments are about our matrix types. We have many matrix types, and we could probably drop support for our non-vectorized types, which would reduce the amount of duplicated code we have anyway. Adding kokkos gives me more motivation to consider that.

Comment thread include/micm/kokkos/process/kokkos_process_set.hpp Outdated
Comment thread include/micm/kokkos/process/kokkos_process_set.hpp Outdated
Comment thread test/unit/kokkos/util/CMakeLists.txt Outdated
- Add new kokkos_cuda_tests job to runner workflow
- Configure NVIDIA GPU container with CUDA 13.0 support
- Install required dependencies (cmake, g++)
- Build with Kokkos CUDA enabled on Ampere architecture
- Run full test suite on GPU hardware with parallel execution
- Enable GPU resource allocation via Cirrus group configuration
@K20shores
Copy link
Copy Markdown
Collaborator

@bbakernoaa did you happen to use AI in any of this? No worries if you did (#966, which I "authored" was done entirely with Claude). If you did, would you mind adding a statement to the PR description about how AI was used, if it was?

- Bump Kokkos version to 4.7.04 for improved performance and bug fixes
- Ensures compatibility with latest Kokkos features and optimizations
@bbakernoaa
Copy link
Copy Markdown
Author

@bbakernoaa did you happen to use AI in any of this? No worries if you did (#966, which I "authored" was done entirely with Claude). If you did, would you mind adding a statement to the PR description about how AI was used, if it was?

I did actually :). I was using the Amazon Q Developer (which uses Claude as the LLM)

@K20shores
Copy link
Copy Markdown
Collaborator

@bbakernoaa oh guess I missed it

@bbakernoaa
Copy link
Copy Markdown
Author

No you didn't! I just added the disclaimer

Copy link
Copy Markdown
Collaborator

@sjsprecious sjsprecious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bbakernoaa for working on this Kokkos implementation. It looks really cool and I think it can be a good comparison to our existing C++/CUDA implementation.

A few quick comments as the first-round review and we iterate on them later as needed.

Comment thread cmake/dependencies.cmake Outdated
DenseMatrixPolicy forcing{ n_cells, n_species, 1000.0 };
state.rate_constants_ = rate_constants;

CheckCopyToDevice<DenseMatrixPolicy>(state.variables_);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@K20shores this looks like a bug to me. How could our previous GPU test passes without this copy?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, haha. This test has zero assertions. I guess we should add some

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is a random system and we do not know what the deterministic correct answer is?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. We could just recreate the expected forcings by writing the algorithm by hand, but we already have tests to make sure our different policies get the correct answer from known systems.

I think the intent of this test is that we can solve many systems without failing, which this test is already doing.

Comment thread include/micm/kokkos/process/kokkos_process_set.hpp Outdated
Comment thread test/unit/kokkos/util/CMakeLists.txt Outdated
Comment thread include/micm/kokkos/process/kokkos_process_set.hpp
Comment thread include/micm/kokkos/process/kokkos_process_set.hpp Outdated
…rocess set

- Replace `auto` with explicit `StatePolicy` template parameter in `AddForcingTerms` method
- Replace `auto` with explicit `StatePolicy` template parameter in `SubtractJacobianTerms` method
- Improve code clarity and type safety by making state parameter types explicit
- Enhance IDE support and type checking for Kokkos process operations
- Update Kokkos dependency from 4.7.04 to 5.1.0 in cmake/dependencies.cmake
- Replace HostMirror type alias with explicit Kokkos::View<T*, Kokkos::HostSpace> in dense and sparse matrix headers
- Remove RandomAccess memory trait from ProcessSetParam view declarations for improved compatibility
- Add blank lines between include groups for better code organization
- Reorder includes alphabetically (type_traits before vector)
- Add braces to single-statement for loops for consistency and readability
- Format GroupVectorSize() methods across multiple lines for improved readability
Comment thread include/micm/kokkos/util/kokkos_dense_matrix.hpp Outdated
Comment thread include/micm/kokkos/util/kokkos_sparse_matrix.hpp Outdated
Comment thread include/micm/kokkos/util/kokkos_sparse_matrix.hpp Outdated
Comment thread include/micm/kokkos/util/kokkos_util.hpp Outdated
Comment thread include/micm/kokkos/process/kokkos_process_set.hpp Outdated
Comment thread include/micm/kokkos/process/kokkos_process_set.hpp Outdated
- Add Kokkos serial backend CI jobs to macOS, Ubuntu, and Windows workflows
- Remove kokkos_util.hpp header and consolidate utility includes in main Kokkos.hpp
- Refactor KokkosProcessSet to remove redundant Initialize() calls and simplify device struct updates
- Clean up trailing whitespace in workflow files
- Reorganize include order in Kokkos.hpp for better dependency management
- Update test CMakeLists.txt to remove kokkos_util.hpp references
- Improve code formatting and consistency across Kokkos utility headers
@bbakernoaa
Copy link
Copy Markdown
Author

@sjsprecious @K20shores I think I addressed most of the comments. If you could take another look and let me know any other suggestions, I'll take care of them.

Comment on lines +1 to +3
add_executable(test_kokkos_process_set test_kokkos_process_set.cpp)
target_link_libraries(test_kokkos_process_set PUBLIC musica::micm musica::micm_kokkos GTest::gtest)
add_test(NAME kokkos_process_set COMMAND test_kokkos_process_set WORKING_DIRECTORY ${CMAKE_BINARY_DIR})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a utility for this in cmake/test_util.cmake You should be able to do something like this

Suggested change
add_executable(test_kokkos_process_set test_kokkos_process_set.cpp)
target_link_libraries(test_kokkos_process_set PUBLIC musica::micm musica::micm_kokkos GTest::gtest)
add_test(NAME kokkos_process_set COMMAND test_kokkos_process_set WORKING_DIRECTORY ${CMAKE_BINARY_DIR})
create_standard_test(NAME kokkos_process_set SOURCES test_kokkos_process_set.cpp LIBRARIES musica::micm_kokkos)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same change can be done here as suggested above

DenseMatrixPolicy forcing{ n_cells, n_species, 1000.0 };
state.rate_constants_ = rate_constants;

CheckCopyToDevice<DenseMatrixPolicy>(state.variables_);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, haha. This test has zero assertions. I guess we should add some

Copy link
Copy Markdown
Collaborator

@sjsprecious sjsprecious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bbakernoaa for addressing the comments. I probably should ask this question at the very beginning: what is your motivation of doing this work? If you have a Kokkos host model and want to call MICM with a Kokkos wrapper, I think what you are doing here is fine. But if you want to show Kokkos is a better way to move on compared to C++/CUDA approach, I think we need a deeper refactor here and probably need an offline discussion first before going too far.

@bbakernoaa
Copy link
Copy Markdown
Author

Thanks @bbakernoaa for addressing the comments. I probably should ask this question at the very beginning: what is your motivation of doing this work? If you have a Kokkos host model and want to call MICM with a Kokkos wrapper, I think what you are doing here is fine. But if you want to show Kokkos is a better way to move on compared to C++/CUDA approach, I think we need a deeper refactor here and probably need an offline discussion first before going too far.

@sjsprecious sure. We are currently integrating MICM/MUSICA within our new chemical component within the UFS where we are exploring the adoption of Kokkos across our ecosystem for different components. For instance, https://github.com/ufs-community/CECE already supports Kokkos.

@sjsprecious
Copy link
Copy Markdown
Collaborator

@sjsprecious sure. We are currently integrating MICM/MUSICA within our new chemical component within the UFS where we are exploring the adoption of Kokkos across our ecosystem for different components. For instance, https://github.com/ufs-community/CECE already supports Kokkos.

@bbakernoaa Thanks for your detailed explanations. That sounds great. Here you only wrap the ProcessSet class with Kokkos. Do you plan to issue another PR to include the LU solver with Kokkos later or you will use your own solver in UFS?

Also out of curiosity: it was my first time to hear that NOAA has efforts with investing in C++/Kokkos. Is there a trend or proposal inside NOAA to shift from Fortran?

@bbakernoaa
Copy link
Copy Markdown
Author

There is a little movement on that. WaveWatch4 is currently in early development and is being rewritten in C++/Kokkos as well.

Yes I'll add the LU solver with a later PR hopefully. This is just the initial piece hopefully.

@K20shores
Copy link
Copy Markdown
Collaborator

K20shores commented Apr 20, 2026

@bbakernoaa would it be possible to make the kokkos matrix apis match our internal implemenations? Things like Function and ForEachBlock. It would be nice if the kokkos matrix could pass all of our policy tests. And I think Jian mentioned it, but if it's possible I'd like to see all of the memory owned by kokkos so that we could compare the implementations between micm data structures and kokkos more directly?

If you do that, you won't actually need to implement the process set and jacobian/forcing. We could rewrite our solvers to use these APIs entirely and then we could run our solvers entirely just based off of implementations of the matrix APIs.

… views

- Remove persistent h_view_ member from KokkosDenseMatrix and KokkosSparseMatrix
- Replace manual element-by-element loops with Kokkos::deep_copy for better performance
- Use unmanaged host views in CopyToDevice() and CopyToHost() to avoid redundant allocations
- Simplify constructor initialization by removing h_view_ creation
- Add comprehensive documentation explaining device/host synchronization pattern
- Improve Fill() method by removing unnecessary h_view_ management
- Reduces memory overhead and aligns with CUDA matrix implementation pattern
@bbakernoaa
Copy link
Copy Markdown
Author

@bbakernoaa would it be possible to make the kokkos matrix apis match our internal implemenations? Things like Function and ForEachBlock. It would be nice if the kokkos matrix could pass all of our policy tests. And I think Jian mentioned it, but if it's possible I'd like to see all of the memory owned by kokkos so that we could compare the implementations between micm data structures and kokkos more directly?

If you do that, you won't actually need to implement the process set and jacobian/forcing. We could rewrite our solvers to use these APIs entirely and then we could run our solvers entirely just based off of implementations of the matrix APIs.

The latest commit should address this some. I tried to use the CopyToDevice/CopyToHost to wrap MICM's data_.data() pointers. This should now line up with the VectorMaxtrix within MICM where Kokkos View is just the device-side mirror.

- Add test_kokkos_sparse_matrix.cpp with full test coverage for KokkosSparseMatrix
- Test zero matrix initialization and const zero matrix behavior
- Test scalar assignment and diagonal operations
- Test single and multi-block matrix operations with various ordering policies
- Test print and print non-zero functionality across all vector orderings
- Add using declaration for base class assignment operator in KokkosSparseMatrix
- Register new test executable in CMakeLists.txt with proper linking and test configuration
- Ensures KokkosSparseMatrix implementation is fully validated across different matrix configurations
@bbakernoaa
Copy link
Copy Markdown
Author

@K20shores I added the inheritance from the SparceMatrix base class in the KokkosSparseMatrix class.

@K20shores
Copy link
Copy Markdown
Collaborator

@bbakernoaa I guess that's okay. We don't really need inheritance. We use policy-based-design, which means as long as a class implements a function, it is usable with our policies. Sorry, I could have been more clear here.

@bbakernoaa
Copy link
Copy Markdown
Author

@bbakernoaa I guess that's okay. We don't really need inheritance. We use policy-based-design, which means as long as a class implements a function, it is usable with our policies. Sorry, I could have been more clear here.

I think the purpose of inheriting here is that we don't need to replicate the SparseMatrix in Kokkos. The Kokkos piece is just the device-side mirror and this allows for the policy-based design in MICM to be leveraged within Kokkos without replicating it. If we are going to move the entire MICM SparseMatrix to Kokkos I think thats one thing otherwise I'm not sure I see the benefit here.

Just to note that this is exactly the same pattern as the CudaDenseMatrix and CudaSparseMatrix.

… coverage

- Rename type aliases from KokkosVectorOrdering to KokkosOrdering for clarity
- Add BlockFunction infrastructure tests (ArrayFunction, MultiMatrixArrayFunction)
- Add temporary variable and view reuse tests (MultipleTemporaries, BlockViewReuse)
- Add function reusability and multi-matrix structure tests
- Add sparse + dense/vector matrix interaction tests
- Add ordering compatibility and error condition tests
- Organize tests into logical sections with comments for maintainability
- Expand test coverage across all four vector orderings (1-4) where applicable
@K20shores
Copy link
Copy Markdown
Collaborator

@bbakernoaa fair. I would still like to see the changes in the cmake files. Other than that I think I'm fine with these changes

Copy link
Copy Markdown
Collaborator

@sjsprecious sjsprecious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the user's intent and no impact on the existing C++/CUDA code, the Kokkos implementation looks fine to me.

It seems two Clang CI tests are failing. @K20shores any idea what could be wrong here?

@K20shores
Copy link
Copy Markdown
Collaborator

@sjsprecious yes, it's trying to pull this branch, but the branch lives in a fork so it can't run. We should update these actions so that they don't run on forks. Not an issue that @bbakernoaa should address and should not hold up the merge

Copy link
Copy Markdown
Collaborator

@boulderdaze boulderdaze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have a few minor suggestions that should be straightforward to fix.

Comment on lines +1 to +4
if(MICM_ENABLE_KOKKOS)
add_subdirectory(util)
add_subdirectory(process)
endif()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test/unit/CMakeLists.txt already includes a check for adding the kokos subdirectory. Could you remove the redundant check?

Suggested change
if(MICM_ENABLE_KOKKOS)
add_subdirectory(util)
add_subdirectory(process)
endif()
add_subdirectory(util)
add_subdirectory(process)

Comment on lines +30 to +52
struct ProcessInfoParam
{
std::size_t process_id_;
std::size_t independent_id_;
std::size_t number_of_dependent_reactants_;
std::size_t number_of_products_;
};

/// This struct holds device-side copies of ProcessSet data for use in Kokkos kernels
struct ProcessSetParam
{
Kokkos::View<std::size_t*> number_of_reactants_;
Kokkos::View<std::size_t*> reactant_ids_;
Kokkos::View<std::size_t*> number_of_products_;
Kokkos::View<std::size_t*> product_ids_;
Kokkos::View<double*> yields_;
Kokkos::View<ProcessInfoParam*> jacobian_process_info_;
Kokkos::View<std::size_t*> jacobian_reactant_ids_;
Kokkos::View<std::size_t*> jacobian_product_ids_;
Kokkos::View<double*> jacobian_yields_;
Kokkos::View<std::size_t*> jacobian_flat_ids_;
Kokkos::View<uint8_t*> is_algebraic_variable_;
};
Copy link
Copy Markdown
Collaborator

@boulderdaze boulderdaze Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProcessInfoParam and ProcessSetParam are specific to ProcessSet which are built during ProcessSet construction.
Could you move them to kokkos/process/ directory? That would better mirror the micm structure.

micm/process/process_set.hpp -> micm/kokkos/process/kokkos_process_set.hpp

Also since this file would still contain CopyVectorToView, it might make sense to rename it to something more general, like as kokos_utils.hpp.

@dwfncar dwfncar removed their request for review May 3, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants