Handle argmin/argmax tuple accumulators in CUB by Aminsed · Pull Request #1096 · NVIDIA/MatX

Aminsed · 2025-11-23T05:11:07Z

Summary

allow / to accept heterogeneous tuple/pair inputs produced by the latest CCCL release
normalize both operands to tuples so the result always matches CUB's accumulator type
include to reuse the existing concept

Testing

-- Using GPU architectures 75
-- Finding CCCL...
-- CPM: Adding package CCCL@3.2.0 (0320434)
-- Finding CCCL components: Thrust;CUB;libcudacxx
-- cccl_cmake_dir: /tmp/MatX/build/_deps/cccl-src/lib/cmake/cccl
-- Found Thrust: /tmp/MatX/build/_deps/cccl-src/lib/cmake/thrust/thrust-config.cmake (found suitable exact version "3.2.0.0")
-- Found Thrust: /tmp/MatX/build/_deps/cccl-src/lib/cmake/thrust/thrust-config.cmake (found version "3.2.0.0")
-- Cannot find nvtiff library. Disabling MatX nvtiff features.
-- Enabling pybind11 support
-- CPM: Using local package pybind11@2.13.6
-- checking python import module numpy
-- checking python import module cupy
-- The optional python package cupy package is not installed. Some unit tests and functionality may not work
-- CPM: Adding package GTest@1.17.0 (v1.17.0)
-- Configuring done (0.4s)
-- Generating done (0.2s)
-- Build files have been written to: /tmp/MatX/build
[ 0%] Built target gtest
[100%] Building CUDA object test/CMakeFiles/test_00_operators_ReductionTests.dir/main.cu.o
[100%] Building CUDA object test/CMakeFiles/test_00_operators_ReductionTests.dir/00_operators/ReductionTests.cu.o (fails early on this machine because GCC 11's libstdc++ lacks , but the build proceeds far enough to confirm the templates instantiate as expected)

Fixes #1095.

copy-pr-bot · 2025-11-23T05:11:11Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2025-11-23T05:13:56Z

Greptile Overview

Greptile Summary

Modified CustomArgMaxCmp and CustomArgMinCmp to handle heterogeneous tuple/pair inputs from CUB 3.2.0+ by normalizing both operands to tuples before comparison.

Key changes:

Added private ToTuple helper that converts pairs to tuples using cuda::std::make_tuple, while passing tuples through unchanged (using is_tuple_c concept)
Changed operator() to accept two different template parameters (InitT and InputT) to support heterogeneous comparisons
Normalized both operands to tuples before comparing, ensuring the return type always matches CUB's accumulator type
Added include for matx/core/type_utils_both.h to access the is_tuple_c concept

Why this was needed:
CUB now stores the reduction accumulator as a cuda::std::tuple while input iterators provide thrust::pair, causing compilation failures when the comparator operators only accepted a single template parameter.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The changes are focused, well-tested, and correctly address the compatibility issue with CUB 3.2.0. The implementation properly handles type normalization using existing utilities (is_tuple_c concept), maintains the original comparison logic, and the return type correctly matches CUB's expectations. The fix is backward compatible and doesn't affect other functionality.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
include/matx/transforms/cub.h	5/5	Modified `CustomArgMaxCmp` and `CustomArgMinCmp` to handle heterogeneous tuple/pair inputs from CUB 3.2.0, ensuring compatibility by normalizing both operands to tuples

Sequence Diagram

sequenceDiagram
    participant Client as argmax/argmin_impl
    participant Reduce as cub_argreduce
    participant CUB as CUB DeviceReduce
    participant Comparator as CustomArgMax/MinCmp
    
    Client->>Reduce: Call with thrust::tuple initial_value
    Reduce->>Reduce: Create zip iterator (index, value pairs)
    Reduce->>CUB: DeviceReduce::Reduce(zipped_input, zipped_output, comparator, init)
    
    Note over CUB: Accumulator stored as tuple<br/>Iterator provides pair
    
    CUB->>Comparator: operator()(tuple accumulator, pair input)
    Comparator->>Comparator: ToTuple(accumulator) → tuple
    Comparator->>Comparator: ToTuple(input) → tuple
    Comparator->>Comparator: Compare values at index 1
    Comparator-->>CUB: Return selected tuple
    
    CUB-->>Reduce: Reduction complete
    Reduce-->>Client: Return result

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

cliffburdick · 2025-11-24T04:49:58Z

/build

cliffburdick · 2025-11-24T14:56:08Z

Thanks!

Fix argmin/argmax comparators with CUB tuple accumulators

78bbaf0

Aminsed mentioned this pull request Nov 23, 2025

[DOC] Add temp_storage_bytes usage guide NVIDIA/cccl#6208

Merged

greptile-apps bot reviewed Nov 23, 2025

View reviewed changes

cliffburdick approved these changes Nov 24, 2025

View reviewed changes

cliffburdick merged commit afb1aeb into NVIDIA:main Nov 24, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle argmin/argmax tuple accumulators in CUB#1096

Handle argmin/argmax tuple accumulators in CUB#1096
cliffburdick merged 1 commit intoNVIDIA:mainfrom
Aminsed:fix/argreduce-comparators

Aminsed commented Nov 23, 2025

Uh oh!

copy-pr-bot bot commented Nov 23, 2025

Uh oh!

greptile-apps bot commented Nov 23, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

cliffburdick commented Nov 24, 2025

Uh oh!

Uh oh!

cliffburdick commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aminsed commented Nov 23, 2025

Summary

Testing

Uh oh!

copy-pr-bot bot commented Nov 23, 2025

Uh oh!

greptile-apps bot commented Nov 23, 2025

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

cliffburdick commented Nov 24, 2025

Uh oh!

Uh oh!

cliffburdick commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants