Fixes for 32-bit builds. Tested w/ gcc 11.4 and CTK 12.9 by tbensonatl · Pull Request #1120 · NVIDIA/MatX

tbensonatl · 2026-01-15T20:03:20Z

Adjust the make_tensor constraints to not match allocators to a ShapeType. Use MATX_INDEX_T_FMT in place of lld and explicitly convert indices in filter.cuh to index_t.

Adjust the make_tensor constraints to not match allocators to a ShapeType. Use MATX_INDEX_T_FMT in place of lld and explicitly convert indices in filter.cuh to index_t. Signed-off-by: Thomas Benson <tbenson@nvidia.com>

copy-pr-bot · 2026-01-15T20:03:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-01-15T20:05:53Z

Greptile Summary

Fixed compilation issues for 32-bit builds by addressing type mismatches and format string incompatibilities.

Added is_tuple_c constraint to make_tensor and make_tensor_p template functions to prevent allocator types from incorrectly matching the ShapeType parameter
Changed loop iteration variables from uint32_t to index_t in filter.cuh and added explicit casts for CUDA built-in variables (blockIdx.y) to avoid implicit conversion warnings when index_t is 32-bit
Replaced hardcoded %lld format specifier with MATX_INDEX_T_FMT macro in printf statement to correctly handle both 32-bit (int32_t) and 64-bit (long long int) index types

All changes are well-targeted fixes for 32-bit compatibility without affecting 64-bit builds or runtime behavior.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
All changes are minimal, well-scoped fixes for 32-bit build compatibility. The type constraint addition prevents incorrect template matching, explicit casts ensure proper type conversions, and the format string macro usage follows existing patterns. Changes have been tested with gcc 11.4 and CTK 12.9.
No files require special attention

Important Files Changed

Filename	Overview
include/matx/core/make_tensor.h	Added `is_tuple_c` constraint to prevent allocators from matching the ShapeType parameter in `make_tensor` and `make_tensor_p` overloads
include/matx/kernels/filter.cuh	Changed loop counters from `uint32_t` to `index_t` and added explicit casts for `blockIdx.y` and index calculations to `index_t` for 32-bit build compatibility
examples/black_scholes.cu	Replaced hardcoded format string `%lld` with `MATX_INDEX_T_FMT` macro to handle both 32-bit and 64-bit index types correctly in printf statement

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant MT as make_tensor/make_tensor_p
    participant Filter as RecursiveFilter Kernel
    participant Print as printf

    Note over User,Print: 32-bit Build Compilation Flow

    User->>MT: Call make_tensor(data, shape)
    Note over MT: Before: ShapeType could match allocators
    Note over MT: After: is_tuple_c constraint ensures<br/>ShapeType is a tuple
    MT-->>User: Return tensor_t

    User->>Filter: Launch kernel with tensor
    Note over Filter: Loop counters changed from<br/>uint32_t to index_t
    Filter->>Filter: Access d_in/d_out with<br/>static_cast<index_t>(blockIdx.y)
    Note over Filter: Explicit casts prevent<br/>implicit conversion warnings
    Filter-->>User: Kernel complete

    User->>Print: printf with index value
    Note over Print: Format string uses<br/>MATX_INDEX_T_FMT macro
    Note over Print: 32-bit: "d", 64-bit: "lld"
    Print-->>User: Correct output

cliffburdick · 2026-01-15T20:09:23Z

/build

cliffburdick · 2026-01-15T20:09:37Z

Fixes #1117

Fixes for 32-bit builds. Tested w/ gcc 11.4 and CTK 12.9

bdf7611

Adjust the make_tensor constraints to not match allocators to a ShapeType. Use MATX_INDEX_T_FMT in place of lld and explicitly convert indices in filter.cuh to index_t. Signed-off-by: Thomas Benson <tbenson@nvidia.com>

tbensonatl requested a review from cliffburdick January 15, 2026 20:03

tbensonatl self-assigned this Jan 15, 2026

cliffburdick approved these changes Jan 15, 2026

View reviewed changes

cliffburdick mentioned this pull request Jan 15, 2026

Fix 32-bit build issues #1118

Closed

tbensonatl merged commit b00d894 into main Jan 16, 2026
2 checks passed

tbensonatl deleted the bugfix/alternate-32b-build-fixes branch January 23, 2026 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for 32-bit builds. Tested w/ gcc 11.4 and CTK 12.9#1120

Fixes for 32-bit builds. Tested w/ gcc 11.4 and CTK 12.9#1120
tbensonatl merged 1 commit intomainfrom
bugfix/alternate-32b-build-fixes

tbensonatl commented Jan 15, 2026

Uh oh!

copy-pr-bot bot commented Jan 15, 2026

Uh oh!

greptile-apps bot commented Jan 15, 2026

Uh oh!

cliffburdick commented Jan 15, 2026

Uh oh!

cliffburdick commented Jan 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tbensonatl commented Jan 15, 2026

Uh oh!

copy-pr-bot bot commented Jan 15, 2026

Uh oh!

greptile-apps bot commented Jan 15, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

cliffburdick commented Jan 15, 2026

Uh oh!

cliffburdick commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cliffburdick commented Jan 15, 2026 •

edited

Loading