Skip to content

Optimize tensor indexing for ranks 1-4 with explicit stride calculations#1057

Merged
cliffburdick merged 1 commit intomainfrom
explicit_stride
Sep 26, 2025
Merged

Optimize tensor indexing for ranks 1-4 with explicit stride calculations#1057
cliffburdick merged 1 commit intomainfrom
explicit_stride

Conversation

@cliffburdick
Copy link
Collaborator

Replace recursive GetValC template instantiation with GetOffsetOptimized that provides explicit constexpr branches for tensor ranks 1-4. This allows the compiler to better optimize stride calculations by making all multiplications visible at compile time.

Changes:

  • Add GetOffsetOptimized() with unrolled stride calculations for ranks 1-4
  • Use cuda::std::array for index storage instead of tuple operations
  • Update operator() and GetPointer() to use the optimized function
  • Fall back to recursive GetValC only for tensors with rank > 4

This addresses compiler optimization issues where GetValC's recursive template instantiation prevented effective optimization of the common case tensor operations.

Replace recursive GetValC template instantiation with GetOffsetOptimized
that provides explicit constexpr branches for tensor ranks 1-4. This
allows the compiler to better optimize stride calculations by making
all multiplications visible at compile time.

Changes:
- Add GetOffsetOptimized() with unrolled stride calculations for ranks 1-4
- Use cuda::std::array for index storage instead of tuple operations
- Update operator() and GetPointer() to use the optimized function
- Fall back to recursive GetValC only for tensors with rank > 4

This addresses compiler optimization issues where GetValC's recursive
template instantiation prevented effective optimization of the common
case tensor operations.
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cliffburdick
Copy link
Collaborator Author

/build

@cliffburdick cliffburdick merged commit 6dae1ff into main Sep 26, 2025
1 check failed
@cliffburdick cliffburdick deleted the explicit_stride branch September 26, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant