Skip to content

Fixed issue where op=transform was double-calling transform#1037

Merged
cliffburdick merged 2 commits intomainfrom
lhs_op_fix
Aug 15, 2025
Merged

Fixed issue where op=transform was double-calling transform#1037
cliffburdick merged 2 commits intomainfrom
lhs_op_fix

Conversation

@cliffburdick
Copy link
Collaborator

@cliffburdick cliffburdick commented Aug 13, 2025

When a statement op = transform occurred where op was not a tensor, the transform was being called twice incorrectly leading to poor performance. This simplifies the code and calls it once.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cliffburdick
Copy link
Collaborator Author

/build

tbensonatl added a commit that referenced this pull request Aug 14, 2025
Signed-off-by: Thomas Benson <tbenson@nvidia.com>
Copy link
Collaborator

@tbensonatl tbensonatl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reshape.h needs to be added, but otherwise this change looks good to me.

@cliffburdick
Copy link
Collaborator Author

/build

@cliffburdick cliffburdick merged commit a5b571e into main Aug 15, 2025
1 check passed
@cliffburdick cliffburdick deleted the lhs_op_fix branch August 15, 2025 14:05
cliffburdick pushed a commit that referenced this pull request Aug 25, 2025
* Add new zipvec operator

Add a new zipvec operator that zips multiple input operators into
a vectorized operation (e.g., an operator with type float3 when
zipping x, y, and z coordinates).

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

* Update zipvec documentation

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

* Address part of review comments

Use cuda::std namespace for template helpers, update copyright
data and static assertion comment

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

* Remove is_narrowing_conversion helpers and add get_impl
helper for zipvec operator() methods.

* Remove support for half types

* Use scalar loads for vectorized types

* Handle the sizeof(T) != alignment_by_type<T>() only in load()

* Special-case alignment checks for sizeof(T) != alignment_by_type<T>()

* Address remaining review feedback

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

* Remove use of mtie in zipvec (see PR #1037)

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

* Add back assignment operator with self_type

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

* Update ZipVecOp class documentation block

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

---------

Signed-off-by: Thomas Benson <tbenson@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants