Add FMA function for the fltflt data type#1123
Merged
cliffburdick merged 1 commit intomainfrom Jan 26, 2026
Merged
Conversation
fltflt_fma() performs a * b + c for fltflt types more efficiently than a fltflt_mul() followed by a fltflt_add(). The fused function can perform one fewer normalization than the separate functions. This PR also switches from function names like fltflt_add_float(fltflt, float) to overloads of fltflt_add(). The former were intended to be more easily usable in a C context, but the file now contains many other C++ features (ctors, conversion operators, comparison operators, etc.).w Signed-off-by: Thomas Benson <tbenson@nvidia.com>
Contributor
Greptile SummaryAdded fused multiply-add ( Key improvements:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant ComputeRangeToPixel
participant fltflt_fma
participant fltflt_two_prod_fma
participant fltflt_two_sum
participant fltflt_fast_two_sum
participant fltflt_sqrt
User->>ComputeRangeToPixel: Compute distance
ComputeRangeToPixel->>fltflt_fma: fltflt_fma(dx, dx, dy * dy)
Note over fltflt_fma: Compute dx² + dy²
fltflt_fma->>fltflt_two_prod_fma: Multiply a.hi * b.hi
fltflt_two_prod_fma-->>fltflt_fma: Return product with error term
fltflt_fma->>fltflt_fma: Add cross terms with fmaf_rn()
fltflt_fma->>fltflt_two_sum: Add product to c (skip intermediate normalization)
fltflt_two_sum-->>fltflt_fma: Return sum with error term
fltflt_fma->>fltflt_fma: Add p.lo component
fltflt_fma->>fltflt_fast_two_sum: Normalize once
fltflt_fast_two_sum-->>fltflt_fma: Return normalized result
fltflt_fma->>fltflt_fma: Add c.lo component
fltflt_fma->>fltflt_fast_two_sum: Final normalization
fltflt_fast_two_sum-->>fltflt_fma: Return final result
fltflt_fma-->>ComputeRangeToPixel: dx² + dy²
ComputeRangeToPixel->>fltflt_fma: fltflt_fma(dz, dz, dx2dy2)
Note over fltflt_fma: Compute dz² + (dx² + dy²)
fltflt_fma-->>ComputeRangeToPixel: dx² + dy² + dz²
ComputeRangeToPixel->>fltflt_sqrt: sqrt(dx² + dy² + dz²)
fltflt_sqrt-->>ComputeRangeToPixel: Final distance
ComputeRangeToPixel-->>User: Return range to pixel
|
cliffburdick
approved these changes
Jan 26, 2026
Collaborator
|
/build |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fltflt_fma() performs a * b + c for fltflt types more efficiently than a fltflt_mul() followed by a fltflt_add(). The fused function can perform one fewer normalization than the separate functions.
This PR also switches from function names like fltflt_add_float(fltflt, float) to overloads of fltflt_add(). The former were intended to be more easily usable in a C context, but the file now contains many other C++ features (ctors, conversion operators, comparison operators, etc.).w