Add SAR backprojection transform by tbensonatl · Pull Request #1108 · NVIDIA/MatX

tbensonatl · 2026-01-05T19:45:17Z

Add initial version of a synthetic aperture radar (SAR) backprojection operator. The operator is currently in the matx::experimental namespace as its API is subject to change. A large focus of this implementation is offering reasonable performance on platforms with reduced fp64 throughput. A ComputeType parameter indicates the overall computational mode. The ComputeType drives part of the accuracy-performance trade-off, with the types of the input/output tensors driving the remainder. The options are as follows:

Double: uses predominately fp64 operations.
Mixed: uses fp32 for the less sensitive intermediate calculations and fp64 for the more sensitive calculations. This is an improvement even on systems with full-throughput fp64.
Float: uses predominately fp32 calculations.
FloatFloat: uses a float-float representation to achieve close-to-fp64 precision by representing values as a pair of single-precision floats. Manipulation of float-float values uses only fp32 operations, other than fp64 conversion instructions when converting between double and FloatFloat. This mode offers improved performance on systems with lower fp64 throughput but is significantly slower on those with full-throughput fp64 due to the use of many fp32 operations to manipulate float-float values (e.g., adding two float-float values uses 20 fp32 instructions).

This initial SAR backprojector supports single and double-precision backprojection, but does not yet implement further optimizations or mixed precision. Validation is still underway. This implementation has the following assumptions/limitations: 1. It largely assumes that the data has been motion-compenstated to some mocomp point (potentially per pulse). The operator accepts an operator with the range to the mocomp point and uses differential range calculations in the backprojector. 2. It accepts an operator with per-voxel coordinates. For regular grids, it would be more efficient to accept, e.g., min_x, min_y, dx, dy, etc. and construct the regular grid in the kernel. The more general implementation supports reconstructions onto polar or other irregular grids. 3. It does not yet include any optimizations. Signed-off-by: Thomas Benson <tbenson@nvidia.com>

…tion-transform

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

copy-pr-bot · 2026-01-05T19:45:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cliffburdick · 2026-01-05T19:46:39Z

/build

greptile-apps · 2026-01-05T19:48:48Z

Greptile Summary

Added initial experimental SAR backprojection operator with multi-precision compute modes (Double, Mixed, Float, FloatFloat) designed to optimize performance on platforms with reduced fp64 throughput.

Key changes:

Implemented matx::experimental::sar_bp operator following MatX operator patterns with proper PreRun/PostRun lifecycle
Created float-float arithmetic library (fltflt.h) implementing extended precision using pairs of single-precision floats, based on algorithms from Thall, Knuth, and Dekker
CUDA kernel supports four compute modes with optional phase LUT optimization for improved performance
FloatFloat mode achieves near-fp64 precision using only fp32 operations (except for conversion instructions), beneficial for GPUs with reduced fp64 throughput
Comprehensive test suite validates all compute modes including point target scenario
Properly integrated into build system and documentation structure

Confidence Score: 5/5

This PR is safe to merge with minimal risk
Well-architected implementation following MatX conventions with comprehensive validation. Clean separation of concerns across transform/operator/kernel layers. Thorough test coverage validates correctness across all compute modes. Previous review comments addressed.
No files require special attention

Important Files Changed

Filename	Overview
include/matx/kernels/fltflt.h	Implements float-float arithmetic library for extended precision using pairs of floats. Clean implementation following established algorithms from Thall, Knuth, Dekker.
include/matx/kernels/sar_bp.cuh	Implements SAR backprojection CUDA kernel with multiple compute precision modes (Double, Mixed, Float, FloatFloat). Well-structured with proper synchronization.
include/matx/operators/sar_bp.h	Defines SAR BP operator interface with params, compute types, and feature flags. Proper operator pattern implementation following MatX conventions.
include/matx/transforms/sar_bp.h	Transform implementation layer dispatching to kernels based on compute type and features. Proper validation and workspace allocation.
test/00_transform/SarBp.cu	Comprehensive test suite covering all compute modes with non-mixed types, mixed precision, and point target validation scenarios.

Sequence Diagram

sequenceDiagram
    participant User
    participant SarBpOp as sar_bp Operator
    participant Transform as Transform Layer
    participant Kernel as CUDA Kernel
    participant FltFlt as Float-Float Lib

    User->>SarBpOp: Call sar_bp(initial_image, range_profiles, platform_positions, voxel_locations, range_to_mcp, params)
    SarBpOp->>SarBpOp: Validate inputs (rank, types)
    SarBpOp->>Transform: Exec() with parameters
    Transform->>Transform: Check FloatFloat requires PhaseLUT
    alt PhaseLUTOptimization enabled
        Transform->>Transform: Allocate workspace for phase LUT
        Transform->>Kernel: Launch SarBpFillPhaseLUT kernel
        Kernel-->>Transform: Phase LUT populated
    end
    alt ComputeType == FloatFloat
        Transform->>Kernel: Launch SarBp kernel (FloatFloat mode)
        loop Each pixel
            loop Each pulse block
                Kernel->>FltFlt: Convert platform positions to fltflt
                Kernel->>FltFlt: ComputeRangeToPixelFloatFloat()
                FltFlt-->>Kernel: High-precision range
                Kernel->>Kernel: Interpolate range profiles
                Kernel->>Kernel: Apply phase correction from LUT
                Kernel->>Kernel: Accumulate contribution
            end
        end
    else ComputeType == Double/Mixed/Float
        Transform->>Kernel: Launch SarBp kernel
        loop Each pixel
            loop Each pulse
                Kernel->>Kernel: ComputeRangeToPixel()
                Kernel->>Kernel: Interpolate range profiles
                alt PhaseLUT enabled
                    Kernel->>Kernel: Get phase from LUT + incremental
                else
                    Kernel->>Kernel: Compute phase with sincos
                end
                Kernel->>Kernel: Accumulate contribution
            end
        end
    end
    Kernel-->>Transform: Output image populated
    Transform-->>SarBpOp: Execution complete
    SarBpOp-->>User: Return result operator

greptile-apps

_{8 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

greptile-apps · 2026-01-05T20:56:39Z

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

cliffburdick · 2026-01-05T21:25:50Z

/build

cliffburdick · 2026-01-05T22:29:57Z

/build

cliffburdick · 2026-01-05T22:38:53Z

/build

cliffburdick · 2026-01-05T22:46:24Z

/build

tbensonatl added 14 commits September 7, 2025 15:26

Add mixed precision SAR BP implementation

58b7710

Update copyright years

1cc9e67

Merge branch 'main' of github.com:NVIDIA/MatX into add-sar-backprojec…

5269157

…tion-transform

Add incremental phase calculation optimization

db07ca7

Store phase lut terms in loose_compute_t format

5521ffd

Remove dblflt header

821df9a

Add Newton-Raphson based square root optimization

f16adc7

Only use Newton-Raphson sqrt on FP64-reduced GPUs

b4baefe

Add float-float implementation of SAR backprojector

5d27125

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

Add unit tests for SAR backprojector

a9c155f

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

Move sar_bp operator to matx::experimental namespace

b3c863d

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

Add documentation for sar_bp operator

f62080a

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

Update copyright year and fix comment typos

738d77d

Signed-off-by: Thomas Benson <tbenson@nvidia.com>

tbensonatl requested review from cliffburdick and luitjens January 5, 2026 19:45

tbensonatl self-assigned this Jan 5, 2026

greptile-apps bot reviewed Jan 5, 2026

View reviewed changes

Comment thread include/matx/operators/sar_bp.h Outdated

Comment thread include/matx/kernels/sar_bp.cuh