Skip to content

ONNX models are non-portable and break with dynamic inputs #691

@andreszs

Description

@andreszs

Faster-RCNN ONNX models are non-portable and break with dynamic inputs

Summary

The current Faster-RCNN ONNX exports (TorchVision / ONNX Model Zoo variants) are not portable and cannot be reliably used in generic inference pipelines.
They fail deterministically when used with dynamic image sizes or standard preprocessing, even when the ONNX graph itself loads successfully.

This is not a runtime, adapter, or inference-framework bug — the issue is caused by how the model is exported.


Observed failures

When running inference with valid inputs, ONNX Runtime crashes with errors such as:

Non-zero status code returned while running Add node
left operand cannot broadcast on dim 3
LeftShape: {1,256,50,75}
RightShape: {1,256,50,76}

Additional common failures include:

  • Invalid rank for input: image (Got 4, Expected 3)
  • Unexpected input data type: uint8 (expected float)
  • Silent zero-detection outputs depending on preprocessing

Root cause

The exported ONNX graph:

  • Assumes exact input resolutions matching the export image size
  • Relies on implicit PyTorch padding behavior that does not translate to ONNX
  • Uses feature map merges (Add) that require identical spatial dimensions
  • Is not safe for dynamic shapes, despite appearing to support them

In PyTorch this is hidden by runtime padding logic; in ONNX it results in hard failures.


Why this matters

These models:

  • Cannot be safely integrated into generic inference frameworks
  • Break when image sizes are not perfectly aligned to internal strides
  • Waste significant debugging time for downstream users
  • Give the false impression of ONNX compatibility

In practice, they are non-portable artifacts, not production-ready ONNX models.


Suggested actions to fix this properly

To make Faster-RCNN ONNX exports usable, one or more of the following should be implemented:

  1. Enforce fixed input resolution
    Export with static input shapes and explicitly document the required resolution.

  2. Add explicit padding or shape alignment
    Insert Pad or Resize nodes so all FPN branches always match spatial dimensions.

  3. Guarantee dynamic-shape safety
    Ensure all spatial merges are shape-safe and validated at export time.

  4. Provide a deployment-grade ONNX variant
    Separate training-oriented graphs from inference-safe deployment exports.


Recommendation

Until these issues are addressed, Faster-RCNN ONNX models should be considered experimental and clearly documented as unsafe for dynamic input sizes.


Closing note

Other detection architectures (YOLOv8, YOLO-NAS, RT-DETR) demonstrate that robust, portable ONNX exports are achievable.
Faster-RCNN can reach the same level, but only with explicit export-time discipline.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions