-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Faster-RCNN ONNX models are non-portable and break with dynamic inputs
Summary
The current Faster-RCNN ONNX exports (TorchVision / ONNX Model Zoo variants) are not portable and cannot be reliably used in generic inference pipelines.
They fail deterministically when used with dynamic image sizes or standard preprocessing, even when the ONNX graph itself loads successfully.
This is not a runtime, adapter, or inference-framework bug — the issue is caused by how the model is exported.
Observed failures
When running inference with valid inputs, ONNX Runtime crashes with errors such as:
Non-zero status code returned while running Add node
left operand cannot broadcast on dim 3
LeftShape: {1,256,50,75}
RightShape: {1,256,50,76}
Additional common failures include:
- Invalid rank for input: image (Got 4, Expected 3)
- Unexpected input data type: uint8 (expected float)
- Silent zero-detection outputs depending on preprocessing
Root cause
The exported ONNX graph:
- Assumes exact input resolutions matching the export image size
- Relies on implicit PyTorch padding behavior that does not translate to ONNX
- Uses feature map merges (Add) that require identical spatial dimensions
- Is not safe for dynamic shapes, despite appearing to support them
In PyTorch this is hidden by runtime padding logic; in ONNX it results in hard failures.
Why this matters
These models:
- Cannot be safely integrated into generic inference frameworks
- Break when image sizes are not perfectly aligned to internal strides
- Waste significant debugging time for downstream users
- Give the false impression of ONNX compatibility
In practice, they are non-portable artifacts, not production-ready ONNX models.
Suggested actions to fix this properly
To make Faster-RCNN ONNX exports usable, one or more of the following should be implemented:
-
Enforce fixed input resolution
Export with static input shapes and explicitly document the required resolution. -
Add explicit padding or shape alignment
Insert Pad or Resize nodes so all FPN branches always match spatial dimensions. -
Guarantee dynamic-shape safety
Ensure all spatial merges are shape-safe and validated at export time. -
Provide a deployment-grade ONNX variant
Separate training-oriented graphs from inference-safe deployment exports.
Recommendation
Until these issues are addressed, Faster-RCNN ONNX models should be considered experimental and clearly documented as unsafe for dynamic input sizes.
Closing note
Other detection architectures (YOLOv8, YOLO-NAS, RT-DETR) demonstrate that robust, portable ONNX exports are achievable.
Faster-RCNN can reach the same level, but only with explicit export-time discipline.