ONNX models are non-portable and break with dynamic inputs

## Faster-RCNN ONNX models are non-portable and break with dynamic inputs

### Summary

The current Faster-RCNN ONNX exports (TorchVision / ONNX Model Zoo variants) are not portable and cannot be reliably used in generic inference pipelines.  
They fail deterministically when used with dynamic image sizes or standard preprocessing, even when the ONNX graph itself loads successfully.

This is not a runtime, adapter, or inference-framework bug — the issue is caused by how the model is exported.

---

### Observed failures

When running inference with valid inputs, ONNX Runtime crashes with errors such as:

Non-zero status code returned while running Add node  
left operand cannot broadcast on dim 3  
LeftShape:  {1,256,50,75}  
RightShape: {1,256,50,76}

Additional common failures include:

- Invalid rank for input: image (Got 4, Expected 3)
- Unexpected input data type: uint8 (expected float)
- Silent zero-detection outputs depending on preprocessing

---

### Root cause

The exported ONNX graph:

- Assumes exact input resolutions matching the export image size
- Relies on implicit PyTorch padding behavior that does not translate to ONNX
- Uses feature map merges (Add) that require identical spatial dimensions
- Is not safe for dynamic shapes, despite appearing to support them

In PyTorch this is hidden by runtime padding logic; in ONNX it results in hard failures.

---

### Why this matters

These models:

- Cannot be safely integrated into generic inference frameworks
- Break when image sizes are not perfectly aligned to internal strides
- Waste significant debugging time for downstream users
- Give the false impression of ONNX compatibility

In practice, they are non-portable artifacts, not production-ready ONNX models.

---

### Suggested actions to fix this properly

To make Faster-RCNN ONNX exports usable, one or more of the following should be implemented:

1. Enforce fixed input resolution  
   Export with static input shapes and explicitly document the required resolution.

2. Add explicit padding or shape alignment  
   Insert Pad or Resize nodes so all FPN branches always match spatial dimensions.

3. Guarantee dynamic-shape safety  
   Ensure all spatial merges are shape-safe and validated at export time.

4. Provide a deployment-grade ONNX variant  
   Separate training-oriented graphs from inference-safe deployment exports.

---

### Recommendation

Until these issues are addressed, Faster-RCNN ONNX models should be considered experimental and clearly documented as unsafe for dynamic input sizes.

---

### Closing note

Other detection architectures (YOLOv8, YOLO-NAS, RT-DETR) demonstrate that robust, portable ONNX exports are achievable.  
Faster-RCNN can reach the same level, but only with explicit export-time discipline.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX models are non-portable and break with dynamic inputs #691

Faster-RCNN ONNX models are non-portable and break with dynamic inputs

Summary

Observed failures

Root cause

Why this matters

Suggested actions to fix this properly

Recommendation

Closing note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX models are non-portable and break with dynamic inputs #691

Description

Faster-RCNN ONNX models are non-portable and break with dynamic inputs

Summary

Observed failures

Root cause

Why this matters

Suggested actions to fix this properly

Recommendation

Closing note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions