Skip to content

Commit 74a8694

Browse files
authored
[OMNIML-1525] Create a folder for the plugin example (#1114)
### What does this PR do? Type of change: New example - Created a folder for the plugin example. - Removed reference from the documentation. - Updated instructions - Original example: https://github.com/leimao/TensorRT-Custom-Plugin-Example ### Testing Able to run example end to end ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain why. --> - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A <!--- Mandatory --> - Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Clarified Docker/prerequisite notes and updated the quantization workflow with self-contained example instructions. * **New Features** * Added a self-contained example for quantizing ONNX models with a custom operator (no external repo required). * Provided a ready-to-build custom operator plugin and a script to generate an ONNX identity-style test model for PTQ and TensorRT deployment. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
1 parent 5888979 commit 74a8694

10 files changed

Lines changed: 781 additions & 28 deletions

examples/onnx_ptq/README.md

Lines changed: 14 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -172,53 +172,39 @@ python -m modelopt.onnx.quantization \
172172

173173
This feature requires `TensorRT 10+` and `ORT>=1.20`. For proper usage, please make sure that the paths to `libcudnn*.so` and TensorRT `lib/` are in the `LD_LIBRARY_PATH` env variable and that the `tensorrt` python package is installed.
174174

175-
Please see the sample example below.
175+
A self-contained example is provided in the [`custom_op_plugin/`](./custom_op_plugin/) subfolder, based on [leimao/TensorRT-Custom-Plugin-Example](https://github.com/leimao/TensorRT-Custom-Plugin-Example). Please see the steps below.
176176

177-
**Step 1**: Obtain the sample ONNX model and TensorRT plugin from [TensorRT-Custom-Plugin-Example](https://github.com/leimao/TensorRT-Custom-Plugin-Example).
177+
**Step 1**: Build the TensorRT plugin and create the sample ONNX model.
178178

179-
&#160; **1.1.** Change directory to `TensorRT-Custom-Plugin-Example`:
179+
&#160; **1.1.** Compile the TensorRT plugin:
180180

181181
```bash
182-
cd /path/to/TensorRT-Custom-Plugin-Example
182+
cmake -S custom_op_plugin/plugin -B /tmp/plugin_build
183+
cmake --build /tmp/plugin_build --config Release --parallel
183184
```
184185

185-
&#160; **1.2.** Compile the TensorRT plugin:
186+
This generates `/tmp/plugin_build/libidentity_conv_plugin.so`.
186187

187-
```bash
188-
cmake -B build \
189-
-DNVINFER_LIB=$TRT_LIBPATH/libnvinfer.so.10 \
190-
-DNVINFER_PLUGIN_LIB=$TRT_LIBPATH/libnvinfer_plugin.so.10 \
191-
-DNVONNXPARSER_LIB=$TRT_LIBPATH/libnvonnxparser.so.10 \
192-
-DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES=/usr/include/x86_64-linux-gnu
193-
```
194-
195-
```bash
196-
cmake --build build --config Release --parallel
197-
```
198-
199-
This generates a plugin in `TensorRT-Custom-Plugin-Example/build/src/plugins/IdentityConvIPluginV2IOExt/libidentity_conv_iplugin_v2_io_ext.so`
200-
201-
&#160; **1.3.** Create the ONNX file.
188+
&#160; **1.2.** Create the ONNX model with a custom `IdentityConv` operator:
202189

203190
```bash
204-
python scripts/create_identity_neural_network.py
191+
python custom_op_plugin/create_identity_neural_network.py \
192+
--output_path=/tmp/identity_neural_network.onnx
205193
```
206194

207-
This generates the identity_neural_network.onnx model in `TensorRT-Custom-Plugin-Example/data/identity_neural_network.onnx`
208-
209-
**Step 2**: Quantize the ONNX model. We will be using the `libidentity_conv_iplugin_v2_io_ext.so` plugin for this example.
195+
**Step 2**: Quantize the ONNX model using the compiled plugin.
210196

211197
```bash
212198
python -m modelopt.onnx.quantization \
213-
--onnx_path=/path/to/identity_neural_network.onnx \
214-
--trt_plugins=/path/to/libidentity_conv_iplugin_v2_io_ext.so
199+
--onnx_path=/tmp/identity_neural_network.onnx \
200+
--trt_plugins=/tmp/plugin_build/libidentity_conv_plugin.so
215201
```
216202

217203
**Step 3**: Deploy the quantized model with TensorRT.
218204

219205
```bash
220-
trtexec --onnx=/path/to/identity_neural_network.quant.onnx \
221-
--staticPlugins=/path/to/libidentity_conv_iplugin_v2_io_ext.so
206+
trtexec --onnx=/tmp/identity_neural_network.quant.onnx \
207+
--staticPlugins=/tmp/plugin_build/libidentity_conv_plugin.so
222208
```
223209

224210
### Optimize Q/DQ node placement with Autotune
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""Create a simple identity neural network with a custom IdentityConv operator.
17+
18+
This script generates an ONNX model consisting of three convolutional layers where the
19+
second Conv node is replaced with a custom ``IdentityConv`` operator. The custom operator
20+
is not defined in the standard ONNX operator set and requires a TensorRT plugin to parse.
21+
22+
Based on https://github.com/leimao/TensorRT-Custom-Plugin-Example.
23+
"""
24+
25+
import argparse
26+
import os
27+
28+
import numpy as np
29+
import onnx
30+
import onnx_graphsurgeon as gs
31+
32+
33+
def create_identity_neural_network(output_path: str) -> None:
34+
"""Create and save an ONNX model with a custom IdentityConv operator."""
35+
opset_version = 15
36+
37+
input_shape = (1, 3, 480, 960)
38+
input_channels = input_shape[1]
39+
40+
# Configure identity convolution weights (depthwise, 1x1 kernel with all ones).
41+
weights_shape = (input_channels, 1, 1, 1)
42+
num_groups = input_channels
43+
weights_data = np.ones(weights_shape, dtype=np.float32)
44+
45+
# Build the ONNX graph using onnx-graphsurgeon.
46+
x0 = gs.Variable(name="X0", dtype=np.float32, shape=input_shape)
47+
w0 = gs.Constant(name="W0", values=weights_data)
48+
x1 = gs.Variable(name="X1", dtype=np.float32, shape=input_shape)
49+
w1 = gs.Constant(name="W1", values=weights_data)
50+
x2 = gs.Variable(name="X2", dtype=np.float32, shape=input_shape)
51+
w2 = gs.Constant(name="W2", values=weights_data)
52+
x3 = gs.Variable(name="X3", dtype=np.float32, shape=input_shape)
53+
54+
conv_attrs = {
55+
"kernel_shape": [1, 1],
56+
"strides": [1, 1],
57+
"pads": [0, 0, 0, 0],
58+
"group": num_groups,
59+
}
60+
61+
node_1 = gs.Node(name="Conv-1", op="Conv", inputs=[x0, w0], outputs=[x1], attrs=conv_attrs)
62+
63+
# The second node uses the custom IdentityConv operator instead of standard Conv.
64+
# This operator requires a TensorRT plugin to be loaded at runtime.
65+
node_2 = gs.Node(
66+
name="Conv-2",
67+
op="IdentityConv",
68+
inputs=[x1, w1],
69+
outputs=[x2],
70+
attrs={
71+
**conv_attrs,
72+
"plugin_version": "1",
73+
"plugin_namespace": "",
74+
},
75+
)
76+
77+
node_3 = gs.Node(name="Conv-3", op="Conv", inputs=[x2, w2], outputs=[x3], attrs=conv_attrs)
78+
79+
graph = gs.Graph(
80+
nodes=[node_1, node_2, node_3],
81+
inputs=[x0],
82+
outputs=[x3],
83+
opset=opset_version,
84+
)
85+
model = gs.export_onnx(graph)
86+
# Shape inference does not work with the custom operator.
87+
dirname = os.path.dirname(output_path)
88+
if dirname:
89+
os.makedirs(dirname, exist_ok=True)
90+
onnx.save(model, output_path)
91+
print(f"Saved ONNX model to {output_path}")
92+
93+
94+
if __name__ == "__main__":
95+
parser = argparse.ArgumentParser(
96+
description="Create an ONNX model with a custom IdentityConv operator."
97+
)
98+
parser.add_argument(
99+
"--output_path",
100+
type=str,
101+
default="identity_neural_network.onnx",
102+
help="Path to save the generated ONNX model.",
103+
)
104+
args = parser.parse_args()
105+
create_identity_neural_network(args.output_path)
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# Based on https://github.com/leimao/TensorRT-Custom-Plugin-Example.
17+
18+
cmake_minimum_required(VERSION 3.18)
19+
20+
project(IDENTITY-CONV-PLUGIN VERSION 0.0.1 LANGUAGES CXX)
21+
22+
set(CMAKE_CXX_STANDARD 14)
23+
set(CMAKE_CXX_STANDARD_REQUIRED ON)
24+
25+
find_package(CUDAToolkit REQUIRED)
26+
27+
# TensorRT libraries
28+
find_library(NVINFER_LIB nvinfer HINTS /usr/lib/x86_64-linux-gnu/ PATH_SUFFIXES lib lib64 REQUIRED)
29+
find_library(NVINFER_PLUGIN_LIB nvinfer_plugin HINTS /usr/lib/x86_64-linux-gnu/ PATH_SUFFIXES lib lib64 REQUIRED)
30+
31+
add_library(
32+
identity_conv_plugin
33+
SHARED
34+
PluginUtils.cpp
35+
IdentityConvPlugin.cpp
36+
IdentityConvPluginCreator.cpp
37+
PluginRegistration.cpp
38+
)
39+
40+
target_include_directories(identity_conv_plugin PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
41+
target_link_libraries(identity_conv_plugin PRIVATE ${NVINFER_LIB} ${NVINFER_PLUGIN_LIB} CUDA::cudart)

0 commit comments

Comments
 (0)