[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part by SigureMo · Pull Request #5223 · PaddlePaddle/FastDeploy

SigureMo · 2025-11-25T11:22:04Z

Motivation

使用 CINN 加速 PaddleOCR ViT 部分，对其中 _run_encoder_layer 装饰 to_static

Modifications

ViT encoder 拆分两部分，循环部分用 to_static 装饰
添加为 ViT encoder 的 warmup 逻辑，目前仅针对 PaddleOCR-VL 模型，后续再考虑其他模型，会考虑将逻辑抽象一下
原 tests/e2e/test_paddleocr_vl_serving.py 同时测动态图+CINN

Usage or Command

目前开启 compile 与 LLM 部分方式相同：

graph_opt_level=0 纯动态图，走 fused ops
graph_opt_level=1 SOT 动转静，走 native ops，性能有所下降
graph_opt_level=2 CINN，走 native ops + CINN fusion，性能与 fused ops 持平且略有超越

python -m fastdeploy.entrypoints.openai.api_server \
        --model /root/paddlejob/tmpspace/MODELS/PaddlePaddle/PaddleOCR-VL \
        --port 8295 \
        --metrics-port 8296 \
        --engine-worker-queue-port 8297 \
        --cache-queue-port 55660 \
        --max-model-len 16384 \
        --max-num-batched-tokens 16384 \
        --gpu-memory-utilization 0.7 \
        --max-num-seqs 256 \
        --workers 2 \
        --graph-optimization-config '{"graph_opt_level":2, "use_cudagraph":true}'

	动态图无 fused（tokens/s）	动态图手工 fused（tokens/s）	CINN（tokens/s）
HZZ1	15394.50	16944.64 (+10%)	17211.10 (+12%)
A10	4743.69	6359.45 (+34%)	6509.28（+37%)

Accuracy Tests

无

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-25T11:22:11Z

Thanks for your contribution!

codecov-commenter · 2025-11-25T14:04:57Z

Codecov Report

❌ Patch coverage is 93.47826% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5d9b5e4). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/gpu_model_runner.py	93.10%	1 Missing and 1 partial ⚠️
fastdeploy/worker/model_runner_base.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5223   +/-   ##
==========================================
  Coverage           ?   60.04%           
==========================================
  Files              ?      329           
  Lines              ?    40980           
  Branches           ?     6210           
==========================================
  Hits               ?    24606           
  Misses             ?    14517           
  Partials           ?     1857

Flag	Coverage Δ
GPU	`60.04% <93.47%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

本 PR 为 PaddleOCR-VL 模型的 ViT 编码器部分引入了 CINN 编译支持，通过图优化提升推理性能。主要实现了三级图优化策略：level 0（纯动态图+fused ops）、level 1（SOT 动转静+native ops）、level 2（CINN 编译+native ops），其中 CINN 模式在性能测试中相比动态图有 12-37% 的提升。

将 SiglipEncoder 的循环部分提取为 _run_encoder_layer 方法，支持 to_static 装饰和 CINN 编译
为 vision encoder 添加 warmup 逻辑和编译流程，目前仅支持 PaddleOCR-VL 模型
扩展端到端测试以覆盖动态图和 CINN 两种模式

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
tests/e2e/test_paddleocr_vl_serving.py	参数化测试以覆盖 graph_opt_level 0 和 2 两种配置
fastdeploy/worker/model_runner_base.py	添加 vision_encoder_compile 基类方法
fastdeploy/worker/gpu_worker.py	在 warm-up 流程中集成 vision encoder 编译
fastdeploy/worker/gpu_model_runner.py	实现 vision encoder 的 CINN 编译和 warmup 逻辑
fastdeploy/model_executor/models/paddleocr_vl/siglip_ops.py	添加动态形状处理和统一标记以支持静态图编译
fastdeploy/model_executor/models/paddleocr_vl/siglip.py	重构编码器，提取 _run_encoder_layer 用于 CINN 编译
fastdeploy/engine/engine.py	添加 SOT_ENABLE_COMPILE_TIME_LIMIT 环境变量
fastdeploy/engine/async_llm.py	添加 SOT_ENABLE_COMPILE_TIME_LIMIT 环境变量

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gongshaotian

LGTM

ming1753

LGTM

…addle#5223) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[CINN] Use CINN in PaddleOCR-VL ViT part

a5fb601

SigureMo added 6 commits December 3, 2025 20:41

use fused op in dynamic mode

12451de

refine code

10e4574

refine code

aab7180

add vision_encoder_compile to model_runner_base

6ece3a2

fix import

6fc098e

fix fused op run

b3301e7

SigureMo changed the title ~~[CINN] Use CINN in PaddleOCR-VL ViT part~~ [Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part Dec 4, 2025

SigureMo added 5 commits December 4, 2025 22:15

fix Dh

73cd5eb

Merge branch 'develop' into use-cinn-in-paddle-ocr-vl

662d5ce

Merge branch 'develop' into use-cinn-in-paddle-ocr-vl

a332b88

test graph opt level=2

9000438

clean commented to_static

e8f9bb7

SigureMo marked this pull request as ready for review December 8, 2025 04:25

Copilot AI review requested due to automatic review settings December 8, 2025 04:25

Copilot started reviewing on behalf of SigureMo December 8, 2025 04:25 View session

Copilot AI reviewed Dec 8, 2025

View reviewed changes

SigureMo and others added 3 commits December 8, 2025 12:37

Apply suggestions from code review

22c3b73

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

add type hints

4dc578e

run get_activation_fn in each forward

5b1054b

gongshaotian previously approved these changes Dec 8, 2025

View reviewed changes

SigureMo added 3 commits December 9, 2025 11:04

revert changes in fastdeploy/engine/async_llm.py

4a31886

Merge branch 'develop' into use-cinn-in-paddle-ocr-vl

d03c834

resolve confilct

f22a757

SigureMo dismissed gongshaotian’s stale review via f22a757 December 9, 2025 03:06

ming1753 approved these changes Dec 9, 2025

View reviewed changes

zyfncg approved these changes Dec 9, 2025

View reviewed changes

SigureMo merged commit e1c4a12 into PaddlePaddle:develop Dec 9, 2025
15 of 17 checks passed

SigureMo deleted the use-cinn-in-paddle-ocr-vl branch December 9, 2025 06:37

SigureMo mentioned this pull request Dec 26, 2025

[BugFix] Correct condition for reversed_window_indices in SiglipEncoder #5795

Merged

5 tasks

chang-wenbin pushed a commit to chang-wenbin/FastDeploy that referenced this pull request Mar 2, 2026

[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part (PaddleP…

32ec60f

…addle#5223) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026

[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part (PaddleP…

81157b0

…addle#5223) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part#5223

[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part#5223
SigureMo merged 18 commits into
PaddlePaddle:developfrom
cattidea:use-cinn-in-paddle-ocr-vl

SigureMo commented Nov 25, 2025 •

edited

Loading

Uh oh!

paddle-bot Bot commented Nov 25, 2025

Uh oh!

codecov-commenter commented Nov 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gongshaotian left a comment

Uh oh!

ming1753 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

SigureMo commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented Nov 25, 2025

Uh oh!

codecov-commenter commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

ming1753 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

SigureMo commented Nov 25, 2025 •

edited

Loading

codecov-commenter commented Nov 25, 2025 •

edited

Loading