Skip to content

[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part#5223

Merged
SigureMo merged 18 commits into
PaddlePaddle:developfrom
cattidea:use-cinn-in-paddle-ocr-vl
Dec 9, 2025
Merged

[Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part#5223
SigureMo merged 18 commits into
PaddlePaddle:developfrom
cattidea:use-cinn-in-paddle-ocr-vl

Conversation

@SigureMo
Copy link
Copy Markdown
Member

@SigureMo SigureMo commented Nov 25, 2025

Motivation

使用 CINN 加速 PaddleOCR ViT 部分,对其中 _run_encoder_layer 装饰 to_static

Modifications

  • ViT encoder 拆分两部分,循环部分用 to_static 装饰
  • 添加为 ViT encoder 的 warmup 逻辑,目前仅针对 PaddleOCR-VL 模型,后续再考虑其他模型,会考虑将逻辑抽象一下
  • tests/e2e/test_paddleocr_vl_serving.py 同时测动态图+CINN

Usage or Command

目前开启 compile 与 LLM 部分方式相同:

  • graph_opt_level=0 纯动态图,走 fused ops
  • graph_opt_level=1 SOT 动转静,走 native ops,性能有所下降
  • graph_opt_level=2 CINN,走 native ops + CINN fusion,性能与 fused ops 持平且略有超越
python -m fastdeploy.entrypoints.openai.api_server \
        --model /root/paddlejob/tmpspace/MODELS/PaddlePaddle/PaddleOCR-VL \
        --port 8295 \
        --metrics-port 8296 \
        --engine-worker-queue-port 8297 \
        --cache-queue-port 55660 \
        --max-model-len 16384 \
        --max-num-batched-tokens 16384 \
        --gpu-memory-utilization 0.7 \
        --max-num-seqs 256 \
        --workers 2 \
        --graph-optimization-config '{"graph_opt_level":2, "use_cudagraph":true}'
动态图无 fused(tokens/s) 动态图手工 fused(tokens/s) CINN(tokens/s)
HZZ1 15394.50 16944.64 (+10%) 17211.10 (+12%)
A10 4743.69 6359.45 (+34%) 6509.28(+37%)

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Nov 25, 2025

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 93.47826% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@5d9b5e4). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/gpu_model_runner.py 93.10% 1 Missing and 1 partial ⚠️
fastdeploy/worker/model_runner_base.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5223   +/-   ##
==========================================
  Coverage           ?   60.04%           
==========================================
  Files              ?      329           
  Lines              ?    40980           
  Branches           ?     6210           
==========================================
  Hits               ?    24606           
  Misses             ?    14517           
  Partials           ?     1857           
Flag Coverage Δ
GPU 60.04% <93.47%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SigureMo SigureMo changed the title [CINN] Use CINN in PaddleOCR-VL ViT part [Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part Dec 4, 2025
@SigureMo SigureMo marked this pull request as ready for review December 8, 2025 04:25
Copilot AI review requested due to automatic review settings December 8, 2025 04:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

本 PR 为 PaddleOCR-VL 模型的 ViT 编码器部分引入了 CINN 编译支持,通过图优化提升推理性能。主要实现了三级图优化策略:level 0(纯动态图+fused ops)、level 1(SOT 动转静+native ops)、level 2(CINN 编译+native ops),其中 CINN 模式在性能测试中相比动态图有 12-37% 的提升。

  • SiglipEncoder 的循环部分提取为 _run_encoder_layer 方法,支持 to_static 装饰和 CINN 编译
  • 为 vision encoder 添加 warmup 逻辑和编译流程,目前仅支持 PaddleOCR-VL 模型
  • 扩展端到端测试以覆盖动态图和 CINN 两种模式

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tests/e2e/test_paddleocr_vl_serving.py 参数化测试以覆盖 graph_opt_level 0 和 2 两种配置
fastdeploy/worker/model_runner_base.py 添加 vision_encoder_compile 基类方法
fastdeploy/worker/gpu_worker.py 在 warm-up 流程中集成 vision encoder 编译
fastdeploy/worker/gpu_model_runner.py 实现 vision encoder 的 CINN 编译和 warmup 逻辑
fastdeploy/model_executor/models/paddleocr_vl/siglip_ops.py 添加动态形状处理和统一标记以支持静态图编译
fastdeploy/model_executor/models/paddleocr_vl/siglip.py 重构编码器,提取 _run_encoder_layer 用于 CINN 编译
fastdeploy/engine/engine.py 添加 SOT_ENABLE_COMPILE_TIME_LIMIT 环境变量
fastdeploy/engine/async_llm.py 添加 SOT_ENABLE_COMPILE_TIME_LIMIT 环境变量

Comment thread fastdeploy/worker/model_runner_base.py Outdated
Comment thread tests/e2e/test_paddleocr_vl_serving.py Outdated
Comment thread fastdeploy/model_executor/models/paddleocr_vl/siglip_ops.py
Comment thread tests/e2e/test_paddleocr_vl_serving.py
Comment thread fastdeploy/worker/gpu_model_runner.py
Comment thread fastdeploy/model_executor/models/paddleocr_vl/siglip.py Outdated
Comment thread fastdeploy/model_executor/models/paddleocr_vl/siglip.py
Comment thread fastdeploy/worker/gpu_model_runner.py
Comment thread fastdeploy/worker/gpu_model_runner.py
Comment thread fastdeploy/worker/model_runner_base.py Outdated
SigureMo and others added 3 commits December 8, 2025 12:37
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
gongshaotian
gongshaotian previously approved these changes Dec 8, 2025
Copy link
Copy Markdown
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@ming1753 ming1753 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SigureMo SigureMo merged commit e1c4a12 into PaddlePaddle:develop Dec 9, 2025
15 of 17 checks passed
@SigureMo SigureMo deleted the use-cinn-in-paddle-ocr-vl branch December 9, 2025 06:37
chang-wenbin pushed a commit to chang-wenbin/FastDeploy that referenced this pull request Mar 2, 2026
…addle#5223)

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026
…addle#5223)

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants