[Ascend] support qwen3next by wanfengcxz · Pull Request #304 · DeepLink-org/dlinfer

wanfengcxz · 2026-02-26T08:56:11Z

No description provided.

CLAassistant · 2026-02-26T08:56:25Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ tangzhiyi11
✅ yao-fengchen
❌ wanfengcxz
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

jinminxi104 · 2026-03-02T12:15:14Z

dlinfer/framework/lmdeploy_ext/device/ascend_qwen3_next.py

@@ -0,0 +1,1147 @@
+# Copyright (c) OpenMMLab. All rights reserved.


add comments to explain the rationale behind this patch

- Remove unused FLA kernels (chunk_delta_h, chunk_o, solve_tril, etc.) - Use triton_ascend_kernels for core attention ops: - chunk_gated_delta_rule_fwd (prefill) - fused_recurrent_gated_delta_rule (decode) - Simplify fla/chunk.py to a thin wrapper - Add README.md documenting the triton ops structure - Add Chinese installation guide for triton-ascend-kernels - Move triton_utils.py from fla/ to triton_ops/ This reduces maintenance burden by relying on official triton_ascend_kernels for heavy attention computations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jinminxi104 requested changes Mar 2, 2026

View reviewed changes

wanfengcxz and others added 10 commits March 21, 2026 18:16

[ascend] support qwen3next

0abfcbc

[ascend] refactor qwen3next

96cb2ee

format code

cd1b476

fix

927d61e

format code

4a9e4de

support qwen3.5

24d9562

refactor code

696ef82

[ascend] add npu_recurrent_gated_delta_rule

326e421

[ascend] refactor(triton_ops): simplify triton ops

e170c85

wanfengcxz force-pushed the wq/qwen3next branch from b648c14 to 96f0d6b Compare March 24, 2026 08:14

wanfengcxz and others added 8 commits March 24, 2026 08:15

[ascend] fix precision issue in graph mode

96f0d6b

[ascend] refactor conv1d kernel

0b61d2c

[ascend] refactor gdr kernel and ensure the state cache is continuous

694516d

[ascend] refactor state_cache and optimize gdn, conv1d kernel

cd28af5

[ascend] refactor ascend graph

8d8ed8f

[ascend] fix

e1f78cd

[ascend] format code

dc9471d

[ascend] change recurrent_state dtype to float32

9cd7df0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ascend] support qwen3next#304

[Ascend] support qwen3next#304
wanfengcxz wants to merge 18 commits intoDeepLink-org:mainfrom
wanfengcxz:wq/qwen3next

wanfengcxz commented Feb 26, 2026

Uh oh!

CLAassistant commented Feb 26, 2026 •

edited

Loading

Uh oh!

jinminxi104 Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -0,0 +1,1147 @@
		# Copyright (c) OpenMMLab. All rights reserved.

Conversation

wanfengcxz commented Feb 26, 2026

Uh oh!

CLAassistant commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jinminxi104 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Feb 26, 2026 •

edited

Loading