Skip to content

[Ascend] support qwen3next#304

Draft
wanfengcxz wants to merge 18 commits intoDeepLink-org:mainfrom
wanfengcxz:wq/qwen3next
Draft

[Ascend] support qwen3next#304
wanfengcxz wants to merge 18 commits intoDeepLink-org:mainfrom
wanfengcxz:wq/qwen3next

Conversation

@wanfengcxz
Copy link
Copy Markdown
Collaborator

No description provided.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 26, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ tangzhiyi11
✅ yao-fengchen
❌ wanfengcxz
You have signed the CLA already but the status is still pending? Let us recheck it.

@@ -0,0 +1,1147 @@
# Copyright (c) OpenMMLab. All rights reserved.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments to explain the rationale behind this patch

wanfengcxz and others added 10 commits March 21, 2026 18:16
- Remove unused FLA kernels (chunk_delta_h, chunk_o, solve_tril, etc.)
- Use triton_ascend_kernels for core attention ops:
  - chunk_gated_delta_rule_fwd (prefill)
  - fused_recurrent_gated_delta_rule (decode)
- Simplify fla/chunk.py to a thin wrapper
- Add README.md documenting the triton ops structure
- Add Chinese installation guide for triton-ascend-kernels
- Move triton_utils.py from fla/ to triton_ops/

This reduces maintenance burden by relying on official
triton_ascend_kernels for heavy attention computations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants