Skip to content

Copilot/add patch for ascend in lmdeploy#291

Merged
jinminxi104 merged 11 commits intoDeepLink-org:mainfrom
jinminxi104:copilot/add-patch-for-ascend-in-lmdeploy
Dec 29, 2025
Merged

Copilot/add patch for ascend in lmdeploy#291
jinminxi104 merged 11 commits intoDeepLink-org:mainfrom
jinminxi104:copilot/add-patch-for-ascend-in-lmdeploy

Conversation

@jinminxi104
Copy link
Copy Markdown
Collaborator

workaround for uncompleted attention_with_kv_cache op

Copilot AI and others added 7 commits December 27, 2025 04:46
Co-authored-by: jinminxi104 <18713681+jinminxi104@users.noreply.github.com>
Added a comment regarding a workaround for a specific issue.
This update adds logic for prefill scheduling with kv-cache optimization, ensuring compliance with batch size and token count limits.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a workaround patch for Ascend devices to address an incomplete prefill_attention_with_kvcache operation in lmdeploy. The patch monkey-patches the Scheduler._schedule_prefill method to add custom logic for handling prefill sequences with KV-cache optimization.

Key Changes

  • Introduces _schedule_prefill_ascend function that replaces the default scheduler's prefill method
  • Adds logic to control batching based on whether sequences have new tokens for prefill operations
  • Implements early-break conditions to handle prefill-with-kvcache sequences differently

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jinminxi104 jinminxi104 merged commit 059e42c into DeepLink-org:main Dec 29, 2025
4 checks passed
jinminxi104 added a commit to jinminxi104/dlinfer that referenced this pull request Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants