Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .github/workflows/auto-rebase-prs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Auto-Rebase PRs

on:
Comment on lines +1 to +3
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow force-pushes branches and also comments/labels PRs via gh pr comment / gh pr edit, but it does not declare required permissions. Without explicit permissions, pushes and PR edits commonly fail with the default GITHUB_TOKEN. Add workflow/job permissions such as contents: write (push), pull-requests: write (comment), and issues: write (labels).

Copilot uses AI. Check for mistakes.
workflow_run:
workflows: ["Sync Fork", "Sync Upstream"]
types: [completed]
workflow_dispatch:

jobs:
rebase:
if: ${{ github.event_name == 'workflow_dispatch' || github.event.workflow_run.conclusion == 'success' }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GITHUB_TOKEN }}

- name: Configure git
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"

- name: Rebase open PRs
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
DEFAULT_BRANCH=$(gh api "repos/${{ github.repository }}" --jq '.default_branch')
echo "Default branch: $DEFAULT_BRANCH"

# Get all open PRs authored by the repo owner
PRS=$(gh pr list --state open --json number,headRefName,mergeable --jq '.[] | "\(.number) \(.headRefName) \(.mergeable)"')

if [ -z "$PRS" ]; then
echo "No open PRs found"
exit 0
fi

echo "$PRS" | while read -r pr_number branch mergeable; do
echo ""
echo "=== PR #${pr_number} (${branch}) mergeable=${mergeable} ==="

# Fetch and checkout the PR branch
if ! git fetch origin "$branch" 2>/dev/null; then
echo " SKIP: branch $branch not found on origin"
continue
fi
git checkout "$branch"
git reset --hard "origin/$branch"

# Check if rebase is needed
git fetch origin "$DEFAULT_BRANCH"
if git merge-base --is-ancestor "origin/$DEFAULT_BRANCH" HEAD; then
echo " OK: already up to date with $DEFAULT_BRANCH"
continue
fi

# Attempt rebase
echo " Rebasing onto origin/$DEFAULT_BRANCH..."
if git rebase "origin/$DEFAULT_BRANCH" 2>/dev/null; then
echo " Pushing rebased branch..."
git push --force-with-lease origin "$branch"
echo " REBASED: PR #${pr_number} successfully rebased"
gh pr comment "$pr_number" --body "Auto-rebased onto \`${DEFAULT_BRANCH}\` after nightly upstream sync." 2>/dev/null || true
else
git rebase --abort 2>/dev/null || true
echo " CONFLICT: PR #${pr_number} has merge conflicts"
# Label the PR for manual attention
gh pr edit "$pr_number" --add-label "needs-rebase" 2>/dev/null || true
gh pr comment "$pr_number" --body "Auto-rebase failed due to merge conflicts with \`${DEFAULT_BRANCH}\`. Manual rebase needed." 2>/dev/null || true
fi
done
24 changes: 24 additions & 0 deletions .github/workflows/sync-fork.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Sync Fork

on:
Comment on lines +1 to +3
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow relies on GITHUB_TOKEN to sync and potentially update the default branch, but it does not declare permissions. On many repos/orgs the default token permissions are read-only, causing gh repo sync to fail. Add an explicit permissions: block (at least contents: write) to ensure the sync can push updates.

Copilot uses AI. Check for mistakes.
schedule:
- cron: '0 6 * * *' # 6am UTC daily (before dashboard collection at 8am)
workflow_dispatch:

jobs:
sync:
runs-on: ubuntu-latest
steps:
- name: Sync fork default branch with upstream
run: |
# Try fast-forward sync first; fall back to force sync if diverged.
# Safe because feature work lives on branches, not the default branch.
if gh repo sync "${{ github.repository }}" 2>/dev/null; then
echo "Synced successfully (fast-forward)"
else
echo "Diverging commits detected — force syncing to match upstream"
gh repo sync "${{ github.repository }}" --force
echo "Force synced successfully"
fi
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
1 change: 1 addition & 0 deletions aiter/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ def getLogger():
from .ops.sample import * # noqa: F403,E402
from .ops.fused_qk_norm_mrope_cache_quant import * # noqa: F403,E402
from .ops.fused_qk_norm_rope_cache_quant import * # noqa: F403,E402
from .rotary_embedding import fused_rope_rms # noqa: F401,E402
from .ops.groupnorm import * # noqa: F403,E402
from .ops.mhc import * # noqa: F403,E402
from .ops.causal_conv1d import * # noqa: F403,E402
Expand Down
90 changes: 75 additions & 15 deletions aiter/rotary_embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -1293,21 +1293,20 @@ def forward(
else:
return q_out, None, None
else:
raise NotImplementedError("fused_rope_rms not supported yet")
# fused_rope_rms(
# qkv,
# q_weight,
# k_weight,
# self.cos_sin_cache,
# positions,
# num_tokens,
# num_heads_q,
# num_heads_k,
# num_heads_v,
# self.head_size,
# self.is_neox_style,
# eps,
# )
fused_rope_rms(
qkv,
q_weight,
k_weight,
self.cos_sin_cache,
positions,
num_tokens,
num_heads_q,
num_heads_k,
num_heads_v,
self.head_size,
self.is_neox_style,
eps,
)
q_size = num_heads_q * self.head_size
k_size = num_heads_k * self.head_size
v_size = num_heads_v * self.head_size
Expand All @@ -1318,6 +1317,67 @@ def forward(
return q, k, v


def fused_rope_rms(
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is exported publicly (via aiter/__init__.py) but its API contract is currently ambiguous: it mutates qkv in-place and returns None, and callers may expect outputs similar to other embedding helpers. Please document the in-place behavior and expected tensor shapes/dtypes in the docstring (and/or consider returning (q, k) or (q, k, v) views explicitly) so external consumers don’t misuse it.

Copilot uses AI. Check for mistakes.
qkv,
q_weight,
k_weight,
cos_sin_cache,
positions,
num_tokens,
num_heads_q,
num_heads_k,
num_heads_v,
head_size,
is_neox_style,
eps,
):
"""Fused QK-RMSNorm + RoPE on packed QKV tensor (in-place).
Triton fallback for the HIP fused kernel.
"""
Comment on lines +1334 to +1336
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is exported publicly (via aiter/__init__.py) but its API contract is currently ambiguous: it mutates qkv in-place and returns None, and callers may expect outputs similar to other embedding helpers. Please document the in-place behavior and expected tensor shapes/dtypes in the docstring (and/or consider returning (q, k) or (q, k, v) views explicitly) so external consumers don’t misuse it.

Copilot uses AI. Check for mistakes.
from aiter.ops.triton.normalization.rmsnorm import rmsnorm_forward_inference
from aiter.ops.triton.rope.rope import (
rope_cached_thd_positions_2c_fwd_inplace,
)

q_size = num_heads_q * head_size
k_size = num_heads_k * head_size
v_size = num_heads_v * head_size

qkv_2d = qkv.view(num_tokens, q_size + k_size + v_size)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensor.view() will throw at runtime if qkv is non-contiguous (which can happen after slicing/transpose or some fused ops). Since this is a fallback path meant to be robust, prefer reshape(...) here (or make qkv contiguous before viewing) to avoid hard failures on valid inputs.

Copilot uses AI. Check for mistakes.
q, k, _v = qkv_2d.split([q_size, k_size, v_size], dim=-1)

# Per-head RMSNorm: [T, H*D] -> [T*H, D] so rmsnorm operates per-head
q_normed = rmsnorm_forward_inference(
q.reshape(num_tokens * num_heads_q, head_size), q_weight, eps
)
q.copy_(q_normed.view(num_tokens, q_size))

k_normed = rmsnorm_forward_inference(
k.reshape(num_tokens * num_heads_k, head_size), k_weight, eps
)
k.copy_(k_normed.view(num_tokens, k_size))

# RoPE in-place
q_rope = q.view(num_tokens, num_heads_q, head_size)
k_rope = k.view(num_tokens, num_heads_k, head_size)

half = cos_sin_cache.shape[-1] // 2
cos = cos_sin_cache[:, :half]
sin = cos_sin_cache[:, half:]
rotate_style = 0 if is_neox_style else 1

rope_cached_thd_positions_2c_fwd_inplace(
q_rope,
k_rope,
cos,
sin,
positions,
rotate_style,
reuse_freqs_front_part=True,
nope_first=False,
)


class MRotaryEmbeddingQKNormFused(RotaryEmbeddingFusedQKNorm):
"""Rotary Embedding with Multimodal Sections fused with QKNorm"""

Expand Down
Loading