Skip to content

Test/benchmarking#59

Merged
acgetchell merged 6 commits intomainfrom
test/benchmarking
Sep 2, 2025
Merged

Test/benchmarking#59
acgetchell merged 6 commits intomainfrom
test/benchmarking

Conversation

@acgetchell
Copy link
Owner

This pull request updates the performance benchmarking workflow to improve how baseline artifacts are found and handled, and clarifies the messaging and artifact retention policy. The workflow now more reliably locates baseline artifacts, provides clearer instructions when no baseline is found, and aligns artifact retention with repository limits.

Improvements to baseline artifact discovery and handling:

  • The baseline search logic in .github/workflows/benchmarks.yml was refactored to first look for release-tagged baseline artifacts, and if none are found, to search for any recent baseline artifacts from manual or other workflow runs. The search is now more robust and logs the source type of the found artifact (release or manual). [1] [2] [3]
  • The fallback step was changed: if no baseline is found, the workflow now sets status variables and logs a message, rather than attempting to generate a new baseline on the fly.

Messaging and instructions:

  • The messaging when no baseline is found has been improved to clarify the steps required to enable performance regression testing, including creating a release tag or manually triggering the baseline workflow.
  • The regression test step was simplified to always use the full comparison mode, removing the special case for generated baselines.

Artifact retention policy:

  • The baseline artifact retention period in .github/workflows/generate-baseline.yml was reduced from 365 days to 90 days to match the repository maximum.

Refactors the benchmark workflow to rely solely on release baselines for performance comparison.

This removes the generation of fallback baselines during pull request workflows, streamlining the process and ensuring comparisons are made against accurate, full-setting benchmarks.

Updates the reporting to provide clearer guidance on how to enable performance regression testing via release tags.

Increases baseline artifact retention from 1 year to 90 days to match repository maximum.
Modifies the workflow to improve the way performance baseline artifacts are located.

The workflow now first attempts to find baseline artifacts associated with release tags by searching recent successful workflow runs of 'generate-baseline.yml'.

If no release-tagged baseline is found, it falls back to searching for any recent baseline artifact generated by the same workflow, regardless of its origin (e.g., manual trigger).

This ensures that a baseline is found even if a tagged release is not available, or the baseline was manually generated.
It also changes the failure message if no baseline is found, and provides improved instructions on how to generate a baseline.
Copilot AI review requested due to automatic review settings September 2, 2025 16:32
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 2, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Overhauled baseline discovery to build a paginated artifact cache (search up to 150 runs), prefer release-matched artifact, fallback to most-recent manual artifact, expose source_type, remove automatic baseline generation, and shorten uploaded baseline retention to ~90 days; run phase always uses full comparison.

Changes

Cohort / File(s) Change Summary
Benchmark workflow overhaul
.github/workflows/benchmarks.yml
Replaced per-release iterative search with a paginated artifact cache (fetch up to 150 successful generate-baseline runs). Derives expected artifact name from a sanitized release tag; if exact release artifact found emits outputs {found, release_tag, artifact_name, run_id, source_type="release"}. If none, selects the most-recent manual artifact and emits {found, release_tag="manual-baseline", artifact_name, run_id, source_type="manual"}. Removed per-release nested try/catch, removed automatic fallback baseline generation (replaced with setting BASELINE_EXISTS=false, BASELINE_SOURCE=none), clarified step names and logs, and standardized run phase to full comparison mode. Added source_type step output for find_baseline.
Baseline generator retention tweak
.github/workflows/generate-baseline.yml
Shortened uploaded baseline artifact retention from 365 days to 90 days and updated the comment; no other workflow logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Developer
  participant BW as Benchmarks Workflow
  participant GH as GitHub API
  participant AC as Artifact Cache
  participant RT as Regression Test

  Dev->>BW: Trigger benchmarks
  BW->>GH: List successful runs of generate-baseline (paginate, up to 150)
  GH-->>BW: Runs + artifacts
  BW->>AC: Build artifactCache of `performance-baseline-*` artifacts

  note right of BW: Derive expectedName from sanitized release tag

  alt Exact release baseline found
    BW->>BW: emit outputs {found=true, source_type="release", release_tag, artifact_name, run_id}
  else Manual fallback found
    BW->>BW: emit outputs {found=true, source_type="manual", release_tag="manual-baseline", artifact_name, run_id}
  else None found
    BW->>BW: set BASELINE_EXISTS=false, BASELINE_SOURCE=none
  end

  opt If baseline found
    BW->>RT: Run full comparison against discovered baseline
    RT-->>BW: Results
  end

  BW-->>Dev: Summary with Baseline Origin and outcome
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

I hop through artifact lanes at night,
sniffing tags till one feels right.
Release first — then manual cheer,
else I set the baseline clear.
Ninety days to keep the prize — then off I bound to test and rise 🐇✨

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch test/benchmarking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Updates the performance benchmarking workflow to improve baseline artifact discovery and handling, clarify messaging when baselines are missing, and align artifact retention with repository limits.

  • Refactored baseline search to first look for release-tagged artifacts, then fall back to any recent baseline artifacts
  • Replaced on-the-fly baseline generation with clearer messaging about enabling performance regression testing
  • Reduced artifact retention from 365 to 90 days to match repository maximum

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
.github/workflows/generate-baseline.yml Updated artifact retention period from 365 to 90 days
.github/workflows/benchmarks.yml Refactored baseline discovery logic, improved fallback handling, and simplified regression testing workflow

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +125 to +126
core.setOutput('source_type', 'release');
return;
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return statement exits the entire script block abruptly. Consider using a proper control flow mechanism or setting a flag to break out of nested loops more cleanly, as this makes the code harder to follow and maintain.

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +168
core.setOutput('source_type', 'manual');
return;
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above - the return statement exits the script abruptly from within nested loops. This pattern makes the control flow difficult to understand and maintain.

Copilot uses AI. Check for mistakes.
@codacy-production
Copy link

codacy-production bot commented Sep 2, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.36% (target: -1.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (ea87521) 2488 1734 69.69%
Head commit (13b1b51) 2488 (+0) 1743 (+9) 70.06% (+0.36%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#59) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@codecov
Copy link

codecov bot commented Sep 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.11%. Comparing base (ea87521) to head (13b1b51).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #59      +/-   ##
==========================================
+ Coverage   69.75%   70.11%   +0.36%     
==========================================
  Files          17       17              
  Lines        2486     2486              
==========================================
+ Hits         1734     1743       +9     
+ Misses        752      743       -9     
Flag Coverage Δ
unittests 70.11% <ø> (+0.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
.github/workflows/generate-baseline.yml (1)

144-146: Avoid asserting “repository maximum” unless enforced; consider making retention configurable.

90 days is fine, but “repository maximum” may be plan/settings-dependent. Recommend either softening the comment or wiring retention via an input/env so it can be adjusted without code changes.

Apply:

-          retention-days: 90   # Keep baselines for 90 days (repository maximum)
+          retention-days: 90   # Keep baselines ~90 days (align with repo settings; adjust if needed)
.github/workflows/benchmarks.yml (5)

93-96: Keep tag sanitization in lockstep with generate-baseline.

Regex mirrors sed in generate-baseline; add a short comment referencing that file to prevent drift when one changes.

Apply:

-                const cleanTag = release.tag_name.replace(/[^a-zA-Z0-9._-]/g, '_');
+                // Must match sanitize step in .github/workflows/generate-baseline.yml
+                const cleanTag = release.tag_name.replace(/[^a-zA-Z0-9._-]/g, '_');

190-211: Propagate origin (“release”/“manual”) to the env for later logging.

Currently BASELINE_SOURCE is always set to artifact, losing the new source_type. Export an additional BASELINE_ORIGIN.

Apply:

       - name: Extract baseline from artifact
         if: steps.find_baseline.outputs.found == 'true'
         shell: bash
         env:
           RELEASE_TAG: ${{ steps.find_baseline.outputs.release_tag }}
+          SOURCE_TYPE: ${{ steps.find_baseline.outputs.source_type }}
         run: |
           set -euo pipefail
           if [[ -f "baseline-artifact/baseline_results.txt" ]]; then
             echo "📦 Extracted baseline from artifact for release ${RELEASE_TAG:-unknown}"
             mkdir -p "benches"
             cp "baseline-artifact/baseline_results.txt" "benches/baseline_results.txt"
             echo "BASELINE_EXISTS=true" >> "$GITHUB_ENV"
             echo "BASELINE_SOURCE=artifact" >> "$GITHUB_ENV"
+            echo "BASELINE_ORIGIN=${SOURCE_TYPE:-unknown}" >> "$GITHUB_ENV"

213-221: Also set BASELINE_ORIGIN when none found.

Keeps summary consistent.

Apply:

       - name: Set baseline status if none found
         if: steps.find_baseline.outputs.found != 'true'
         shell: bash
         run: |
           set -euo pipefail
           echo "📈 No baseline artifact found for performance comparison"
           echo "BASELINE_EXISTS=false" >> "$GITHUB_ENV"
           echo "BASELINE_SOURCE=none" >> "$GITHUB_ENV"
+          echo "BASELINE_ORIGIN=none" >> "$GITHUB_ENV"

292-299: Fix message: it may be a manual baseline; use origin in logs.

Current log says “release baseline” unconditionally.

Apply:

           echo "🚀 Running performance regression test..."
-          echo "   Baseline source: ${BASELINE_SOURCE:-unknown}"
+          echo "   Baseline source: ${BASELINE_SOURCE:-unknown}"
+          echo "   Baseline origin: ${BASELINE_ORIGIN:-unknown}"
 
           # This will exit with code 1 if significant regressions are found
-          echo "   Using full comparison mode against release baseline"
+          echo "   Using full comparison mode against ${BASELINE_ORIGIN:-unknown} baseline"
           uv run benchmark-utils compare --baseline benches/baseline_results.txt \
             || uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt

326-333: Include origin in the summary.

Surfaces the new source_type signal that the PR introduces.

Apply:

           echo "Baseline source: ${BASELINE_SOURCE:-none}"
+          echo "Baseline origin: ${BASELINE_ORIGIN:-unknown}"
           echo "Baseline exists: ${BASELINE_EXISTS:-false}"
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ea87521 and 2ae746c.

📒 Files selected for processing (2)
  • .github/workflows/benchmarks.yml (5 hunks)
  • .github/workflows/generate-baseline.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Code Coverage
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (macos-latest)
  • GitHub Check: Codacy Security Scan

Ensures that expired artifacts are ignored when searching for performance baselines. This prevents the benchmark workflow from erroneously using outdated data.

Adds more detailed logging about the baseline source, differentiating between baselines from artifacts and releases. The origin of the baseline (release or artifact) is now included in the
comparison output and summary.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
.github/workflows/benchmarks.yml (2)

124-129: Early returns inside github-script.

Previous reviews flagged early returns. Here they’re acceptable for control flow in this isolated script; no action needed unless you prefer a single-exit style.


169-171: Same note about early return.

Same rationale as above.

🧹 Nitpick comments (1)
.github/workflows/benchmarks.yml (1)

97-131: Reduce API calls by caching run artifacts once.

Current approach lists artifacts for up to 3×50 runs per release, multiplying API calls by number of releases. Fetch runs and their artifacts once, build a Set/map of artifact names → run_id, then match expectedName per release. This cuts calls from O(releases × runs) to O(runs).

Also applies to: 136-173

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae746c and e35f04c.

📒 Files selected for processing (2)
  • .github/workflows/benchmarks.yml (8 hunks)
  • .github/workflows/generate-baseline.yml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/generate-baseline.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (windows-latest)
  • GitHub Check: Code Coverage
  • GitHub Check: Codacy Security Scan
🔇 Additional comments (5)
.github/workflows/benchmarks.yml (5)

115-117: Good fix: exclude expired artifacts.

Guarding on a.expired !== true prevents 404s when downloading expired artifacts. This addresses prior feedback.


155-158: Good fix in fallback: also exclude expired artifacts.

Consistent with the primary search; avoids dead downloads.


192-207: Nice: propagate artifact origin.

BASELINE_SOURCE=artifact and BASELINE_ORIGIN={release|manual} improve observability.


217-226: Clear “no baseline” path.

Explicitly setting BASELINE_EXISTS=false/SOURCE=none/ORIGIN=none is clean and predictable.


299-300: Consistent origin reporting.

Surfacing Baseline origin in run and summary output is helpful for triage.

Also applies to: 335-335

Comment on lines +302 to 305
echo " Using full comparison mode against ${BASELINE_ORIGIN:-unknown} baseline"
uv run benchmark-utils compare --baseline benches/baseline_results.txt \
|| uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: ‘||’ fallback masks real regressions (exit code 1).

If benchmark-utils correctly exits 1 on regression, the right-hand command runs and can overwrite the outcome. Gate the fallback on command availability, not on exit status.

Apply this diff:

-          echo "   Using full comparison mode against ${BASELINE_ORIGIN:-unknown} baseline"
-          uv run benchmark-utils compare --baseline benches/baseline_results.txt \
-            || uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt
+          echo "   Using full comparison mode against ${BASELINE_ORIGIN:-unknown} baseline"
+          if uv run benchmark-utils --help >/dev/null 2>&1; then
+            uv run benchmark-utils compare --baseline benches/baseline_results.txt
+          elif uv run python - <<'PY' >/dev/null 2>&1
+import importlib; importlib.import_module("scripts.benchmark_utils")
+PY
+          then
+            uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt
+          else
+            echo "❌ benchmark-utils entrypoint and module not found" >&2
+            exit 2
+          fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo " Using full comparison mode against ${BASELINE_ORIGIN:-unknown} baseline"
uv run benchmark-utils compare --baseline benches/baseline_results.txt \
|| uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt
echo " Using full comparison mode against ${BASELINE_ORIGIN:-unknown} baseline"
if uv run benchmark-utils --help >/dev/null 2>&1; then
uv run benchmark-utils compare --baseline benches/baseline_results.txt
elif uv run python - <<'PY' >/dev/null 2>&1
import importlib; importlib.import_module("scripts.benchmark_utils")
PY
then
uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt
else
echo "❌ benchmark-utils entrypoint and module not found" >&2
exit 2
fi
🤖 Prompt for AI Agents
.github/workflows/benchmarks.yml around lines 302 to 305: the current use of
"||" to fall back to the python runner will execute the python fallback whenever
the first command exits non-zero (including a legitimate regression exit code
1), masking real failures; change the logic to only run the python fallback when
the primary command is unavailable by checking for the presence of the 'uv'
command (or whichever runner you intend) and only invoking the python fallback
if that check fails, ensuring that a non-zero exit from the primary runner is
propagated and not overwritten.

Optimizes the benchmark workflow by caching baseline artifacts from recent successful workflow runs. This significantly reduces the number of API calls required to locate the correct baseline, particularly for release baselines. It also falls back
to the most recent baseline artifact if no release baseline is found.

Also, the `compare` command is executed using an `if` statement that determines whether the benchmark utils is installed as a
module or an entrypoint to avoid errors.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e35f04c and 7b64982.

📒 Files selected for processing (1)
  • .github/workflows/benchmarks.yml (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: Code Coverage
  • GitHub Check: Codacy Security Scan
🔇 Additional comments (3)
.github/workflows/benchmarks.yml (3)

142-145: Confirm tag sanitization matches generate-baseline.

The regex must be identical to the producer to avoid misses. Please verify .github/workflows/generate-baseline.yml uses the same /[^a-zA-Z0-9._-]/g replacement.


152-158: Early returns are appropriate here.

Exiting after setting outputs keeps control flow simple in actions/github-script. No change needed.

Also applies to: 182-188


319-327: Good fix: don’t mask regressions with “||” fallback.

Tool detection before invocation preserves non-zero exit codes from real regressions. Solid.

Limits the number of workflow runs fetched to avoid excessive API calls and ensures that only unique, non-expired baseline artifacts are cached, enhancing the reliability and efficiency of the benchmark workflow.

Fixes a bug where oldest benchmarks were used - GitHub lists newest first, so first one should be preferred.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
.github/workflows/benchmarks.yml (1)

320-329: Fix preserves true regression exit codes (no ‘||’ masking).

Availability-gated fallback addresses the prior concern about masking exit code 1 on regressions.

🧹 Nitpick comments (3)
.github/workflows/benchmarks.yml (3)

91-96: Cap pagination to exactly 150 items (avoid overshooting).

Currently you may collect >150 runs from the last page. Slice the current page to the remaining budget and update the counter from the slice length.

-                (response, done) => {
-                  // Limit to 150 runs total across pages
-                  count += response.data.length;
-                  if (count >= 150) done();
-                  return response.data;
-                }
+                (response, done) => {
+                  // Limit to 150 runs total across pages (no overshoot)
+                  const remaining = Math.max(0, 150 - count);
+                  if (remaining === 0) { done(); return []; }
+                  const slice = response.data.slice(0, remaining);
+                  count += slice.length;
+                  if (count >= 150) done();
+                  return slice;
+                }

218-225: Surface the baseline tag in the Summary.

Persist the release tag into GITHUB_ENV when present and show it in the summary.

           cp "baseline-artifact/baseline_results.txt" "benches/baseline_results.txt"
           echo "BASELINE_EXISTS=true" >> "$GITHUB_ENV"
           echo "BASELINE_SOURCE=artifact" >> "$GITHUB_ENV"
           echo "BASELINE_ORIGIN=${SOURCE_TYPE:-unknown}" >> "$GITHUB_ENV"
+          if [[ -n "${RELEASE_TAG:-}" ]]; then
+            echo "BASELINE_TAG=${RELEASE_TAG}" >> "$GITHUB_ENV"
+          fi
-          echo "Baseline origin: ${BASELINE_ORIGIN:-unknown}"
+          echo "Baseline origin: ${BASELINE_ORIGIN:-unknown}"
+          echo "Baseline tag: ${BASELINE_TAG:-n/a}"

Also applies to: 359-361


76-99: Optional: reduce API calls by early-hit strategy.

Precompute a Set of expected baseline names from releases, then stop artifact enumeration as soon as you cache any exact match. This can short-circuit before walking all runs.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7b64982 and 174b8c4.

📒 Files selected for processing (1)
  • .github/workflows/benchmarks.yml (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Code Coverage
  • GitHub Check: Codacy Security Scan
  • GitHub Check: build (windows-latest)
🔇 Additional comments (3)
.github/workflows/benchmarks.yml (3)

110-118: Good: filters expired artifacts and preserves newest cache entry.

The expired !== true guard plus “set-if-absent” avoids 404s and prevents older runs from overwriting newer ones.


148-160: Release-first match logic looks solid.

Tag sanitization mirrors the generator, outputs are complete, and early exit is appropriate here.


165-189: Sane fallback: most-recent non-expired baseline.

Clear logging and outputs; good tie-break via run_created_at.

Optimizes the retrieval of baseline artifacts by pre-computing expected release baseline names and short-circuiting the search when a release baseline is found.

This significantly reduces the number of API calls and speeds up the process, especially when release baselines are available.

Also exports the `BASELINE_TAG` to the environment, making it available for subsequent steps.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
.github/workflows/benchmarks.yml (2)

170-182: Early return from github-script is acceptable here.

Setting outputs then return; is the right way to short-circuit in actions/github-script despite prior comments.


344-354: Great fix: no ‘||’ fallback masking regressions.

Capability checks gate the fallback runner, preserving exit code 1 on real regressions as intended.

🧹 Nitpick comments (2)
.github/workflows/benchmarks.yml (2)

246-249: Only set BASELINE_TAG for release-sourced baselines.

Avoid writing a synthetic tag like “manual-baseline” into BASELINE_TAG; reserve it for actual release tags.

Apply:

-            echo "BASELINE_ORIGIN=${SOURCE_TYPE:-unknown}" >> "$GITHUB_ENV"
-            if [[ -n "${RELEASE_TAG:-}" ]]; then
-              echo "BASELINE_TAG=${RELEASE_TAG}" >> "$GITHUB_ENV"
-            fi
+            echo "BASELINE_ORIGIN=${SOURCE_TYPE:-unknown}" >> "$GITHUB_ENV"
+            if [[ "${SOURCE_TYPE:-}" == "release" && -n "${RELEASE_TAG:-}" ]]; then
+              echo "BASELINE_TAG=${RELEASE_TAG}" >> "$GITHUB_ENV"
+            fi

326-334: Clarify retention window in guidance.

Mention 90-day artifact retention to explain why older baselines might be missing.

Apply:

           echo "⚠️ No performance baseline available for comparison."
           echo "   - No baseline artifacts found in recent workflow runs"
           echo "   - Performance regression testing requires a baseline"
           echo ""
           echo "💡 To enable performance regression testing:"
           echo "   1. Create a release tag (e.g., v0.4.3), or"
           echo "   2. Manually trigger the 'Generate Performance Baseline' workflow"
           echo "   3. Future PRs and pushes will use that baseline for comparison"
-          echo "   4. Baselines use full benchmark settings for accurate comparisons"
+          echo "   4. Baselines use full benchmark settings for accurate comparisons"
+          echo "   5. Baseline artifacts are retained for ~90 days; older ones may expire"
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 174b8c4 and 13b1b51.

📒 Files selected for processing (1)
  • .github/workflows/benchmarks.yml (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: Codacy Security Scan
  • GitHub Check: Code Coverage
🔇 Additional comments (5)
.github/workflows/benchmarks.yml (5)

48-49: Confirm macOS 15 runner availability or use macos-latest.

macos-15 may not be available in all orgs/tiers. Consider macos-latest to avoid queued failures.

Apply if desired:

-    runs-on: macos-15
+    runs-on: macos-latest

94-114: Good pagination cap fix.

The aggregate counter with slice prevents over-fetching and keeps API calls bounded. Nicely done.


129-148: Artifact cache logic looks solid.

Filtering expired artifacts and only setting when absent (to preserve newest) prevents 404s and stale hits. Early-hit flag plus outer-loop break is clean.


201-212: Sane fallback to most-recent manual baseline.

Selecting by run_created_at gives deterministic behavior when no release-tagged baseline exists.


283-307: Skip logic reads correctly and is safe under set -euo pipefail.

Covers unknown baseline, identical commit, missing commit, and relevant-change detection without false exits in conditionals.

@acgetchell acgetchell merged commit fda43d7 into main Sep 2, 2025
11 checks passed
@acgetchell acgetchell deleted the test/benchmarking branch September 2, 2025 17:48
@coderabbitai coderabbitai bot mentioned this pull request Sep 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants