Skip to content

Commit fda43d7

Browse files
authored
Test/benchmarking (#59)
* Simplifies benchmark workflow and reporting Refactors the benchmark workflow to rely solely on release baselines for performance comparison. This removes the generation of fallback baselines during pull request workflows, streamlining the process and ensuring comparisons are made against accurate, full-setting benchmarks. Updates the reporting to provide clearer guidance on how to enable performance regression testing via release tags. Increases baseline artifact retention from 1 year to 90 days to match repository maximum. * Improves performance baseline artifact retrieval Modifies the workflow to improve the way performance baseline artifacts are located. The workflow now first attempts to find baseline artifacts associated with release tags by searching recent successful workflow runs of 'generate-baseline.yml'. If no release-tagged baseline is found, it falls back to searching for any recent baseline artifact generated by the same workflow, regardless of its origin (e.g., manual trigger). This ensures that a baseline is found even if a tagged release is not available, or the baseline was manually generated. It also changes the failure message if no baseline is found, and provides improved instructions on how to generate a baseline. * Improves benchmark baseline handling Ensures that expired artifacts are ignored when searching for performance baselines. This prevents the benchmark workflow from erroneously using outdated data. Adds more detailed logging about the baseline source, differentiating between baselines from artifacts and releases. The origin of the baseline (release or artifact) is now included in the comparison output and summary. * Improves baseline artifact lookup efficiency Optimizes the benchmark workflow by caching baseline artifacts from recent successful workflow runs. This significantly reduces the number of API calls required to locate the correct baseline, particularly for release baselines. It also falls back to the most recent baseline artifact if no release baseline is found. Also, the `compare` command is executed using an `if` statement that determines whether the benchmark utils is installed as a module or an entrypoint to avoid errors. * Improves benchmark workflow reliability Limits the number of workflow runs fetched to avoid excessive API calls and ensures that only unique, non-expired baseline artifacts are cached, enhancing the reliability and efficiency of the benchmark workflow. Fixes a bug where oldest benchmarks were used - GitHub lists newest first, so first one should be preferred. * Improves benchmark baseline artifact retrieval Optimizes the retrieval of baseline artifacts by pre-computing expected release baseline names and short-circuiting the search when a release baseline is found. This significantly reduces the number of API calls and speeds up the process, especially when release baselines are available. Also exports the `BASELINE_TAG` to the environment, making it available for subsequent steps.
1 parent ea87521 commit fda43d7

2 files changed

Lines changed: 152 additions & 84 deletions

File tree

.github/workflows/benchmarks.yml

Lines changed: 151 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -68,73 +68,151 @@ jobs:
6868
run: uv --version
6969

7070

71-
- name: Find latest release with baseline artifact
71+
- name: Find baseline artifact
7272
id: find_baseline
7373
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
7474
with:
7575
script: |
7676
try {
77-
// Get all releases with pagination to handle large repos
77+
// Precompute expected baseline names for early-hit optimization
7878
const releases = await github.paginate(
7979
github.rest.repos.listReleases,
80-
{ owner: context.repo.owner, repo: context.repo.repo, per_page: 100 }
80+
{ owner: context.repo.owner, repo: context.repo.repo, per_page: 50 }
8181
);
8282
83-
console.log(`Found ${releases.length} releases (paginated)`);
84-
85-
// Look for releases with baseline artifacts
83+
const expectedReleaseBaselines = new Set();
8684
for (const release of releases) {
87-
console.log(`Checking release ${release.tag_name}...`);
88-
if (release.draft || release.prerelease) {
89-
console.log(`Skipping draft/prerelease ${release.tag_name}`);
90-
continue;
85+
if (!release.draft && !release.prerelease) {
86+
const cleanTag = release.tag_name.replace(/[^a-zA-Z0-9._-]/g, '_');
87+
expectedReleaseBaselines.add(`performance-baseline-${cleanTag}`);
9188
}
89+
}
90+
console.log(`Precomputed ${expectedReleaseBaselines.size} expected release baseline names`);
91+
92+
// Fetch all successful generate-baseline.yml runs once (O(runs) API calls)
93+
console.log('Fetching recent generate-baseline.yml runs...');
94+
let count = 0;
95+
const runs = await github.paginate(
96+
github.rest.actions.listWorkflowRuns,
97+
{
98+
owner: context.repo.owner,
99+
repo: context.repo.repo,
100+
workflow_id: 'generate-baseline.yml',
101+
status: 'completed',
102+
conclusion: 'success',
103+
per_page: 100
104+
},
105+
(response, done) => {
106+
// Limit to 150 runs total across pages (no overshoot)
107+
const remaining = Math.max(0, 150 - count);
108+
if (remaining === 0) { done(); return []; }
109+
const slice = response.data.slice(0, remaining);
110+
count += slice.length;
111+
if (count >= 150) done();
112+
return slice;
113+
}
114+
);
92115
93-
// List recent runs and look for an artifact matching the release tag
116+
console.log(`Found ${runs.length} successful generate-baseline runs`);
117+
118+
// Build artifact cache: artifact name → {run_id, run_created_at}
119+
const artifactCache = new Map();
120+
let foundReleaseBaseline = false;
121+
for (const run of runs) {
94122
try {
95-
// Mirror sanitization in generate-baseline.yml (allow [A-Za-z0-9._-], replace others with _)
96-
const cleanTag = release.tag_name.replace(/[^a-zA-Z0-9._-]/g, '_');
97-
const expectedName = `performance-baseline-${cleanTag}`;
98-
let found = false;
99-
for (let page = 1; page <= 5 && !found; page++) {
100-
const runs = await github.rest.actions.listWorkflowRuns({
101-
owner: context.repo.owner,
102-
repo: context.repo.repo,
103-
workflow_id: 'generate-baseline.yml',
104-
status: 'completed',
105-
conclusion: 'success',
106-
per_page: 100,
107-
page
108-
});
109-
for (const run of runs.data.workflow_runs) {
110-
const artifacts = await github.rest.actions.listWorkflowRunArtifacts({
111-
owner: context.repo.owner,
112-
repo: context.repo.repo,
113-
run_id: run.id
114-
});
115-
const baselineArtifact = artifacts.data.artifacts.find(a => a.name === expectedName);
116-
if (baselineArtifact) {
117-
console.log(
118-
`Found baseline artifact ${expectedName} in run ${run.id} ` +
119-
`for release ${release.tag_name}`
120-
);
121-
core.setOutput('found', 'true');
122-
core.setOutput('release_tag', release.tag_name);
123-
core.setOutput('artifact_name', expectedName);
124-
core.setOutput('run_id', run.id.toString());
125-
found = true;
126-
break;
123+
const artifacts = await github.rest.actions.listWorkflowRunArtifacts({
124+
owner: context.repo.owner,
125+
repo: context.repo.repo,
126+
run_id: run.id
127+
});
128+
129+
for (const artifact of artifacts.data.artifacts) {
130+
if (artifact.name.startsWith('performance-baseline-') && artifact.expired !== true) {
131+
if (!artifactCache.has(artifact.name)) {
132+
artifactCache.set(artifact.name, {
133+
run_id: run.id,
134+
run_created_at: run.created_at
135+
});
136+
137+
// Early-hit optimization: stop if we found a release baseline
138+
if (expectedReleaseBaselines.has(artifact.name)) {
139+
console.log(`Early hit: found release baseline ${artifact.name}, stopping search`);
140+
foundReleaseBaseline = true;
141+
}
127142
}
128143
}
129144
}
130-
if (found) return;
145+
146+
// Short-circuit if we found a release baseline
147+
if (foundReleaseBaseline) break;
131148
} catch (error) {
132-
console.log(`Error checking release ${release.tag_name}: ${error.message}`);
149+
console.log(`Warning: Could not fetch artifacts for run ${run.id}: ${error.message}`);
133150
continue;
134151
}
135152
}
136153
137-
console.log('No baseline artifacts found in any release');
154+
console.log(`Built cache of ${artifactCache.size} baseline artifacts`);
155+
156+
console.log(`Found ${releases.length} releases (already fetched)`);
157+
158+
// Look for releases with baseline artifacts (now O(releases) lookups)
159+
for (const release of releases) {
160+
console.log(`Checking release ${release.tag_name}...`);
161+
if (release.draft || release.prerelease) {
162+
console.log(`Skipping draft/prerelease ${release.tag_name}`);
163+
continue;
164+
}
165+
166+
// Must match sanitize step in .github/workflows/generate-baseline.yml
167+
const cleanTag = release.tag_name.replace(/[^a-zA-Z0-9._-]/g, '_');
168+
const expectedName = `performance-baseline-${cleanTag}`;
169+
170+
if (artifactCache.has(expectedName)) {
171+
const artifactInfo = artifactCache.get(expectedName);
172+
console.log(
173+
`Found release baseline artifact ${expectedName} in run ${artifactInfo.run_id} ` +
174+
`for release ${release.tag_name}`
175+
);
176+
core.setOutput('found', 'true');
177+
core.setOutput('release_tag', release.tag_name);
178+
core.setOutput('artifact_name', expectedName);
179+
core.setOutput('run_id', artifactInfo.run_id.toString());
180+
core.setOutput('source_type', 'release');
181+
return;
182+
}
183+
}
184+
185+
console.log('No release baseline artifacts found, checking for any recent baselines...');
186+
187+
// Fallback: look for any recent baseline artifact (including manual runs)
188+
if (artifactCache.size > 0) {
189+
// Find the most recent artifact (by run creation time)
190+
let mostRecentArtifact = null;
191+
let mostRecentTime = null;
192+
193+
for (const [artifactName, artifactInfo] of artifactCache.entries()) {
194+
const runTime = new Date(artifactInfo.run_created_at);
195+
if (!mostRecentTime || runTime > mostRecentTime) {
196+
mostRecentTime = runTime;
197+
mostRecentArtifact = { name: artifactName, ...artifactInfo };
198+
}
199+
}
200+
201+
if (mostRecentArtifact) {
202+
console.log(
203+
`Found baseline artifact ${mostRecentArtifact.name} in run ${mostRecentArtifact.run_id} ` +
204+
`(created: ${mostRecentArtifact.run_created_at})`
205+
);
206+
core.setOutput('found', 'true');
207+
core.setOutput('release_tag', 'manual-baseline');
208+
core.setOutput('artifact_name', mostRecentArtifact.name);
209+
core.setOutput('run_id', mostRecentArtifact.run_id.toString());
210+
core.setOutput('source_type', 'manual');
211+
return;
212+
}
213+
}
214+
215+
console.log('No baseline artifacts found in any recent runs');
138216
core.setOutput('found', 'false');
139217
} catch (error) {
140218
console.error(`Error searching for baseline artifacts: ${error.message}`);
@@ -156,6 +234,7 @@ jobs:
156234
shell: bash
157235
env:
158236
RELEASE_TAG: ${{ steps.find_baseline.outputs.release_tag }}
237+
SOURCE_TYPE: ${{ steps.find_baseline.outputs.source_type }}
159238
run: |
160239
set -euo pipefail
161240
if [[ -f "baseline-artifact/baseline_results.txt" ]]; then
@@ -164,6 +243,10 @@ jobs:
164243
cp "baseline-artifact/baseline_results.txt" "benches/baseline_results.txt"
165244
echo "BASELINE_EXISTS=true" >> "$GITHUB_ENV"
166245
echo "BASELINE_SOURCE=artifact" >> "$GITHUB_ENV"
246+
echo "BASELINE_ORIGIN=${SOURCE_TYPE:-unknown}" >> "$GITHUB_ENV"
247+
if [[ -n "${RELEASE_TAG:-}" ]]; then
248+
echo "BASELINE_TAG=${RELEASE_TAG}" >> "$GITHUB_ENV"
249+
fi
167250
168251
# Show baseline metadata
169252
echo "=== Baseline Information (from artifact) ==="
@@ -174,29 +257,15 @@ jobs:
174257
echo "BASELINE_SOURCE=missing" >> "$GITHUB_ENV"
175258
fi
176259
177-
- name: Generate fallback baseline if none found
260+
- name: Set baseline status if none found
178261
if: steps.find_baseline.outputs.found != 'true'
179262
shell: bash
180263
run: |
181264
set -euo pipefail
182-
echo "📈 No baseline artifact found, generating fallback baseline..."
183-
echo " This baseline will be used for this comparison only."
184-
echo ""
185-
186-
# Generate baseline using dev mode for faster execution
187-
if uv run benchmark-utils generate-baseline --dev \
188-
|| uv run python -m scripts.benchmark_utils generate-baseline --dev; then
189-
echo "BASELINE_EXISTS=true" >> "$GITHUB_ENV"
190-
echo "BASELINE_SOURCE=generated" >> "$GITHUB_ENV"
191-
192-
echo "✅ Generated fallback baseline successfully"
193-
echo "=== Generated Baseline Information ==="
194-
head -n 3 benches/baseline_results.txt
195-
else
196-
echo "❌ Failed to generate fallback baseline"
197-
echo "BASELINE_EXISTS=false" >> "$GITHUB_ENV"
198-
echo "BASELINE_SOURCE=failed" >> "$GITHUB_ENV"
199-
fi
265+
echo "📈 No baseline artifact found for performance comparison"
266+
echo "BASELINE_EXISTS=false" >> "$GITHUB_ENV"
267+
echo "BASELINE_SOURCE=none" >> "$GITHUB_ENV"
268+
echo "BASELINE_ORIGIN=none" >> "$GITHUB_ENV"
200269
201270
- name: Extract baseline commit SHA
202271
if: env.BASELINE_EXISTS == 'true'
@@ -254,18 +323,14 @@ jobs:
254323
run: |
255324
set -euo pipefail
256325
echo "⚠️ No performance baseline available for comparison."
257-
if [[ "${BASELINE_SOURCE:-}" == "failed" ]]; then
258-
echo " - Failed to generate fallback baseline"
259-
echo " - Check the baseline generation workflow logs for issues"
260-
else
261-
echo " - No baseline artifacts found in recent releases"
262-
echo " - Create a new release tag to generate a baseline"
263-
fi
326+
echo " - No baseline artifacts found in recent workflow runs"
327+
echo " - Performance regression testing requires a baseline"
264328
echo ""
265-
echo "💡 To resolve:"
266-
echo " 1. Create a new release tag (e.g., v0.4.2)"
267-
echo " 2. The baseline generation workflow will run automatically"
329+
echo "💡 To enable performance regression testing:"
330+
echo " 1. Create a release tag (e.g., v0.4.3), or"
331+
echo " 2. Manually trigger the 'Generate Performance Baseline' workflow"
268332
echo " 3. Future PRs and pushes will use that baseline for comparison"
333+
echo " 4. Baselines use full benchmark settings for accurate comparisons"
269334
270335
- name: Run performance regression test
271336
if: env.BASELINE_EXISTS == 'true' && env.SKIP_BENCHMARKS == 'false'
@@ -274,17 +339,18 @@ jobs:
274339
set -euo pipefail
275340
echo "🚀 Running performance regression test..."
276341
echo " Baseline source: ${BASELINE_SOURCE:-unknown}"
342+
echo " Baseline origin: ${BASELINE_ORIGIN:-unknown}"
277343
278344
# This will exit with code 1 if significant regressions are found
279-
# Use --dev mode when comparing against a generated baseline to match settings
280-
if [[ "${BASELINE_SOURCE:-}" == "generated" ]]; then
281-
echo " Using --dev mode to match generated baseline settings"
282-
uv run benchmark-utils compare --baseline benches/baseline_results.txt --dev \
283-
|| uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt --dev
345+
echo " Using full comparison mode against ${BASELINE_ORIGIN:-unknown} baseline"
346+
if uv run benchmark-utils --help >/dev/null 2>&1; then
347+
uv run benchmark-utils compare --baseline benches/baseline_results.txt
348+
elif uv run python -c "import importlib; importlib.import_module('scripts.benchmark_utils')" \
349+
>/dev/null 2>&1; then
350+
uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt
284351
else
285-
echo " Using full comparison mode against release baseline"
286-
uv run benchmark-utils compare --baseline benches/baseline_results.txt \
287-
|| uv run python -m scripts.benchmark_utils compare --baseline benches/baseline_results.txt
352+
echo "❌ benchmark-utils entrypoint and module not found" >&2
353+
exit 2
288354
fi
289355
290356
- name: Display regression test results
@@ -316,6 +382,8 @@ jobs:
316382
echo "📊 Performance Regression Testing Summary"
317383
echo "==========================================="
318384
echo "Baseline source: ${BASELINE_SOURCE:-none}"
385+
echo "Baseline origin: ${BASELINE_ORIGIN:-unknown}"
386+
echo "Baseline tag: ${BASELINE_TAG:-n/a}"
319387
echo "Baseline exists: ${BASELINE_EXISTS:-false}"
320388
echo "Skip benchmarks: ${SKIP_BENCHMARKS:-unknown}"
321389
echo "Skip reason: ${SKIP_REASON:-n/a}"

.github/workflows/generate-baseline.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ jobs:
141141
with:
142142
name: ${{ steps.safe_name.outputs.artifact_name }}
143143
path: baseline-artifact/
144-
retention-days: 365 # Keep baselines for 1 year
144+
retention-days: 90 # Keep baselines ~90 days (align with repo settings; adjust if needed)
145145
compression-level: 6 # Good balance of speed/compression
146146

147147
- name: Display next steps

0 commit comments

Comments
 (0)