try to use different backends for binlog and build analysis by baronfel · Pull Request #502 · dotnet/skills

baronfel · 2026-04-07T15:05:27Z

No description provided.

github-actions · 2026-04-07T15:05:39Z

Note

This PR is from a fork and modifies infrastructure files (eng/ or .github/).

Changes to infrastructure typically need to be submitted from a branch in dotnet/skills (not a fork) so that CI workflows run with the correct permissions and secrets.

Please consider recreating this PR from an upstream branch. If you don't have push access to dotnet/skills, ask a maintainer to push your branch for you.

JanKrivanek · 2026-04-07T15:59:58Z

/evaluate

github-actions · 2026-04-07T16:09:43Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
check-bin-obj-clash	Diagnose bin/obj output path clashes	3.7/5 → 5.0/5 🟢	✅ check-bin-obj-clash; tools: skill / ✅ check-bin-obj-clash; binlog-generation; tools: skill	🟡 0.36	✅ [1]
incremental-build	Analyze incremental build issues	3.0/5 → 4.7/5 🟢	✅ incremental-build; tools: skill, bash	🟡 0.33	❌ [2]
build-perf-diagnostics	Diagnose slow build for a small project	4.3/5 → 4.7/5 🟢	✅ build-perf-diagnostics; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.47	❌ [3]
resolve-project-references	Explain misleading ResolveProjectReferences time	3.0/5 → 5.0/5 🟢	✅ resolve-project-references; tools: skill, glob / ✅ resolve-project-references; tools: skill	✅ 0.11	✅
eval-performance	Analyze MSBuild evaluation performance issues	3.0/5 → 4.0/5 🟢	✅ eval-performance; tools: skill, bash / ✅ eval-performance; tools: skill	🟡 0.32	✅ [4]
build-parallelism	Analyze build parallelism bottlenecks	3.0/5 → 3.3/5 ⏰ 🟢	✅ build-parallelism; tools: task, glob, skill, read_agent, bash, edit / ⚠️ NOT ACTIVATED	🟡 0.39	❌ [5]
binlog-failure-analysis	Diagnose build failures from binlog only (no source files)	4.3/5 → 5.0/5 🟢	✅ binlog-failure-analysis; tools: skill, view	🟡 0.37	✅ [6]

[1] ⚠️ High run-to-run variance (CV=25.78) — consider re-running with --runs 5
[2] ⚠️ High run-to-run variance (CV=1.92) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -24.5% due to: judgment, tokens (58411 → 163329), quality, tool calls (6 → 10), time (40.7s → 58.0s)
[3] ⚠️ High run-to-run variance (CV=1.43) — consider re-running with --runs 5
[4] ⚠️ High run-to-run variance (CV=0.64) — consider re-running with --runs 5
[5] (Plugin) Quality improved but weighted score is -0.4% due to: tokens (55678 → 104094), tool calls (7 → 11), time (46.3s → 72.2s)
[6] ⚠️ High run-to-run variance (CV=1.80) — consider re-running with --runs 5

⏰ timeout — run(s) hit the (160s, 360s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

▶ Sessions Visualisation -- interactive replay of all evaluation sessions

try to use different backends for binlog and build analysis

9f8cacd

github-actions bot added a commit that referenced this pull request Apr 7, 2026

Update PR token usage data (PR #502)

e4b302b

github-actions bot added a commit that referenced this pull request Apr 7, 2026

Update session data (PR #502)

86fdc1c

This was referenced Apr 13, 2026

Log exceptions in AgentRunner.StopAllClients catch block #349

Draft

Add --run-in-docker to skill-validator to run Copilot CLI in a docker container #273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

try to use different backends for binlog and build analysis#502

try to use different backends for binlog and build analysis#502
baronfel wants to merge 1 commit intodotnet:mainfrom
baronfel:eval-binlog-methods

baronfel commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

JanKrivanek commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

baronfel commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

JanKrivanek commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Skill Validation Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants