Skip to content

Comments

Add Kimi K2.5 experiment, improve classification logging, update model configs#52

Closed
gaojude wants to merge 1 commit intomainfrom
jude/kimi-k2.5-and-improvements
Closed

Add Kimi K2.5 experiment, improve classification logging, update model configs#52
gaojude wants to merge 1 commit intomainfrom
jude/kimi-k2.5-and-improvements

Conversation

@gaojude
Copy link
Contributor

@gaojude gaojude commented Feb 6, 2026

Summary

  • Adds Kimi K2.5 (vercel/moonshotai/kimi-k2.5) as a new experiment via Vercel AI Gateway + OpenCode. Initial run scores 11/20 evals passed (33%).
  • qa-and-export.ts now pre-filters cached classification.json files so only uncached failures hit the AI classifier. Previously it reported "Classifying 80 failures..." even when most were cached.
  • Classification progress icons updated: model failures (expected outcome) show green , infra/timeout failures (problematic) show red.
  • Explicit model IDs for claude-opus-4.6 and claude-sonnet-4.5 experiment configs.
  • Refreshed agent-results.json export.

@gaojude gaojude closed this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant