hotfix(ai-diagnosis): correct misclassifications and surface provider error JSON by SiinXu · Pull Request #15241 · CherryHQ/cherry-studio

SiinXu · 2026-05-21T13:02:02Z

What this PR does

Before this PR:
The AI diagnosis feature gave misleading results in several common scenarios:

A 429 from high-frequency requests was diagnosed as "account quota exhausted, please recharge" (it's actually a rate limit, not a billing issue).
A 403 from a geo-restricted region (OpenAI's unsupported_country_region_territory, Anthropic geo-blocks) was diagnosed as "API key invalid, please reconfigure" — sending users to re-check a key that was fine.
MCP / OCR timeouts were stolen by the generic network rule and routed to /settings/general instead of the relevant feature settings.
The AI second layer ignored the provider's actual error JSON (responseBody, data) and Gemini/Anthropic's structured finishReason (SAFETY / RECITATION), so the AI had to guess from the wrapper message.

After this PR:

Rate limit is a separate category from quota; the order of keyword checks ensures 429 + insufficient_balance still routes to quota.
HTTP 402 Payment Required routes to quota and is mirrored in the AI prompt context, so providers that signal billing failure via 402 get billing-language guidance instead of falling through to unknown.
New region category for geo-blocks, with its own i18n text and navigation to general/proxy settings.
Specialized rules (MCP, OCR) run before the network rule so feature-specific timeouts are categorized correctly.
Narrowed stream, parse, and deprecated keyword matches to remove false positives (e.g. max_tokens must be a valid JSON number no longer flags as a parse error; "parameter is deprecated" no longer triggers "model deprecated").
Expanded content-filter detection across OpenAI / Anthropic / Gemini (incl. structured finishReason).
The AI prompt now includes responseBody, data, finishReason, and auto-fills provider / modelId from the error itself when the caller did not provide context.

Fixes #

Why we need it and why it was done in this way

The misclassifications were already shipping wrong advice to users (e.g. telling them to top up balance when they just need to slow down requests, or to check their API key when their region is blocked). This is a correctness issue, not an enhancement.

The following tradeoffs were made:

The rule classifier file grew (~150 → ~200 lines), but each new branch is justified by a real-world provider error pattern; alternatives like a single regex table would obscure the ordering logic that is core to the fix.
The AI prompt now includes 2 additional examples (rate_limit, quota) and slightly more rules. This raises per-call token usage modestly. Diagnosis runs on the CherryAI free model, so the cost impact is effectively zero, and the examples meaningfully steer the model away from the most common wrong answers.
Narrowed keywords trade some edge-case coverage for accuracy. The cases that fall through now go to unknown, which still triggers the AI second layer, so user experience does not degrade.

The following alternatives were considered:

Adding only the rate_limit split and stopping there — rejected because the same root pattern (overly broad rules and lost context) caused several other misdiagnoses surfaced during the audit.
Letting the AI handle all classification with no rule layer — rejected because the rule layer is the offline fallback when the diagnosis AI itself fails, and it is also what we display when AI is still loading.

Links to places where the discussion took place: N/A

Breaking changes

None. The change is purely a correction to error-diagnosis logic. No public APIs, data schemas, or persistence formats change. Persisted diagnosis results in block.metadata continue to render unchanged.

Special notes for your reviewer

The new region category navigates to /settings/general (proxy config) rather than /settings/provider, because geo-block fixes typically require a proxy, not a new API key.
The cause field on SerializedAiSdkError is intentionally read as a string (it is serialized by serializeError in src/renderer/src/utils/error.ts). The new responseBody / data / finishReason extraction relies on the top-level fields already flattened by the same serializer; no nested status-code extraction is needed.
15 new unit tests cover the new categories (incl. HTTP 402 → quota), narrowed rules, and the new AI-input fields; the existing test suite (9320 tests) continues to pass.

Checklist

PR: The PR description is expressive enough and will help future contributors
Code: Write code that humans can understand and Keep it simple
Refactor: You have left the code cleaner than you found it (Boy Scout Rule)
Upgrade: Impact of this change on upgrade flows was considered and addressed if required
Documentation: A user-guide update was considered and is present (link) or not required. Check this only when the PR introduces or changes a user-facing feature or behavior.
Self-review: I have reviewed my own code (e.g., via /gh-pr-review, gh pr diff, or GitHub UI) before requesting review from others

Release note

Fix AI error diagnosis to correctly distinguish rate limits from quota exhaustion, geo-region blocks from auth failures, and feature-specific (MCP / OCR) timeouts from generic network errors. The diagnosis pipeline now forwards the provider's structured error fields (responseBody, data, finishReason) to the AI model, and falls back to a rule-based classification message when AI diagnosis itself fails.

Related issues

Refs [Feature]: Improve AI Error Diagnosis with More Context and Hidden Raw Stack #15025 (Improve AI Error Diagnosis with More Context and Hidden Raw Stack) — this PR fixes the misclassification axis (rate-limit vs quota, geo-block vs auth, MCP/OCR-specific timeouts) and surfaces provider error JSON to the diagnosis pipeline; the broader UX improvements in [Feature]: Improve AI Error Diagnosis with More Context and Hidden Raw Stack #15025 remain open.

…ror JSON The rule layer conflated rate-limit with quota, treated 403 geo-blocks as auth, and let generic network rules steal MCP/OCR timeouts; the AI layer ignored the actual provider responseBody/data/finishReason that carry the diagnostic signal, and the UI showed raw exception text when diagnosis itself failed. This rewrites the classification rules, pipes the missing fields into the AI prompt, and adds a rule-based fallback in the UI on diagnosis failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Siin Xu <31815270+SiinXu@users.noreply.github.com>

… fallback Tighten this hotfix to only the misclassifications confirmed in real provider responses. Defensive keywords without a known triggering case ('exceeds the maximum', 'tls', 'ssl handshake', 'geographic', 'geo block') are removed; the diagnosis-failed UI fallback (rule-based text + ai_failed i18n key) is reverted because it is a UX improvement orthogonal to this hotfix and belongs in a separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Siin Xu <31815270+SiinXu@users.noreply.github.com>

ousugo

Note

This issue/comment/review was translated by Claude.

This code review primarily found one inconsistency between the PR description and the final code implementation (see inline comments for details). Other than that, the refactoring of rule categorization for newly added geo-detection and rate limiting is very clear, and the unit tests are also very comprehensive.

ousugo · 2026-05-21T17:23:22Z

+    | 'rate_limit'
    | 'context_length'
    | 'payload'
    | 'network'


Note
This comment was translated by Claude.

[B7] The PR description mentions: "On diagnosis failure, the UI shows the rule-based classification text first, then the raw error below as a small detail, with the retry button kept." However, in the final commit of this branch, this logic and the i18n translation key "ai_failed" were removed in commit "2e1f28c8a". The current code does not include this fallback UI logic, which conflicts with the PR description. I suggest updating the PR description or implementing this feature.

Original Content

[B7] PR 描述中提及："On diagnosis failure, the UI shows the rule-based classification text first, then the raw error below as a small detail, with the retry button kept."。但在分支的最终提交中，该部分逻辑和 i18n 翻译键 "ai_failed" 在 commit "2e1f28c8a" 被移除了。目前代码中并未包含此兜底 UI 逻辑，与 PR 描述存在冲突，建议更新 PR 描述或补充该功能实现。

PhoenixCPH · 2026-05-21T17:36:55Z

Note

This comment was translated by Claude.

Why are all PRs submitted by AI now?

Original Content

怎么现在都是全AI提的PR了

zhibisora

Note

This review was translated by Claude.

Two review points: complete the quota categorization/diagnostic context/tests for 402 Payment Required, and complete non-core language translations for the newly added diagnostic text.

Original Content

留下两点 review：补齐 402 Payment Required 的 quota 分类/诊断上下文/测试，以及为新增诊断文案补全非核心语言翻译。

zhibisora · 2026-05-28T07:39:19Z

  }

-  // Quota / rate limit (429)
+  // Quota / balance exhausted — check first so "429 + insufficient_balance" routes here, not to rate_limit


[A1] This split still misses plain HTTP 402 Payment Required, which several providers use for billing/quota failures. If a provider returns status 402 with a generic message, it can fall through to unknown instead of quota. Please route numStatus === 402 to quota, mirror that in ErrorDiagnosisService so the AI prompt uses quota/billing context for 402, and add unit coverage for the classifier and diagnosis hint.

zhibisora · 2026-05-28T07:39:35Z

      "payload": "Nội dung yêu cầu quá lớn, vui lòng giảm kích thước tệp hoặc văn bản",
      "proxy": "Lỗi proxy hoặc chứng chỉ SSL, hãy kiểm tra cài đặt proxy và mạng",
      "quota": "Hạn mức tài khoản đã cạn kiệt, vui lòng nạp tiền hoặc chuyển đổi nhà cung cấp",
+      "rate_limit": "[to be translated]:Too many requests in a short time. Wait a moment and retry, or switch to a model with a higher rate limit",


[C1] The new rate_limit / region diagnosis strings are user-visible, but the translate files still contain [to be translated]:... English placeholders across the non-core locales. Please provide actual translations for all touched locale files, and include the new error.http.402 translation as well if 402 handling is added.

…ations Address review feedback from #15241: - Add `numStatus === 402` to the quota branch in `errorClassifier.ts`. HTTP 402 Payment Required is the canonical billing-failure status used by several providers and gateways. Previously these fell through to `unknown` and missed the quota navTarget / context. - Mirror the 402 check in `ErrorDiagnosisService` so the AI prompt receives billing/quota context for 402, not the rate-limit context (which would advise the user to "wait and retry" — wrong for a billing failure). - Add `error.http.402` text to all 12 locale files. - Translate the previously-placeholder `rate_limit` and `region` strings across the 9 non-core locales (de, el, es, fr, ja, pt, ro, ru, vi). - Add 3 unit tests covering the 402 routing path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Siin Xu <31815270+SiinXu@users.noreply.github.com>

SiinXu · 2026-05-28T07:56:11Z

Thanks for the review — addressed all three points in 1858b4be:

A1 (402 → quota) — done. numStatus === 402 now routes to quota in errorClassifier.ts, and ErrorDiagnosisService.buildContext() mirrors the same check so the AI prompt gets billing/quota language (not the rate-limit "wait and retry" prompt). Added 3 unit tests covering 402 with explicit Payment Required, 402 with a generic message, and verification that the AI prompt picks the quota context branch.

C1 (translations) — done. error.http.402 added to all 12 locales (3 core + 9 translate). The rate_limit and region placeholders across de-de / el-gr / es-es / fr-fr / ja-jp / pt-pt / ro-ro / ru-ru / vi-vn have been replaced with proper translations. pnpm i18n:check passes.

B7 (PR description / ai_failed fallback) — resolved by updating the PR description, not by restoring the code. The deletion in 2e1f28c8 was intentional: the diagnosis-failure rule-based fallback is a UX improvement orthogonal to this hotfix and belongs in a separate PR (the commit message explains this). I've removed the three lines from the PR description that promised behavior the current code doesn't implement (the "On diagnosis failure, the UI shows the rule-based classification text…" bullet, the upstream "no actionable fallback" line, and the IIFE refactor note). The diff now matches what reviewers see in the code.

Also added 402 to the "After this PR" section in the description so it's discoverable from the PR summary.

Ready for re-review when you have a moment. @ousugo @zhibisora

SiinXu added this to the v1.9.7 milestone May 21, 2026

SiinXu mentioned this pull request May 21, 2026

hotfix(ai-diagnosis): correct misclassifications and surface provider error JSON #15155

Closed

6 tasks

ousugo reviewed May 21, 2026

View reviewed changes

zhibisora reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hotfix(ai-diagnosis): correct misclassifications and surface provider error JSON#15241

hotfix(ai-diagnosis): correct misclassifications and surface provider error JSON#15241
SiinXu wants to merge 3 commits into
mainfrom
hotfix/ai-diagnosis-clean

SiinXu commented May 21, 2026 •

edited

Loading

Uh oh!

ousugo left a comment •

edited by kangfenmao

Loading

Uh oh!

ousugo May 21, 2026 •

edited by kangfenmao

Loading

Uh oh!

PhoenixCPH commented May 21, 2026 •

edited by kangfenmao

Loading

Uh oh!

zhibisora left a comment •

edited by kangfenmao

Loading

Uh oh!

zhibisora May 28, 2026

Uh oh!

zhibisora May 28, 2026

Uh oh!

SiinXu commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SiinXu commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why we need it and why it was done in this way

Breaking changes

Special notes for your reviewer

Checklist

Release note

Related issues

Uh oh!

ousugo left a comment • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ousugo May 21, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PhoenixCPH commented May 21, 2026 • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhibisora left a comment • edited by kangfenmao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhibisora May 28, 2026

Choose a reason for hiding this comment

Uh oh!

zhibisora May 28, 2026

Choose a reason for hiding this comment

Uh oh!

SiinXu commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SiinXu commented May 21, 2026 •

edited

Loading

ousugo left a comment •

edited by kangfenmao

Loading

ousugo May 21, 2026 •

edited by kangfenmao

Loading

PhoenixCPH commented May 21, 2026 •

edited by kangfenmao

Loading

zhibisora left a comment •

edited by kangfenmao

Loading