Summary
When the primary provider's output layer safety filter (e.g. MiniMax's output new_sensitive (1027)) terminates a streaming response mid-delivery, Hermes does not activate the configured fallback_providers chain. Instead it loops indefinitely retrying the same content against the same provider, which hits the same filter again.
This is a separate root cause from the stale-stream timeout case (#25689) — that bug requires the stale detector to fire; this bug's trigger fires while the stream is still appearing healthy.
Environment
- Hermes: latest main (cea87d9)
- Provider: minimax-portal/MiniMax-M2.7 via OpenRouter
- Configured fallback chain: yes (primary → fallback → fallback-2)
- Error code: output new_sensitive (1027) — provider-side content safety filter
Reproduction
- Configure minimax-portal/MiniMax-M2.7 as primary with a fallback_providers chain in config.yaml
- Send a tool-call that produces output content large enough to exceed the provider's output safety threshold (e.g. write_file with a ~17KB markdown file, patch with large context)
- Provider begins streaming; Hermes sees live chunks
- MiniMax output layer silently terminates the SSE stream
- Hermes receives a partial stream stub: finish_reason="length" + _dropped_tool_names
- Conversation loop sends a continuation prompt: "Do NOT retry the same large tool call. Break content into smaller tool calls."
- Model retries same content → hits same new_sensitive filter → loop
Expected: After N retries, the fallback provider should be activated
Actual: Infinite retry loop against the same primary; fallback chain never activates
Root cause
Two problems compound:
1. Partial stream stub is invisible to fallback logic
In agent/chat_completion_helpers.py, when a stream delivers some chunks then is killed by the provider, the response is returned as a partial stub with finish_reason="length". This is treated by conversation_loop.py as a truncation event requiring continuation, not as a failure requiring fallback.
2. Continuation prompt retries the same cause
The new continuation prompt (introduced in cea87d9) instructs the model to retry the same content with smaller tool calls. But when the filter is content-specific (always triggered by the same content), retrying locally against the same provider is futile — it hits the same new_sensitive filter again.
The stale-detector path (_maybe_activate_fallback_on_stale_stream from #25789) is never entered: new_sensitive fires on an active, chunk-delivering stream, so no stale timeout accumulates.
Suggested fix
Option A — Classify new_sensitive as a failover signal
In agent/error_classifier.py, treat new_sensitive as FailoverReason.content_filter (or similar). The stream-stall detection path already exists; it just needs a sentinel that fires when this specific error code is observed, not just on stale timeout.
Option B — Treat partial stream stub + _dropped_tool_names as a failover trigger
In conversation_loop.py, when a stub response has _dropped_tool_names and finish_reason="length", and the same tool was already retried N times, activate fallback instead of sending another continuation prompt.
Option C — Sentinel in chat completion client
In agent/chat_completion_helpers.py, when new_sensitive is detected in the stream error, surface it as a classified error so the caller can route it to the fallback logic directly.
Related issues
Impact
Any user whose primary provider has provider-side output content filtering (common for Chinese model providers) is affected. With new_sensitive triggering on large tool-call output, the fallback chain is effectively dead for those providers — users must manually /reset and switch providers.
Summary
When the primary provider's output layer safety filter (e.g. MiniMax's
output new_sensitive (1027)) terminates a streaming response mid-delivery, Hermes does not activate the configuredfallback_providerschain. Instead it loops indefinitely retrying the same content against the same provider, which hits the same filter again.This is a separate root cause from the stale-stream timeout case (#25689) — that bug requires the stale detector to fire; this bug's trigger fires while the stream is still appearing healthy.
Environment
Reproduction
Expected: After N retries, the fallback provider should be activated
Actual: Infinite retry loop against the same primary; fallback chain never activates
Root cause
Two problems compound:
1. Partial stream stub is invisible to fallback logic
In agent/chat_completion_helpers.py, when a stream delivers some chunks then is killed by the provider, the response is returned as a partial stub with finish_reason="length". This is treated by conversation_loop.py as a truncation event requiring continuation, not as a failure requiring fallback.
2. Continuation prompt retries the same cause
The new continuation prompt (introduced in cea87d9) instructs the model to retry the same content with smaller tool calls. But when the filter is content-specific (always triggered by the same content), retrying locally against the same provider is futile — it hits the same new_sensitive filter again.
The stale-detector path (_maybe_activate_fallback_on_stale_stream from #25789) is never entered: new_sensitive fires on an active, chunk-delivering stream, so no stale timeout accumulates.
Suggested fix
Option A — Classify new_sensitive as a failover signal
In agent/error_classifier.py, treat new_sensitive as FailoverReason.content_filter (or similar). The stream-stall detection path already exists; it just needs a sentinel that fires when this specific error code is observed, not just on stale timeout.
Option B — Treat partial stream stub + _dropped_tool_names as a failover trigger
In conversation_loop.py, when a stub response has _dropped_tool_names and finish_reason="length", and the same tool was already retried N times, activate fallback instead of sending another continuation prompt.
Option C — Sentinel in chat completion client
In agent/chat_completion_helpers.py, when new_sensitive is detected in the stream error, surface it as a classified error so the caller can route it to the fallback logic directly.
Related issues
Impact
Any user whose primary provider has provider-side output content filtering (common for Chinese model providers) is affected. With new_sensitive triggering on large tool-call output, the fallback chain is effectively dead for those providers — users must manually /reset and switch providers.