feat(wren-ai-service): add MiniMax as a first-class LLM provider with M2.7 default#2158
feat(wren-ai-service): add MiniMax as a first-class LLM provider with M2.7 default#2158octo-patch wants to merge 3 commits intoCanner:mainfrom
Conversation
Add a dedicated `minimax_llm` provider that talks directly to the MiniMax OpenAI-compatible Chat API (https://api.minimax.io/v1). Supported models: - MiniMax-M2.5 (default, 204K context window) - MiniMax-M2.5-highspeed (faster variant, same context) Key features: - Automatic temperature clamping to MiniMax's valid range (0, 1.0] - Automatic response_format removal (unsupported by MiniMax) - Streaming and non-streaming support - Configurable API endpoint (international / China mainland) Files added/changed: - src/providers/llm/minimax.py — MiniMax LLM provider implementation - docs/config_examples/config.minimax.yaml — full configuration example - docs/config_examples/README.md — MiniMax documentation section - tests/pytest/providers/llm/test_minimax.py — 23 unit tests - tests/pytest/providers/test_loader.py — updated provider count
WalkthroughAdds MiniMax LLM support: new MiniMaxLLMProvider implementation, configuration examples and README updates for MiniMax models, and comprehensive unit tests including provider registration, temperature clamping, streaming and non‑streaming behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant MiniMaxProvider as MiniMaxLLMProvider
participant MiniMaxAPI as MiniMax (OpenAI-compatible)
participant Callback as StreamingCallback
Client->>MiniMaxProvider: request generation(system, history, generation_kwargs)
MiniMaxProvider->>MiniMaxAPI: send messages (converted OpenAI format), stream?=true/false
alt streaming
MiniMaxAPI-->>MiniMaxProvider: stream chunks
MiniMaxProvider->>Callback: emit StreamingChunk(s)
MiniMaxProvider->>Client: final aggregated text + metadata
else non-streaming
MiniMaxAPI-->>MiniMaxProvider: final completion
MiniMaxProvider->>Client: parsed text (extract code blocks) + metadata
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
wren-ai-service/tests/pytest/providers/llm/test_minimax.py (2)
64-67: Test relies on environment not havingMINIMAX_API_KEYset.This test passes
api_key_name=Nonewhich bypasses env lookup entirely, so test isolation is fine. However, the comment could be clearer that it's testing theapi_key_name=Nonepath specifically, not the "missing env var" path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/tests/pytest/providers/llm/test_minimax.py` around lines 64 - 67, The test comment is misleading about what it's verifying; update the comment in test_no_api_key to explicitly state that it tests the api_key_name=None code path (which supplies a "placeholder" API key to allow client creation) rather than asserting behavior when the MINIMAX_API_KEY environment variable is missing; reference the test function name test_no_api_key and the MiniMaxLLMProvider instantiation and provider._client.api_key assertion so reviewers can find and clarify the comment.
86-107: Consider adding a streaming callback test.The generator tests cover non-streaming scenarios well, but the streaming path (when
streaming_callbackis provided) is not exercised. Consider adding a test that:
- Provides a mock
streaming_callback- Returns an async iterable from the mocked client
- Verifies the callback is invoked and the response is aggregated correctly
This would ensure the streaming code path (lines 155-187 in minimax.py) is tested.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/tests/pytest/providers/llm/test_minimax.py` around lines 86 - 107, Add a test exercising the streaming path for MiniMaxLLMProvider.get_generator: create a test (e.g., test_streaming_invokes_callback_and_aggregates_response) that patches MINIMAX_API_KEY, constructs MiniMaxLLMProvider with a mock streaming_callback, and mocks the underlying client method used for streaming to return an async iterable of chunks; call provider.get_generator(), run the returned async generator to completion, assert the streaming_callback was called with expected chunk(s), and assert the final aggregated response matches the concatenated chunks (verifying the streaming path and aggregation logic in get_generator/streaming_callback handling).wren-ai-service/src/providers/llm/minimax.py (2)
26-29:_extract_braces_contentonly extracts JSON objects, not arrays.The regex
\{.*?\}will not match JSON arrays (e.g.,[...]). If MiniMax responses might contain JSON arrays in code blocks, this helper will return the original string with the markdown wrapper intact.If array responses are expected, consider extending the pattern:
♻️ Optional: Support JSON arrays
def _extract_braces_content(resp: str) -> str: """Extract JSON from a markdown code block, or return the original string.""" - match = re.search(r"```json\s*(\{.*?\})\s*```", resp, re.DOTALL) + match = re.search(r"```json\s*([\{\[].*?[\}\]])\s*```", resp, re.DOTALL) return match.group(1) if match else resp🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/src/providers/llm/minimax.py` around lines 26 - 29, The helper _extract_braces_content currently only captures JSON objects because its regex uses \{.*?\}; update the pattern used in re.search inside _extract_braces_content to accept either an object or an array (e.g., match opening brace or bracket and the corresponding closing char) so code blocks like ```json [...]``` are extracted; adjust the regex and keep re.DOTALL, then return match.group(1) if found as before.
87-90: Consider adding a comment to clarify the kwargs merge order design. The pattern—whereself._model_kwargstake precedence overget_generator()generation_kwargs—is intentional and mirrors the LitellmLLMProvider. However, inside_run(), per-call generation_kwargs override all previous values. A brief comment explaining this layered precedence would help future maintainers understand the design.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/src/providers/llm/minimax.py` around lines 87 - 90, Add a short clarifying comment above the combined_generation_kwargs merge explaining the intended precedence: base defaults < (generation_kwargs from get_generator()) < self._model_kwargs, and note that per-call generation_kwargs passed into _run() override everything; reference the variables combined_generation_kwargs, generation_kwargs, self._model_kwargs, and the _run() and get_generator() behavior, and mention this mirrors LitellmLLMProvider so future maintainers understand the layered override design.wren-ai-service/docs/config_examples/config.minimax.yaml (1)
19-33: Consider lowering default temperature for SQL generation use cases.Both model configurations use
temperature: 1.0, which is at the maximum allowed value. For SQL generation tasks that require deterministic and accurate outputs, a lower temperature (e.g., 0.3-0.7) is typically preferred to reduce variability.Since this is an example config, users will likely copy these values. Consider using a more conservative default or adding a comment noting that lower temperatures are recommended for SQL generation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/docs/config_examples/config.minimax.yaml` around lines 19 - 33, Update the example configs for models "MiniMax-M2.5" and "MiniMax-M2.5-highspeed" to use a lower default "temperature" (e.g., 0.3–0.5) instead of 1.0 in their kwargs to make SQL generation more deterministic; also add a short inline comment near the "temperature" setting noting that lower values are recommended for SQL/structured-output use cases so users copying the example adopt safer defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@wren-ai-service/docs/config_examples/config.minimax.yaml`:
- Around line 1-6: The header comment incorrectly says "3 steps basically" while
enumerating four steps; update the comment so the count matches the list—either
change the phrase "3 steps basically" to "4 steps basically" or remove/merge one
of the numbered items so there are three steps; locate the top-of-file comment
containing the phrase "3 steps basically" and edit it accordingly to keep the
comment consistent with the enumerated steps.
In `@wren-ai-service/src/providers/llm/minimax.py`:
- Around line 155-167: The streaming_callback type declaration is wrong: update
the type hint for the streaming_callback parameter in minimax.py from
Callable[[StreamingChunk], None] to Callable[[StreamingChunk, str], None] (or
Optional[Callable[[StreamingChunk, str], None]] if currently optional) so it
matches actual usage where streaming_callback(chunk_delta, query_id) is called;
ensure the function signature that declares streaming_callback and any related
import/typing (StreamingChunk, Callable/Optional) are adjusted accordingly.
---
Nitpick comments:
In `@wren-ai-service/docs/config_examples/config.minimax.yaml`:
- Around line 19-33: Update the example configs for models "MiniMax-M2.5" and
"MiniMax-M2.5-highspeed" to use a lower default "temperature" (e.g., 0.3–0.5)
instead of 1.0 in their kwargs to make SQL generation more deterministic; also
add a short inline comment near the "temperature" setting noting that lower
values are recommended for SQL/structured-output use cases so users copying the
example adopt safer defaults.
In `@wren-ai-service/src/providers/llm/minimax.py`:
- Around line 26-29: The helper _extract_braces_content currently only captures
JSON objects because its regex uses \{.*?\}; update the pattern used in
re.search inside _extract_braces_content to accept either an object or an array
(e.g., match opening brace or bracket and the corresponding closing char) so
code blocks like ```json [...]``` are extracted; adjust the regex and keep
re.DOTALL, then return match.group(1) if found as before.
- Around line 87-90: Add a short clarifying comment above the
combined_generation_kwargs merge explaining the intended precedence: base
defaults < (generation_kwargs from get_generator()) < self._model_kwargs, and
note that per-call generation_kwargs passed into _run() override everything;
reference the variables combined_generation_kwargs, generation_kwargs,
self._model_kwargs, and the _run() and get_generator() behavior, and mention
this mirrors LitellmLLMProvider so future maintainers understand the layered
override design.
In `@wren-ai-service/tests/pytest/providers/llm/test_minimax.py`:
- Around line 64-67: The test comment is misleading about what it's verifying;
update the comment in test_no_api_key to explicitly state that it tests the
api_key_name=None code path (which supplies a "placeholder" API key to allow
client creation) rather than asserting behavior when the MINIMAX_API_KEY
environment variable is missing; reference the test function name
test_no_api_key and the MiniMaxLLMProvider instantiation and
provider._client.api_key assertion so reviewers can find and clarify the
comment.
- Around line 86-107: Add a test exercising the streaming path for
MiniMaxLLMProvider.get_generator: create a test (e.g.,
test_streaming_invokes_callback_and_aggregates_response) that patches
MINIMAX_API_KEY, constructs MiniMaxLLMProvider with a mock streaming_callback,
and mocks the underlying client method used for streaming to return an async
iterable of chunks; call provider.get_generator(), run the returned async
generator to completion, assert the streaming_callback was called with expected
chunk(s), and assert the final aggregated response matches the concatenated
chunks (verifying the streaming path and aggregation logic in
get_generator/streaming_callback handling).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8cedf33f-9014-4d6c-bdab-96c99ac99c5a
📒 Files selected for processing (6)
wren-ai-service/docs/config_examples/README.mdwren-ai-service/docs/config_examples/config.minimax.yamlwren-ai-service/src/providers/llm/minimax.pywren-ai-service/tests/pytest/providers/llm/__init__.pywren-ai-service/tests/pytest/providers/llm/test_minimax.pywren-ai-service/tests/pytest/providers/test_loader.py
| # you should rename this file to config.yaml and put it in ~/.wrenai | ||
| # please pay attention to the comments starting with # and adjust the config accordingly, 3 steps basically: | ||
| # 1. you need to use your own llm and embedding models | ||
| # 2. fill in embedding model dimension in the document_store section | ||
| # 3. you need to use the correct pipe definitions based on https://raw.githubusercontent.com/canner/WrenAI/<WRENAI_VERSION_NUMBER>/docker/config.example.yaml | ||
| # 4. you need to fill in correct llm and embedding models in the pipe definitions |
There was a problem hiding this comment.
Minor: Comment lists 4 steps but says "3 steps basically".
The comment on line 2 mentions "3 steps basically" but then lists steps 1 through 4.
📝 Suggested fix
-# please pay attention to the comments starting with # and adjust the config accordingly, 3 steps basically:
+# please pay attention to the comments starting with # and adjust the config accordingly, 4 steps basically:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # you should rename this file to config.yaml and put it in ~/.wrenai | |
| # please pay attention to the comments starting with # and adjust the config accordingly, 3 steps basically: | |
| # 1. you need to use your own llm and embedding models | |
| # 2. fill in embedding model dimension in the document_store section | |
| # 3. you need to use the correct pipe definitions based on https://raw.githubusercontent.com/canner/WrenAI/<WRENAI_VERSION_NUMBER>/docker/config.example.yaml | |
| # 4. you need to fill in correct llm and embedding models in the pipe definitions | |
| # you should rename this file to config.yaml and put it in ~/.wrenai | |
| # please pay attention to the comments starting with # and adjust the config accordingly, 4 steps basically: | |
| # 1. you need to use your own llm and embedding models | |
| # 2. fill in embedding model dimension in the document_store section | |
| # 3. you need to use the correct pipe definitions based on https://raw.githubusercontent.com/canner/WrenAI/<WRENAI_VERSION_NUMBER>/docker/config.example.yaml | |
| # 4. you need to fill in correct llm and embedding models in the pipe definitions |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@wren-ai-service/docs/config_examples/config.minimax.yaml` around lines 1 - 6,
The header comment incorrectly says "3 steps basically" while enumerating four
steps; update the comment so the count matches the list—either change the phrase
"3 steps basically" to "4 steps basically" or remove/merge one of the numbered
items so there are three steps; locate the top-of-file comment containing the
phrase "3 steps basically" and edit it accordingly to keep the comment
consistent with the enumerated steps.
| if streaming_callback is not None: | ||
| chunks: List[StreamingChunk] = [] | ||
|
|
||
| last_chunk = None | ||
| async for chunk in completion: | ||
| if chunk.choices and streaming_callback: | ||
| chunk_delta: StreamingChunk = build_chunk(chunk) | ||
| chunks.append(chunk_delta) | ||
| streaming_callback(chunk_delta, query_id) | ||
| last_chunk = chunk | ||
|
|
||
| complete_response = ChatMessage.from_assistant( | ||
| "".join([c.content for c in chunks]) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check streaming_callback signatures and usages across the codebase
rg -n "streaming_callback" --type py -A 3 -B 1 | head -80Repository: Canner/WrenAI
Length of output: 7180
🏁 Script executed:
#!/bin/bash
# Check minimax.py for streaming_callback signature and usage
cat -n wren-ai-service/src/providers/llm/minimax.py | grep -A 5 -B 5 "streaming_callback"Repository: Canner/WrenAI
Length of output: 1643
🏁 Script executed:
#!/bin/bash
# Search for tests related to streaming or minimax
find wren-ai-service -type f -name "*test*.py" | head -20Repository: Canner/WrenAI
Length of output: 1290
🏁 Script executed:
#!/bin/bash
# Look for tests related to minimax or streaming
rg -l "minimax|streaming" wren-ai-service --type py | grep -i testRepository: Canner/WrenAI
Length of output: 169
🏁 Script executed:
#!/bin/bash
# Check if there are any tests that would catch this signature issue
rg -n "streaming_callback" wren-ai-service --type py | grep -i testRepository: Canner/WrenAI
Length of output: 39
🏁 Script executed:
#!/bin/bash
# Check test_minimax.py content
cat wren-ai-service/tests/pytest/providers/llm/test_minimax.pyRepository: Canner/WrenAI
Length of output: 9272
Fix the streaming callback type signature mismatch.
The streaming_callback parameter is declared with type hint Callable[[StreamingChunk], None] (line 85) but is called with two arguments: streaming_callback(chunk_delta, query_id) (line 163). This type signature is incorrect and creates a contract violation. All pipeline implementations (sql_answer.py, user_guide_assistance.py, etc.) define callbacks accepting two arguments (chunk, query_id), confirming the actual signature should be Callable[[StreamingChunk, str], None]. Update the type hint on line 85 to match the actual usage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@wren-ai-service/src/providers/llm/minimax.py` around lines 155 - 167, The
streaming_callback type declaration is wrong: update the type hint for the
streaming_callback parameter in minimax.py from Callable[[StreamingChunk], None]
to Callable[[StreamingChunk, str], None] (or Optional[Callable[[StreamingChunk,
str], None]] if currently optional) so it matches actual usage where
streaming_callback(chunk_delta, query_id) is called; ensure the function
signature that declares streaming_callback and any related import/typing
(StreamingChunk, Callable/Optional) are adjusted accordingly.
- Add MiniMax-M2.7 and MiniMax-M2.7-highspeed to model list - Set MiniMax-M2.7 as default model - Keep all previous models (M2.5, M2.5-highspeed) as alternatives - Update config example and documentation - Update related unit tests
- Add MiniMax-M2.7 and MiniMax-M2.7-highspeed to model list - Set MiniMax-M2.7 as default model - Keep all previous models (M2.5, M2.5-highspeed) as alternatives - Update config example and documentation - Update related unit tests
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
wren-ai-service/src/providers/llm/minimax.py (1)
83-88:⚠️ Potential issue | 🟠 MajorFix the streaming callback contract.
Line 87 advertises a one-argument callback, but Line 165 always calls it with
(chunk_delta, query_id), and Line 112 allows that second argument to beNone. A caller that follows the current annotation will crash on the first streamed chunk.💡 Suggested fix
- streaming_callback: Optional[Callable[[StreamingChunk], None]] = None, + streaming_callback: Optional[ + Callable[[StreamingChunk, Optional[str]], None] + ] = None,Also applies to: 112-112, 165-165
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wren-ai-service/src/providers/llm/minimax.py` around lines 83 - 88, The streaming callback type in get_generator is wrong: update the signature for streaming_callback to accept two parameters (StreamingChunk and Optional[str]) and ensure all call sites use that contract; specifically change the annotation on get_generator's streaming_callback to Callable[[StreamingChunk, Optional[str]], None] (or Callable[[StreamingChunk, str], None] if you decide query_id must never be None) and ensure the invocation that calls streaming_callback(chunk_delta, query_id) and any places that pass None (line 112) are adjusted to match the chosen contract so callers won't crash.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@wren-ai-service/src/providers/llm/minimax.py`:
- Around line 66-71: Update the warning in minimax.py to reference the actual
environment variable name used (api_key_name) instead of hardcoding
"MINIMAX_API_KEY": when api_key is missing, include api_key_name in the
logger.warning message so operators see which env var to set; locate the
api_key/api_key_name logic and the logger.warning call in the module and change
the message to mention the resolved api_key_name (or its default) for accurate
troubleshooting.
- Around line 103-105: The backoff retry decorator currently wraps the whole
streaming coroutine (using `@backoff.on_exception`(backoff.expo,
openai.APIError,...)), causing retries to replay emitted chunks and to retry
non-transient failures; change this by removing the decorator from the streaming
callback and instead wrap only the actual API request invocation (the openai API
call at lines ~149-154) with a retry limited to transient SDK/network errors
(e.g., openai.APIError or a transient-only predicate), leaving the streaming
loop that yields chunks (the loop around lines ~161-165) outside of any retry
logic so already-emitted chunks are not replayed; also add a pytest that
simulates a stream that fails mid-stream to assert no duplicate/replayed chunks
on retry to prevent regression.
---
Duplicate comments:
In `@wren-ai-service/src/providers/llm/minimax.py`:
- Around line 83-88: The streaming callback type in get_generator is wrong:
update the signature for streaming_callback to accept two parameters
(StreamingChunk and Optional[str]) and ensure all call sites use that contract;
specifically change the annotation on get_generator's streaming_callback to
Callable[[StreamingChunk, Optional[str]], None] (or Callable[[StreamingChunk,
str], None] if you decide query_id must never be None) and ensure the invocation
that calls streaming_callback(chunk_delta, query_id) and any places that pass
None (line 112) are adjusted to match the chosen contract so callers won't
crash.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 010cafd4-cfeb-4afd-9880-8cea5d1ac339
📒 Files selected for processing (4)
wren-ai-service/docs/config_examples/README.mdwren-ai-service/docs/config_examples/config.minimax.yamlwren-ai-service/src/providers/llm/minimax.pywren-ai-service/tests/pytest/providers/llm/test_minimax.py
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-ai-service/docs/config_examples/README.md
| api_key = os.getenv(api_key_name) if api_key_name else None | ||
| if not api_key: | ||
| logger.warning( | ||
| "MINIMAX_API_KEY is not set. MiniMax provider will not work " | ||
| "until a valid API key is provided." | ||
| ) |
There was a problem hiding this comment.
Log the configured env var name.
When api_key_name is overridden, Lines 68-70 still tell operators to set MINIMAX_API_KEY, which points them at the wrong variable during troubleshooting.
💡 Suggested fix
- if not api_key:
+ if not api_key:
+ missing_key_name = api_key_name or "MiniMax API key"
logger.warning(
- "MINIMAX_API_KEY is not set. MiniMax provider will not work "
- "until a valid API key is provided."
+ "%s is not set. MiniMax provider will not work until a valid "
+ "API key is provided.",
+ missing_key_name,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@wren-ai-service/src/providers/llm/minimax.py` around lines 66 - 71, Update
the warning in minimax.py to reference the actual environment variable name used
(api_key_name) instead of hardcoding "MINIMAX_API_KEY": when api_key is missing,
include api_key_name in the logger.warning message so operators see which env
var to set; locate the api_key/api_key_name logic and the logger.warning call in
the module and change the message to mention the resolved api_key_name (or its
default) for accurate troubleshooting.
| @backoff.on_exception( | ||
| backoff.expo, openai.APIError, max_time=60.0, max_tries=3 | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "minimax.py" wren-ai-service/Repository: Canner/WrenAI
Length of output: 160
🏁 Script executed:
wc -l wren-ai-service/src/providers/llm/minimax.pyRepository: Canner/WrenAI
Length of output: 105
🏁 Script executed:
cat -n wren-ai-service/src/providers/llm/minimax.pyRepository: Canner/WrenAI
Length of output: 8708
🏁 Script executed:
cat -n wren-ai-service/tests/pytest/providers/llm/test_minimax.pyRepository: Canner/WrenAI
Length of output: 11267
🏁 Script executed:
rg -r "streaming" wren-ai-service/tests/ -A 2 -B 2 --type pyRepository: Canner/WrenAI
Length of output: 39
🏁 Script executed:
rg "streaming_callback" wren-ai-service/src/ -A 3 -B 2 --type pyRepository: Canner/WrenAI
Length of output: 12799
🏁 Script executed:
find wren-ai-service/tests -name "*.py" -type f -exec grep -l "streaming" {} \;Repository: Canner/WrenAI
Length of output: 115
🏁 Script executed:
sed -n '1,150p' wren-ai-service/src/providers/llm/litellm.py | head -100Repository: Canner/WrenAI
Length of output: 3662
🏁 Script executed:
rg "@backoff" wren-ai-service/src/providers/llm/ -A 5 -B 2 --type pyRepository: Canner/WrenAI
Length of output: 1316
🏁 Script executed:
rg "streaming_callback" wren-ai-service/src/providers/llm/litellm.py -A 10 -B 5 --type py | head -80Repository: Canner/WrenAI
Length of output: 2684
🏁 Script executed:
cd wren-ai-service && just test 2>&1 | head -50Repository: Canner/WrenAI
Length of output: 99
🏁 Script executed:
ls -la wren-ai-service/ | grep -E "(justfile|pytest|Makefile)"Repository: Canner/WrenAI
Length of output: 39
🏁 Script executed:
find wren-ai-service -name "justfile" -o -name "pyproject.toml" -o -name "setup.py" | head -5Repository: Canner/WrenAI
Length of output: 87
🏁 Script executed:
cat wren-ai-service/pyproject.toml | grep -A 20 "\[tool.pytest"Repository: Canner/WrenAI
Length of output: 39
🏁 Script executed:
cat wren-ai-service/pyproject.toml | head -100Repository: Canner/WrenAI
Length of output: 2047
🏁 Script executed:
grep -r "streaming" wren-ai-service/tests/pytest/providers/llm/ -A 5 -B 2 --type pyRepository: Canner/WrenAI
Length of output: 174
Move backoff retry logic outside the streaming callback path.
Lines 103-105 wrap the entire coroutine in @backoff.on_exception(..., openai.APIError, ...). If the stream fails after line 165 emits chunks, retry replays the prefix and corrupts the caller's streamed output. This also unnecessarily retries permanent request/auth failures. Refactor to limit retries to transient SDK errors only—the API call (lines 149-154) should be wrapped separately from the streaming loop (lines 161-165). Add a streaming pytest test to prevent regression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@wren-ai-service/src/providers/llm/minimax.py` around lines 103 - 105, The
backoff retry decorator currently wraps the whole streaming coroutine (using
`@backoff.on_exception`(backoff.expo, openai.APIError,...)), causing retries to
replay emitted chunks and to retry non-transient failures; change this by
removing the decorator from the streaming callback and instead wrap only the
actual API request invocation (the openai API call at lines ~149-154) with a
retry limited to transient SDK/network errors (e.g., openai.APIError or a
transient-only predicate), leaving the streaming loop that yields chunks (the
loop around lines ~161-165) outside of any retry logic so already-emitted chunks
are not replayed; also add a pytest that simulates a stream that fails
mid-stream to assert no duplicate/replayed chunks on retry to prevent
regression.
Summary
minimax_llm) with@providerdecoratorresponse_formatremoval for MiniMax API compatibilityChanges
wren-ai-service/src/providers/llm/minimax.py— New MiniMax LLM providerwren-ai-service/tests/pytest/providers/llm/test_minimax.py— 24 unit testswren-ai-service/docs/config_examples/config.minimax.yaml— Example configwren-ai-service/docs/config_examples/README.md— Documentationwren-ai-service/tests/pytest/providers/test_loader.py— Provider registration testModels
MiniMax-M2.7(default)MiniMax-M2.7-highspeedMiniMax-M2.5MiniMax-M2.5-highspeedTesting