Skip to content

feat(wren-ai-service): add MiniMax as a first-class LLM provider with M2.7 default#2158

Open
octo-patch wants to merge 3 commits intoCanner:mainfrom
octo-patch:feature/add-minimax-provider
Open

feat(wren-ai-service): add MiniMax as a first-class LLM provider with M2.7 default#2158
octo-patch wants to merge 3 commits intoCanner:mainfrom
octo-patch:feature/add-minimax-provider

Conversation

@octo-patch
Copy link
Copy Markdown

@octo-patch octo-patch commented Mar 17, 2026

Summary

  • Add MiniMax as a dedicated LLM provider (minimax_llm) with @provider decorator
  • Support MiniMax-M2.7 (default), M2.7-highspeed, M2.5, and M2.5-highspeed models
  • Built-in temperature clamping and response_format removal for MiniMax API compatibility
  • Streaming support with proper chunk handling

Changes

  • wren-ai-service/src/providers/llm/minimax.py — New MiniMax LLM provider
  • wren-ai-service/tests/pytest/providers/llm/test_minimax.py — 24 unit tests
  • wren-ai-service/docs/config_examples/config.minimax.yaml — Example config
  • wren-ai-service/docs/config_examples/README.md — Documentation
  • wren-ai-service/tests/pytest/providers/test_loader.py — Provider registration test

Models

Model Description
MiniMax-M2.7 (default) Latest flagship model with enhanced reasoning and coding
MiniMax-M2.7-highspeed High-speed version of M2.7 for low-latency scenarios
MiniMax-M2.5 Peak Performance, 204K context
MiniMax-M2.5-highspeed Faster variant, 204K context

Testing

  • 24 unit tests passing (temperature clamping, response_format removal, streaming, history messages, provider registration)
  • Integration tested with MiniMax API

Add a dedicated `minimax_llm` provider that talks directly to the MiniMax
OpenAI-compatible Chat API (https://api.minimax.io/v1).

Supported models:
- MiniMax-M2.5 (default, 204K context window)
- MiniMax-M2.5-highspeed (faster variant, same context)

Key features:
- Automatic temperature clamping to MiniMax's valid range (0, 1.0]
- Automatic response_format removal (unsupported by MiniMax)
- Streaming and non-streaming support
- Configurable API endpoint (international / China mainland)

Files added/changed:
- src/providers/llm/minimax.py — MiniMax LLM provider implementation
- docs/config_examples/config.minimax.yaml — full configuration example
- docs/config_examples/README.md — MiniMax documentation section
- tests/pytest/providers/llm/test_minimax.py — 23 unit tests
- tests/pytest/providers/test_loader.py — updated provider count
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 17, 2026

Walkthrough

Adds MiniMax LLM support: new MiniMaxLLMProvider implementation, configuration examples and README updates for MiniMax models, and comprehensive unit tests including provider registration, temperature clamping, streaming and non‑streaming behaviors.

Changes

Cohort / File(s) Summary
Documentation & Config
wren-ai-service/docs/config_examples/README.md, wren-ai-service/docs/config_examples/config.minimax.yaml
Adds a MiniMax section to README and a full example YAML config (LLM/provider, embedder, qdrant store, engines, many pipelines, and settings) documenting models, endpoints, env vars, and usage notes.
LLM Provider
wren-ai-service/src/providers/llm/minimax.py
New MiniMaxLLMProvider class implementing OpenAI-compatible MiniMax API integration with streaming support, temperature clamping, response_format stripping, message conversion, retry/backoff, and text extraction from code blocks.
Tests & Loader
wren-ai-service/tests/pytest/providers/llm/test_minimax.py, wren-ai-service/tests/pytest/providers/test_loader.py
Adds comprehensive tests for MiniMax provider (clamping, init, API key handling, generator behavior, streaming/non-streaming, history), and updates loader tests to include minimax_llm provider.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant MiniMaxProvider as MiniMaxLLMProvider
  participant MiniMaxAPI as MiniMax (OpenAI-compatible)
  participant Callback as StreamingCallback

  Client->>MiniMaxProvider: request generation(system, history, generation_kwargs)
  MiniMaxProvider->>MiniMaxAPI: send messages (converted OpenAI format), stream?=true/false
  alt streaming
    MiniMaxAPI-->>MiniMaxProvider: stream chunks
    MiniMaxProvider->>Callback: emit StreamingChunk(s)
    MiniMaxProvider->>Client: final aggregated text + metadata
  else non-streaming
    MiniMaxAPI-->>MiniMaxProvider: final completion
    MiniMaxProvider->>Client: parsed text (extract code blocks) + metadata
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested labels

wren-ai-service, ai-env-changed

Suggested reviewers

  • cyyeh
  • imAsterSun
  • yichieh-lu

Poem

🐰 I hopped in code to add a new friend,
MiniMax replies that stream and mend,
Docs and configs snug in a row,
Tests to prove the outputs flow,
A rabbit cheers — ready, go! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: adding MiniMax as a first-class LLM provider with M2.7 as the default model, which is the primary focus of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
wren-ai-service/tests/pytest/providers/llm/test_minimax.py (2)

64-67: Test relies on environment not having MINIMAX_API_KEY set.

This test passes api_key_name=None which bypasses env lookup entirely, so test isolation is fine. However, the comment could be clearer that it's testing the api_key_name=None path specifically, not the "missing env var" path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/tests/pytest/providers/llm/test_minimax.py` around lines 64 -
67, The test comment is misleading about what it's verifying; update the comment
in test_no_api_key to explicitly state that it tests the api_key_name=None code
path (which supplies a "placeholder" API key to allow client creation) rather
than asserting behavior when the MINIMAX_API_KEY environment variable is
missing; reference the test function name test_no_api_key and the
MiniMaxLLMProvider instantiation and provider._client.api_key assertion so
reviewers can find and clarify the comment.

86-107: Consider adding a streaming callback test.

The generator tests cover non-streaming scenarios well, but the streaming path (when streaming_callback is provided) is not exercised. Consider adding a test that:

  1. Provides a mock streaming_callback
  2. Returns an async iterable from the mocked client
  3. Verifies the callback is invoked and the response is aggregated correctly

This would ensure the streaming code path (lines 155-187 in minimax.py) is tested.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/tests/pytest/providers/llm/test_minimax.py` around lines 86 -
107, Add a test exercising the streaming path for
MiniMaxLLMProvider.get_generator: create a test (e.g.,
test_streaming_invokes_callback_and_aggregates_response) that patches
MINIMAX_API_KEY, constructs MiniMaxLLMProvider with a mock streaming_callback,
and mocks the underlying client method used for streaming to return an async
iterable of chunks; call provider.get_generator(), run the returned async
generator to completion, assert the streaming_callback was called with expected
chunk(s), and assert the final aggregated response matches the concatenated
chunks (verifying the streaming path and aggregation logic in
get_generator/streaming_callback handling).
wren-ai-service/src/providers/llm/minimax.py (2)

26-29: _extract_braces_content only extracts JSON objects, not arrays.

The regex \{.*?\} will not match JSON arrays (e.g., [...]). If MiniMax responses might contain JSON arrays in code blocks, this helper will return the original string with the markdown wrapper intact.

If array responses are expected, consider extending the pattern:

♻️ Optional: Support JSON arrays
 def _extract_braces_content(resp: str) -> str:
     """Extract JSON from a markdown code block, or return the original string."""
-    match = re.search(r"```json\s*(\{.*?\})\s*```", resp, re.DOTALL)
+    match = re.search(r"```json\s*([\{\[].*?[\}\]])\s*```", resp, re.DOTALL)
     return match.group(1) if match else resp
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/providers/llm/minimax.py` around lines 26 - 29, The
helper _extract_braces_content currently only captures JSON objects because its
regex uses \{.*?\}; update the pattern used in re.search inside
_extract_braces_content to accept either an object or an array (e.g., match
opening brace or bracket and the corresponding closing char) so code blocks like
```json [...]``` are extracted; adjust the regex and keep re.DOTALL, then return
match.group(1) if found as before.

87-90: Consider adding a comment to clarify the kwargs merge order design. The pattern—where self._model_kwargs take precedence over get_generator() generation_kwargs—is intentional and mirrors the LitellmLLMProvider. However, inside _run(), per-call generation_kwargs override all previous values. A brief comment explaining this layered precedence would help future maintainers understand the design.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/providers/llm/minimax.py` around lines 87 - 90, Add a
short clarifying comment above the combined_generation_kwargs merge explaining
the intended precedence: base defaults < (generation_kwargs from
get_generator()) < self._model_kwargs, and note that per-call generation_kwargs
passed into _run() override everything; reference the variables
combined_generation_kwargs, generation_kwargs, self._model_kwargs, and the
_run() and get_generator() behavior, and mention this mirrors LitellmLLMProvider
so future maintainers understand the layered override design.
wren-ai-service/docs/config_examples/config.minimax.yaml (1)

19-33: Consider lowering default temperature for SQL generation use cases.

Both model configurations use temperature: 1.0, which is at the maximum allowed value. For SQL generation tasks that require deterministic and accurate outputs, a lower temperature (e.g., 0.3-0.7) is typically preferred to reduce variability.

Since this is an example config, users will likely copy these values. Consider using a more conservative default or adding a comment noting that lower temperatures are recommended for SQL generation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/docs/config_examples/config.minimax.yaml` around lines 19 -
33, Update the example configs for models "MiniMax-M2.5" and
"MiniMax-M2.5-highspeed" to use a lower default "temperature" (e.g., 0.3–0.5)
instead of 1.0 in their kwargs to make SQL generation more deterministic; also
add a short inline comment near the "temperature" setting noting that lower
values are recommended for SQL/structured-output use cases so users copying the
example adopt safer defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wren-ai-service/docs/config_examples/config.minimax.yaml`:
- Around line 1-6: The header comment incorrectly says "3 steps basically" while
enumerating four steps; update the comment so the count matches the list—either
change the phrase "3 steps basically" to "4 steps basically" or remove/merge one
of the numbered items so there are three steps; locate the top-of-file comment
containing the phrase "3 steps basically" and edit it accordingly to keep the
comment consistent with the enumerated steps.

In `@wren-ai-service/src/providers/llm/minimax.py`:
- Around line 155-167: The streaming_callback type declaration is wrong: update
the type hint for the streaming_callback parameter in minimax.py from
Callable[[StreamingChunk], None] to Callable[[StreamingChunk, str], None] (or
Optional[Callable[[StreamingChunk, str], None]] if currently optional) so it
matches actual usage where streaming_callback(chunk_delta, query_id) is called;
ensure the function signature that declares streaming_callback and any related
import/typing (StreamingChunk, Callable/Optional) are adjusted accordingly.

---

Nitpick comments:
In `@wren-ai-service/docs/config_examples/config.minimax.yaml`:
- Around line 19-33: Update the example configs for models "MiniMax-M2.5" and
"MiniMax-M2.5-highspeed" to use a lower default "temperature" (e.g., 0.3–0.5)
instead of 1.0 in their kwargs to make SQL generation more deterministic; also
add a short inline comment near the "temperature" setting noting that lower
values are recommended for SQL/structured-output use cases so users copying the
example adopt safer defaults.

In `@wren-ai-service/src/providers/llm/minimax.py`:
- Around line 26-29: The helper _extract_braces_content currently only captures
JSON objects because its regex uses \{.*?\}; update the pattern used in
re.search inside _extract_braces_content to accept either an object or an array
(e.g., match opening brace or bracket and the corresponding closing char) so
code blocks like ```json [...]``` are extracted; adjust the regex and keep
re.DOTALL, then return match.group(1) if found as before.
- Around line 87-90: Add a short clarifying comment above the
combined_generation_kwargs merge explaining the intended precedence: base
defaults < (generation_kwargs from get_generator()) < self._model_kwargs, and
note that per-call generation_kwargs passed into _run() override everything;
reference the variables combined_generation_kwargs, generation_kwargs,
self._model_kwargs, and the _run() and get_generator() behavior, and mention
this mirrors LitellmLLMProvider so future maintainers understand the layered
override design.

In `@wren-ai-service/tests/pytest/providers/llm/test_minimax.py`:
- Around line 64-67: The test comment is misleading about what it's verifying;
update the comment in test_no_api_key to explicitly state that it tests the
api_key_name=None code path (which supplies a "placeholder" API key to allow
client creation) rather than asserting behavior when the MINIMAX_API_KEY
environment variable is missing; reference the test function name
test_no_api_key and the MiniMaxLLMProvider instantiation and
provider._client.api_key assertion so reviewers can find and clarify the
comment.
- Around line 86-107: Add a test exercising the streaming path for
MiniMaxLLMProvider.get_generator: create a test (e.g.,
test_streaming_invokes_callback_and_aggregates_response) that patches
MINIMAX_API_KEY, constructs MiniMaxLLMProvider with a mock streaming_callback,
and mocks the underlying client method used for streaming to return an async
iterable of chunks; call provider.get_generator(), run the returned async
generator to completion, assert the streaming_callback was called with expected
chunk(s), and assert the final aggregated response matches the concatenated
chunks (verifying the streaming path and aggregation logic in
get_generator/streaming_callback handling).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8cedf33f-9014-4d6c-bdab-96c99ac99c5a

📥 Commits

Reviewing files that changed from the base of the PR and between eea642f and d7d12dd.

📒 Files selected for processing (6)
  • wren-ai-service/docs/config_examples/README.md
  • wren-ai-service/docs/config_examples/config.minimax.yaml
  • wren-ai-service/src/providers/llm/minimax.py
  • wren-ai-service/tests/pytest/providers/llm/__init__.py
  • wren-ai-service/tests/pytest/providers/llm/test_minimax.py
  • wren-ai-service/tests/pytest/providers/test_loader.py

Comment on lines +1 to +6
# you should rename this file to config.yaml and put it in ~/.wrenai
# please pay attention to the comments starting with # and adjust the config accordingly, 3 steps basically:
# 1. you need to use your own llm and embedding models
# 2. fill in embedding model dimension in the document_store section
# 3. you need to use the correct pipe definitions based on https://raw.githubusercontent.com/canner/WrenAI/<WRENAI_VERSION_NUMBER>/docker/config.example.yaml
# 4. you need to fill in correct llm and embedding models in the pipe definitions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: Comment lists 4 steps but says "3 steps basically".

The comment on line 2 mentions "3 steps basically" but then lists steps 1 through 4.

📝 Suggested fix
-# please pay attention to the comments starting with # and adjust the config accordingly, 3 steps basically:
+# please pay attention to the comments starting with # and adjust the config accordingly, 4 steps basically:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# you should rename this file to config.yaml and put it in ~/.wrenai
# please pay attention to the comments starting with # and adjust the config accordingly, 3 steps basically:
# 1. you need to use your own llm and embedding models
# 2. fill in embedding model dimension in the document_store section
# 3. you need to use the correct pipe definitions based on https://raw.githubusercontent.com/canner/WrenAI/<WRENAI_VERSION_NUMBER>/docker/config.example.yaml
# 4. you need to fill in correct llm and embedding models in the pipe definitions
# you should rename this file to config.yaml and put it in ~/.wrenai
# please pay attention to the comments starting with # and adjust the config accordingly, 4 steps basically:
# 1. you need to use your own llm and embedding models
# 2. fill in embedding model dimension in the document_store section
# 3. you need to use the correct pipe definitions based on https://raw.githubusercontent.com/canner/WrenAI/<WRENAI_VERSION_NUMBER>/docker/config.example.yaml
# 4. you need to fill in correct llm and embedding models in the pipe definitions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/docs/config_examples/config.minimax.yaml` around lines 1 - 6,
The header comment incorrectly says "3 steps basically" while enumerating four
steps; update the comment so the count matches the list—either change the phrase
"3 steps basically" to "4 steps basically" or remove/merge one of the numbered
items so there are three steps; locate the top-of-file comment containing the
phrase "3 steps basically" and edit it accordingly to keep the comment
consistent with the enumerated steps.

Comment on lines +155 to +167
if streaming_callback is not None:
chunks: List[StreamingChunk] = []

last_chunk = None
async for chunk in completion:
if chunk.choices and streaming_callback:
chunk_delta: StreamingChunk = build_chunk(chunk)
chunks.append(chunk_delta)
streaming_callback(chunk_delta, query_id)
last_chunk = chunk

complete_response = ChatMessage.from_assistant(
"".join([c.content for c in chunks])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check streaming_callback signatures and usages across the codebase
rg -n "streaming_callback" --type py -A 3 -B 1 | head -80

Repository: Canner/WrenAI

Length of output: 7180


🏁 Script executed:

#!/bin/bash
# Check minimax.py for streaming_callback signature and usage
cat -n wren-ai-service/src/providers/llm/minimax.py | grep -A 5 -B 5 "streaming_callback"

Repository: Canner/WrenAI

Length of output: 1643


🏁 Script executed:

#!/bin/bash
# Search for tests related to streaming or minimax
find wren-ai-service -type f -name "*test*.py" | head -20

Repository: Canner/WrenAI

Length of output: 1290


🏁 Script executed:

#!/bin/bash
# Look for tests related to minimax or streaming
rg -l "minimax|streaming" wren-ai-service --type py | grep -i test

Repository: Canner/WrenAI

Length of output: 169


🏁 Script executed:

#!/bin/bash
# Check if there are any tests that would catch this signature issue
rg -n "streaming_callback" wren-ai-service --type py | grep -i test

Repository: Canner/WrenAI

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check test_minimax.py content
cat wren-ai-service/tests/pytest/providers/llm/test_minimax.py

Repository: Canner/WrenAI

Length of output: 9272


Fix the streaming callback type signature mismatch.

The streaming_callback parameter is declared with type hint Callable[[StreamingChunk], None] (line 85) but is called with two arguments: streaming_callback(chunk_delta, query_id) (line 163). This type signature is incorrect and creates a contract violation. All pipeline implementations (sql_answer.py, user_guide_assistance.py, etc.) define callbacks accepting two arguments (chunk, query_id), confirming the actual signature should be Callable[[StreamingChunk, str], None]. Update the type hint on line 85 to match the actual usage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/providers/llm/minimax.py` around lines 155 - 167, The
streaming_callback type declaration is wrong: update the type hint for the
streaming_callback parameter in minimax.py from Callable[[StreamingChunk], None]
to Callable[[StreamingChunk, str], None] (or Optional[Callable[[StreamingChunk,
str], None]] if currently optional) so it matches actual usage where
streaming_callback(chunk_delta, query_id) is called; ensure the function
signature that declares streaming_callback and any related import/typing
(StreamingChunk, Callable/Optional) are adjusted accordingly.

- Add MiniMax-M2.7 and MiniMax-M2.7-highspeed to model list
- Set MiniMax-M2.7 as default model
- Keep all previous models (M2.5, M2.5-highspeed) as alternatives
- Update config example and documentation
- Update related unit tests
@octo-patch octo-patch changed the title feat(wren-ai-service): add MiniMax as a first-class LLM provider feat(wren-ai-service): add MiniMax as LLM provider with M2.7 default Mar 18, 2026
- Add MiniMax-M2.7 and MiniMax-M2.7-highspeed to model list
- Set MiniMax-M2.7 as default model
- Keep all previous models (M2.5, M2.5-highspeed) as alternatives
- Update config example and documentation
- Update related unit tests
@octo-patch octo-patch changed the title feat(wren-ai-service): add MiniMax as LLM provider with M2.7 default feat(wren-ai-service): add MiniMax as a first-class LLM provider with M2.7 default Mar 18, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
wren-ai-service/src/providers/llm/minimax.py (1)

83-88: ⚠️ Potential issue | 🟠 Major

Fix the streaming callback contract.

Line 87 advertises a one-argument callback, but Line 165 always calls it with (chunk_delta, query_id), and Line 112 allows that second argument to be None. A caller that follows the current annotation will crash on the first streamed chunk.

💡 Suggested fix
-        streaming_callback: Optional[Callable[[StreamingChunk], None]] = None,
+        streaming_callback: Optional[
+            Callable[[StreamingChunk, Optional[str]], None]
+        ] = None,

Also applies to: 112-112, 165-165

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/providers/llm/minimax.py` around lines 83 - 88, The
streaming callback type in get_generator is wrong: update the signature for
streaming_callback to accept two parameters (StreamingChunk and Optional[str])
and ensure all call sites use that contract; specifically change the annotation
on get_generator's streaming_callback to Callable[[StreamingChunk,
Optional[str]], None] (or Callable[[StreamingChunk, str], None] if you decide
query_id must never be None) and ensure the invocation that calls
streaming_callback(chunk_delta, query_id) and any places that pass None (line
112) are adjusted to match the chosen contract so callers won't crash.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@wren-ai-service/src/providers/llm/minimax.py`:
- Around line 66-71: Update the warning in minimax.py to reference the actual
environment variable name used (api_key_name) instead of hardcoding
"MINIMAX_API_KEY": when api_key is missing, include api_key_name in the
logger.warning message so operators see which env var to set; locate the
api_key/api_key_name logic and the logger.warning call in the module and change
the message to mention the resolved api_key_name (or its default) for accurate
troubleshooting.
- Around line 103-105: The backoff retry decorator currently wraps the whole
streaming coroutine (using `@backoff.on_exception`(backoff.expo,
openai.APIError,...)), causing retries to replay emitted chunks and to retry
non-transient failures; change this by removing the decorator from the streaming
callback and instead wrap only the actual API request invocation (the openai API
call at lines ~149-154) with a retry limited to transient SDK/network errors
(e.g., openai.APIError or a transient-only predicate), leaving the streaming
loop that yields chunks (the loop around lines ~161-165) outside of any retry
logic so already-emitted chunks are not replayed; also add a pytest that
simulates a stream that fails mid-stream to assert no duplicate/replayed chunks
on retry to prevent regression.

---

Duplicate comments:
In `@wren-ai-service/src/providers/llm/minimax.py`:
- Around line 83-88: The streaming callback type in get_generator is wrong:
update the signature for streaming_callback to accept two parameters
(StreamingChunk and Optional[str]) and ensure all call sites use that contract;
specifically change the annotation on get_generator's streaming_callback to
Callable[[StreamingChunk, Optional[str]], None] (or Callable[[StreamingChunk,
str], None] if you decide query_id must never be None) and ensure the invocation
that calls streaming_callback(chunk_delta, query_id) and any places that pass
None (line 112) are adjusted to match the chosen contract so callers won't
crash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 010cafd4-cfeb-4afd-9880-8cea5d1ac339

📥 Commits

Reviewing files that changed from the base of the PR and between d7d12dd and bdc0ae5.

📒 Files selected for processing (4)
  • wren-ai-service/docs/config_examples/README.md
  • wren-ai-service/docs/config_examples/config.minimax.yaml
  • wren-ai-service/src/providers/llm/minimax.py
  • wren-ai-service/tests/pytest/providers/llm/test_minimax.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • wren-ai-service/docs/config_examples/README.md

Comment on lines +66 to +71
api_key = os.getenv(api_key_name) if api_key_name else None
if not api_key:
logger.warning(
"MINIMAX_API_KEY is not set. MiniMax provider will not work "
"until a valid API key is provided."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log the configured env var name.

When api_key_name is overridden, Lines 68-70 still tell operators to set MINIMAX_API_KEY, which points them at the wrong variable during troubleshooting.

💡 Suggested fix
-        if not api_key:
+        if not api_key:
+            missing_key_name = api_key_name or "MiniMax API key"
             logger.warning(
-                "MINIMAX_API_KEY is not set. MiniMax provider will not work "
-                "until a valid API key is provided."
+                "%s is not set. MiniMax provider will not work until a valid "
+                "API key is provided.",
+                missing_key_name,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/providers/llm/minimax.py` around lines 66 - 71, Update
the warning in minimax.py to reference the actual environment variable name used
(api_key_name) instead of hardcoding "MINIMAX_API_KEY": when api_key is missing,
include api_key_name in the logger.warning message so operators see which env
var to set; locate the api_key/api_key_name logic and the logger.warning call in
the module and change the message to mention the resolved api_key_name (or its
default) for accurate troubleshooting.

Comment on lines +103 to +105
@backoff.on_exception(
backoff.expo, openai.APIError, max_time=60.0, max_tries=3
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd -t f "minimax.py" wren-ai-service/

Repository: Canner/WrenAI

Length of output: 160


🏁 Script executed:

wc -l wren-ai-service/src/providers/llm/minimax.py

Repository: Canner/WrenAI

Length of output: 105


🏁 Script executed:

cat -n wren-ai-service/src/providers/llm/minimax.py

Repository: Canner/WrenAI

Length of output: 8708


🏁 Script executed:

cat -n wren-ai-service/tests/pytest/providers/llm/test_minimax.py

Repository: Canner/WrenAI

Length of output: 11267


🏁 Script executed:

rg -r "streaming" wren-ai-service/tests/ -A 2 -B 2 --type py

Repository: Canner/WrenAI

Length of output: 39


🏁 Script executed:

rg "streaming_callback" wren-ai-service/src/ -A 3 -B 2 --type py

Repository: Canner/WrenAI

Length of output: 12799


🏁 Script executed:

find wren-ai-service/tests -name "*.py" -type f -exec grep -l "streaming" {} \;

Repository: Canner/WrenAI

Length of output: 115


🏁 Script executed:

sed -n '1,150p' wren-ai-service/src/providers/llm/litellm.py | head -100

Repository: Canner/WrenAI

Length of output: 3662


🏁 Script executed:

rg "@backoff" wren-ai-service/src/providers/llm/ -A 5 -B 2 --type py

Repository: Canner/WrenAI

Length of output: 1316


🏁 Script executed:

rg "streaming_callback" wren-ai-service/src/providers/llm/litellm.py -A 10 -B 5 --type py | head -80

Repository: Canner/WrenAI

Length of output: 2684


🏁 Script executed:

cd wren-ai-service && just test 2>&1 | head -50

Repository: Canner/WrenAI

Length of output: 99


🏁 Script executed:

ls -la wren-ai-service/ | grep -E "(justfile|pytest|Makefile)"

Repository: Canner/WrenAI

Length of output: 39


🏁 Script executed:

find wren-ai-service -name "justfile" -o -name "pyproject.toml" -o -name "setup.py" | head -5

Repository: Canner/WrenAI

Length of output: 87


🏁 Script executed:

cat wren-ai-service/pyproject.toml | grep -A 20 "\[tool.pytest"

Repository: Canner/WrenAI

Length of output: 39


🏁 Script executed:

cat wren-ai-service/pyproject.toml | head -100

Repository: Canner/WrenAI

Length of output: 2047


🏁 Script executed:

grep -r "streaming" wren-ai-service/tests/pytest/providers/llm/ -A 5 -B 2 --type py

Repository: Canner/WrenAI

Length of output: 174


Move backoff retry logic outside the streaming callback path.

Lines 103-105 wrap the entire coroutine in @backoff.on_exception(..., openai.APIError, ...). If the stream fails after line 165 emits chunks, retry replays the prefix and corrupts the caller's streamed output. This also unnecessarily retries permanent request/auth failures. Refactor to limit retries to transient SDK errors only—the API call (lines 149-154) should be wrapped separately from the streaming loop (lines 161-165). Add a streaming pytest test to prevent regression.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@wren-ai-service/src/providers/llm/minimax.py` around lines 103 - 105, The
backoff retry decorator currently wraps the whole streaming coroutine (using
`@backoff.on_exception`(backoff.expo, openai.APIError,...)), causing retries to
replay emitted chunks and to retry non-transient failures; change this by
removing the decorator from the streaming callback and instead wrap only the
actual API request invocation (the openai API call at lines ~149-154) with a
retry limited to transient SDK/network errors (e.g., openai.APIError or a
transient-only predicate), leaving the streaming loop that yields chunks (the
loop around lines ~161-165) outside of any retry logic so already-emitted chunks
are not replayed; also add a pytest that simulates a stream that fails
mid-stream to assert no duplicate/replayed chunks on retry to prevent
regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant