fix(core): handle non-dict elements in merge_lists index lookup#35265
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a runtime TypeError in merge_lists when merging streaming content lists that contain mixed element types (e.g., str text segments alongside dict citation/reference objects).
Changes:
- Guarded the
index-based lookup inmerge_liststo only inspect dict elements in the existing merged list. - Added parametrized unit tests covering mixed
str/dictlist merging with both matching and non-matchingindexvalues.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
libs/core/langchain_core/utils/_merge.py |
Prevents invalid "index" in e_left / e_left["index"] access when merged contains non-dict elements. |
libs/core/tests/unit_tests/utils/test_utils.py |
Adds regression tests for mixed str + dict content lists (inline citations scenario). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [{"no_index": "b"}], | ||
| [{"no_index": "a"}, {"no_index": "b"}], | ||
| ), | ||
| # Mixed str and dict elements with index (Mistral inline citations) |
There was a problem hiding this comment.
These tests pass on master.
There was a problem hiding this comment.
Good catch! Updated the test strings to contain "index" as a substring (e.g., "see index 1 for details", "reindex the data") so they properly trigger the TypeError on master. The bug requires "index" in some_string to return True (substring match), which then causes some_string["index"] to fail.
Fixes langchain-ai#35259 When streaming Mistral responses with inline citations, the content list can contain a mix of str and dict elements. The merge_lists function assumed all elements would be dicts when checking for 'index', causing a TypeError when iterating over str elements ('string indices must be integers, not str'). Add isinstance(e_left, dict) check before accessing dict keys in the list comprehension that finds merge targets by index.
2a6d5dc to
0a05585
Compare
|
Good catch! Updated the test strings to contain "index" as a substring (e.g., |
Merging this PR will improve performance by 11.37%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | WallTime | test_async_callbacks_in_sync |
23.5 ms | 21.1 ms | +11.37% |
Comparing nileshhadalgi016:nileshhadalgi016/fix-merge-lists-type-error (7575656) with master (b004103)
Footnotes
-
22 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Description
Fixes #35259
When streaming Mistral responses with inline citations, the content list can contain a mix of
stranddictelements. For example:[ "start answer...", {"type": "reference", "index": 0, "reference_ids": ["iKcb2CAQ7"]}, "other answer..." ]The
merge_listsfunction in_merge.pyassumed all elements in themergedlist would be dicts when searching for a matchingindex. When it encountered astrelement, the expression"index" in e_leftwould check for substring containment (valid but wrong), and thene_left["index"]would fail withTypeError: string indices must be integers, not 'str'.Fix
Added an
isinstance(e_left, dict)guard in the list comprehension (line 114) that finds merge targets by index. This ensures only dict elements are checked for matchingindexkeys, skipping strings and other non-dict types.Tests
Added 2 new parametrized test cases to
test_merge_lists:Issue
This affects any Mistral model returning inline citations (e.g.,
mistral-medium-2505) when used with streaming and tools like Tavily search.Dependencies
None