feat: add agentic RAG with iterative sub-agent search by MattiaMolon · Pull Request #180 · superlinear-ai/raglite

MattiaMolon · 2026-03-03T15:01:40Z

Summary

Introduces a search sub-agent that iteratively queries the knowledge base to gather sufficient context before answering, replacing the single-shot retrieval approach
Parallel tool calls are executed concurrently via ThreadPoolExecutor, with context proportionally allocated across results within the model's window
Adds async_rag to mirror the sync rag function with full async streaming support

Key changes

Sub-agent loop (_run_tool): when search_knowledge_base is called, a dedicated agent with SEARCH_AGENT_PROMPT iterates up to config.agentic_iterations, deduplicating chunk spans by chunk ID across iterations
Token budget fix: buffer now counts all messages (not just last per role) and reserves space for the LLM output, preventing context overflow
Robustness fixes: safe fallback in _clip always preserves at least the last user message; _get_tools handles None content; _limit_chunkspans guards against zero-token edge case
Prompt improvements: SEARCH_AGENT_PROMPT guides the sub-agent with concrete good/bad query examples; NO_TOOLS_FOLLOW_UP_PROMPT prevents tool calls in the final answer step

Tests

test_rag_manual — manual retrieval via retrieve_context + add_context, asserts no tool calls are made
test_rag_auto_with_retrieval — agentic RAG on a question that requires retrieval; checks tool messages appear and on_retrieval callback is populated
test_rag_auto_without_retrieval — agentic RAG on a trivial question; verifies no retrieval occurs
test_retrieve_context_self_query — retrieve_context with self_query=True; asserts metadata filters are applied correctly
test_agentic_search_threads_metadata_filter_to_nested_tool_calls — unit test verifying metadata_filter is forwarded from search_knowledge_base down into nested query_knowledge_base calls inside _run_tool
test_query_tool_call_passes_metadata_filter_to_retrieve_context — unit test verifying _run_tool passes metadata_filter to retrieve_context for direct query_knowledge_base calls
test_sub_agent_deduplicates_chunk_spans_by_chunk_id — unit test verifying that fully-redundant spans are dropped across sub-agent iterations while partially-novel spans are kept
test_rag_does_not_mutate_caller_messages_on_stream_error — ensures the caller's messages list is unchanged when an exception is raised mid-stream

(cherry picked from commit 2fc5d7d)

…bility (cherry picked from commit d46fcf6)

…alls

Copilot

Pull request overview

This PR upgrades RAGLite’s RAG pipeline from single-shot retrieval to an agentic, iterative retrieval flow, adds token-budget safeguards to reduce context overflows, and introduces an async streaming variant to mirror the sync API.

Changes:

Add an iterative “search sub-agent” path for search_knowledge_base that repeatedly calls nested retrieval tools and deduplicates chunk spans across iterations.
Fix context-window budgeting by counting all message tokens, reserving output space, and improving clipping fallbacks.
Add streaming helpers plus async_rag, and propagate metadata_filter through tool execution paths.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/test_rag.py`	Adds unit tests for agentic search iteration, metadata_filter forwarding, dedup behavior, and message mutation safety on stream errors.
`src/raglite/_rag.py`	Implements the agentic sub-agent loop, parallel tool execution, token budgeting / clipping updates, and new streaming + async RAG flow.
`src/raglite/_search.py`	Minor refactor in query-adapter conditional; passes `drop_params=True` for self-query extraction robustness.
`src/raglite/_litellm.py`	Removes global `litellm.drop_params` default.
`src/raglite/_config.py`	Adds `agentic_iterations` configuration.
`src/raglite/_chatml_function_calling.py`	Allows `tool_choice="required"` in streaming function-calling path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/raglite/_rag.py

tests/test_rag.py

emilradix

A few comments. Can you double check the github copilot comments as well.

src/raglite/_rag.py

- import MetadataFilter at runtime - remove redundant assert tool - added clip to subagent messages - fix bug that allowed empty tool call list to run LLM completion

emilradix

LGTM

MattiaMolon and others added 16 commits February 17, 2026 14:08

feat: allow agentic behavior with iterable tool calling

82dac31

feat: added system message injection to allow for smoother agentic rag

9eda9c8

feat: spawn subagent for search queries

7c1b08a

feat: first implementation of sub-agent search

a3edec0

feat: merge main

cddfaef

fix: make auto-retrieval assertion robust to parallel tool calls

313b675

(cherry picked from commit 2fc5d7d)

fix: drop unsupported params in self-query LLM call for GPT-5 compati…

025a285

…bility (cherry picked from commit d46fcf6)

fix: fixed token budget miscalculation

abb9e74

fix: pass metadata_filter to run_tools

f61747e

fix: deduplicate chunk spans based on chunk ids

ef3ebf5

fix: separate live messages from working messages object during sub c…

fa7c163

…alls

feat: updated async_rag to match rag

cff1501

feat: code clean up

120192a

fix: small robustness fixes for edge cases

96aa82f

fix: fix required tool call in litellm

9f40573

fix: adapted tool description to make it more robust to tests

76d5f84

MattiaMolon requested review from emilradix and r-dh March 5, 2026 10:14

emilradix assigned emilradix and unassigned emilradix Mar 9, 2026

emilradix requested a review from Copilot March 9, 2026 10:40

Copilot started reviewing on behalf of emilradix March 9, 2026 10:41 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

src/raglite/_rag.py Outdated Show resolved Hide resolved

src/raglite/_rag.py Show resolved Hide resolved

tests/test_rag.py Show resolved Hide resolved

tests/test_rag.py Outdated Show resolved Hide resolved

emilradix reviewed Mar 9, 2026

View reviewed changes

src/raglite/_rag.py Outdated Show resolved Hide resolved

src/raglite/_rag.py Outdated Show resolved Hide resolved

fix: resolved pr comments

ac5a3a1

- import MetadataFilter at runtime - remove redundant assert tool - added clip to subagent messages - fix bug that allowed empty tool call list to run LLM completion

emilradix approved these changes Mar 11, 2026

View reviewed changes

emilradix merged commit 7c910df into main Mar 11, 2026
4 checks passed

emilradix deleted the mm-agentic_rag branch March 11, 2026 14:29

emilradix changed the title ~~Agentic RAG with iterative sub-agent search~~ feat: add agentic RAG with iterative sub-agent search Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add agentic RAG with iterative sub-agent search#180

feat: add agentic RAG with iterative sub-agent search#180
emilradix merged 17 commits intomainfrom
mm-agentic_rag

MattiaMolon commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emilradix left a comment

Uh oh!

Uh oh!

Uh oh!

emilradix left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MattiaMolon commented Mar 3, 2026

Summary

Key changes

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emilradix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

emilradix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants