Skip to content

feat: add agentic RAG with iterative sub-agent search#180

Merged
emilradix merged 17 commits intomainfrom
mm-agentic_rag
Mar 11, 2026
Merged

feat: add agentic RAG with iterative sub-agent search#180
emilradix merged 17 commits intomainfrom
mm-agentic_rag

Conversation

@MattiaMolon
Copy link
Copy Markdown
Contributor

Summary

  • Introduces a search sub-agent that iteratively queries the knowledge base to gather sufficient context before answering, replacing the single-shot retrieval approach
  • Parallel tool calls are executed concurrently via ThreadPoolExecutor, with context proportionally allocated across results within the model's window
  • Adds async_rag to mirror the sync rag function with full async streaming support

Key changes

  • Sub-agent loop (_run_tool): when search_knowledge_base is called, a dedicated agent with SEARCH_AGENT_PROMPT iterates up to config.agentic_iterations, deduplicating chunk spans by chunk ID across iterations
  • Token budget fix: buffer now counts all messages (not just last per role) and reserves space for the LLM output, preventing context overflow
  • Robustness fixes: safe fallback in _clip always preserves at least the last user message; _get_tools handles None content; _limit_chunkspans guards against zero-token edge case
  • Prompt improvements: SEARCH_AGENT_PROMPT guides the sub-agent with concrete good/bad query examples; NO_TOOLS_FOLLOW_UP_PROMPT prevents tool calls in the final answer step

Tests

  • test_rag_manual — manual retrieval via retrieve_context + add_context, asserts no tool calls are made
  • test_rag_auto_with_retrieval — agentic RAG on a question that requires retrieval; checks tool messages appear and on_retrieval callback is populated
  • test_rag_auto_without_retrieval — agentic RAG on a trivial question; verifies no retrieval occurs
  • test_retrieve_context_self_queryretrieve_context with self_query=True; asserts metadata filters are applied correctly
  • test_agentic_search_threads_metadata_filter_to_nested_tool_calls — unit test verifying metadata_filter is forwarded from search_knowledge_base down into nested query_knowledge_base calls inside _run_tool
  • test_query_tool_call_passes_metadata_filter_to_retrieve_context — unit test verifying _run_tool passes metadata_filter to retrieve_context for direct query_knowledge_base calls
  • test_sub_agent_deduplicates_chunk_spans_by_chunk_id — unit test verifying that fully-redundant spans are dropped across sub-agent iterations while partially-novel spans are kept
  • test_rag_does_not_mutate_caller_messages_on_stream_error — ensures the caller's messages list is unchanged when an exception is raised mid-stream

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades RAGLite’s RAG pipeline from single-shot retrieval to an agentic, iterative retrieval flow, adds token-budget safeguards to reduce context overflows, and introduces an async streaming variant to mirror the sync API.

Changes:

  • Add an iterative “search sub-agent” path for search_knowledge_base that repeatedly calls nested retrieval tools and deduplicates chunk spans across iterations.
  • Fix context-window budgeting by counting all message tokens, reserving output space, and improving clipping fallbacks.
  • Add streaming helpers plus async_rag, and propagate metadata_filter through tool execution paths.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_rag.py Adds unit tests for agentic search iteration, metadata_filter forwarding, dedup behavior, and message mutation safety on stream errors.
src/raglite/_rag.py Implements the agentic sub-agent loop, parallel tool execution, token budgeting / clipping updates, and new streaming + async RAG flow.
src/raglite/_search.py Minor refactor in query-adapter conditional; passes drop_params=True for self-query extraction robustness.
src/raglite/_litellm.py Removes global litellm.drop_params default.
src/raglite/_config.py Adds agentic_iterations configuration.
src/raglite/_chatml_function_calling.py Allows tool_choice="required" in streaming function-calling path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@emilradix emilradix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Can you double check the github copilot comments as well.

- import MetadataFilter at runtime
- remove redundant assert tool
- added clip to subagent messages
- fix bug that allowed empty tool call list to run LLM completion
Copy link
Copy Markdown
Collaborator

@emilradix emilradix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@emilradix emilradix merged commit 7c910df into main Mar 11, 2026
4 checks passed
@emilradix emilradix deleted the mm-agentic_rag branch March 11, 2026 14:29
@emilradix emilradix changed the title Agentic RAG with iterative sub-agent search feat: add agentic RAG with iterative sub-agent search Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants