feat: add agentic RAG with iterative sub-agent search#180
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR upgrades RAGLite’s RAG pipeline from single-shot retrieval to an agentic, iterative retrieval flow, adds token-budget safeguards to reduce context overflows, and introduces an async streaming variant to mirror the sync API.
Changes:
- Add an iterative “search sub-agent” path for
search_knowledge_basethat repeatedly calls nested retrieval tools and deduplicates chunk spans across iterations. - Fix context-window budgeting by counting all message tokens, reserving output space, and improving clipping fallbacks.
- Add streaming helpers plus
async_rag, and propagatemetadata_filterthrough tool execution paths.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_rag.py |
Adds unit tests for agentic search iteration, metadata_filter forwarding, dedup behavior, and message mutation safety on stream errors. |
src/raglite/_rag.py |
Implements the agentic sub-agent loop, parallel tool execution, token budgeting / clipping updates, and new streaming + async RAG flow. |
src/raglite/_search.py |
Minor refactor in query-adapter conditional; passes drop_params=True for self-query extraction robustness. |
src/raglite/_litellm.py |
Removes global litellm.drop_params default. |
src/raglite/_config.py |
Adds agentic_iterations configuration. |
src/raglite/_chatml_function_calling.py |
Allows tool_choice="required" in streaming function-calling path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
emilradix
reviewed
Mar 9, 2026
Collaborator
emilradix
left a comment
There was a problem hiding this comment.
A few comments. Can you double check the github copilot comments as well.
- import MetadataFilter at runtime - remove redundant assert tool - added clip to subagent messages - fix bug that allowed empty tool call list to run LLM completion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ThreadPoolExecutor, with context proportionally allocated across results within the model's windowasync_ragto mirror the syncragfunction with full async streaming supportKey changes
_run_tool): whensearch_knowledge_baseis called, a dedicated agent withSEARCH_AGENT_PROMPTiterates up toconfig.agentic_iterations, deduplicating chunk spans by chunk ID across iterations_clipalways preserves at least the last user message;_get_toolshandlesNonecontent;_limit_chunkspansguards against zero-token edge caseSEARCH_AGENT_PROMPTguides the sub-agent with concrete good/bad query examples;NO_TOOLS_FOLLOW_UP_PROMPTprevents tool calls in the final answer stepTests
test_rag_manual— manual retrieval viaretrieve_context+add_context, asserts no tool calls are madetest_rag_auto_with_retrieval— agentic RAG on a question that requires retrieval; checks tool messages appear andon_retrievalcallback is populatedtest_rag_auto_without_retrieval— agentic RAG on a trivial question; verifies no retrieval occurstest_retrieve_context_self_query—retrieve_contextwithself_query=True; asserts metadata filters are applied correctlytest_agentic_search_threads_metadata_filter_to_nested_tool_calls— unit test verifyingmetadata_filteris forwarded fromsearch_knowledge_basedown into nestedquery_knowledge_basecalls inside_run_tooltest_query_tool_call_passes_metadata_filter_to_retrieve_context— unit test verifying_run_toolpassesmetadata_filtertoretrieve_contextfor directquery_knowledge_basecallstest_sub_agent_deduplicates_chunk_spans_by_chunk_id— unit test verifying that fully-redundant spans are dropped across sub-agent iterations while partially-novel spans are kepttest_rag_does_not_mutate_caller_messages_on_stream_error— ensures the caller'smessageslist is unchanged when an exception is raised mid-stream