Skip to content

irahardianto/qurio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

114 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Qurio Logo

Go Vue Python Docker MCP License

codecov Semgrep CodeQL Dependabot

The Open Source Knowledge Engine for AI Agents
Built for localhost. Grounded in truth.


πŸ“– About

Qurio is a self-hosted, open-source ingestion and retrieval engine that functions as a local Shared Library for AI coding assistants (like Gemini-CLI, Claude Code, Cursor, Windsurf, or custom scripts).

Unlike cloud-based RAG solutions that introduce latency and privacy risks, Qurio runs locally to ingest your handpicked heterogeneous documentation (web crawls, PDFs, Markdown) and serves it directly to your agents via the Model Context Protocol (MCP). This ensures your AI writes better code faster using only the context you trust.

Qurio features a custom structural chunker that respects code blocks, API definitions, and config files, preserving full code blocks and syntaxes.

Why Qurio?

  • Privacy First: Your data stays on your machine (localhost).
  • Precision: Retrieves grounded "truth" to prevent AI hallucinations.
  • Speed: Deploys in minutes with docker-compose.
  • Open Standards: Built on MCP, Weaviate, and PostgreSQL.

✨ Key Features

  • 🌐 Universal Ingestion: Crawl documentation sites or upload files (PDF, DOCX, MD).
  • 🧠 Hybrid Search: Configurable BM25 keyword search with Vector embeddings for high-recall retrieval.
  • 🎯 Configurable Reranking: Integrate Jina AI or Cohere for precision tuning.
  • πŸ”Œ Native MCP Support: Exposes a standard JSON-RPC 2.0 endpoint for seamless integration with AI coding assistants.
  • πŸ•ΈοΈ Smart Crawling: Recursive web crawling with depth control, regex exclusions, respect robot.txt, sitemap and llms.txt llms-full.txt support.
  • πŸ“„ OCR Pipeline: Automatically extracts text from scanned PDFs and images via Docling.
  • πŸ–₯️ Admin Dashboard: Manage sources, view ingestion status, and debug queries via a clean Vue.js interface.

πŸ—οΈ Architecture

Qurio is built as a set of microservices orchestrated by Docker Compose:

  • Backend (Go): Core orchestration, API, and MCP server.
  • Frontend (Vue.js): User interface for managing sources and settings.
  • Ingestion Worker (Python): Async ingestion engine handling crawling (crawl4ai) and parsing (docling).
  • Vector Store (Weaviate): Stores embeddings and handles hybrid search.
  • Database (PostgreSQL): Stores metadata, job status, and configuration.
  • Queue (NSQ): Manages asynchronous ingestion tasks.

πŸš€ Getting Started

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/irahardianto/qurio.git
    cd qurio
  2. Configure Environment: Copy the example environment file and add your API key.

    cp .env.example .env
  3. Start the System:

    docker-compose up -d

    Wait a minute for all services (Weaviate, Postgres) to initialize.

  4. Access the Dashboard: Open http://localhost:3000 in your browser.

  5. Add API Keys: Access http://localhost:3000/settings page in the dashboard, and add your Gemini and JinaAI/Cohoere(optional) API Keys

Configuration

Configuration is managed via the Settings page in the UI or environment variables.

Variable Description Default
GEMINI_API_KEY Key for Google Gemini (Embeddings) Required
RERANK_PROVIDER none, jina, cohere none
RERANK_API_KEY API Key for selected provider -
SEARCH_ALPHA Hybrid search balance (0.0=Keyword, 1.0=Vector) 0.5
SEARCH_TOP_K Max results to return 5

πŸ’‘ Usage

Tip

Unlock the full potential of your Agent
Check out the Agent Prompting Guide for best practices, workflow examples, and system prompt templates (CLAUDE.md, GEMINI.md) to paste into your project.

1. Add Data Sources

Navigate to the Admin Dashboard (http://localhost:3000) and click "Add Source".

  • Web Crawl: Enter a documentation URL (e.g., https://docs.docker.com). Configure depth and exclusion patterns.
  • File Upload: Drag and drop PDFs or Markdown files.

2. Connect Your AI Agent (MCP)

Configure your MCP-enabled editor (like Cursor/Gemini CLI) to connect to Qurio.

Add the following to your MCP settings:

{
  "mcpServers": {
    "qurio": {
      "httpUrl": "http://localhost:8081/mcp"
    }
  }
}

Note: Qurio uses a stateless, streamable HTTP transport at http://localhost:8081/mcp. Use a client that supports native HTTP MCP connections.

3. Query

Ask your AI agent a question. It will now have access to the documentation you indexed!

"How do I configure a healthcheck in Docker Compose?"

4. Available Tools

Once connected, your agent will have access to the following tools:

Tool Description
qurio_search Search your knowledge base. Supports hybrid search (keywords + vectors). Use this to find relevant documentation or code examples.
qurio_list_sources List all available data sources. Useful to see what documentation is currently indexed.
qurio_list_pages List pages within a source. Helpful for exploring the structure of a documentation site.
qurio_read_page Read a full page. Retrieves the complete content of a specific document or web page found via search or listing.

5. Roadmap

  • Rework crawler & embedder parallelization
  • Migrate to Streamable HTTP
  • Supports multiple different models beyond Gemini
  • Supports more granular i.e. section by section page retrieval

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Built with ❀️ for the Developer Community

About

Self-hosted RAG engine for AI coding assistants. Ingests technical docs & code repositories locally with structure-aware chunking. Serves grounded context via MCP to prevent hallucinations in software development workflows.

Topics

Resources

License

Stars

Watchers

Forks

Contributors