Releases: barkain/agentlib
v1.8.0 — Unified Navigation Architecture
What's New
Unified Navigation (3 files instead of 6)
library_index.json— one file for entire library: concepts, aliases, related concepts, and pattern fingerprints (merged from separatepattern_index.json)nav.json(per-book) — structure + chunk metadata + concepts in one file (replacesmanifest.compact.json,concepts.json,chunk_index.json)manifest.json(per-book) — full archive, unchanged
MCP Server
- Plugin now registers an MCP server with 6 tools:
browse_library,open_book,search_library,search_concepts,preview_chunks,read_chunks - Skill uses MCP tools directly — no sub-agent delegation, no file reads
- Typical query: 3 MCP calls → answer in ~1.5 minutes
Smarter Search
search_librarysearches concepts, aliases, related concepts, AND pattern tags in one callopen_booknow returns chunk IDs per section — no format guessing- Skill prompt: 1 search attempt, then browse chapters (no retry loops)
Pattern Discovery
- Patterns generated organically by LLM (removed hardcoded 43-item seed vocabulary)
- Cross-domain associative recall integrated into
search_library - Fuzzy pattern merging across books for consistency
Bug Fixes
ConceptEntrynow carriesrelatedfield (was silently dropped)- Chunk token counts preserved on non-force re-runs
explore_patternsresult cap (merged intosearch_library)- Compact manifest schema restored to integer chunk counts
- Corpus concept extraction now extracts
related(parity with books)
Agent Improvements
- Library-researcher agent upgraded to Sonnet (was Haiku)
maxTurnsincreased to 25 (was 15)- Output capped at 2000 chars to prevent rejection by main agent
- Mandatory preview-before-read enforcement in prompts
Quality of Life
- Portable MCP server config (
/usr/bin/env uv) - Updated README with MCP tool workflow and streamlined diagrams
- Plugin version bumped to 1.8.0
v1.7.0 — Proactive Library Usage & Refined Positioning
What's New
Proactive Library Usage
- The
agentlib-knowledgeskill now auto-triggers on domain-specific coding tasks — not just explicit research questions - Expanded trigger patterns: writing code involving domain-specific parameters, protocols, standards, or configurations; encountering technical terms from the library; methodology and best practices questions
- Narrowed exclusions: "general programming tasks" instead of blanket "code editing" exclusion, so domain-specific coding triggers the library while generic tasks don't
Refined Positioning & Documentation
- New README pitch: "Curate your knowledge library. Your agent works from sources you trust." — focused on curation, persistent knowledge, proactive integration, and citable answers
- Replaced the O(n²) token-efficiency opening with a user-centric value proposition
- Four key pillars: your sources/your curation, always available, proactive not reactive, citable answers
Demo Screenshots
- Added proactive query demo: agent automatically consults library when asked about BOM maturity dimensions
- Added collapsible library-researcher navigation view showing the full tool chain (NAVIGATION.md → concepts.json → chunks → synthesized answer)
Navigation Diagram
- Redesigned "How agents navigate the library" diagram from vertical ASCII to horizontal Mermaid flowchart — wider and more readable
Hero Animation
- README hero image replaced with looping GIF demo (800px, 15fps)
Session Summary
This release caps a productive session that also shipped v1.5.0 (content-aware chunking, PDF table/image extraction) and v1.6.0 (parallel ingestion, auto-recovery, large book support). Together these three releases transform AgentLib from a basic chunking pipeline into a production-ready knowledge layer for AI agents.
Across v1.5.0–v1.7.0:
- Parser: PDF tables extracted as clean markdown, images extracted with vision-based summarization
- Chunker: Tables and code fences kept atomic with smart splitting
- Ingestion: 10x faster via parallel summarization, auto-recovery on failure, batched concept extraction
- Agent integration: Proactive library usage without explicit user commands
- Testing: 98 tests passing (up from ~34 at session start)
- Issues: #8 implemented, #9-12 closed as not needed, #16 fixed
Full Changelog: v1.6.0...v1.7.0
v1.6.0 — Parallel Ingestion, Auto-Recovery & Large Book Support
What's New
Parallel Chapter Summarization
- Stage 4 (chapter summarization) now runs in parallel using asyncio with a configurable semaphore
- Default concurrency: 10 simultaneous LLM calls
- Configurable via
AGENTLIB_CONCURRENCYenv var - ~10x faster ingestion for large books (e.g., 1134 chapters: ~5 min instead of ~50 min)
Batched Concept Extraction
extract_concepts()now processes chapters in batches of 50 instead of sending all chapter summaries in one LLM call- Prevents context window overflow on large books (previously crashed on books with 1000+ chapters)
- Concepts deduplicated across batches (normalized keys, merged locations and aliases)
- Global cap of 50 concepts per book, ranked by coverage breadth
Auto-Recovery
- Partial manifest saved after stage 4 (chapter summaries with empty concept index)
- On re-run after a stage 5 failure, stages 1-4 are skipped automatically — only concept extraction re-runs
- Stage 5 retries up to 3 times with 30-second delay on API failures (rate limits, overloaded errors)
- Retry scope includes
RuntimeError,TypeError,AttributeError, andValueErrorto handle malformed LLM output
Hero Animation
- README now features a looping GIF demo of AgentLib in action
- Navigation diagram redesigned as horizontal Mermaid flowchart
Bug Fixes
AGENTLIB_CONCURRENCYvalidated: clamped to min 1, fallback to 10 on invalid input- Stage 5 skip detection uses catalog entry (not dict truthiness) to correctly handle empty concept indexes
- Non-dict LLM responses now raise
RuntimeErrorwith clear message instead of silent failures - Server tests no longer fail when
fastmcpdependency is missing
Documentation
- README updated with: content-aware chunking, PDF table/image extraction, parallel ingestion, auto-recovery,
AGENTLIB_CONCURRENCYenv var
Testing
- 98 tests passing (up from 92)
- New tests for: batched concept extraction, deduplication, concept cap, async summarization, semaphore concurrency, parallel timing
Full Changelog: v1.5.0...v1.6.0
v1.5.0 — Content-Aware Chunking & PDF Table/Image Extraction
What's New
Content-Aware Chunk Boundaries
- Tables and fenced code blocks are now detected as atomic structural units and kept together in chunks
- Soft cap 500 tokens (prose), hard cap 1000 tokens (structural blocks)
- Oversized tables split at row boundaries with header propagation into each sub-chunk
- Oversized code fences split at line boundaries with opening/closing markers preserved
PDF Table Extraction
- PyMuPDF's
find_tables()detects tables in PDFs and renders them as markdown pipe tables - Replaces previous whitespace-aligned garbled text with clean
| col | col |format - Phantom empty columns (from PyMuPDF layout misalignment) automatically stripped
- Tables positioned correctly among surrounding text via y-coordinate ordering
Image Extraction & Vision Summarization
- Images extracted from PDFs at parse time, filtered (skips icons < 50px and < 5KB)
[Figure: filename]placeholders inserted in chunk text at correct positions- During chapter summarization, images sent to the LLM as base64 content blocks
- LLM can now describe diagrams and figures in chapter summaries
- Capped at 5 images per chapter to control costs
- Automatic fallback to text-only if the configured model doesn't support vision
Bug Fixes
- Fixed corpus pipeline crash from
parse_pdf()return type change - Fixed
/knowledge→/agentlib-knowledgereferences in ingestion commands
Testing
- 72 tests passing (up from ~34)
- Verified end-to-end on the CycloneDX Authoritative Guide to SBOM
Full Changelog: v1.4.0...v1.5.0
v1.4.0 — Concept Index Aliases
Concept Index Aliases for Semantic Search
Closes #7. The concept index now supports LLM-generated aliases — abbreviations, acronyms, and synonyms — so agents find concepts even when their query doesn't match the primary name.
What's new
- Ingestion-time aliases — the LLM generates 2-3 aliases per concept during ingestion (e.g., "CycloneDX" →
["CDX", "Cyclone DX"]). Stored in manifests andconcepts.json. - Alias-aware search —
search_conceptsand thelibrary-researcheragent check aliases alongside concept names. Zero runtime cost. - Hit-rate metrics — new
concept_search_hits/concept_search_missesfields in benchmark traces, with a post-hoc classifier wired into the scoring CLI. - Navigation diagram — README now includes an ASCII diagram showing how agents navigate the library layers.
Before → After
| Query | Before | After |
|---|---|---|
| "CDX" | ❌ miss → 5-6 calls, ~5k tokens | ✅ hit via alias → 2-3 calls, ~1k tokens |
| "SBOM" | ❌ miss on "Software Bill of Materials" | ✅ hit via alias |
| "Maturity Model" | ❌ miss on "SCVS BOM Maturity Model" | ✅ hit via alias |
Upgrade notes
- Existing books: re-ingest with
--forceto generate aliases. Books without aliases continue to work (empty aliases list). concepts.jsonformat change: now{"concept": {"chunks": [...], "aliases": [...]}}instead of{"concept": [...]}. The library-researcher agent handles both.- Backward compatible: old manifests deserialize cleanly.
Files changed
lib/models.py—aliasesfield onConceptEntryandCorpusConceptEntrylib/storage.py— alias-awaresearch_concepts, updatedwrite_concept_indexlib/summariser.py— updated concept extraction prompt and parsingpreprocessing/books.py,preprocessing/corpus.py— alias passthroughagents/library-researcher.md— updated prompt to check aliasesbenchmark/— hit-rate metrics, classifier, report integrationREADME.md— navigation diagram
v1.3.0 — Corpus Ingestion & Research Agent
What's New
Corpus Ingestion Pipeline
/agentlib:agentlib-ingest-corpus— ingest a folder of scientific PDFs as a paper collection- 7-stage pipeline: discover → extract metadata → parse + chunk → summarise → cluster by topic → build cross-paper concept index → update navigation
- Two-level L0: topic clusters (L0a) → per-cluster paper lists with abstracts (L0b)
- Cross-paper concept index with 15-40 concepts mapped to papers and sections
Library Researcher Agent
library-researchercustom agent runs in an isolated context window- All navigation and chunk reading stays out of the main conversation
- Only the synthesized answer with citations returns — 3k main context per query
- Uses haiku model for fast, cheap research with structured navigation
Skill Improvements
- Renamed to
/agentlib-knowledgefor consistent naming - Skill now delegates to the research agent by default
- Navigation starts with
NAVIGATION.md(covers both books and corpora) - Fallback to direct reads if agent delegation fails
Benchmarks (real measurements)
- Agent delegation: 58% fewer total tokens vs raw PDF reading, 2x faster
- Multi-query session: 7k main context for 2 questions vs 30k+ without agent
- Corpus queries: 57% token reduction vs raw PDFs (33k vs 79k)
- Book queries: 47-82% token reduction with same answer quality
Bug Fixes
- Fixed metadata cache key mismatch in corpus pipeline (Qodo review)
- Fixed concept index degradation on cached re-runs (Qodo review)
- Fixed
~path expansion in research agent context - Fixed author name in plugin metadata
- Removed duplicate corpus entry handling
Documentation
- New corpus walkthrough with Prof. Aharon Davidson physics papers example
- Updated SBOM walkthrough for agent delegation flow
- README updated with corpus support, benchmarks, agent architecture, and examples section
- Added MIT license file
v1.2.2 — New hero image
AgentLib v1.2.2
- Updated hero image
v1.2.1 — Documentation & Polish
AgentLib v1.2.1
Fixes
- Fixed skill loading — added required
namefield to SKILL.md frontmatter - Skill invocation is
/knowledge(not/agentlib:knowledge) - Fixed all command names to plugin-qualified format (
/agentlib:agentlib-ingest-book) - Fixed OWASP walkthrough to use the actual SBOM book with real ingested data
- Corrected PDF download link
- Bumped version to 1.2.x across plugin.json and pyproject.toml
Documentation
- Added SBOM walkthrough example (
examples/sbom-walkthrough.md) with real data - Added all 3 commands to README (ingest, configure, library)
- Documented
/knowledgeexplicit invocation for when you want the book's answer - Removed stale MCP server references
- Descriptive result labels in README
v1.2.0 — Concept Index Fix + Documentation Overhaul
AgentLib v1.2.0
Fixes
- Concept index now populated correctly — chunk IDs are looked up programmatically after LLM extraction, fixing empty
concepts.jsonfor books where section-chunk mapping was incomplete - Skill renamed from
agentlib:agentlibtoagentlib:knowledge— cleaner invocation, no more redundant naming - All documentation aligned with zero-server architecture — removed stale MCP tool/server references from commands and docs
Documentation
- Explicit invocation documented — use
/agentlib:knowledge <question>when you want the book's answer over Claude's training data - Auto-trigger + explicit invocation patterns explained in README
- Skill trigger keywords improved — removed hardcoded domain terms (SBOM, CycloneDX), fixed contradictory read limits
- marketplace.json updated to "Up to 82% fewer tokens"
Code Quality
- Extracted
_section_id_from_chunk()and_title_from_path()helpers - Pre-built chapter→chunks lookup for O(1) concept fallback
- Removed
.scratch/temp files from tracking - Fixed stale
_data_root()docstring
Usage
# Auto-trigger — just ask naturally
What specific actor frameworks does the book mention?
# Explicit — when you want the book's specific answer
/agentlib:knowledge What defensive techniques protect against prompt injection?
Real-World Results
- 87% token reduction (5.2k vs 38.6k) on fresh session test
- 82% reduction confirmed across multiple questions and books
- Correct answers with source citations every time
v1.1.0 — Zero-Server Mode
AgentLib v1.1.0 — Zero-Server Mode
AgentLib no longer requires an MCP server. The agent navigates preprocessed files directly using a skill that teaches the navigation pattern.
What changed
- No MCP server — removed
.mcp.json, no process to manage - File-based navigation — agent reads
catalog.json→concepts.json→chunks/*.mddirectly - Smart skill trigger — keyword-rich description activates automatically for research/knowledge questions
- Compact files on disk —
manifest.compact.json(~500-2k tok) andconcepts.json(~200 tok) written during ingestion - Self-documenting library —
NAVIGATION.mdgenerated in library root
Results
- 58% fewer content tokens vs raw PDF (6.1k vs 14.7k on SBOM maturity question)
- 4 file reads to answer — catalog → concepts → 2 chunks
- Automatic source citations in answers
- Zero infrastructure, zero server management
Architecture (simplified)
AgentLib = Ingestion Pipeline + Skill + File Convention
/agentlib-ingest-book ~/book.pdf— preprocess once- Ask questions — skill teaches the agent to navigate metadata → chunks
- No server, no MCP, no tools — just files + instructions
Also in this release
/agentlib-librarycommand to browse ingested books- Stronger skill trigger keywords (CandleKeep-inspired pattern)
- Token budget enforcement on all navigation layers
Install
claude --plugin-dir ~/path/to/agentlib
/agentlib-ingest-book ~/path/to/book.pdfThen just ask questions naturally.