Skip to content

Releases: barkain/agentlib

v1.8.0 — Unified Navigation Architecture

05 Apr 15:23
11092a0

Choose a tag to compare

What's New

Unified Navigation (3 files instead of 6)

  • library_index.json — one file for entire library: concepts, aliases, related concepts, and pattern fingerprints (merged from separate pattern_index.json)
  • nav.json (per-book) — structure + chunk metadata + concepts in one file (replaces manifest.compact.json, concepts.json, chunk_index.json)
  • manifest.json (per-book) — full archive, unchanged

MCP Server

  • Plugin now registers an MCP server with 6 tools: browse_library, open_book, search_library, search_concepts, preview_chunks, read_chunks
  • Skill uses MCP tools directly — no sub-agent delegation, no file reads
  • Typical query: 3 MCP calls → answer in ~1.5 minutes

Smarter Search

  • search_library searches concepts, aliases, related concepts, AND pattern tags in one call
  • open_book now returns chunk IDs per section — no format guessing
  • Skill prompt: 1 search attempt, then browse chapters (no retry loops)

Pattern Discovery

  • Patterns generated organically by LLM (removed hardcoded 43-item seed vocabulary)
  • Cross-domain associative recall integrated into search_library
  • Fuzzy pattern merging across books for consistency

Bug Fixes

  • ConceptEntry now carries related field (was silently dropped)
  • Chunk token counts preserved on non-force re-runs
  • explore_patterns result cap (merged into search_library)
  • Compact manifest schema restored to integer chunk counts
  • Corpus concept extraction now extracts related (parity with books)

Agent Improvements

  • Library-researcher agent upgraded to Sonnet (was Haiku)
  • maxTurns increased to 25 (was 15)
  • Output capped at 2000 chars to prevent rejection by main agent
  • Mandatory preview-before-read enforcement in prompts

Quality of Life

  • Portable MCP server config (/usr/bin/env uv)
  • Updated README with MCP tool workflow and streamlined diagrams
  • Plugin version bumped to 1.8.0

v1.7.0 — Proactive Library Usage & Refined Positioning

28 Mar 16:55

Choose a tag to compare

What's New

Proactive Library Usage

  • The agentlib-knowledge skill now auto-triggers on domain-specific coding tasks — not just explicit research questions
  • Expanded trigger patterns: writing code involving domain-specific parameters, protocols, standards, or configurations; encountering technical terms from the library; methodology and best practices questions
  • Narrowed exclusions: "general programming tasks" instead of blanket "code editing" exclusion, so domain-specific coding triggers the library while generic tasks don't

Refined Positioning & Documentation

  • New README pitch: "Curate your knowledge library. Your agent works from sources you trust." — focused on curation, persistent knowledge, proactive integration, and citable answers
  • Replaced the O(n²) token-efficiency opening with a user-centric value proposition
  • Four key pillars: your sources/your curation, always available, proactive not reactive, citable answers

Demo Screenshots

  • Added proactive query demo: agent automatically consults library when asked about BOM maturity dimensions
  • Added collapsible library-researcher navigation view showing the full tool chain (NAVIGATION.md → concepts.json → chunks → synthesized answer)

Navigation Diagram

  • Redesigned "How agents navigate the library" diagram from vertical ASCII to horizontal Mermaid flowchart — wider and more readable

Hero Animation

  • README hero image replaced with looping GIF demo (800px, 15fps)

Session Summary

This release caps a productive session that also shipped v1.5.0 (content-aware chunking, PDF table/image extraction) and v1.6.0 (parallel ingestion, auto-recovery, large book support). Together these three releases transform AgentLib from a basic chunking pipeline into a production-ready knowledge layer for AI agents.

Across v1.5.0–v1.7.0:

  • Parser: PDF tables extracted as clean markdown, images extracted with vision-based summarization
  • Chunker: Tables and code fences kept atomic with smart splitting
  • Ingestion: 10x faster via parallel summarization, auto-recovery on failure, batched concept extraction
  • Agent integration: Proactive library usage without explicit user commands
  • Testing: 98 tests passing (up from ~34 at session start)
  • Issues: #8 implemented, #9-12 closed as not needed, #16 fixed

Full Changelog: v1.6.0...v1.7.0

v1.6.0 — Parallel Ingestion, Auto-Recovery & Large Book Support

28 Mar 15:22

Choose a tag to compare

What's New

Parallel Chapter Summarization

  • Stage 4 (chapter summarization) now runs in parallel using asyncio with a configurable semaphore
  • Default concurrency: 10 simultaneous LLM calls
  • Configurable via AGENTLIB_CONCURRENCY env var
  • ~10x faster ingestion for large books (e.g., 1134 chapters: ~5 min instead of ~50 min)

Batched Concept Extraction

  • extract_concepts() now processes chapters in batches of 50 instead of sending all chapter summaries in one LLM call
  • Prevents context window overflow on large books (previously crashed on books with 1000+ chapters)
  • Concepts deduplicated across batches (normalized keys, merged locations and aliases)
  • Global cap of 50 concepts per book, ranked by coverage breadth

Auto-Recovery

  • Partial manifest saved after stage 4 (chapter summaries with empty concept index)
  • On re-run after a stage 5 failure, stages 1-4 are skipped automatically — only concept extraction re-runs
  • Stage 5 retries up to 3 times with 30-second delay on API failures (rate limits, overloaded errors)
  • Retry scope includes RuntimeError, TypeError, AttributeError, and ValueError to handle malformed LLM output

Hero Animation

  • README now features a looping GIF demo of AgentLib in action
  • Navigation diagram redesigned as horizontal Mermaid flowchart

Bug Fixes

  • AGENTLIB_CONCURRENCY validated: clamped to min 1, fallback to 10 on invalid input
  • Stage 5 skip detection uses catalog entry (not dict truthiness) to correctly handle empty concept indexes
  • Non-dict LLM responses now raise RuntimeError with clear message instead of silent failures
  • Server tests no longer fail when fastmcp dependency is missing

Documentation

  • README updated with: content-aware chunking, PDF table/image extraction, parallel ingestion, auto-recovery, AGENTLIB_CONCURRENCY env var

Testing

  • 98 tests passing (up from 92)
  • New tests for: batched concept extraction, deduplication, concept cap, async summarization, semaphore concurrency, parallel timing

Full Changelog: v1.5.0...v1.6.0

v1.5.0 — Content-Aware Chunking & PDF Table/Image Extraction

27 Mar 13:54

Choose a tag to compare

What's New

Content-Aware Chunk Boundaries

  • Tables and fenced code blocks are now detected as atomic structural units and kept together in chunks
  • Soft cap 500 tokens (prose), hard cap 1000 tokens (structural blocks)
  • Oversized tables split at row boundaries with header propagation into each sub-chunk
  • Oversized code fences split at line boundaries with opening/closing markers preserved

PDF Table Extraction

  • PyMuPDF's find_tables() detects tables in PDFs and renders them as markdown pipe tables
  • Replaces previous whitespace-aligned garbled text with clean | col | col | format
  • Phantom empty columns (from PyMuPDF layout misalignment) automatically stripped
  • Tables positioned correctly among surrounding text via y-coordinate ordering

Image Extraction & Vision Summarization

  • Images extracted from PDFs at parse time, filtered (skips icons < 50px and < 5KB)
  • [Figure: filename] placeholders inserted in chunk text at correct positions
  • During chapter summarization, images sent to the LLM as base64 content blocks
  • LLM can now describe diagrams and figures in chapter summaries
  • Capped at 5 images per chapter to control costs
  • Automatic fallback to text-only if the configured model doesn't support vision

Bug Fixes

  • Fixed corpus pipeline crash from parse_pdf() return type change
  • Fixed /knowledge/agentlib-knowledge references in ingestion commands

Testing

  • 72 tests passing (up from ~34)
  • Verified end-to-end on the CycloneDX Authoritative Guide to SBOM

Full Changelog: v1.4.0...v1.5.0

v1.4.0 — Concept Index Aliases

25 Mar 23:11
cf577da

Choose a tag to compare

Concept Index Aliases for Semantic Search

Closes #7. The concept index now supports LLM-generated aliases — abbreviations, acronyms, and synonyms — so agents find concepts even when their query doesn't match the primary name.

What's new

  • Ingestion-time aliases — the LLM generates 2-3 aliases per concept during ingestion (e.g., "CycloneDX" → ["CDX", "Cyclone DX"]). Stored in manifests and concepts.json.
  • Alias-aware searchsearch_concepts and the library-researcher agent check aliases alongside concept names. Zero runtime cost.
  • Hit-rate metrics — new concept_search_hits/concept_search_misses fields in benchmark traces, with a post-hoc classifier wired into the scoring CLI.
  • Navigation diagram — README now includes an ASCII diagram showing how agents navigate the library layers.

Before → After

Query Before After
"CDX" ❌ miss → 5-6 calls, ~5k tokens ✅ hit via alias → 2-3 calls, ~1k tokens
"SBOM" ❌ miss on "Software Bill of Materials" ✅ hit via alias
"Maturity Model" ❌ miss on "SCVS BOM Maturity Model" ✅ hit via alias

Upgrade notes

  • Existing books: re-ingest with --force to generate aliases. Books without aliases continue to work (empty aliases list).
  • concepts.json format change: now {"concept": {"chunks": [...], "aliases": [...]}} instead of {"concept": [...]}. The library-researcher agent handles both.
  • Backward compatible: old manifests deserialize cleanly.

Files changed

  • lib/models.pyaliases field on ConceptEntry and CorpusConceptEntry
  • lib/storage.py — alias-aware search_concepts, updated write_concept_index
  • lib/summariser.py — updated concept extraction prompt and parsing
  • preprocessing/books.py, preprocessing/corpus.py — alias passthrough
  • agents/library-researcher.md — updated prompt to check aliases
  • benchmark/ — hit-rate metrics, classifier, report integration
  • README.md — navigation diagram

v1.3.0 — Corpus Ingestion & Research Agent

23 Mar 21:02

Choose a tag to compare

What's New

Corpus Ingestion Pipeline

  • /agentlib:agentlib-ingest-corpus — ingest a folder of scientific PDFs as a paper collection
  • 7-stage pipeline: discover → extract metadata → parse + chunk → summarise → cluster by topic → build cross-paper concept index → update navigation
  • Two-level L0: topic clusters (L0a) → per-cluster paper lists with abstracts (L0b)
  • Cross-paper concept index with 15-40 concepts mapped to papers and sections

Library Researcher Agent

  • library-researcher custom agent runs in an isolated context window
  • All navigation and chunk reading stays out of the main conversation
  • Only the synthesized answer with citations returns — 3k main context per query
  • Uses haiku model for fast, cheap research with structured navigation

Skill Improvements

  • Renamed to /agentlib-knowledge for consistent naming
  • Skill now delegates to the research agent by default
  • Navigation starts with NAVIGATION.md (covers both books and corpora)
  • Fallback to direct reads if agent delegation fails

Benchmarks (real measurements)

  • Agent delegation: 58% fewer total tokens vs raw PDF reading, 2x faster
  • Multi-query session: 7k main context for 2 questions vs 30k+ without agent
  • Corpus queries: 57% token reduction vs raw PDFs (33k vs 79k)
  • Book queries: 47-82% token reduction with same answer quality

Bug Fixes

  • Fixed metadata cache key mismatch in corpus pipeline (Qodo review)
  • Fixed concept index degradation on cached re-runs (Qodo review)
  • Fixed ~ path expansion in research agent context
  • Fixed author name in plugin metadata
  • Removed duplicate corpus entry handling

Documentation

  • New corpus walkthrough with Prof. Aharon Davidson physics papers example
  • Updated SBOM walkthrough for agent delegation flow
  • README updated with corpus support, benchmarks, agent architecture, and examples section
  • Added MIT license file

v1.2.2 — New hero image

23 Mar 08:39

Choose a tag to compare

AgentLib v1.2.2

  • Updated hero image

v1.2.1 — Documentation & Polish

22 Mar 14:12

Choose a tag to compare

AgentLib v1.2.1

Fixes

  • Fixed skill loading — added required name field to SKILL.md frontmatter
  • Skill invocation is /knowledge (not /agentlib:knowledge)
  • Fixed all command names to plugin-qualified format (/agentlib:agentlib-ingest-book)
  • Fixed OWASP walkthrough to use the actual SBOM book with real ingested data
  • Corrected PDF download link
  • Bumped version to 1.2.x across plugin.json and pyproject.toml

Documentation

  • Added SBOM walkthrough example (examples/sbom-walkthrough.md) with real data
  • Added all 3 commands to README (ingest, configure, library)
  • Documented /knowledge explicit invocation for when you want the book's answer
  • Removed stale MCP server references
  • Descriptive result labels in README

v1.2.0 — Concept Index Fix + Documentation Overhaul

22 Mar 06:43
3901294

Choose a tag to compare

AgentLib v1.2.0

Fixes

  • Concept index now populated correctly — chunk IDs are looked up programmatically after LLM extraction, fixing empty concepts.json for books where section-chunk mapping was incomplete
  • Skill renamed from agentlib:agentlib to agentlib:knowledge — cleaner invocation, no more redundant naming
  • All documentation aligned with zero-server architecture — removed stale MCP tool/server references from commands and docs

Documentation

  • Explicit invocation documented — use /agentlib:knowledge <question> when you want the book's answer over Claude's training data
  • Auto-trigger + explicit invocation patterns explained in README
  • Skill trigger keywords improved — removed hardcoded domain terms (SBOM, CycloneDX), fixed contradictory read limits
  • marketplace.json updated to "Up to 82% fewer tokens"

Code Quality

  • Extracted _section_id_from_chunk() and _title_from_path() helpers
  • Pre-built chapter→chunks lookup for O(1) concept fallback
  • Removed .scratch/ temp files from tracking
  • Fixed stale _data_root() docstring

Usage

# Auto-trigger — just ask naturally
What specific actor frameworks does the book mention?

# Explicit — when you want the book's specific answer
/agentlib:knowledge What defensive techniques protect against prompt injection?

Real-World Results

  • 87% token reduction (5.2k vs 38.6k) on fresh session test
  • 82% reduction confirmed across multiple questions and books
  • Correct answers with source citations every time

v1.1.0 — Zero-Server Mode

21 Mar 20:57
59bcdd5

Choose a tag to compare

AgentLib v1.1.0 — Zero-Server Mode

AgentLib no longer requires an MCP server. The agent navigates preprocessed files directly using a skill that teaches the navigation pattern.

What changed

  • No MCP server — removed .mcp.json, no process to manage
  • File-based navigation — agent reads catalog.jsonconcepts.jsonchunks/*.md directly
  • Smart skill trigger — keyword-rich description activates automatically for research/knowledge questions
  • Compact files on diskmanifest.compact.json (~500-2k tok) and concepts.json (~200 tok) written during ingestion
  • Self-documenting libraryNAVIGATION.md generated in library root

Results

  • 58% fewer content tokens vs raw PDF (6.1k vs 14.7k on SBOM maturity question)
  • 4 file reads to answer — catalog → concepts → 2 chunks
  • Automatic source citations in answers
  • Zero infrastructure, zero server management

Architecture (simplified)

AgentLib = Ingestion Pipeline + Skill + File Convention
  1. /agentlib-ingest-book ~/book.pdf — preprocess once
  2. Ask questions — skill teaches the agent to navigate metadata → chunks
  3. No server, no MCP, no tools — just files + instructions

Also in this release

  • /agentlib-library command to browse ingested books
  • Stronger skill trigger keywords (CandleKeep-inspired pattern)
  • Token budget enforcement on all navigation layers

Install

claude --plugin-dir ~/path/to/agentlib
/agentlib-ingest-book ~/path/to/book.pdf

Then just ask questions naturally.