Releases · barkain/agentlib

05 Apr 15:23

barkain

v1.8.0

11092a0

v1.8.0 — Unified Navigation Architecture Latest

Latest

What's New

Unified Navigation (3 files instead of 6)

library_index.json — one file for entire library: concepts, aliases, related concepts, and pattern fingerprints (merged from separate pattern_index.json)
nav.json (per-book) — structure + chunk metadata + concepts in one file (replaces manifest.compact.json, concepts.json, chunk_index.json)
manifest.json (per-book) — full archive, unchanged

MCP Server

Plugin now registers an MCP server with 6 tools: browse_library, open_book, search_library, search_concepts, preview_chunks, read_chunks
Skill uses MCP tools directly — no sub-agent delegation, no file reads
Typical query: 3 MCP calls → answer in ~1.5 minutes

Smarter Search

search_library searches concepts, aliases, related concepts, AND pattern tags in one call
open_book now returns chunk IDs per section — no format guessing
Skill prompt: 1 search attempt, then browse chapters (no retry loops)

Pattern Discovery

Patterns generated organically by LLM (removed hardcoded 43-item seed vocabulary)
Cross-domain associative recall integrated into search_library
Fuzzy pattern merging across books for consistency

Bug Fixes

ConceptEntry now carries related field (was silently dropped)
Chunk token counts preserved on non-force re-runs
explore_patterns result cap (merged into search_library)
Compact manifest schema restored to integer chunk counts
Corpus concept extraction now extracts related (parity with books)

Agent Improvements

Library-researcher agent upgraded to Sonnet (was Haiku)
maxTurns increased to 25 (was 15)
Output capped at 2000 chars to prevent rejection by main agent
Mandatory preview-before-read enforcement in prompts

Quality of Life

Portable MCP server config (/usr/bin/env uv)
Updated README with MCP tool workflow and streamlined diagrams
Plugin version bumped to 1.8.0

Assets 2

28 Mar 16:55

barkain

v1.7.0

64f6b09

v1.7.0 — Proactive Library Usage & Refined Positioning

What's New

Proactive Library Usage

The agentlib-knowledge skill now auto-triggers on domain-specific coding tasks — not just explicit research questions
Expanded trigger patterns: writing code involving domain-specific parameters, protocols, standards, or configurations; encountering technical terms from the library; methodology and best practices questions
Narrowed exclusions: "general programming tasks" instead of blanket "code editing" exclusion, so domain-specific coding triggers the library while generic tasks don't

Refined Positioning & Documentation

New README pitch: "Curate your knowledge library. Your agent works from sources you trust." — focused on curation, persistent knowledge, proactive integration, and citable answers
Replaced the O(n²) token-efficiency opening with a user-centric value proposition
Four key pillars: your sources/your curation, always available, proactive not reactive, citable answers

Demo Screenshots

Added proactive query demo: agent automatically consults library when asked about BOM maturity dimensions
Added collapsible library-researcher navigation view showing the full tool chain (NAVIGATION.md → concepts.json → chunks → synthesized answer)

Navigation Diagram

Redesigned "How agents navigate the library" diagram from vertical ASCII to horizontal Mermaid flowchart — wider and more readable

Hero Animation

README hero image replaced with looping GIF demo (800px, 15fps)

Session Summary

This release caps a productive session that also shipped v1.5.0 (content-aware chunking, PDF table/image extraction) and v1.6.0 (parallel ingestion, auto-recovery, large book support). Together these three releases transform AgentLib from a basic chunking pipeline into a production-ready knowledge layer for AI agents.

Across v1.5.0–v1.7.0:

Parser: PDF tables extracted as clean markdown, images extracted with vision-based summarization
Chunker: Tables and code fences kept atomic with smart splitting
Ingestion: 10x faster via parallel summarization, auto-recovery on failure, batched concept extraction
Agent integration: Proactive library usage without explicit user commands
Testing: 98 tests passing (up from ~34 at session start)
Issues: #8 implemented, #9-12 closed as not needed, #16 fixed

Full Changelog: v1.6.0...v1.7.0

Assets 2

28 Mar 15:22

barkain

v1.6.0

21e39e5

v1.6.0 — Parallel Ingestion, Auto-Recovery & Large Book Support

What's New

Parallel Chapter Summarization

Stage 4 (chapter summarization) now runs in parallel using asyncio with a configurable semaphore
Default concurrency: 10 simultaneous LLM calls
Configurable via AGENTLIB_CONCURRENCY env var
~10x faster ingestion for large books (e.g., 1134 chapters: ~5 min instead of ~50 min)

Batched Concept Extraction

extract_concepts() now processes chapters in batches of 50 instead of sending all chapter summaries in one LLM call
Prevents context window overflow on large books (previously crashed on books with 1000+ chapters)
Concepts deduplicated across batches (normalized keys, merged locations and aliases)
Global cap of 50 concepts per book, ranked by coverage breadth

Auto-Recovery

Partial manifest saved after stage 4 (chapter summaries with empty concept index)
On re-run after a stage 5 failure, stages 1-4 are skipped automatically — only concept extraction re-runs
Stage 5 retries up to 3 times with 30-second delay on API failures (rate limits, overloaded errors)
Retry scope includes RuntimeError, TypeError, AttributeError, and ValueError to handle malformed LLM output

Hero Animation

README now features a looping GIF demo of AgentLib in action
Navigation diagram redesigned as horizontal Mermaid flowchart

Bug Fixes

AGENTLIB_CONCURRENCY validated: clamped to min 1, fallback to 10 on invalid input
Stage 5 skip detection uses catalog entry (not dict truthiness) to correctly handle empty concept indexes
Non-dict LLM responses now raise RuntimeError with clear message instead of silent failures
Server tests no longer fail when fastmcp dependency is missing

Documentation

README updated with: content-aware chunking, PDF table/image extraction, parallel ingestion, auto-recovery, AGENTLIB_CONCURRENCY env var

Testing

98 tests passing (up from 92)
New tests for: batched concept extraction, deduplication, concept cap, async summarization, semaphore concurrency, parallel timing

Full Changelog: v1.5.0...v1.6.0

Assets 2

27 Mar 13:54

barkain

v1.5.0

04687e1

v1.5.0 — Content-Aware Chunking & PDF Table/Image Extraction

What's New

Content-Aware Chunk Boundaries

Tables and fenced code blocks are now detected as atomic structural units and kept together in chunks
Soft cap 500 tokens (prose), hard cap 1000 tokens (structural blocks)
Oversized tables split at row boundaries with header propagation into each sub-chunk
Oversized code fences split at line boundaries with opening/closing markers preserved

PDF Table Extraction

PyMuPDF's find_tables() detects tables in PDFs and renders them as markdown pipe tables
Replaces previous whitespace-aligned garbled text with clean | col | col | format
Phantom empty columns (from PyMuPDF layout misalignment) automatically stripped
Tables positioned correctly among surrounding text via y-coordinate ordering

Image Extraction & Vision Summarization

Images extracted from PDFs at parse time, filtered (skips icons < 50px and < 5KB)
[Figure: filename] placeholders inserted in chunk text at correct positions
During chapter summarization, images sent to the LLM as base64 content blocks
LLM can now describe diagrams and figures in chapter summaries
Capped at 5 images per chapter to control costs
Automatic fallback to text-only if the configured model doesn't support vision

Bug Fixes

Fixed corpus pipeline crash from parse_pdf() return type change
Fixed /knowledge → /agentlib-knowledge references in ingestion commands

Testing

72 tests passing (up from ~34)
Verified end-to-end on the CycloneDX Authoritative Guide to SBOM

Full Changelog: v1.4.0...v1.5.0

Assets 2

25 Mar 23:11

barkain

v1.4.0

cf577da

v1.4.0 — Concept Index Aliases

Concept Index Aliases for Semantic Search

Closes #7. The concept index now supports LLM-generated aliases — abbreviations, acronyms, and synonyms — so agents find concepts even when their query doesn't match the primary name.

What's new

Ingestion-time aliases — the LLM generates 2-3 aliases per concept during ingestion (e.g., "CycloneDX" → ["CDX", "Cyclone DX"]). Stored in manifests and concepts.json.
Alias-aware search — search_concepts and the library-researcher agent check aliases alongside concept names. Zero runtime cost.
Hit-rate metrics — new concept_search_hits/concept_search_misses fields in benchmark traces, with a post-hoc classifier wired into the scoring CLI.
Navigation diagram — README now includes an ASCII diagram showing how agents navigate the library layers.

Before → After

Query	Before	After
"CDX"	❌ miss → 5-6 calls, ~5k tokens	✅ hit via alias → 2-3 calls, ~1k tokens
"SBOM"	❌ miss on "Software Bill of Materials"	✅ hit via alias
"Maturity Model"	❌ miss on "SCVS BOM Maturity Model"	✅ hit via alias

Upgrade notes

Existing books: re-ingest with --force to generate aliases. Books without aliases continue to work (empty aliases list).
concepts.json format change: now {"concept": {"chunks": [...], "aliases": [...]}} instead of {"concept": [...]}. The library-researcher agent handles both.
Backward compatible: old manifests deserialize cleanly.

Files changed

lib/models.py — aliases field on ConceptEntry and CorpusConceptEntry
lib/storage.py — alias-aware search_concepts, updated write_concept_index
lib/summariser.py — updated concept extraction prompt and parsing
preprocessing/books.py, preprocessing/corpus.py — alias passthrough
agents/library-researcher.md — updated prompt to check aliases
benchmark/ — hit-rate metrics, classifier, report integration
README.md — navigation diagram

Assets 2

23 Mar 21:02

barkain

v1.3.0

435f9c2

v1.3.0 — Corpus Ingestion & Research Agent

What's New

Corpus Ingestion Pipeline

/agentlib:agentlib-ingest-corpus — ingest a folder of scientific PDFs as a paper collection
7-stage pipeline: discover → extract metadata → parse + chunk → summarise → cluster by topic → build cross-paper concept index → update navigation
Two-level L0: topic clusters (L0a) → per-cluster paper lists with abstracts (L0b)
Cross-paper concept index with 15-40 concepts mapped to papers and sections

Library Researcher Agent

library-researcher custom agent runs in an isolated context window
All navigation and chunk reading stays out of the main conversation
Only the synthesized answer with citations returns — 3k main context per query
Uses haiku model for fast, cheap research with structured navigation

Skill Improvements

Renamed to /agentlib-knowledge for consistent naming
Skill now delegates to the research agent by default
Navigation starts with NAVIGATION.md (covers both books and corpora)
Fallback to direct reads if agent delegation fails

Benchmarks (real measurements)

Agent delegation: 58% fewer total tokens vs raw PDF reading, 2x faster
Multi-query session: 7k main context for 2 questions vs 30k+ without agent
Corpus queries: 57% token reduction vs raw PDFs (33k vs 79k)
Book queries: 47-82% token reduction with same answer quality

Bug Fixes

Fixed metadata cache key mismatch in corpus pipeline (Qodo review)
Fixed concept index degradation on cached re-runs (Qodo review)
Fixed ~ path expansion in research agent context
Fixed author name in plugin metadata
Removed duplicate corpus entry handling

Documentation

New corpus walkthrough with Prof. Aharon Davidson physics papers example
Updated SBOM walkthrough for agent delegation flow
README updated with corpus support, benchmarks, agent architecture, and examples section
Added MIT license file

Assets 2

23 Mar 08:39

barkain

v1.2.2

f967874

v1.2.2 — New hero image

AgentLib v1.2.2

Updated hero image

Assets 2

22 Mar 14:12

barkain

v1.2.1

887bcd3

v1.2.1 — Documentation & Polish

AgentLib v1.2.1

Fixes

Fixed skill loading — added required name field to SKILL.md frontmatter
Skill invocation is /knowledge (not /agentlib:knowledge)
Fixed all command names to plugin-qualified format (/agentlib:agentlib-ingest-book)
Fixed OWASP walkthrough to use the actual SBOM book with real ingested data
Corrected PDF download link
Bumped version to 1.2.x across plugin.json and pyproject.toml

Documentation

Added SBOM walkthrough example (examples/sbom-walkthrough.md) with real data
Added all 3 commands to README (ingest, configure, library)
Documented /knowledge explicit invocation for when you want the book's answer
Removed stale MCP server references
Descriptive result labels in README

Assets 2

22 Mar 06:43

barkain

v1.2.0

3901294

v1.2.0 — Concept Index Fix + Documentation Overhaul

AgentLib v1.2.0

Fixes

Concept index now populated correctly — chunk IDs are looked up programmatically after LLM extraction, fixing empty concepts.json for books where section-chunk mapping was incomplete
Skill renamed from agentlib:agentlib to agentlib:knowledge — cleaner invocation, no more redundant naming
All documentation aligned with zero-server architecture — removed stale MCP tool/server references from commands and docs

Documentation

Explicit invocation documented — use /agentlib:knowledge <question> when you want the book's answer over Claude's training data
Auto-trigger + explicit invocation patterns explained in README
Skill trigger keywords improved — removed hardcoded domain terms (SBOM, CycloneDX), fixed contradictory read limits
marketplace.json updated to "Up to 82% fewer tokens"

Code Quality

Extracted _section_id_from_chunk() and _title_from_path() helpers
Pre-built chapter→chunks lookup for O(1) concept fallback
Removed .scratch/ temp files from tracking
Fixed stale _data_root() docstring

Usage

# Auto-trigger — just ask naturally
What specific actor frameworks does the book mention?

# Explicit — when you want the book's specific answer
/agentlib:knowledge What defensive techniques protect against prompt injection?

Real-World Results

87% token reduction (5.2k vs 38.6k) on fresh session test
82% reduction confirmed across multiple questions and books
Correct answers with source citations every time

Assets 2

21 Mar 20:57

barkain

v1.1.0

59bcdd5

v1.1.0 — Zero-Server Mode

AgentLib v1.1.0 — Zero-Server Mode

AgentLib no longer requires an MCP server. The agent navigates preprocessed files directly using a skill that teaches the navigation pattern.

What changed

No MCP server — removed .mcp.json, no process to manage
File-based navigation — agent reads catalog.json → concepts.json → chunks/*.md directly
Smart skill trigger — keyword-rich description activates automatically for research/knowledge questions
Compact files on disk — manifest.compact.json (~500-2k tok) and concepts.json (~200 tok) written during ingestion
Self-documenting library — NAVIGATION.md generated in library root

Results

58% fewer content tokens vs raw PDF (6.1k vs 14.7k on SBOM maturity question)
4 file reads to answer — catalog → concepts → 2 chunks
Automatic source citations in answers
Zero infrastructure, zero server management

Architecture (simplified)

AgentLib = Ingestion Pipeline + Skill + File Convention

/agentlib-ingest-book ~/book.pdf — preprocess once
Ask questions — skill teaches the agent to navigate metadata → chunks
No server, no MCP, no tools — just files + instructions

Also in this release

/agentlib-library command to browse ingested books
Stronger skill trigger keywords (CandleKeep-inspired pattern)
Token budget enforcement on all navigation layers

Install

claude --plugin-dir ~/path/to/agentlib
/agentlib-ingest-book ~/path/to/book.pdf

Then just ask questions naturally.

Assets 2

Releases: barkain/agentlib

v1.8.0 — Unified Navigation Architecture

What's New

Unified Navigation (3 files instead of 6)

MCP Server

Smarter Search

Pattern Discovery

Bug Fixes

Agent Improvements

Quality of Life

Uh oh!

v1.7.0 — Proactive Library Usage & Refined Positioning

What's New

Proactive Library Usage

Refined Positioning & Documentation

Demo Screenshots

Navigation Diagram

Hero Animation

Session Summary

Across v1.5.0–v1.7.0:

Uh oh!

v1.6.0 — Parallel Ingestion, Auto-Recovery & Large Book Support

What's New

Parallel Chapter Summarization

Batched Concept Extraction

Auto-Recovery

Hero Animation

Bug Fixes

Documentation

Testing

Uh oh!

v1.5.0 — Content-Aware Chunking & PDF Table/Image Extraction

What's New

Content-Aware Chunk Boundaries

PDF Table Extraction

Image Extraction & Vision Summarization

Bug Fixes

Testing

Uh oh!

v1.4.0 — Concept Index Aliases

Concept Index Aliases for Semantic Search

What's new

Before → After

Upgrade notes

Files changed

Uh oh!

v1.3.0 — Corpus Ingestion & Research Agent

What's New

Corpus Ingestion Pipeline

Library Researcher Agent

Skill Improvements

Benchmarks (real measurements)

Bug Fixes

Documentation

Uh oh!

v1.2.2 — New hero image

AgentLib v1.2.2

Uh oh!

v1.2.1 — Documentation & Polish

AgentLib v1.2.1

Fixes

Documentation

Uh oh!

v1.2.0 — Concept Index Fix + Documentation Overhaul

AgentLib v1.2.0

Fixes

Documentation

Code Quality

Usage

Real-World Results

Uh oh!

v1.1.0 — Zero-Server Mode

AgentLib v1.1.0 — Zero-Server Mode

What changed

Results

Architecture (simplified)

Also in this release

Install

Uh oh!