Skip to content

Releases: andi586/cleanslate-pdf-engine

v0.1 — Initial AI Automation Framework

04 Mar 03:01

Choose a tag to compare

CleanSlate Protocol v0.1 — Initial Release

First public version of the CleanSlate AI data standardization framework.

Highlights

  • CleanSlate Protocol (CSP) — A three-layer open standard for AI data preprocessing

    • Layer 1: Raw Extraction (source preservation + SHA-256 checksum)
    • Layer 2: Semantic Structure (heading detection, table analysis, entity extraction)
    • Layer 3: Verification (Merkle hash tree, Ed25519 signatures, timestamps)
  • PDF Engine v0.2 — Production-grade PDF parsing engine

    • Font-clustering heading detection with hierarchy inference
    • Spatial column analysis for table extraction
    • CJK-Latin mixed text spacing repair
    • OCR artifact cleanup
    • Overall accuracy: 82/100 (up from 15/100 in v0.1)
  • 10+ Format Support — PDF, DOCX, XLSX, PPTX, HTML, CSV, TXT, JSON, PNG, JPEG, Markdown

  • CSP Playground — Browser-based interactive converter

    • Drag-and-drop file upload
    • Three output modes: CSP JSON / Markdown / Raw JSON
    • Real-time quality metrics display
    • 100% browser-local processing — zero uploads, zero servers
  • Professional Landing Page — Protocol showcase with architecture diagrams, code examples, ecosystem overview, and competitive comparison

Technical Stack

  • React 19 + TypeScript 5.6
  • Tailwind CSS 4 + shadcn/ui
  • Vite 7 build system
  • pdfjs-dist for PDF processing
  • Framer Motion for animations

Getting Started

git clone https://github.com/andi586/cleanslate-pdf-engine.git
cd cleanslate-pdf-engine
pnpm install
pnpm dev

Open http://localhost:3000 in your browser.

Documentation

License

MIT License — free for personal and commercial use.


CleanSlate Protocol — The standardization layer every AI agent needs.