Releases: andi586/cleanslate-pdf-engine
Releases · andi586/cleanslate-pdf-engine
v0.1 — Initial AI Automation Framework
CleanSlate Protocol v0.1 — Initial Release
First public version of the CleanSlate AI data standardization framework.
Highlights
-
CleanSlate Protocol (CSP) — A three-layer open standard for AI data preprocessing
- Layer 1: Raw Extraction (source preservation + SHA-256 checksum)
- Layer 2: Semantic Structure (heading detection, table analysis, entity extraction)
- Layer 3: Verification (Merkle hash tree, Ed25519 signatures, timestamps)
-
PDF Engine v0.2 — Production-grade PDF parsing engine
- Font-clustering heading detection with hierarchy inference
- Spatial column analysis for table extraction
- CJK-Latin mixed text spacing repair
- OCR artifact cleanup
- Overall accuracy: 82/100 (up from 15/100 in v0.1)
-
10+ Format Support — PDF, DOCX, XLSX, PPTX, HTML, CSV, TXT, JSON, PNG, JPEG, Markdown
-
CSP Playground — Browser-based interactive converter
- Drag-and-drop file upload
- Three output modes: CSP JSON / Markdown / Raw JSON
- Real-time quality metrics display
- 100% browser-local processing — zero uploads, zero servers
-
Professional Landing Page — Protocol showcase with architecture diagrams, code examples, ecosystem overview, and competitive comparison
Technical Stack
- React 19 + TypeScript 5.6
- Tailwind CSS 4 + shadcn/ui
- Vite 7 build system
- pdfjs-dist for PDF processing
- Framer Motion for animations
Getting Started
git clone https://github.com/andi586/cleanslate-pdf-engine.git
cd cleanslate-pdf-engine
pnpm install
pnpm devOpen http://localhost:3000 in your browser.
Documentation
- README.md — Project overview
- ARCHITECTURE.md — Technical deep-dive
- ROADMAP.md — Development roadmap
- CONTRIBUTING.md — Contribution guidelines
License
MIT License — free for personal and commercial use.
CleanSlate Protocol — The standardization layer every AI agent needs.