A Python pipeline for analyzing manga, comics, and documents. Extracts panels from PDFs, performs OCR, and uses LLM to generate summaries, tags, and genre classification.
- π PDF page extraction and panel detection
- π OCR text extraction from panels
- π§ Smart token-limited chunking with uniform distribution
- π€ LLM-powered analysis (OpenAI API compatible)
- π» Cross-platform support (Windows, macOS, Linux)
Make sure you have Python 3.10 or higher installed.
Poppler is required for PDF processing. Install it based on your OS:
Option A: Using Chocolatey (Recommended)
choco install popplerOption B: Manual Installation
- Download poppler for Windows from: https://github.com/oschwartz10612/poppler-windows/releases
- Extract the zip file (e.g., to
C:\poppler) - Add the
binfolder to your PATH:- Right-click "This PC" β Properties β Advanced system settings
- Click "Environment Variables"
- Under "System variables", find and edit "Path"
- Add new entry:
C:\poppler\Library\bin(or wherever you extracted it) - Click OK and restart your terminal
brew install popplersudo apt-get update
sudo apt-get install poppler-utilssudo dnf install poppler-utilsTesseract is required for text extraction.
Option A: Using Chocolatey
choco install tesseractOption B: Manual Installation
- Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
- Run the installer
- Add Tesseract to PATH (usually
C:\Program Files\Tesseract-OCR) - Restart your terminal
brew install tesseractsudo apt-get install tesseract-ocrsudo dnf install tesseractgit clone https://github.com/sammwyy/cosmisum
cd cosmisum# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtOn first run, the script will create a default .env file. Edit it with your configuration:
# .env
OPENAI_API_KEY=sk-your-actual-api-key-here
OPENAI_API_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL_ID=gpt-4Note: You can use any OpenAI-compatible API by changing the OPENAI_API_BASE_URL (e.g., for local models, Azure OpenAI, etc.)
- For OpenAI: https://platform.openai.com/api-keys
- For other providers: Check their documentation
python cosmisum.py input.pdfpython cosmisum.py my_manga_chapter.pdf- Extracts panels from each PDF page
- Performs OCR on each panel to extract text
- Creates uniform chunks distributed across the document (respecting token limits)
- Sends to LLM for analysis
- Outputs:
- Plot summary
- Thematic tags
- Genre/category classification
The script will display results in the console:
================================================================================
Based on the provided text extracts, here is the analysis of the manga/comic:
1. **Plot Summary**: [Generated summary...]
2. **Thematic Tags**: action, adventure, friendship, ...
3. **Genre/Category**: Shonen manga / Action-Adventure
================================================================================
- Make sure poppler is installed (see Prerequisites)
- Verify it's in your PATH: run
pdfinfo -vin terminal - On Windows, restart your terminal after adding to PATH
- Make sure Tesseract is installed (see Prerequisites)
- On Windows, you may need to specify the path in the script:
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
- Edit the
.envfile and add your actual API key - Make sure the
.envfile is in the same directory ascosmisum.py
- Check if your PDF contains actual text (not just images)
- Try adjusting the OCR language settings in the code
- Increase PDF DPI in
extract_all_panels()function
Edit .env to point to your local API:
OPENAI_API_KEY=not-needed
OPENAI_API_BASE_URL=http://localhost:1234/v1
OPENAI_MODEL_ID=local-modelModify the perform_ocr() function to use different languages:
text = pytesseract.image_to_string(img, lang='jpn+eng') # Japanese + EnglishDownload language data from: https://github.com/tesseract-ocr/tessdata
MIT
Pull requests are welcome. For major changes, please open an issue first.