REST API and job worker for poster extraction.
The API does not accept file uploads. The frontend uploads poster files to Bunny storage and creates ExtractionJob records in the database. This service polls the database for new jobs, downloads the file from Bunny, runs extraction, and writes results to PosterMetadata.
# Set required environment variables (see Configuration)
export DATABASE_URL="postgresql://..."
export BUNNY_STORAGE_ZONE="your-zone"
export BUNNY_ACCESS_KEY="your-storage-password"
# Start the API server (starts background job worker)
python api.py
# Or via Docker
docker compose upThe API runs on http://localhost:8000 by default.
Simple health check returning API status.
Response:
{
"status": "ok",
"service": "Poster Extraction API",
"version": "1.0.0"
}Detailed health status including GPU and model availability.
Response:
{
"status": "healthy",
"checks": {
"api": "ok",
"cuda": "ok",
"gpu": "NVIDIA GeForce RTX 4090",
"json_model": "ok"
}
}Run one job-worker cycle: if there is an uncompleted (pending) job, it is claimed and processed immediately. Call this after submitting a job to start processing without waiting for the next poll interval.
Response: 204 No Content (no body).
A background thread runs continuously:
- Poll the database for an
ExtractionJobwithcompleted = falseandstatus = 'pending'. - Claim the job (set
status = 'processing'). - Download the file from Bunny storage using the job’s
filePath(and optionalfileName). - Extract using the same pipeline as the CLI (no extraction logic changes).
- Upsert
PosterMetadatafor the job’sposterIdwith the extracted JSON (creators, titles, posterContent, imageCaption, tableCaption, etc.). - Complete the job (
status = 'completed',completed = true) or fail it (status = 'failed',errorset).
Only one extraction runs at a time (shared lock with any future HTTP-triggered work).
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection URL (for ExtractionJob / PosterMetadata) | required |
BUNNY_STORAGE_ZONE |
Bunny storage zone name | required |
BUNNY_ACCESS_KEY |
Bunny storage zone password (AccessKey) | required |
BUNNY_REGION |
Optional region (e.g. ny, uk); omit for default storage.bunnycdn.com |
— |
POLL_INTERVAL_SECONDS |
Seconds between job poll cycles | 30 |
PORT |
API server port | 8000 |
HOST |
API server host | 0.0.0.0 |
CUDA_VISIBLE_DEVICES |
GPU device(s) | All available |
PDFALTO_PATH |
Path to pdfalto binary (for PDF processing) | See poster_extraction |
PORT=9000 python api.py| Code | Description |
|---|---|
| 200 | Success (health) |
| 503 | Unhealthy (e.g. GPU or model unavailable) |
Job failures are recorded in the database: ExtractionJob.status = 'failed' and ExtractionJob.error set.
CORS is enabled by default for all origins. Configure in api.py if needed:
from flask_cors import CORS
CORS(app, origins=["https://your-domain.com"])- Docker Setup - Container deployment
- Architecture - Technical details
- Installation - Setup instructions