API Reference

REST API and job worker for poster extraction.

Overview

The API does not accept file uploads. The frontend uploads poster files to Bunny storage and creates ExtractionJob records in the database. This service polls the database for new jobs, downloads the file from Bunny, runs extraction, and writes results to PosterMetadata.

Quick Start

# Set required environment variables (see Configuration)
export DATABASE_URL="postgresql://..."
export BUNNY_STORAGE_ZONE="your-zone"
export BUNNY_ACCESS_KEY="your-storage-password"

# Start the API server (starts background job worker)
python api.py

# Or via Docker
docker compose up

The API runs on http://localhost:8000 by default.

Endpoints

Health Check

`GET /`

Simple health check returning API status.

Response:

{
  "status": "ok",
  "service": "Poster Extraction API",
  "version": "1.0.0"
}

`GET /health`

`GET /up`

Detailed health status including GPU and model availability.

Response:

{
  "status": "healthy",
  "checks": {
    "api": "ok",
    "cuda": "ok",
    "gpu": "NVIDIA GeForce RTX 4090",
    "json_model": "ok"
  }
}

Trigger job check

`POST /jobs/check`

Run one job-worker cycle: if there is an uncompleted (pending) job, it is claimed and processed immediately. Call this after submitting a job to start processing without waiting for the next poll interval.

Response: 204 No Content (no body).

Job Worker

A background thread runs continuously:

Poll the database for an ExtractionJob with completed = false and status = 'pending'.
Claim the job (set status = 'processing').
Download the file from Bunny storage using the job’s filePath (and optional fileName).
Extract using the same pipeline as the CLI (no extraction logic changes).
Upsert PosterMetadata for the job’s posterId with the extracted JSON (creators, titles, posterContent, imageCaption, tableCaption, etc.).
Complete the job (status = 'completed', completed = true) or fail it (status = 'failed', error set).

Only one extraction runs at a time (shared lock with any future HTTP-triggered work).

Configuration

Environment Variables

Variable	Description	Default
`DATABASE_URL`	PostgreSQL connection URL (for ExtractionJob / PosterMetadata)	required
`BUNNY_STORAGE_ZONE`	Bunny storage zone name	required
`BUNNY_ACCESS_KEY`	Bunny storage zone password (AccessKey)	required
`BUNNY_REGION`	Optional region (e.g. `ny`, `uk`); omit for default `storage.bunnycdn.com`	—
`POLL_INTERVAL_SECONDS`	Seconds between job poll cycles	30
`PORT`	API server port	8000
`HOST`	API server host	0.0.0.0
`CUDA_VISIBLE_DEVICES`	GPU device(s)	All available
`PDFALTO_PATH`	Path to pdfalto binary (for PDF processing)	See poster_extraction

Starting with Custom Port

PORT=9000 python api.py

Error Handling

Code	Description
200	Success (health)
503	Unhealthy (e.g. GPU or model unavailable)

Job failures are recorded in the database: ExtractionJob.status = 'failed' and ExtractionJob.error set.

CORS

CORS is enabled by default for all origins. Configure in api.py if needed:

from flask_cors import CORS
CORS(app, origins=["https://your-domain.com"])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

Overview

Quick Start

Endpoints

Health Check

`GET /`

`GET /health`

`GET /up`

Trigger job check

`POST /jobs/check`

Job Worker

Configuration

Environment Variables

Starting with Custom Port

Error Handling

CORS

See Also

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

API Reference

Overview

Quick Start

Endpoints

Health Check

GET /

GET /health

GET /up

Trigger job check

POST /jobs/check

Job Worker

Configuration

Environment Variables

Starting with Custom Port

Error Handling

CORS

See Also

`GET /`

`GET /health`

`GET /up`

`POST /jobs/check`