Install PyPDF2 and pdfrw via python3 -m pip install -r requirements.txt.
Also, install qpdf before using any script. Make sure qpdf can be found directly in your PATH.
If you need to use to_image.py, you'll also need to have poppler installed in your PATH as well.
- Windows: https://github.com/oschwartz10612/poppler-windows/releases/latest
- MacOS:
brew install poppler
remove_image.py combines PyPDF2 and pdfrw's functionalities to achieve a better result. All scripts uses qpdf to produce a linearized PDF result that displays faster under bad network condition.
$ python3 remove_image.py "./pdfs/Space OCR.pdf"
$ ls -alh ./pdfs
-rw-r--r-- 1 host staff 25M Nov 3 14:24 Space OCR.pdf
-rw-r--r-- 1 host staff 9.7M Nov 7 11:20 image_removed_pypdf2_only.pdf
-rw-r--r-- 1 host staff 956K Nov 7 11:20 image_removed_pypdf2_pdfrw.pdf
$ python3 remove_image.py --aggressive "./pdfs/Space OCR.pdf"
$ ls -alh ./pdfs/image_removed_pypdf2_aggressive_pdfrw.pdf
-rw-r--r-- 1 host staff 699K Nov 7 11:21 image_removed_pypdf2_aggressive_pdfrw.pdfNote that the PDF produced by side_by_side.py works better with PDF readers that supports Two-Page View, e.g. Preview.app.