Strip + prune the venv to slim the runtime image (~30MB)#58
Open
jghoman wants to merge 1 commit into
Open
Conversation
Adds one RUN block in the builder stage that: 1. Installs binutils (builder-only; never lands in the final image). 2. Strips debug symbols from every .so in /app/.venv via `strip --strip-unneeded`. Most of the heft is PyArrow's libarrow.so and librdkafka.so; both ship with full debug tables. 3. Deletes bundled tests/, test/, __pycache__/, *.pyi, *.pyc inside the venv. These are install-time artifacts; the runtime doesn't need them. 4. Cleans /root/.cache, /tmp, and the apt lists. Measured on this branch's Dockerfile (so the duckdb-cli + extensions + tools/ paths are unchanged): Before: ghcr.io/posthog/millpond:latest 459 MB After: millpond:main-stripped 429 MB Delta: -30 MB Sanity-tested: `python -c "import duckdb, pyarrow, confluent_kafka, prometheus_client; from millpond import ducklake, schema"` succeeds in the stripped image. No behavior change; the strip is on debug symbols only and the prune targets install-time artifacts that have no runtime consumer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds one
RUNblock in the Dockerfile's builder stage to (a) strip debug symbols from every.soin the venv viastrip --strip-unneeded, and (b) delete bundledtests//__pycache__//*.pyi/*.pycartifacts that have no runtime consumer.binutilsis installed only in the builder stage and never lands in the final image.Numbers
ghcr.io/posthog/millpond:latest(current main)millpond:main-strippedlocally)The biggest contributors stripped are
libarrow.so,libarrow_acero.so,libparquet.so, andlibrdkafka.so— all ship with full debug tables that aren't needed at runtime.Sanity check
All imports succeed in the stripped image. The strip is on debug symbols only and the prune targets install-time artifacts.
Why this is safe
tests/,__pycache__/,*.pyi,*.pyc: bundled package test directories aren't imported by the application; type stubs (.pyi) are pure annotations consumed by type-checkers, not runtime;.pycfiles are regenerated by Python at first import if needed.binutilsinstall is in the builder stage; the final image's apt state is unchanged.What this PR does not do
Other slimming levers we measured but skipped:
msk-iamextra out of the default image → another −20MB but operational change (would need two published variants, and a real decision about which non-MSK deploys exist).python3-debian12is Python 3.11; we require 3.12.The strip + prune is the zero-risk free win. Bigger cuts can come if there's a real driver.
Test plan
docker build .succeeds (the only file touched is the Dockerfile)docker imagespython -c "import …"smoke test in the built image