Skip to content

[WIP] compression: add specific prefix for zstd:chunked#2183

Closed
giuseppe wants to merge 2 commits into
containers:mainfrom
giuseppe:zstd-detection
Closed

[WIP] compression: add specific prefix for zstd:chunked#2183
giuseppe wants to merge 2 commits into
containers:mainfrom
giuseppe:zstd-detection

Conversation

@giuseppe

Copy link
Copy Markdown
Member

it allows to differentiate between zstd and zstd:chunked

Needs: containers/storage#1756

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Needs: containers/storage#1756

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe marked this pull request as draft November 10, 2023 14:18

@mtrmac mtrmac left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation LGTM, but when do we need this?

The two zstd variants should be differentiated by annotations; and the layer can’t be pulled without the ManifestChecksumKey annotation present.[1] Are there any situations where need to treat such “chunked blob, but no chunked annotations” layers specially?


[1]… actually that sounds like another thing we don’t handle: when pushing from c/storage to a registry, TryReusingBlobWithOptions will find a record of a pre-existing zstd:chunked layer and return that it should be reused, but we don’t set the right annotations on that blob. (So we would need to record these annotations in BlobInfoCache, and take good care that we only record them when we created them ourselves, or after we have verified them when pulling.)

@giuseppe

Copy link
Copy Markdown
Member Author

I am not even sure if we need any of these. The blob info cache probably doesn't make sense when using a layer that was partially pulled.

@mtrmac

mtrmac commented Nov 14, 2023

Copy link
Copy Markdown
Collaborator

The compression data in the BIC exists for CandidateLocations2, i.e. for pushes only. We probably don’t need it for pulls, reuse (currently?) happens mostly (but not exclusively) relying on data stored in c/storage natively, not in the BIC.

But we do need it for pushes:

  • If the same image is pushed twice to the same destination, we should typically re-use blobs and not re-push.
  • In particular, if we (pull), build, + push the result, users expect us to optimize out re-pushing the base image layers if possible. Edit+build+push+run iterations should be fast.

Also, if users ask for zstd, I think it’s fine to reuse a zstd:chunked layer (possibly without adding the annotations, especially if we don’t trust them). If they ask for zstd:chunked, we need to be able to reuse only chunked layers and not the non-chunked ones. So eventually we should be able to reliably differentiate in the BIC between the two, and to carry the required annotations to allow reuse. Right now it seems to me that the difference is primarily trusted-annotation-driven, but there may well be something I’m missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants