HADOOP-19859. Speed up GHA jobs by image cache#8451
Conversation
|
note: cache only will be used after merging this. so don't expect this PR itself to benefit from cache. |
|
|
||
| jobs: | ||
| main: | ||
| name: build-image-cache-${{ inputs.os }}-${{ github.ref_name }} |
There was a problem hiding this comment.
github.ref_name is the branch name for push event
| - 'branch-*' | ||
| paths: | ||
| - 'dev-support/docker/**' | ||
| workflow_dispatch: |
There was a problem hiding this comment.
both push and workflow_dispatch indicate the person has write permission on the repo, so it's safe.
There was a problem hiding this comment.
Agreed. It is a "trusted" action. (We still are careful to use best practices below, and our CodeQL scanning helps enforce that in the future.)
|
@ajfabbri I follow your practice to add "Security ..." comments too, could you take a look? |
|
🎊 +1 overall
This message was automatically generated. |
| - 'trunk' | ||
| - 'branch-*' | ||
| paths: | ||
| - 'dev-support/docker/**' |
There was a problem hiding this comment.
Cool. Makes sense to rebuild when anything in here changes. In the future we might just publish an image and use it directly instead of always doing a cached build? We can iterate on it though. 👍
There was a problem hiding this comment.
I think the cache is better.
- in most cases, no docker file change, cache hit and it works well
- if trunk merges a PR that has docker files change, it will take ~15min to refresh the cache, a new push happens in the forked PR will miss cache and do a refresh build
- if a PR itself contains docker files change, it must do a fresh build
| - 'branch-*' | ||
| paths: | ||
| - 'dev-support/docker/**' | ||
| workflow_dispatch: |
There was a problem hiding this comment.
Agreed. It is a "trusted" action. (We still are careful to use best practices below, and our CodeQL scanning helps enforce that in the future.)
| push: true | ||
| tags: ghcr.io/apache/hadoop/gha-build-${{ inputs.os }}-image-cache:${{ github.ref_name }}-static | ||
| cache-from: type=registry,ref=ghcr.io/apache/hadoop/gha-build-${{ inputs.os }}-image-cache:${{ github.ref_name }} | ||
| cache-to: type=registry,ref=ghcr.io/apache/hadoop/gha-build-${{ inputs.os }}-image-cache:${{ github.ref_name }},mode=max |
There was a problem hiding this comment.
Just getting familiar with this and reading docs. Is this based on the Spark CI workflows?
type=registry (docs)
registry: embeds the build cache into a separate image, and pushes to a dedicated location separate from the main output.
cache-to: exports the cache to a particular backend (registry) after a build. cache-from specifies how to import at start of a build. IIUC the local BuildKit cache is always enabled, but has no persistence between runs, so only helps with multiple builds within the same workflow.
The locations passed in (ref=) act as the key for the cache lookup, and we separate these by OS and branch name.
mode=max means to export all intermediate layers of the image build, whereas mode=min only exports those which end up in the image. This looks good to me. 👍
There was a problem hiding this comment.
Is this based on the Spark CI workflows?
I basically replicate it from Spark.
https://github.com/apache/spark/blob/branch-4.1/.github/workflows/build_infra_images_cache.yml
|
thanks, merged to trunk, I will manually trigger the first image cache, subsequent cache refreshes will happen automatically once the manually triggered image cache jobs are: https://github.com/apache/hadoop/actions/runs/24869306467 |
Description of PR
This PR adds image cache for the GHA workflow, when cache-hit, the build image step likely decreases from ~15min to ~1min.
Image cache is created on pushing a commit to
apache/hadooprepo, whendev-support/docker/**changes or manually trigger, and since this is a public repo, it can be leveraged by forked repos to read to speed up their image building workflows.How was this patch tested?
Tested on my forked repo.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?AI Tooling
No AI usage.