Skip to content

HADOOP-19859. Speed up GHA jobs by image cache#8451

Merged
pan3793 merged 2 commits intoapache:trunkfrom
pan3793:HADOOP-19859
Apr 24, 2026
Merged

HADOOP-19859. Speed up GHA jobs by image cache#8451
pan3793 merged 2 commits intoapache:trunkfrom
pan3793:HADOOP-19859

Conversation

@pan3793
Copy link
Copy Markdown
Member

@pan3793 pan3793 commented Apr 22, 2026

Description of PR

This PR adds image cache for the GHA workflow, when cache-hit, the build image step likely decreases from ~15min to ~1min.

Image cache is created on pushing a commit to apache/hadoop repo, when dev-support/docker/** changes or manually trigger, and since this is a public repo, it can be leveraged by forked repos to read to speed up their image building workflows.

How was this patch tested?

Tested on my forked repo.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (HADOOP-19859)?
  • [na] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • [na] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • [na] If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

AI Tooling

No AI usage.

@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented Apr 22, 2026

note: cache only will be used after merging this. so don't expect this PR itself to benefit from cache.


jobs:
main:
name: build-image-cache-${{ inputs.os }}-${{ github.ref_name }}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.ref_name is the branch name for push event

@pan3793 pan3793 closed this Apr 22, 2026
@pan3793 pan3793 deleted the HADOOP-19859 branch April 22, 2026 06:00
@pan3793 pan3793 restored the HADOOP-19859 branch April 22, 2026 06:08
@pan3793 pan3793 reopened this Apr 22, 2026
@apache apache deleted a comment from hadoop-yetus Apr 22, 2026
@apache apache deleted a comment from hadoop-yetus Apr 22, 2026
@apache apache deleted a comment from hadoop-yetus Apr 22, 2026
@pan3793 pan3793 requested review from ajfabbri April 22, 2026 16:48
- 'branch-*'
paths:
- 'dev-support/docker/**'
workflow_dispatch:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both push and workflow_dispatch indicate the person has write permission on the repo, so it's safe.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It is a "trusted" action. (We still are careful to use best practices below, and our CodeQL scanning helps enforce that in the future.)

@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented Apr 23, 2026

@ajfabbri I follow your practice to add "Security ..." comments too, could you take a look?

@hadoop-yetus
Copy link
Copy Markdown

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 yamllint 0m 1s yamllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ trunk Compile Tests _
+1 💚 shadedclient 31m 20s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 shadedclient 27m 2s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 0m 37s The patch does not generate ASF License warnings.
61m 10s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8451/5/artifact/out/Dockerfile
GITHUB PR #8451
Optional Tests dupname asflicense codespell detsecrets yamllint
uname Linux 5914304c2c9d 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2b51282
Max. process+thread count 635 (vs. ulimit of 10000)
modules C: . U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8451/5/console
versions git=2.43.0 maven=3.9.11
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@pan3793 pan3793 requested a review from steveloughran April 23, 2026 06:02
@apache apache deleted a comment from hadoop-yetus Apr 23, 2026
Copy link
Copy Markdown
Contributor

@ajfabbri ajfabbri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- 'trunk'
- 'branch-*'
paths:
- 'dev-support/docker/**'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Makes sense to rebuild when anything in here changes. In the future we might just publish an image and use it directly instead of always doing a cached build? We can iterate on it though. 👍

Copy link
Copy Markdown
Member Author

@pan3793 pan3793 Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cache is better.

  1. in most cases, no docker file change, cache hit and it works well
  2. if trunk merges a PR that has docker files change, it will take ~15min to refresh the cache, a new push happens in the forked PR will miss cache and do a refresh build
  3. if a PR itself contains docker files change, it must do a fresh build

- 'branch-*'
paths:
- 'dev-support/docker/**'
workflow_dispatch:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It is a "trusted" action. (We still are careful to use best practices below, and our CodeQL scanning helps enforce that in the future.)

push: true
tags: ghcr.io/apache/hadoop/gha-build-${{ inputs.os }}-image-cache:${{ github.ref_name }}-static
cache-from: type=registry,ref=ghcr.io/apache/hadoop/gha-build-${{ inputs.os }}-image-cache:${{ github.ref_name }}
cache-to: type=registry,ref=ghcr.io/apache/hadoop/gha-build-${{ inputs.os }}-image-cache:${{ github.ref_name }},mode=max
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just getting familiar with this and reading docs. Is this based on the Spark CI workflows?

type=registry (docs)

registry: embeds the build cache into a separate image, and pushes to a dedicated location separate from the main output.

cache-to: exports the cache to a particular backend (registry) after a build. cache-from specifies how to import at start of a build. IIUC the local BuildKit cache is always enabled, but has no persistence between runs, so only helps with multiple builds within the same workflow.

The locations passed in (ref=) act as the key for the cache lookup, and we separate these by OS and branch name.

mode=max means to export all intermediate layers of the image build, whereas mode=min only exports those which end up in the image. This looks good to me. 👍

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this based on the Spark CI workflows?

I basically replicate it from Spark.

https://github.com/apache/spark/blob/branch-4.1/.github/workflows/build_infra_images_cache.yml

@pan3793 pan3793 merged commit 38f48fb into apache:trunk Apr 24, 2026
8 checks passed
@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented Apr 24, 2026

thanks, merged to trunk, I will manually trigger the first image cache, subsequent cache refreshes will happen automatically once dev-support/docker/** changes.

the manually triggered image cache jobs are: https://github.com/apache/hadoop/actions/runs/24869306467

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants