build.yml: fix eve job cache handling#5665
Conversation
The eve job was rebuilding arm64 packages from scratch instead of using the ones already built by the packages job. Investigating the root cause revealed several interrelated issues. 1. Redundant 'pkgs' target in the eve build command The eve job ran 'make pkgs eve', but the packages job already builds and caches all packages. Since the eve job restores the cache first, the 'pkgs' target should be a no-op. Removed it. 2. arm64 packages were never restored from cache The cache restore logic had a conditional: if the runner arch matched the matrix arch, it skipped both clearing the linuxkit cache and restoring the target arch cache. The assumption was that the first cache restore (for tool images) already had the right packages. But that first restore always fetched the amd64 generic cache — even on arm64 runners. So arm64 jobs were left with amd64 packages in the cache, and 'make pkgs' (issue #1) was silently rebuilding everything for arm64. 3. Tool images were hardcoded to amd64 The cache key for loading tool images (mkconf, mkimage-raw-efi, mkrootfs-squash, etc.) into docker was hardcoded to amd64. On arm64 runners this is wrong — they need arm64 tool images. Since for native builds the target cache already contains these tools, we now load them directly from the target cache. The two-cache dance (load tools from one arch, then restore packages from another) is only needed for riscv64 cross-builds on amd64. 4. The 'rt' platform maps to generic packages No build-rt.yml files exist anywhere in pkg/, so PLATFORM=rt produces identical packages to PLATFORM=generic. Rather than adding a redundant amd64/rt entry to the packages matrix, we map 'rt' to 'generic' in the cache key. The fix simplifies the eve job's cache handling: - Native builds (amd64, arm64): restore target cache, load tools, build - Cross-builds (riscv64): restore amd64 cache, load tools, clear, restore riscv64 cache, build The "Arch Runner is Matrix" step is removed as it is no longer used. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
|
|
||
| - name: Build EVE ${{ matrix.hv }}-${{ matrix.arch }}-${{ matrix.platform }} | ||
| run: | | ||
| make V=1 ROOTFS_VERSION="$VERSION" PLATFORM=${{ matrix.platform }} HV=${{ matrix.hv }} ZARCH=${{ matrix.arch }} pkgs eve # note that this already loads it into docker |
There was a problem hiding this comment.
@europaul , I'm not sure if you can get rid of make pkgs because all packages pointed by PKGS_DOCKER_LOAD in the Makefile must be loaded to docker in the host, so is not only about being available on linuxkit cache, they must be loaded into docker...
There was a problem hiding this comment.
ok, I saw you added the command to load these tools packages....
When I type 'make eve' in my workspace it does not rebuild all of pkg/* from source. I think the intent is that the workflow does that (unless I'm missing something) so I don't know how this can be considered a no-op. |
we first run |
Description
The eve job was rebuilding arm64 packages from scratch instead of using the ones already built by the packages job. Investigating the root cause revealed several interrelated issues.
Redundant 'pkgs' target in the eve build command
The eve job ran 'make pkgs eve', but the packages job already builds and caches all packages. Since the eve job restores the cache first, the 'pkgs' target should be a no-op. Removed it.
arm64 packages were never restored from cache
The cache restore logic had a conditional: if the runner arch matched the matrix arch, it skipped both clearing the linuxkit cache and restoring the target arch cache. The assumption was that the first cache restore (for tool images) already had the right packages. But that first restore always fetched the amd64 generic cache — even on arm64 runners. So arm64 jobs were left with amd64 packages in the cache, and 'make pkgs' (see issue number 1) was silently rebuilding everything for arm64.
Tool images were hardcoded to amd64
The cache key for loading tool images (mkconf, mkimage-raw-efi, mkrootfs-squash, etc.) into docker was hardcoded to amd64. On arm64 runners this is wrong — they need arm64 tool images. Since for native builds the target cache already contains these tools, we now load them directly from the target cache. The two-cache dance (load tools from one arch, then restore packages from another) is only needed for riscv64 cross-builds on amd64.
The 'rt' platform maps to generic packages
No build-rt.yml files exist anywhere in pkg/, so PLATFORM=rt produces identical packages to PLATFORM=generic. Rather than adding a redundant amd64/rt entry to the packages matrix, we map 'rt' to 'generic' in the cache key.
The fix simplifies the eve job's cache handling:
The "Arch Runner is Matrix" step is removed as it is no longer used.
PR dependencies
None
How to test and validate this PR
Run CI - the
evejob frombuild.ymlshould not rebuild the packages again and run quicker on every architecture.Changelog notes
N/A
PR Backports
Checklist