-
Notifications
You must be signed in to change notification settings - Fork 63
cleanup-acr-images-official pipeline is broken #2056
Copy link
Copy link
Open
Description
The cleanup-acr-images-official pipeline has been broken for some time. This issue should be closed only once the pipeline runs successfully.
Current failure modes:
- Ghost manifest crashes
GenerateEolAnnotationDataForAllImagesCommand- Handle 404 in GetAllImageDigestsFromRegistryAsync #2055.
GetAllManifestPropertiesAsynclists a manifest inbuild-staging/2881074/dotnet/nightly/aspnetthat no longer exists -GetManifestAsyncreturns 404 every time. Because the exception is unhandled, it crashes the entire Annotations job. This has been blocking the cleanup pipeline from completing at all. orasCLI is not authenticated in the Clean job - Revert "Remove unnecessary Docker login from CleanAcrImagesCommand" #2052.
ThepruneEolaction callsIsDigestAnnotatedForEol, which shells out tooras discover. The oras CLI has no credentials in the Clean job - every call returns 401 Unauthorized.- Duplicate annotations keep staging repos perpetually fresh.
pruneEol action deletes 0 images due tooras discoveroutput format change #2045 caused ImageBuilder to think that no image digests had EOL lifecycle annotations. This causes ImageBuilder to attach lots of duplicate annotations (See EOL images have multiple redundant end-of-life lifecycle annotations dotnet-docker#7121). Furthermore, staging image repos are pruned based on theirLastUpdatedOndate. Months of duplicate annotation writes have updatedLastUpdatedOnon staging repos, so theDelete build-staging/*step's 15-day age check never passes. 1,840 staging repos have accumulated. This also inflates the publish pipeline'sgenerateEolAnnotationDataForPublishto 78 minutes and 5M log lines, as it enumerates and checks all the accumulated digests and their bloated referrer lists. Clean public/dotnet* (PruneEol > 15d)step times out. - Migrate all ORAS CLI usage to OrasDotNet #2050.
The step callsoras discoveras a CLI subprocess for every manifest in the registry (14,426 in this run). Due to the duplicate annotation bug (oras v1.3.0 format change breaking annotation detection), each digest has ~23 redundant lifecycle annotations, so each call returns ~285 lines of JSON.ExecuteHelperunconditionally pipes all CLI stdout toConsole.Out, producing a 320MB/4.1M-line log. The step timed out after ~45 minutes without finishing.
List of related issues/PRs:
- Reference registry service connections in cleanup Clean job #2058
- Handle 404 in GetAllImageDigestsFromRegistryAsync #2055
- Cleanup pruneEol fails: "unauthorized: authentication required" #2051
- Migrate all ORAS CLI usage to OrasDotNet #2050
- Revert "Remove unnecessary Docker login from CleanAcrImagesCommand" #2052
- pruneEol action deletes 0 images due to
oras discoveroutput format change #2045 - Remove unnecessary Docker login from CleanAcrImagesCommand #2044
- In steps/clean-acr-images, use dryRunArg to fit pipeline usage #2035
- The
steps/clean-acr-images.ymltemplate always runs in dry run mode when called bystages/cleanup-acr-images.yml#2034 - Parallelize
GetAllImageDigestsFromRegistryAsync#1909 GetAllImageDigestsFromRegistryAsyncis suddenly extremely slow #1905- Fix cleanup-acr-images pipeline timeout by increasing EOL Annotations job timeout to 2 hours #1830
Reactions are currently unavailable
Metadata
Metadata
Assignees
Type
Projects
Status
In Progress