Skip to content

cleanup-acr-images-official pipeline is broken #2056

@lbussell

Description

@lbussell

The cleanup-acr-images-official pipeline has been broken for some time. This issue should be closed only once the pipeline runs successfully.

Current failure modes:

  1. Ghost manifest crashes GenerateEolAnnotationDataForAllImagesCommand - Handle 404 in GetAllImageDigestsFromRegistryAsync #2055.
    GetAllManifestPropertiesAsync lists a manifest in build-staging/2881074/dotnet/nightly/aspnet that no longer exists - GetManifestAsync returns 404 every time. Because the exception is unhandled, it crashes the entire Annotations job. This has been blocking the cleanup pipeline from completing at all.
  2. oras CLI is not authenticated in the Clean job - Revert "Remove unnecessary Docker login from CleanAcrImagesCommand" #2052.
    The pruneEol action calls IsDigestAnnotatedForEol, which shells out to oras discover. The oras CLI has no credentials in the Clean job - every call returns 401 Unauthorized.
  3. Duplicate annotations keep staging repos perpetually fresh.
    pruneEol action deletes 0 images due to oras discover output format change #2045 caused ImageBuilder to think that no image digests had EOL lifecycle annotations. This causes ImageBuilder to attach lots of duplicate annotations (See EOL images have multiple redundant end-of-life lifecycle annotations dotnet-docker#7121). Furthermore, staging image repos are pruned based on their LastUpdatedOn date. Months of duplicate annotation writes have updated LastUpdatedOn on staging repos, so the Delete build-staging/* step's 15-day age check never passes. 1,840 staging repos have accumulated. This also inflates the publish pipeline's generateEolAnnotationDataForPublish to 78 minutes and 5M log lines, as it enumerates and checks all the accumulated digests and their bloated referrer lists.
  4. Clean public/dotnet* (PruneEol > 15d) step times out. - Migrate all ORAS CLI usage to OrasDotNet #2050.
    The step calls oras discover as a CLI subprocess for every manifest in the registry (14,426 in this run). Due to the duplicate annotation bug (oras v1.3.0 format change breaking annotation detection), each digest has ~23 redundant lifecycle annotations, so each call returns ~285 lines of JSON. ExecuteHelper unconditionally pipes all CLI stdout to Console.Out, producing a 320MB/4.1M-line log. The step timed out after ~45 minutes without finishing.

List of related issues/PRs:

Metadata

Metadata

Assignees

Type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions