Skip to content
This repository was archived by the owner on Mar 20, 2026. It is now read-only.
This repository was archived by the owner on Mar 20, 2026. It is now read-only.

DB refresh: Decrease number of hard failures & notifications #1020

Description

@joeflack4

It is a bit of a burden to see like an average of 1-3 DB refresh GitHub action failures in the inbox each day.

These are usually due to:

  • Threshold for draft cset finalization expansion being resolved: We set to 2 hours, but sometimes it takes longer than that. It fails if it sees unresolved after 2 hours.
  • Timeouts: Enclave wrangler: Recovering from timeouts #1019
  • Other misc, temp infrastructural errors, like temp enclave err 406 , or GitHub or PyPi having a random blip.

What we should do:

  • Log every refresh error (and probably success, too) in the DB. Only raise an error in the GitHub action if the refresh is persistently failing. E.g. maybe failing for like 6 refreshes in a row (2 hours).
  • At that point, throw an error.
    • When thrown, look at the DB and collect all the errors that have occurred since the last time an error was reported, and print them all in the log. Maybe show them as a table, with one column for the datetime, another for the type of error / err name, and another for details. Sort by datetime or type --> datetime.
  • Then, throw no errors except for 1x/day. That is, if the refresh fails, and it sees that it has thrown an error in the last <24 hours, just exit quietly ("success"), because it's already thrown an error and we already know about it.

Metadata

Metadata

Assignees

Type

Fields

No fields configured for Task.

Projects

Status
3. Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions