Skip to content

feat(console): summarize batch notifications by table#1323

Open
absorbb wants to merge 1 commit into
newjitsufrom
feat/newjitsu/notifications-summarize-by-tables
Open

feat(console): summarize batch notifications by table#1323
absorbb wants to merge 1 commit into
newjitsufrom
feat/newjitsu/notifications-summarize-by-tables

Conversation

@absorbb
Copy link
Copy Markdown
Contributor

@absorbb absorbb commented May 22, 2026

Summary

  • Batch notifications for connections that write to multiple tables now collapse into a single connection-level notification with a new PARTIAL status when only some tables are failing (instead of one notification per table).
  • The behavior is controlled by a new summarizeBatchNotificationsByTable toggle that defaults to true and is surfaced both per-user (My Email Notifications) and per Slack channel.
  • Per-table StatusChange rows are unchanged — only the notification routing is aggregated, so the bulker dashboard view is preserved.

Why

Many clients push to multiple tables through one connection. When a problem occurs at the connection level (e.g. destination broken), the cron at /api/admin/notifications produced one email/Slack per table — flooding inboxes for a single underlying issue.

How

  • UserNotificationsPreferences and NotificationChannel schemas gain summarizeBatchNotificationsByTable: boolean (default true); a Switch is added to the per-user settings and to the workspace Slack channel editor.
  • In processStatusChanges (pages/api/admin/notifications.ts):
    • The existing per-table channel loop now skips batch + per-table entries for channels that opt into summarization.
    • A new second pass per batch actor computes the connection-level aggregate from current per-table entity statuses:
      • all SUCCESSSUCCESS (rendered as FIRST_RUN/RECOVERED depending on prior state),
      • all non-SUCCESSFAILED,
      • mixed → PARTIAL with streamsFailed="N of M" and a per-table breakdown.
    • NotificationState for the aggregate is stored against tableName="", so flapping/recurring de-dup keeps working at the connection grain.
    • RECOVERED is detected via one Prisma lookup of the previously-notified statusChangeId.

Test plan

  • Hit /api/admin/notifications?dryRun=true against a workspace whose batch connection has tables in mixed SUCCESS/FAILED state — verify exactly one PARTIAL notification per channel.
  • Toggle "Summarize Batch Notifications by Table" off for a user and rerun — verify per-table notifications resume for that user only.
  • All tables of a previously partial/failed connection return to SUCCESS — verify one RECOVERED notification per channel.
  • Existing workspace Slack channels saved before this PR default to summarize=true (handled in loadNotificationsChannels).
  • Sync and dead-letter notifications are unaffected.

🤖 Generated with Claude Code

When a batch connection writes to multiple tables, a connection-level
problem (e.g. broken destination) used to produce one email/Slack per
table — clients with many tables got tons of duplicates. Now batch
notifications aggregate across tables and emit a single PARTIAL
notification per channel when some tables are failing and others are
succeeding (FAILED when all fail, RECOVERED when all return to SUCCESS).

Behavior is controlled by a new "Summarize Batch Notifications by Table"
toggle, enabled by default, surfaced both per-user (in My Email
Notifications) and per Slack channel.

Per-table StatusChange rows are still written as before — only the
notification delivery is aggregated, so the bulker dashboard view is
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@jitsu-code-review jitsu-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a couple of correctness issues in the new aggregate batch-notification flow that can suppress or misreport recovery notifications.

if (view.aggStatus === "SUCCESS") {
// Only notify on SUCCESS if it represents a recovery from a prior failure.
try {
const prevRow = await db.prisma().statusChange.findUnique({ where: { id: state.statusChangeId } });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state.statusChangeId stores view.aggMaxId (a real per-table StatusChange id), not the previous aggregate status. That means recovery detection can be wrong: if the previous aggregate was PARTIAL but the max-id table row was SUCCESS, this branch suppresses RECOVERED (doNotify = false) even though the connection actually recovered.

aggIncidentDetails,
aggMaxId: maxId,
aggTimestamp: maxIdEntity.timestamp!,
aggStartedAt: earliestIncidentStart ?? maxIdEntity.startedAt!,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For failedCount === 0 (the recovery case), earliestIncidentStart is always undefined, so aggStartedAt falls back to maxIdEntity.startedAt (the latest success timestamp). Later RECOVERED uses this as incidentStartedAt, so the email reports the incident as starting at recovery time instead of when failures began.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant