Skip to content

Management controllers generate excessive 409s by using Create→AlreadyExists→Update instead of server-side apply #6155

@alexmt

Description

@alexmt

Description

The project and serviceaccount management controllers create RBAC resources (Roles, RoleBindings, ClusterRoles, ClusterRoleBindings, ServiceAccounts) using a CreateAlreadyExistsUpdate fallback pattern. In clusters with many projects, this generates a high volume of unnecessary 409 errors on every reconcile cycle.

Affected files:

  • pkg/controller/management/serviceaccounts/serviceaccounts.goensureControllerPermissions
  • pkg/controller/management/projects/projects.goensureSystemPermissions, ensureControllerPermissions, ensureDefaultUserRoles, ensureExtendedPermissions

Current pattern

if err := r.client.Create(ctx, roleBinding); err != nil {
    if !apierrors.IsAlreadyExists(err) {
        return err
    }
    r.client.Update(ctx, roleBinding) // always fires if resource exists
}

The problem

  • ensureControllerPermissions in the serviceaccount reconciler iterates over all project namespaces and attempts to Create a RoleBinding in each. Since the RoleBindings already exist after first reconcile, every subsequent reconcile generates N × 409s (where N = number of projects).
  • The project reconciler does the same across 5+ resource types on every reconciliation, which is triggered frequently by Warehouse/Stage health condition changes.

Observed impact

On a cluster with active Kargo usage, the k8s API server shows ~800 RBAC-related 409s per minute accumulating continuously:

26,589  rolebindings POST → 409
15,951  roles POST → 409
15,951  serviceaccounts POST → 409
 5,318  clusterroles POST → 409
 5,318  clusterrolebindings POST → 409

Every 409 is a wasted API server round trip (etcd read with no write). The subsequent Update then performs an unconditional etcd write even when the resource hasn't changed.

Suggested fix

Replace the Create/Update pattern with server-side apply:

if err := r.client.Patch(ctx, roleBinding, client.Apply,
    client.ForceOwnership,
    client.FieldOwner("kargo"),
); err != nil {
    return fmt.Errorf("error applying RoleBinding %q in namespace %q: %w",
        roleBinding.Name, roleBinding.Namespace, err)
}

Benefits:

  • Single API call instead of two
  • No 409s — server handles create-or-update atomically
  • No etcd write when the object hasn't changed (unlike unconditional Update)
  • Already the idiomatic pattern for Kubernetes controllers

Metadata

Metadata

Assignees

Labels

area/management-controllerAffects the controller that manages the Kargo control plane itselfkind/bugSomething isn't working as intended; If unsure that something IS a bug, start a discussion insteadkind/refactorNon-functional changes to implementation detailspriority/normalThis is the priority for most work

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions