Skip to content

feat(zedkube): configurable VMI descheduler for failback#5885

Open
andrewd-zededa wants to merge 2 commits intolf-edge:masterfrom
andrewd-zededa:eve-k-deschedule-vmis
Open

feat(zedkube): configurable VMI descheduler for failback#5885
andrewd-zededa wants to merge 2 commits intolf-edge:masterfrom
andrewd-zededa:eve-k-deschedule-vmis

Conversation

@andrewd-zededa
Copy link
Copy Markdown
Contributor

@andrewd-zededa andrewd-zededa commented Apr 30, 2026

Description

Replaces the shell-based descheduler trigger with a Go implementation
that fires the Kubernetes descheduler Job once per boot when the new
"kubernetes.vmi.deschedule.events" config key contains "boot".

kubeapi/descheduler.go (new):

  • IsDeschedulerReady: returns (false, nil) until the local node is
    Ready and schedulable, all Longhorn daemonsets are ready, and
    (when present) the kubevirt CR reports Available.
  • TriggerDescheduler: Create-first idempotent job management — skips
    if an active Job already exists (handles multi-node boot race),
    otherwise deletes any stale completed/failed Job and recreates.
    Calls ensureDeschedulerSetup to Get-or-Create-or-Update the
    descheduler ServiceAccount, ClusterRole, ClusterRoleBinding, and
    policy ConfigMap before each run.
  • EnsureVMsDeschedulerAnnotated: idempotently stamps
    descheduler.alpha.kubernetes.io/evict=true on every VMIRS template
    and live VMI in eve-kube-app namespace using StrategicMergePatch
    to avoid List→Update conflicts. No-op in base-k3s mode.
  • Inner functions accept injectable kubernetes.Interface and
    kubecli.KubevirtClient for unit testing with fake clients.
  • Stub added to nokube.go for non-kube builds.

kubeapi/descheduler_test.go (new):

  • 9 unit tests covering IsDeschedulerReady (unschedulable, not-ready),
    ensureDeschedulerSetup (create-on-missing, update-on-existing),
    TriggerDescheduler (active job skips delete, recreates on
    completion), and EnsureVMsDeschedulerAnnotated (already annotated,
    patches missing VMIRS annotation, patches missing VMI annotation).

kubeapi/kubeapi.go:

  • waitForLonghornReady accepts kubernetes.Interface to support
    fake client injection in tests.

zedkube/descheduler.go (new):

  • deschedulerOnBootWatcher goroutine polls IsDeschedulerReady every
    15s then calls TriggerDescheduler once and exits.

zedkube/zedkube.go:

  • deschedulerOnBootWatcher is launched after WaitForKubernetes
    returns. GlobalConfig (which controls OnBoot) is processed in the
    early ENCC wait loop before WaitForKubernetes, so pubKubeConfig
    already reflects operator intent and nodeName is guaranteed set at
    the launch site. The narrow window where config arrives exactly as
    k3s becomes ready (post-WaitForKubernetes) is documented but not
    handled.
  • handleVmiDescheduleEventsOverride parses the CSV config value using
    exact token matching (strings.Split + TrimSpace) and re-publishes
    KubeConfig on change.
  • deschedulerOnBootStarted bool ensures the watcher goroutine is
    launched at most once per boot.

hypervisor/kubevirt.go:

  • CreateReplicaVMIConfig stamps DeschedulerEvictAnnotation on the
    VMIRS pod template so new app VMs are evictable without a
    separate pass.

domainmgr/domainmgr.go:

  • Calls EnsureVMsDeschedulerAnnotated at kubevirt-mode startup to
    retroactively annotate VMIRSes and VMIs that pre-date this change.

types/global.go + types/zedkubetypes.go:

  • KubernetesVmiDescheduleEvents config key
    ("kubernetes.vmi.deschedule.events", default "") and
    VmiDescheduleEventBoot = "boot" constant.
  • VmiDescheduleConfig{OnBoot bool} struct;
    KubeConfig.VmiDescheduleEvents field.
  • Documented in docs/CONFIG-PROPERTIES.md.

docs/failover.md:

  • Documents the Go-based deschedulerOnBootWatcher goroutine, the
    opt-in config key, IsDeschedulerReady prerequisites, and
    TriggerDescheduler's Create-first idempotent job management
    replacing the removed shell function Update_RunDeschedulerOnBoot.

pkg/kube/:

  • Removes descheduler-job.yaml and Update_RunDeschedulerOnBoot from
    cluster-update.sh and its two call sites in cluster-init.sh.
  • Removes descheduler-job.yaml COPY from Dockerfile.
  • shellcheck source annotations and integer comparison quoting cleanup
    in cluster-init.sh; indentation fix in descheduler-utils.sh.

PR dependencies

Not a blocking dependency but this (#5846) should be included to let tests run locally.

How to test and validate this PR

  • deploy three HV=k eve nodes
  • configure EdgeNodeClusterConfig to create a three node cluster
  • deploy one app instance to the cluster without strict node affinity configured
  • set the config property to enable failback 'kubernetes.vmi.deschedule.events:boot'
  • initiate a power failure to the node hosting the app instance
  • wait for the app instance to come ready/running on another node
  • restore power to the node
  • wait for the node to boot and after all pods are ready, the rescheduler should evict the app and allow it to reschedule to the original node

Changelog notes

Configuration property to enable per edge-node app failback triggers.

PR Backports

  • 16.0-stable: If requested
  • 14.5-stable: No, as the feature is not available there.
  • 13.4-stable: No, as the feature is not available there.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch 2 times, most recently from efc21bf to 40ee3b1 Compare April 30, 2026 18:37
@andrewd-zededa andrewd-zededa changed the title feat(zedkube): on-boot VMI descheduler with typed deschedule-event co… feat(zedkube): lay foundation for event-driven VMI descheduler failback Apr 30, 2026
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch 2 times, most recently from 1339d30 to d5b2810 Compare April 30, 2026 21:53
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.10%. Comparing base (2281599) to head (196c662).
⚠️ Report is 647 commits behind head on master.

Files with missing lines Patch % Lines
pkg/pillar/cmd/domainmgr/domainmgr.go 0.00% 2 Missing ⚠️
pkg/pillar/kubeapi/nokube.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5885      +/-   ##
==========================================
- Coverage   19.52%   17.10%   -2.43%     
==========================================
  Files          19      474     +455     
  Lines        3021    85697   +82676     
==========================================
+ Hits          590    14655   +14065     
- Misses       2310    69522   +67212     
- Partials      121     1520    +1399     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread docs/CONFIG-PROPERTIES.md
Comment thread pkg/pillar/hypervisor/kubevirt.go Outdated
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from d5b2810 to 0be7e40 Compare May 1, 2026 22:59
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch 2 times, most recently from 0f3fe0b to e154710 Compare May 4, 2026 20:29
@andrewd-zededa
Copy link
Copy Markdown
Contributor Author

rebased on latest master, testing locally for now

@andrewd-zededa andrewd-zededa changed the title feat(zedkube): lay foundation for event-driven VMI descheduler failback feat(zedkube): configurable event-driven VMI descheduler failback May 4, 2026
@andrewd-zededa andrewd-zededa changed the title feat(zedkube): configurable event-driven VMI descheduler failback feat(zedkube): configurable VMI descheduler failback May 4, 2026
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from e154710 to c537c7b Compare May 4, 2026 23:11
@andrewd-zededa andrewd-zededa changed the title feat(zedkube): configurable VMI descheduler failback feat(zedkube): configurable VMI descheduler for failback May 4, 2026
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from c537c7b to a275303 Compare May 4, 2026 23:36
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from a275303 to 196c662 Compare May 5, 2026 18:02
@andrewd-zededa andrewd-zededa marked this pull request as ready for review May 5, 2026 18:04
@andrewd-zededa
Copy link
Copy Markdown
Contributor Author

/rerun red

Copy link
Copy Markdown

@claude-rsp claude-rsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review by Claude (Opus 4.7) — focused on bugs, races, and missing tests. Severity-tagged inline comments below.

Summary

The shell→Go migration is a sensible direction and the trim-and-recreate Job logic in TriggerDescheduler correctly avoids the multi-node boot race. The main concerns are around (1) event-list parsing correctness, (2) a startup ordering race that can permanently disable the feature, (3) clobbering persisted publication state, and (4) the absence of any unit tests for ~430 new lines.

Findings (linked inline)

  • criticalstrings.Contains will match substrings like reboot; split on , instead.
  • critical — watcher reads pubKubeConfig before GlobalConfig is guaranteed processed; a single bad select ordering disables the feature for the boot.
  • suggestion — initial Publish overwrites a Persistent: true topic with defaults, briefly clobbering K3sVersion overrides.
  • suggestionEnsureVMsDeschedulerAnnotated uses Get/Update with no Conflict retry; use Patch.
  • suggestion — verify the descheduler annotation actually reaches the virt-launcher Pod (descheduler evicts Pods, not VMIs).
  • suggestion — no unit tests for new kubeapi/zedkube code; fake.NewSimpleClientset would cover most of it.
  • suggestiondeschedulerOnBootWatcher has no cancellation path.
  • nit — shellcheck source= annotations downgraded to /dev/null, disabling cross-file lint.

Comment thread pkg/pillar/cmd/zedkube/zedkube.go
Comment thread pkg/pillar/cmd/zedkube/descheduler.go Outdated
Comment thread pkg/pillar/cmd/zedkube/zedkube.go Outdated
Comment thread pkg/pillar/kubeapi/descheduler.go Outdated
Comment thread pkg/pillar/hypervisor/kubevirt.go
Comment thread pkg/pillar/kubeapi/descheduler.go
Comment thread pkg/pillar/cmd/zedkube/descheduler.go
Comment thread pkg/kube/cluster-init.sh Outdated
@andrewd-zededa andrewd-zededa changed the title feat(zedkube): configurable VMI descheduler for failback feat(zedkube): configurable VMI descheduler for failback May 7, 2026
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch 2 times, most recently from ace1fd7 to d9d85e7 Compare May 7, 2026 20:03
@andrewd-zededa
Copy link
Copy Markdown
Contributor Author

resolved conflict, rebased on latest master

andrewd-zededa and others added 2 commits May 7, 2026 15:25
Replaces the shell-based descheduler trigger with a Go implementation
that fires the Kubernetes descheduler Job once per boot when the new
"kubernetes.vmi.deschedule.events" config key contains "boot".

kubeapi/descheduler.go (new):
  - IsDeschedulerReady: returns (false, nil) until the local node is
    Ready and schedulable, all Longhorn daemonsets are ready, and
    (when present) the kubevirt CR reports Available.
  - TriggerDescheduler: Create-first idempotent job management — skips
    if an active Job already exists (handles multi-node boot race),
    otherwise deletes any stale completed/failed Job and recreates.
    Calls ensureDeschedulerSetup to Get-or-Create-or-Update the
    descheduler ServiceAccount, ClusterRole, ClusterRoleBinding, and
    policy ConfigMap before each run.
  - EnsureVMsDeschedulerAnnotated: idempotently stamps
    descheduler.alpha.kubernetes.io/evict=true on every VMIRS template
    and live VMI in eve-kube-app namespace using StrategicMergePatch
    to avoid List→Update conflicts. No-op in base-k3s mode.
  - Inner functions accept injectable kubernetes.Interface and
    kubecli.KubevirtClient for unit testing with fake clients.
  - Stub added to nokube.go for non-kube builds.

kubeapi/descheduler_test.go (new):
  - 9 unit tests covering IsDeschedulerReady (unschedulable, not-ready),
    ensureDeschedulerSetup (create-on-missing, update-on-existing),
    TriggerDescheduler (active job skips delete, recreates on
    completion), and EnsureVMsDeschedulerAnnotated (already annotated,
    patches missing VMIRS annotation, patches missing VMI annotation).

kubeapi/kubeapi.go:
  - waitForLonghornReady accepts kubernetes.Interface to support
    fake client injection in tests.

zedkube/descheduler.go (new):
  - deschedulerOnBootWatcher goroutine polls IsDeschedulerReady every
    15s then calls TriggerDescheduler once and exits.
  - If descheduler is still unable to run after 30 minutes, then the
    watcher exits without running.

zedkube/zedkube.go:
  - deschedulerOnBootWatcher is launched after WaitForKubernetes
    returns. GlobalConfig (which controls OnBoot) is processed in the
    early ENCC wait loop before WaitForKubernetes, so pubKubeConfig
    already reflects operator intent and nodeName is guaranteed set at
    the launch site. The narrow window where config arrives exactly as
    k3s becomes ready (post-WaitForKubernetes) is documented but not
    handled.
  - handleVmiDescheduleEventsOverride parses the CSV config value using
    exact token matching (strings.Split + TrimSpace) and re-publishes
    KubeConfig on change.
  - deschedulerOnBootStarted bool ensures the watcher goroutine is
    launched at most once per boot.

hypervisor/kubevirt.go:
  - CreateReplicaVMIConfig stamps DeschedulerEvictAnnotation on the
    VMIRS pod template so new app VMs are evictable without a
    separate pass.

domainmgr/domainmgr.go:
  - Calls EnsureVMsDeschedulerAnnotated at kubevirt-mode startup to
    retroactively annotate VMIRSes and VMIs that pre-date this change.

types/global.go + types/zedkubetypes.go:
  - KubernetesVmiDescheduleEvents config key
    ("kubernetes.vmi.deschedule.events", default "") and
    VmiDescheduleEventBoot = "boot" constant.
  - VmiDescheduleConfig{OnBoot bool} struct;
    KubeConfig.VmiDescheduleEvents field.
  - Documented in docs/CONFIG-PROPERTIES.md.

docs/failover.md:
  - Documents the Go-based deschedulerOnBootWatcher goroutine, the
    opt-in config key, IsDeschedulerReady prerequisites, and
    TriggerDescheduler's Create-first idempotent job management
    replacing the removed shell function Update_RunDeschedulerOnBoot.

pkg/kube/:
  - Removes descheduler-job.yaml and Update_RunDeschedulerOnBoot from
    cluster-update.sh and its two call sites in cluster-init.sh.
  - Removes descheduler-job.yaml COPY from Dockerfile.
  - integer comparison quoting cleanup in cluster-init.sh
  - indentation fix in descheduler-utils.sh.

pkg/kube/Dockerfile, pkg/pillar/cmd/domainmgr/domainmgr.go,
pkg/pillar/cmd/zedkube/zedkube.go, pkg/pillar/docs/failover.md:
  - Various yetus revive and codespell issues.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andrew Durbin <andrewd@zededa.com>
For new eve-k go tests.

Signed-off-by: Andrew Durbin <andrewd@zededa.com>
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from d9d85e7 to 7e7bb62 Compare May 7, 2026 21:27
@andrewd-zededa
Copy link
Copy Markdown
Contributor Author

Addressed all code spell and revive issues which yetus showed, awaiting next run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants