feat(zedkube): configurable VMI descheduler for failback#5885
feat(zedkube): configurable VMI descheduler for failback#5885andrewd-zededa wants to merge 2 commits intolf-edge:masterfrom
Conversation
efc21bf to
40ee3b1
Compare
1339d30 to
d5b2810
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5885 +/- ##
==========================================
- Coverage 19.52% 17.10% -2.43%
==========================================
Files 19 474 +455
Lines 3021 85697 +82676
==========================================
+ Hits 590 14655 +14065
- Misses 2310 69522 +67212
- Partials 121 1520 +1399 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d5b2810 to
0be7e40
Compare
0f3fe0b to
e154710
Compare
|
rebased on latest master, testing locally for now |
e154710 to
c537c7b
Compare
c537c7b to
a275303
Compare
a275303 to
196c662
Compare
|
/rerun red |
claude-rsp
left a comment
There was a problem hiding this comment.
Automated review by Claude (Opus 4.7) — focused on bugs, races, and missing tests. Severity-tagged inline comments below.
Summary
The shell→Go migration is a sensible direction and the trim-and-recreate Job logic in TriggerDescheduler correctly avoids the multi-node boot race. The main concerns are around (1) event-list parsing correctness, (2) a startup ordering race that can permanently disable the feature, (3) clobbering persisted publication state, and (4) the absence of any unit tests for ~430 new lines.
Findings (linked inline)
- critical —
strings.Containswill match substrings likereboot; split on,instead. - critical — watcher reads
pubKubeConfigbefore GlobalConfig is guaranteed processed; a single bad select ordering disables the feature for the boot. - suggestion — initial Publish overwrites a
Persistent: truetopic with defaults, briefly clobbering K3sVersion overrides. - suggestion —
EnsureVMsDeschedulerAnnotateduses Get/Update with no Conflict retry; use Patch. - suggestion — verify the descheduler annotation actually reaches the virt-launcher Pod (descheduler evicts Pods, not VMIs).
- suggestion — no unit tests for new kubeapi/zedkube code;
fake.NewSimpleClientsetwould cover most of it. - suggestion —
deschedulerOnBootWatcherhas no cancellation path. - nit — shellcheck
source=annotations downgraded to/dev/null, disabling cross-file lint.
ace1fd7 to
d9d85e7
Compare
|
resolved conflict, rebased on latest master |
Replaces the shell-based descheduler trigger with a Go implementation
that fires the Kubernetes descheduler Job once per boot when the new
"kubernetes.vmi.deschedule.events" config key contains "boot".
kubeapi/descheduler.go (new):
- IsDeschedulerReady: returns (false, nil) until the local node is
Ready and schedulable, all Longhorn daemonsets are ready, and
(when present) the kubevirt CR reports Available.
- TriggerDescheduler: Create-first idempotent job management — skips
if an active Job already exists (handles multi-node boot race),
otherwise deletes any stale completed/failed Job and recreates.
Calls ensureDeschedulerSetup to Get-or-Create-or-Update the
descheduler ServiceAccount, ClusterRole, ClusterRoleBinding, and
policy ConfigMap before each run.
- EnsureVMsDeschedulerAnnotated: idempotently stamps
descheduler.alpha.kubernetes.io/evict=true on every VMIRS template
and live VMI in eve-kube-app namespace using StrategicMergePatch
to avoid List→Update conflicts. No-op in base-k3s mode.
- Inner functions accept injectable kubernetes.Interface and
kubecli.KubevirtClient for unit testing with fake clients.
- Stub added to nokube.go for non-kube builds.
kubeapi/descheduler_test.go (new):
- 9 unit tests covering IsDeschedulerReady (unschedulable, not-ready),
ensureDeschedulerSetup (create-on-missing, update-on-existing),
TriggerDescheduler (active job skips delete, recreates on
completion), and EnsureVMsDeschedulerAnnotated (already annotated,
patches missing VMIRS annotation, patches missing VMI annotation).
kubeapi/kubeapi.go:
- waitForLonghornReady accepts kubernetes.Interface to support
fake client injection in tests.
zedkube/descheduler.go (new):
- deschedulerOnBootWatcher goroutine polls IsDeschedulerReady every
15s then calls TriggerDescheduler once and exits.
- If descheduler is still unable to run after 30 minutes, then the
watcher exits without running.
zedkube/zedkube.go:
- deschedulerOnBootWatcher is launched after WaitForKubernetes
returns. GlobalConfig (which controls OnBoot) is processed in the
early ENCC wait loop before WaitForKubernetes, so pubKubeConfig
already reflects operator intent and nodeName is guaranteed set at
the launch site. The narrow window where config arrives exactly as
k3s becomes ready (post-WaitForKubernetes) is documented but not
handled.
- handleVmiDescheduleEventsOverride parses the CSV config value using
exact token matching (strings.Split + TrimSpace) and re-publishes
KubeConfig on change.
- deschedulerOnBootStarted bool ensures the watcher goroutine is
launched at most once per boot.
hypervisor/kubevirt.go:
- CreateReplicaVMIConfig stamps DeschedulerEvictAnnotation on the
VMIRS pod template so new app VMs are evictable without a
separate pass.
domainmgr/domainmgr.go:
- Calls EnsureVMsDeschedulerAnnotated at kubevirt-mode startup to
retroactively annotate VMIRSes and VMIs that pre-date this change.
types/global.go + types/zedkubetypes.go:
- KubernetesVmiDescheduleEvents config key
("kubernetes.vmi.deschedule.events", default "") and
VmiDescheduleEventBoot = "boot" constant.
- VmiDescheduleConfig{OnBoot bool} struct;
KubeConfig.VmiDescheduleEvents field.
- Documented in docs/CONFIG-PROPERTIES.md.
docs/failover.md:
- Documents the Go-based deschedulerOnBootWatcher goroutine, the
opt-in config key, IsDeschedulerReady prerequisites, and
TriggerDescheduler's Create-first idempotent job management
replacing the removed shell function Update_RunDeschedulerOnBoot.
pkg/kube/:
- Removes descheduler-job.yaml and Update_RunDeschedulerOnBoot from
cluster-update.sh and its two call sites in cluster-init.sh.
- Removes descheduler-job.yaml COPY from Dockerfile.
- integer comparison quoting cleanup in cluster-init.sh
- indentation fix in descheduler-utils.sh.
pkg/kube/Dockerfile, pkg/pillar/cmd/domainmgr/domainmgr.go,
pkg/pillar/cmd/zedkube/zedkube.go, pkg/pillar/docs/failover.md:
- Various yetus revive and codespell issues.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andrew Durbin <andrewd@zededa.com>
For new eve-k go tests. Signed-off-by: Andrew Durbin <andrewd@zededa.com>
d9d85e7 to
7e7bb62
Compare
|
Addressed all code spell and revive issues which yetus showed, awaiting next run. |
Description
Replaces the shell-based descheduler trigger with a Go implementation
that fires the Kubernetes descheduler Job once per boot when the new
"kubernetes.vmi.deschedule.events" config key contains "boot".
kubeapi/descheduler.go (new):
Ready and schedulable, all Longhorn daemonsets are ready, and
(when present) the kubevirt CR reports Available.
if an active Job already exists (handles multi-node boot race),
otherwise deletes any stale completed/failed Job and recreates.
Calls ensureDeschedulerSetup to Get-or-Create-or-Update the
descheduler ServiceAccount, ClusterRole, ClusterRoleBinding, and
policy ConfigMap before each run.
descheduler.alpha.kubernetes.io/evict=true on every VMIRS template
and live VMI in eve-kube-app namespace using StrategicMergePatch
to avoid List→Update conflicts. No-op in base-k3s mode.
kubecli.KubevirtClient for unit testing with fake clients.
kubeapi/descheduler_test.go (new):
ensureDeschedulerSetup (create-on-missing, update-on-existing),
TriggerDescheduler (active job skips delete, recreates on
completion), and EnsureVMsDeschedulerAnnotated (already annotated,
patches missing VMIRS annotation, patches missing VMI annotation).
kubeapi/kubeapi.go:
fake client injection in tests.
zedkube/descheduler.go (new):
15s then calls TriggerDescheduler once and exits.
zedkube/zedkube.go:
returns. GlobalConfig (which controls OnBoot) is processed in the
early ENCC wait loop before WaitForKubernetes, so pubKubeConfig
already reflects operator intent and nodeName is guaranteed set at
the launch site. The narrow window where config arrives exactly as
k3s becomes ready (post-WaitForKubernetes) is documented but not
handled.
exact token matching (strings.Split + TrimSpace) and re-publishes
KubeConfig on change.
launched at most once per boot.
hypervisor/kubevirt.go:
VMIRS pod template so new app VMs are evictable without a
separate pass.
domainmgr/domainmgr.go:
retroactively annotate VMIRSes and VMIs that pre-date this change.
types/global.go + types/zedkubetypes.go:
("kubernetes.vmi.deschedule.events", default "") and
VmiDescheduleEventBoot = "boot" constant.
KubeConfig.VmiDescheduleEvents field.
docs/failover.md:
opt-in config key, IsDeschedulerReady prerequisites, and
TriggerDescheduler's Create-first idempotent job management
replacing the removed shell function Update_RunDeschedulerOnBoot.
pkg/kube/:
cluster-update.sh and its two call sites in cluster-init.sh.
in cluster-init.sh; indentation fix in descheduler-utils.sh.
PR dependencies
Not a blocking dependency but this (#5846) should be included to let tests run locally.
How to test and validate this PR
Changelog notes
Configuration property to enable per edge-node app failback triggers.
PR Backports
Checklist
And the last but not least:
check them.
Please, check the boxes above after submitting the PR in interactive mode.