join new etcd members as learners and auto-promote#7629
Conversation
02cbc2e to
6814586
Compare
|
|
||
| type JoinCondition struct { | ||
| // +kubebuilder:validation:Enum=Joined | ||
| // +kubebuilder:validation:Enum=Joined;Promoted |
There was a problem hiding this comment.
Hmm, perhaps we should remove the validation altogether so that adding a new condition doesn't require a CRD change. This is rather limiting for conditions in general: They are free-form and are supposed to offer flexibility, but then we're restricting them to well-known values, which defeats the whole point of not having bespoke status fields for each condition.
| if got == nil { | ||
| t.Fatalf("expected Promoted condition to be set") | ||
| } |
There was a problem hiding this comment.
We're usually using testify assertions across the unit tests.
| // Force a periodic resync even when no EtcdMember CR changes. Learners | ||
| // that join before their own controller is ready never have a matching | ||
| // CR, so the watch alone would never fire for them. | ||
| periodic := time.NewTicker(30 * time.Second) |
There was a problem hiding this comment.
We could merge this with the retry channel, and simply resync every 10 seconds (maybe adding wait.Jitter(...)), not only when a retry is necessary.
| } | ||
|
|
||
| // LearnerInfo describes a learner member as reported by etcd. | ||
| type LearnerInfo struct { |
There was a problem hiding this comment.
I was just about to add something similar for another etcd member issue I'm working on. What about merging the old member list with this? I don't think we need to have two separate methods. /xref 64b77ea.
ce62cb9 to
642cae9
Compare
642cae9 to
0df160f
Compare
|
Self-promoting is impossible in etcd, so we have to do in etcd_member_reconcile |
Adds new etcd members through the k0s join API as raft learners instead of voting members. The leader-elected EtcdMemberReconciler promotes them once etcd reports them caught up. This prevents an unreachable joiner (e.g. one advertising a wrong-interface peer URL) from breaking quorum on the existing cluster — notably the 1-node case where the surviving controller went from quorum=1 to quorum=2 and stalled waiting for an unreachable peer. Signed-off-by: amakhov <amakhov@mirantis.com>
0df160f to
c4a0ad9
Compare
Description
Adds new etcd members through the k0s join API as raft learners instead of voting members. The leader-elected EtcdMemberReconciler promotes them once etcd reports them caught up. This prevents an unreachable joiner (e.g. one advertising a wrong-interface peer URL) from breaking quorum on the existing cluster — notably the 1-node case where the surviving controller went from quorum=1 to quorum=2 and stalled waiting for an unreachable peer.
Fixes #7628
Type of change
How Has This Been Tested?
Checklist