-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Reference: #2661
New nodes that are brought up into the cluster experience a delay in route programming, while already being marked as 'Ready' by Kubernetes (kubelet on the 'Node' sets this). New pods. that are scheduled on these nodes prior to routes being programmed, will not have any connectivity to the rest of the cluster until the routes are programmed several minutes later.
Symptoms of this are:
- DNS resolution fails - the Pod can not connect to kube-dns through the Service IP
- Service or Pod IPs do not function as intended
This is the result of Calico/BIRD trying to peer with a Node that doesn't exist.
The BIRD documentation states (http://bird.network.cz/?get_doc&f=bird-6.html#ss6.3):
**graceful restart time number**
The restart time is announced in the BGP graceful restart capability and specifies how long the neighbor would wait for the BGP session to re-establish after a restart before deleting stale routes. Default: 120 seconds.
meaning that it will take 120s (since default is not overriden) before the routes will be programmed due to a graceful-restart timer.
This is caused by nodes in the Calico etcd nodestore no longer existing. Due to the ephemeral nature of AWS EC2 instances, new nodes are brought up with different hostnames, and nodes that are taken offline remain in the Calico nodestore. This is unlike most datacentre deployments where the hostnames are mostly static in a cluster.
To solve this, we must keep the nodes in sync between Kubernetes and Calico. To do, we can write a node controller that watches for nodes that are removed and reflect that in Calico. We also need a periodic sync to make sure that missed events are accounted for.
The controller needs to be deployed with kops when Calico is set as the network provider.
There's a proof of concept controller, but not production ready:
https://github.com/caseydavenport/calico-node-controller
Caution: At the time of writing this issue, running this controller has lead to all nodes being removed in the Calico's datastore. It could be something that I've done, and nothing with the node controller itself. However, I'd say to run with extreme caution in production.
EDIT: Though I have not tested this, I expect the move towards using Kubernetes API as the datastore for Calico to solve this issue, as there will not be a need to sync a list of nodes.