Kubernetes nodes not working, runtime network not ready

Question

Kubernetes nodes not working, runtime network not ready

kubernetes

I messed up and deleted my nodes using kubectl delete node thinking that was how to restart nodes, but now they aren't working. I have reduced the number of nodes in the pool and increased it again to make new nodes but now the new nodes are all NotReady when I look at their status. Has anyone seen that and been able to fix it in Linode's managed Kubernetes environment?

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 19 Aug 2020 17:45:16 -0400   Wed, 19 Aug 2020 17:24:42 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 19 Aug 2020 17:45:16 -0400   Wed, 19 Aug 2020 17:24:42 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 19 Aug 2020 17:45:16 -0400   Wed, 19 Aug 2020 17:24:42 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 19 Aug 2020 17:45:16 -0400   Wed, 19 Aug 2020 17:24:42 -0400   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

1 Reply

Chris_H · Answer 1 · Aug. 20, 2020, 1:13 p.m.

Chris_H 4 years, 4 months ago Linode Staff

Hey there! Kubernetes can be pretty dense on the best of days, and punishingly so on the worst, so I'd be more than happy to provide a bit of context around what's happening with your LKE cluster.

The Issue

Though there are some Kubernetes resources that are gracefully removed when deleted through kubectl (such as pods and services), this is not the case for nodes -- when kubectl delete <nodeName> is run, that node is ungracefully removed, which is to say that the node is deleted regardless of the processes it's handling. This, as you've noticed, can leave your cluster in a pretty wonky state.

In this particular instance, it's likely that your cluster's webhook controller server (responsible for interacting with your cluster's kube-apiserver and ensuring that the proper services are able to be deployed) was deleted along with its parent node which deadlocked your cluster. The last line in the output you shared refers to your cluster's inability to deploy Calico nodes which manage your cluster's internal networking.

The Solution

Part of your cluster's deadlock condition can be found in its aforementioned webhook server -- this being the case, you should be able to resolve the deadlock by deleting that resource. You can do so by first saving your webhooks with kubectl get mutatingwebhookconfigurations -oyaml > mutatingwebhooks.txt (in case they're needed later), and then deleting them with kubectl delete mutatingwebhookconfigurations <NAME>. After doing so, you should be able to add new nodes to your cluster without issue. If you happen to notice otherwise, though, feel free to follow up here with the output from kubectl get events to get some assistance from the rest of the Community!

Compute

Storage

Networking

Databases

Services

Developer Tools

Industries

Pricing

Community

Engage With Us

Kubernetes nodes not working, runtime network not ready

1 Reply

Reply

Tips: