Kubernetes nodes not working, runtime network not ready

I messed up and deleted my nodes using kubectl delete node thinking that was how to restart nodes, but now they aren't working. I have reduced the number of nodes in the pool and increased it again to make new nodes but now the new nodes are all NotReady when I look at their status. Has anyone seen that and been able to fix it in Linode's managed Kubernetes environment?

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 19 Aug 2020 17:45:16 -0400   Wed, 19 Aug 2020 17:24:42 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 19 Aug 2020 17:45:16 -0400   Wed, 19 Aug 2020 17:24:42 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 19 Aug 2020 17:45:16 -0400   Wed, 19 Aug 2020 17:24:42 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 19 Aug 2020 17:45:16 -0400   Wed, 19 Aug 2020 17:24:42 -0400   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

1 Reply

Hey there! Kubernetes can be pretty dense on the best of days, and punishingly so on the worst, so I'd be more than happy to provide a bit of context around what's happening with your LKE cluster.

The Issue

Though there are some Kubernetes resources that are gracefully removed when deleted through kubectl (such as pods and services), this is not the case for nodes -- when kubectl delete <nodeName> is run, that node is ungracefully removed, which is to say that the node is deleted regardless of the processes it's handling. This, as you've noticed, can leave your cluster in a pretty wonky state.

In this particular instance, it's likely that your cluster's webhook controller server (responsible for interacting with your cluster's kube-apiserver and ensuring that the proper services are able to be deployed) was deleted along with its parent node which deadlocked your cluster. The last line in the output you shared refers to your cluster's inability to deploy Calico nodes which manage your cluster's internal networking.

The Solution

Part of your cluster's deadlock condition can be found in its aforementioned webhook server -- this being the case, you should be able to resolve the deadlock by deleting that resource. You can do so by first saving your webhooks with kubectl get mutatingwebhookconfigurations -oyaml > mutatingwebhooks.txt (in case they're needed later), and then deleting them with kubectl delete mutatingwebhookconfigurations <NAME>. After doing so, you should be able to add new nodes to your cluster without issue. If you happen to notice otherwise, though, feel free to follow up here with the output from kubectl get events to get some assistance from the rest of the Community!

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct