Backend Nodes down for Kubernetes service type=LoadBalancer
Hi there,
I'm having a problem with our LKE cluster. I'm using nginx-ingress to expose services to the internet. The traffic is routed to the ingress controller through a Node Balancer which is created automatically by a Kubernetes service with type=LoadBalancer.
This worked like a charm for the last months. But since a few days, 3 of the 5 backends nodes of the NodeBalancer are shown as DOWN in the Cloud Manager. I have tried to debug the issue and found out, that on some of the LKE nodes the relevant port indeed seems to be closed. But I can't seem to find out what is the underlying problem. Even recycling the nodes doesn't change anything.
I created another service of type LoadBalancer and the same problem did't occur there. But I don't want to recreate the nginx-service because that would require to update all the DNS-Records pointing to the node balancer.
Do you have any idea, how to find out what causes this?
2 Replies
This could be related to your LKE cluster's configuration, especially since it didn't resolve after recycling the nodes. NodeBalancers check the return traffic from health checks to determine if the backend node is UP or DOWN. If there is only one pod running the load balancing service, the other nodes that don't have replicas of that pod will show as DOWN. The ports being closed on the nodes showing as DOWN also points towards this being the cause, since closed ports indicate no service is listening for connections. If this is the case, increasing the replicas of that pod so that it's running on all of the backend nodes should result in the backend nodes showing as UP. Another option would be to add a defaultBackendService
, which should be an option on the Helm chart.
@jyoo Thanks for your answer. I tried scaling the deployment of nginx-ingress. Because that didn't really show an effect a had a look at on which nodes the pods run on. And in fact they only run on the nodes that are also shown as UP in the Node Balancer. Maybe the other nodes don't have enough resources to schedule the pods. Now one node is running 3 instances of nginx-ingress-controller. I I get you right, it is supposed behavior, that the service port is only available on the nodes where one of the pods is actually running? I was expecting that the other nodes would also open up the port but just redirect the traffic to one of the other nodes.