How to delete a specific Node from a LKE Node Pool?
DigitalOcean, GCP, etc… offers the ability to delete a specific node from a node pool: https://www.digitalocean.com/docs/apis-clis/doctl/reference/kubernetes/cluster/node-pool/delete-node/
doctl kubernetes cluster node-pool delete-node "cluster-id|cluster-name" "pool-id|pool-name" "node-id" [flags]
Is this possible on Linode?
I tried
linode-cli linodes delete NODE-ID
on a LKE cluster, which put the cluster's pool in a bad state: A terminated node that still appeared active according to LKE, taking up 1 of the pool's "count".. I had to recycle the pool to remove the deadweight node..
Thank you for your time and help!
12 Replies
I wish that we're able to downscale our node pool by passing a lower --count
in https://techdocs.akamai.com/linode-api/reference/api-summary#node-pool-update and choosing exactly which node to remove, since my nodes are stateful and I don't want to remove random nodes when downscaling
At least it would be useful to know if resizing a node pool will predictable prioritise removing nodes that are cordoned (SchedulingDisabled) re: nodes that are Ready for scheduling.
In my recent attempt at downsizing a pool, the node that was removed was the one that was cordoned (which coincidentally was the oldest one and the one running an older version of the API), but I have no idea of whether that was intended or I just got lucky.
I ran a bunch of tests on this tonight and determined that resizing Node Pools always deletes the oldest node. If you remove more than one node, they are deleted starting at the top of the list and going down (oldest to newest, as would be expected).
I tested after cordoning various nodes as well, and Linode still always deletes the oldest node. Cordoning has no affect on which nodes are removed from the pool.
Of course, the interface continues to warn that downsizing Node Pools will remove nodes at random. Even though that is currently false, Linode could change this behavior with no notice as long as the warning is present. So take care, but for now at least they are indeed removed from oldest to newest.
If anyone at Linode/Akamai is reading this, it would be awesome if you could document this behavior so that we know we can count on it going forward. Deterministic node deletion can be pretty important for certain cluster setups. (Deleting nodes based on which ones are cordoned would be even better, but oldest to newest is still far better than "at random".)
I ran a bunch of tests on this tonight and determined that resizing Node Pools always deletes the oldest node.
I wish Guy above was right -- would have saved me some hassle and downtime -- but unfortunately this was not my own experience.
Attempting an LKE upgrade from 1.27 to 1.28. 2 node cluster. Added a third node to serve as float during the upgrade, and cordoned the original two nodes, resulting in:
lke136306-200871-04bc9f980000 Ready,SchedulingDisabled <none> 73d v1.27.5 192.168.136.33 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-25-cloud-amd64 containerd://1.6.22
lke136306-200871-2f4831d00000 Ready <none> 11m v1.28.3 192.168.133.118 139.177.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-26-cloud-amd64 containerd://1.6.25
lke136306-200871-51e811860000 Ready,SchedulingDisabled <none> 73d v1.27.5 192.168.136.15 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-25-cloud-amd64 containerd://1.6.22
Next I drained and recycled lke136306-200871-04bc9f980000:
lke136306-200871-04bc9f980000 Ready <none> 4m59s v1.28.3 192.168.136.27 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-26-cloud-amd64 containerd://1.6.25
lke136306-200871-2f4831d00000 Ready <none> 10m v1.28.3 192.168.129.35 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-26-cloud-amd64 containerd://1.6.25
lke136306-200871-51e811860000 Ready,SchedulingDisabled <none> 73d v1.27.5 192.168.136.15 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-25-cloud-amd64 containerd://1.6.22
Finally I drained lke136306-200871-51e811860000, the last of the two original nodes, and clearly the oldest node in the cluster.
I performed a node pool resize from 3 nodes to 2. LKE nuked one of the newer nodes, resulting in:
lke136306-200871-2f4831d00000 Ready <none> 11m v1.28.3 192.168.129.35 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-26-cloud-amd64 containerd://1.6.25
lke136306-200871-51e811860000 Ready,SchedulingDisabled <none> 73d v1.27.5 192.168.136.15 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-25-cloud-amd64 containerd://1.6.22
So it destroyed lke136306-200871-04bc9f980000, which was in fact the newest node. But, ok, I had recycled that one rather than creating a net-new instance. So figuring this might be the key difference, I added a completely new instance to the cluster:
lke136306-200871-1404f39e0000 Ready <none> 7m4s v1.28.3 192.168.136.249 139.177.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-26-cloud-amd64 containerd://1.6.25
lke136306-200871-2f4831d00000 Ready <none> 19m v1.28.3 192.168.129.35 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-26-cloud-amd64 containerd://1.6.25
lke136306-200871-51e811860000 Ready,SchedulingDisabled <none> 73d v1.27.5 192.168.136.15 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-25-cloud-amd64 containerd://1.6.22
Stabilized my workloads, crossed my fingers, lit a candle, said a prayer, then shrunk the pool from 3 to 2 in the console. Once again, LKE nuked the newest node:
lke136306-200871-2f4831d00000 Ready <none> 21m v1.28.3 192.168.129.35 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-26-cloud-amd64 containerd://1.6.25
lke136306-200871-51e811860000 Ready,SchedulingDisabled <none> 73d v1.27.5 192.168.136.15 172.105.xxx.xxx Debian GNU/Linux 11 (bullseye) 5.10.0-25-cloud-amd64 containerd://1.6.22
LKE simply is not a suitable platform on which to run stateful applications. And this is to say nothing of the I/O failures and ext4 errors I observed on multiple nodes after migrating workloads.
I came to Linode after network stability problems with a competitor, but this is far worse than what I experienced there, so I'm moving back. The grass is always greener, right?
It should now be possible to delete a specific node with our API and CLI using these instructions here. I'm not sure when this feature was added, but please feel free reach out if you have any trouble with it.
For the CLI you'd run linode-cli lke node-delete
with the clusterid and nodeid parameters.
@CassandraD that's really great to hear, thanks for following up. Were it not for the I/O and ext4 errors I saw during this LKE upgrade from the PVs on the new nodes when migrating workloads from one node to another (using standard kubectl cordon/drain) I'd be tempted back, but at present I've extremely cold feet.
I completely understand you need to make the decision that works best for you. I don't have enough information to speak to the specific issues you mentioned, but if you have questions or information you'd like to share, please feel free to open a Support Ticket and we can try to address any specific concerns about your services. Thanks for providing this feedback.
I wish Guy above was right -- would have saved me some hassle and downtime -- but unfortunately this was not my own experience.
Yeah, alas, at some point since I posted my original testing results, the behavior was inexplicably changed. On my last forced K8s upgrade (some time in the last couple months) I tried to run the same playbook I tested at the start of this year, but now Linode always deletes the newest node instead of the oldest. Should have posted a follow-up but it slipped my mind — sorry if that caused you unexpected downtime [@tack] (/community/user/tack).
Good to know that perhaps this can be done with the API, but come on, this should be in the interface. The fact that there's already an API for it makes it even more crazy that it isn't supported via the Cloud Manager UI, especially given that this thread has been open for nearly three years now. This is a basic feature for Kubernetes deployments. Linode forces K8s upgrades ~2 times per year and the lack of this feature causes my services to have entirely unnecessary downtime every time.
I've recently been exploring competitors for the first time in over a decade of being a Linode customer. Post-acquisition, all I've seen is a price hike combined with increased instability (like servers in my cluster randomly being rebooted, and when they come back online they are unable to successfully attach block storage, causing significant downtime for my services), feature-unreliability (like the node pool deletion order silently changing as we're discussing here), and other issues. And not a single beneficial new feature. It truly hurts my heart, as I have been a vocal Linode proponent for so, so, so many years. But the glory days are over. I've only evaluated competitors on Kubernetes deployments, but I've been pleased to find that both Digital Ocean and Vultr include clearly more professional and well thought-out interfaces and features for Kubernetes (you can most certainly reduce node pools by specific nodes, for instance).
Linode is honestly one of my favorite web companies of all time, so it really does pain me to be considering leaving it behind. I'll always have a soft spot for it, but I just don't think it is what is used to be anymore. It might still be fine for one-off servers — I'm not running any right now so I can't speak to that — but if you're running Kubernetes, it's not worth the (recently jacked-up by 20%) price.
I want to thank you both for your feedback. We've seen this feature suggested before, but you've made a strong case here for its importance and I've shared that with the appropriate teams. I can't promise this is something that will be changed, but I think it's worth exploring on our end.
I've also shared the more general feedback about your recent experience and wanted to let you know that the Support Team is here 24/7 to help address any specific issues you run into.
I've only evaluated competitors on Kubernetes deployments, but I've been pleased to find that both Digital Ocean and Vultr include clearly more professional and well thought-out interfaces and features for Kubernetes (you can most certainly reduce node pools by specific nodes, for instance).
I thought it might be a little cheeky to discuss competitors on a Linode forum, but since you've opened the door, let me walk through it. :)
DOKS may be great, I can't say. Before I moved all my workloads to K8s a couple years ago I was with Digital Ocean. But, bafflingly, even after nearly 7 years of requests, their load balancer still doesn't support IPv6. And IPv6 is important to me.
VKE meanwhile looks great on paper (and, in fairness, VKE is the only one of the three that is dual stack and so supports IPv6 for egress), but Vultr's network is not sufficiently monitored: I've had multiple occasions where their LB simply stops responding to IPv6 and requires intervention from the (usually responsive and helpful) Vultr support team. The most recent incarnation of this problem in which it remained broken for a week with a support ticket that went ignored after tier 1 couldn't immediately fix it was what finally pushed me to Linode.
Moreover, you mentioned PVs not being able to attach on the new nodes when moving pods -- Vultr suffers from this exact same issue, and when I opened a support ticket about it they politely suggested I own the solution instead of them by opening a GitHub issue against kubernetes-csi/external-attacher. (I learned to live with it by manually deleting the offending volumeattachment
object.)
VKE's cluster upgrade orchestration is weak -- one example, before rebooting a node they don't (or at least didn't as of October) remove it from any affected LBs (which point to NodePorts) first, which causes 30-60 seconds of intermittent network disruption until the LB healthcheck fails out the pool member. Also, the upgrade process doesn't consider PDBs, so you can very easily end up in a state where all pods in a workload are down, causing impact.
This doesn't really seem all that complicated to me. If I were building a managed K8s platform, I'd consider all this stuff table stakes. In fact, I vastly prefer LKE's approach that makes you manually upgrade the nodes yourself over half-baked orchestration, because at least you're able to craft a procedure to avoid impact (modulo the topic of this thread for which there happily is now a solution).
LKE has all the same kinds of immaturity as the other budget cloud providers, so I think you generally trade one set of problems for another. That said, the I/O and ext4 issues I observed on my last (first!) LKE upgrade, which caused some pods to start with readonly filesystems (and in some cases not start at all) and which happened on two separate nodes, was the most concerning thing I've encountered on all of them. (I've moved back to Vultr for now, but I'm going to reproduce this problem with LKE and open a support ticket because I'm motivated to see this fixed so I have a place to land after Vultr's next inevitable IPv6-related outage.)
But I get it. These services are priced commensurate to their quality, and even though GKE, EKS, and AKS all provide more robust experiences, those cloud providers definitely aren't priced for individual enthusiasts, particularly because of their hostile data transfer pricing. So tolerating some of these rough edges is par for the course.
I'm going to reproduce this problem with LKE and open a support ticket because I'm motivated to see this fixed
And done. The I/O and ext4 errors problem was easy to reproduce doing a typical workload migration (drain and cordon) to a new node. Looks like a fairly serious CSI driver issue to me.
Also I'm happy to report the linode-cli solution CassandraD mentioned to target a node for deletion worked like a charm.
I'm looking for your ticket now and I'll follow up there when I have more information. Thanks so much for taking the time to reproduce this!