Emergency Maintenance - node went down for 59mins
With that said, I am extremely frustrated this time around. My node was down for 59 minutes in total. That's a long, long time on the internet.The maintenance also happened to affect my single point of failure this time around unfortunately so my whole site was affected. I guess the positive thing out of this is that I learned that I have to fix this single point of failure.
I love Linode for its customer service and value. But if I can pay more for stability elsewhere, maybe I will have to take a hard look.
My question is:
How often do you guys experience this "Emergency Maintenance"? and how long does it go down for usually?
28 Replies
Perhaps you have a problematic host? Try requesting a migration to a new box.
@jebblue I will send them a ticket to see how to prevent htis from happening again
So on a reasonably full host, if Linode does nothing more than reboot the host - no actual work - you could be back up in a few minutes or it could take much longer (I had one in the 40+ minutes range, though that was with older hosts). Obviously add extra time to that if Linode actually needs to do something with the host more than just a reboot. So the difference in your two events could be a combination of actual work time, and then where your guest ended up in the reboot queue.
In terms of the actual number of such maintenances, I've had very few (certainly as a fraction of total Linodes over time), but there have been some, and at one point after a host had multiple instances, I did request a migration after Linode couldn't confirm they had concretely resolved the issue.
From an expectations perspective, I'd recommend that if your host ends up going down (even if just to reboot immediately), a conservative assumption of up to to an hour interruption for the guest is probably reasonable. It should be better than that on average (and it's possible the newest hardware should be better), but someone has to boot last, and it could be you.
– David
PS: Oh, the above all assumes a Linode 1024, which has the maximum number of guests. Everything should get faster on larger plans since there are fewer guests to boot.
@simon604:
The first time it happened a couple of months ago my node was down for 10 minutes, which was much more acceptable.
@jebblue I will send them a ticket to see how to prevent htis from happening again
In general, hardware can fail, no ISP no matter who they are or how large can guarantee 100%.
@simon604:
I guess the positive thing out of this is that I learned that I have to fix this single point of failure.
Exactly - you have no control over your host's setup (except for staying or leaving that host), and numerous things can effect your uptime, so plan (or not) accordingly.
Just my 2 cents,
Jeff
"This host has experienced catastrophic hardware failure. We are now liaising with datacenter personnel to effect a transfer of the hard disks from this host into a new chassis. We will continue to keep you updated via this ticket as this work progresses."
I think I have given Linode a good chance. 4 emergency maintenance to my fleet of nodes within 2 months. I have no choice to look elsewhere.
@simon604:
Well, here it is again this morning. Another emergency Maintenance that brought my master DB down (a different node than last time) for a good 2 hours!!
"This host has experienced catastrophic hardware failure. We are now liaising with datacenter personnel to effect a transfer of the hard disks from this host into a new chassis. We will continue to keep you updated via this ticket as this work progresses."
I think I have given Linode a good chance. 4 emergency maintenance to my fleet of nodes within 2 months. I have no choice to look elsewhere.
Did you submit a ticket to have your node migrated to a different machine? Why do you seem to have such failures while the rest of us do not? It seems odd to me.
I just want to reassure everyone that we do the very best we can to ensure the reliability of our hardware and network - from RAID arrays, dual PSUs, and redundant networking from our uplinks right down to each host. We use the highest quality networking and host components. We spend a lot of time diagnosing each and every hardware issue, corroborating failures across components with where they were sourced from, and ultimately, if need be, working with the vendors and manufacturers to rectify issues if a pattern of component failure is determined.
The incentives to do so are in obvious alignment with our customers - as downtime sucks for everyone. But even disregarding that fact, ignoring reliability issues is just not how we, as a company, want to do business. It's not in Linode's DNA. Unfortunately, some problems just aren't obvious at first and it can take time for them to be resolved to our satisfaction.
When host problems to occur, our team does an incredible job restoring service as quickly and as safely as possible, and I think they do a fantastic job at it.
I also want to point out that, although it sounds like you've had a bad run and I'm sure the team will be looking into the specifics of your experiences with host issues, it's not due to a lack of caring or negligence on our end.
Sorry for the problems. Please let me/us know if there's anything else we can do.
-Chris
I am currently evaluating the cost / time for moving the stack over to AWS ELB, EC2 and RDS, which claims to make HA a couple of clicks away, versus setting up something similar on Linode. Can anyone point me to a straight-forward resources for setting up a HA MySQL cluster?
There used to be a guide in the linode library on HA MySQL but it seems to have gone missing (or I'm just blind it is morning).
One warning about AWS, you don't get anywhere near as much bang for your buck, it'll cost you a fortune to run compared to Linode.
My preferred way of MySQL HA is to have a slave running that pings the master and if the master stops responding takes control, of course you have to make sure your apps are notified of this and you have to monitor that the slave is replicating properly (check out percona tools for this, especially heartbeat and checksum).
AWS does handle failover for you (even across physical locations), it's pretty easy to make a redundant stack there but it really is expensive.
I won't go anywhere else unless forced
I certainly would hope so!!!
Ray
@Rayj:
I assume any Linode maintenance is scheduled unless some emergency type situation? And all Linode renters are notified well in advance in the former case?
I certainly would hope so!!!
Ray
Most of the time they give you 10-14 days notice and you're notified by email and then again closer to the time. The maintenance normally involves moving you to a new host (or in my experience it does anyway). Emergency maintenance is done immediately and is normally a reboot (i.e. host kernel lockup) or a immediate migration. Again you're notified by email.
@obs:
@Rayj:I assume any Linode maintenance is scheduled unless some emergency type situation? And all Linode renters are notified well in advance in the former case?
I certainly would hope so!!!
Ray
Most of the time they give you 10-14 days notice and you're notified by email and then again closer to the time. The maintenance normally involves moving you to a new host (or in my experience it does anyway). Emergency maintenance is done immediately and is normally a reboot (i.e. host kernel lockup) or a immediate migration. Again you're notified by email.
Very good. Thanks.
Ray
-Chris
@peleus:
Is there also a possible way for users to get notified via email if ever there will be a maintenance in the future?
![](~~That happens by default. Ticket creation and updates generate emails to the addresses on file for your user (in the "my profile" section)." />
- Les~~