Emergency Maintenance - node went down for 59mins

I have been a Linode customer for about 5 months now. In the past 2 months, some of my nodes has been shut down due to "Emergency Maintenance" twice. I completely understand that every hosting business runs into issues and requires maintenance.

With that said, I am extremely frustrated this time around. My node was down for 59 minutes in total. That's a long, long time on the internet.The maintenance also happened to affect my single point of failure this time around unfortunately so my whole site was affected. I guess the positive thing out of this is that I learned that I have to fix this single point of failure.

I love Linode for its customer service and value. But if I can pay more for stability elsewhere, maybe I will have to take a hard look.

My question is:

How often do you guys experience this "Emergency Maintenance"? and how long does it go down for usually?

28 Replies

I've only had emergency maintenance once in ~1.5 years or so, which took maybe 15-20 minutes or so.

I don't recall this happening to me, you might want to submit a ticket and ask them to move your instance to a different machine.

I've been hosting here for years. Some of my nodes have never had emergency maintenance. I've had a few "We've detected an issue on the host and scheduled a migration to a new host" which I tend to think is their way of saying "We're getting rid of old hardware" or perhaps I'm just cynical ;). I've had a few host lockups which just require a reboot. I've never had an hours downtime which wasn't a network problem as far as I can remember.

Perhaps you have a problematic host? Try requesting a migration to a new box.

The first time it happened a couple of months ago my node was down for 10 minutes, which was much more acceptable.

@jebblue I will send them a ticket to see how to prevent htis from happening again

We've been using Linode for about two years and have never experienced "emergency maintenance". Our only downtime has been when migrating to the new hardware or the SSD Beta (both were done on our schedule - we placed our Linodes in the queue and had no wait time before our migrations started).

In terms of the total timing, my guess based on past experiences is that it depends where your guest ends up in the reboot cycle. The guests on a host are not all started simultaneously (that would take the host to its knees) but in sequence, which I think may be randomized on each boot.

So on a reasonably full host, if Linode does nothing more than reboot the host - no actual work - you could be back up in a few minutes or it could take much longer (I had one in the 40+ minutes range, though that was with older hosts). Obviously add extra time to that if Linode actually needs to do something with the host more than just a reboot. So the difference in your two events could be a combination of actual work time, and then where your guest ended up in the reboot queue.

In terms of the actual number of such maintenances, I've had very few (certainly as a fraction of total Linodes over time), but there have been some, and at one point after a host had multiple instances, I did request a migration after Linode couldn't confirm they had concretely resolved the issue.

From an expectations perspective, I'd recommend that if your host ends up going down (even if just to reboot immediately), a conservative assumption of up to to an hour interruption for the guest is probably reasonable. It should be better than that on average (and it's possible the newest hardware should be better), but someone has to boot last, and it could be you.

– David

PS: Oh, the above all assumes a Linode 1024, which has the maximum number of guests. Everything should get faster on larger plans since there are fewer guests to boot.

@simon604:

The first time it happened a couple of months ago my node was down for 10 minutes, which was much more acceptable.

@jebblue I will send them a ticket to see how to prevent htis from happening again

In general, hardware can fail, no ISP no matter who they are or how large can guarantee 100%.

@simon604:

I guess the positive thing out of this is that I learned that I have to fix this single point of failure.
Exactly - you have no control over your host's setup (except for staying or leaving that host), and numerous things can effect your uptime, so plan (or not) accordingly.

I can only recall two down time incidents with any of my linodes since starting in 2009 or so. One was a large scale incident the other only involved a few servers. I have been using servers from hosting providers since 2000 everyone has down time sooner or later, even the likes of HostGator. My experience here has been exceptional.

Just my 2 cents,

Jeff

I had one incident of downtime in a single data center. Other websites in other data centers were fine though. It seems good. Far better than GoDaddy and their random domains that go down or MySQL databases that suddenly won't work unless you use the auto installer.

We've been on one of our linodes since 2004… back when the VM's were UML instead of Xen and haven't had ANY downtime except for an occasional reboot from our side. But I still try and build redundancy and plan for an outage should it ever happen :)

Well, here it is again this morning. Another emergency Maintenance that brought my master DB down (a different node than last time) for a good 2 hours!!

"This host has experienced catastrophic hardware failure. We are now liaising with datacenter personnel to effect a transfer of the hard disks from this host into a new chassis. We will continue to keep you updated via this ticket as this work progresses."

I think I have given Linode a good chance. 4 emergency maintenance to my fleet of nodes within 2 months. I have no choice to look elsewhere.

@simon604:

Well, here it is again this morning. Another emergency Maintenance that brought my master DB down (a different node than last time) for a good 2 hours!!

"This host has experienced catastrophic hardware failure. We are now liaising with datacenter personnel to effect a transfer of the hard disks from this host into a new chassis. We will continue to keep you updated via this ticket as this work progresses."

I think I have given Linode a good chance. 4 emergency maintenance to my fleet of nodes within 2 months. I have no choice to look elsewhere.

Did you submit a ticket to have your node migrated to a different machine? Why do you seem to have such failures while the rest of us do not? It seems odd to me.

@jebblue I would like to know the same. I have indeed open a ticket discussing reliability issue on our nodes. Let's see what linode has to say.

If you don't mind me asking, how many nodes do you have and in what DCs?

@obs 5 all in Fremont

Linode Staff

Hello.

I just want to reassure everyone that we do the very best we can to ensure the reliability of our hardware and network - from RAID arrays, dual PSUs, and redundant networking from our uplinks right down to each host. We use the highest quality networking and host components. We spend a lot of time diagnosing each and every hardware issue, corroborating failures across components with where they were sourced from, and ultimately, if need be, working with the vendors and manufacturers to rectify issues if a pattern of component failure is determined.

The incentives to do so are in obvious alignment with our customers - as downtime sucks for everyone. But even disregarding that fact, ignoring reliability issues is just not how we, as a company, want to do business. It's not in Linode's DNA. Unfortunately, some problems just aren't obvious at first and it can take time for them to be resolved to our satisfaction.

When host problems to occur, our team does an incredible job restoring service as quickly and as safely as possible, and I think they do a fantastic job at it.

I also want to point out that, although it sounds like you've had a bad run and I'm sure the team will be looking into the specifics of your experiences with host issues, it's not due to a lack of caring or negligence on our end.

Sorry for the problems. Please let me/us know if there's anything else we can do.

-Chris

@Chris I have no doubt that Linode takes reliability and customer service very seriously. In fact, that is why I came over here in the first place. I might just be the unlucky one with a high percentage of node failure.

I am currently evaluating the cost / time for moving the stack over to AWS ELB, EC2 and RDS, which claims to make HA a couple of clicks away, versus setting up something similar on Linode. Can anyone point me to a straight-forward resources for setting up a HA MySQL cluster?

Only 5 and you've had this many problems, sounds like a very bad run of luck.

There used to be a guide in the linode library on HA MySQL but it seems to have gone missing (or I'm just blind it is morning).

One warning about AWS, you don't get anywhere near as much bang for your buck, it'll cost you a fortune to run compared to Linode.

My preferred way of MySQL HA is to have a slave running that pings the master and if the master stops responding takes control, of course you have to make sure your apps are notified of this and you have to monitor that the slave is replicating properly (check out percona tools for this, especially heartbeat and checksum).

AWS does handle failover for you (even across physical locations), it's pretty easy to make a redundant stack there but it really is expensive.

Were all these failures on the same host server?

@Guspaz I don't think so since each emergency maintenance only affected one node at a time.

For what it's worth I have been using Linode for multiple years and have only had 2 outages, both due to power issues. Neither lasted more than an hour.

I won't go anywhere else unless forced :)

I assume any Linode maintenance is scheduled unless some emergency type situation? And all Linode renters are notified well in advance in the former case?

I certainly would hope so!!!

Ray

@Rayj:

I assume any Linode maintenance is scheduled unless some emergency type situation? And all Linode renters are notified well in advance in the former case?

I certainly would hope so!!!

Ray

Most of the time they give you 10-14 days notice and you're notified by email and then again closer to the time. The maintenance normally involves moving you to a new host (or in my experience it does anyway). Emergency maintenance is done immediately and is normally a reboot (i.e. host kernel lockup) or a immediate migration. Again you're notified by email.

@obs:

@Rayj:

I assume any Linode maintenance is scheduled unless some emergency type situation? And all Linode renters are notified well in advance in the former case?

I certainly would hope so!!!

Ray

Most of the time they give you 10-14 days notice and you're notified by email and then again closer to the time. The maintenance normally involves moving you to a new host (or in my experience it does anyway). Emergency maintenance is done immediately and is normally a reboot (i.e. host kernel lockup) or a immediate migration. Again you're notified by email.

Very good. Thanks.

Ray

Hey all. I just wanted to chime in here and say that the only down time that I had with my linode was when I was rebooting my own server and was breaking things. Lol. I'm using the Dallas data center and t has never had a problem with me.

Host specific issues generate a ticket, which all who have access to that Linode under your account will be notified via email.

-Chris

@peleus:

Is there also a possible way for users to get notified via email if ever there will be a maintenance in the future? ![](" /> ~~That happens by default. Ticket creation and updates generate emails to the addresses on file for your user (in the "my profile" section).

  • Les~~

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct