Management Redundancy

While I understand that the network outage that happened yesterday in Dallas wasn't Linode's fault, they should at least have the management console running in redundant datacenters. Yesterday when I was unable to connect to my server, I had no idea what was going on. I thought that Linode had simply vanished because even the website wouldn't come up!

Even if my server was located in Atlanta or Chicago, I'd have no way to manage it, because the Linode website, management console, even status tracker is in Dallas and was affected. It should be simple enough for the tech staff to do. And status shouldn't even be on your server, but a third party all together, just in case something like this ever happens again.

I am willing to accept that things happen, but the issue could have been mitigated much better yesterday.

13 Replies

Agreed. And for the next outage there's two status websites that are independent:

(1) http://twitter.com/linode

(2) http://linode.typepad.com/

However, of course, the Linode Manager would still be unavailable if the Dallas datacenter went down.

@wiley14:

Even if my server was located in Atlanta or Chicago, I'd have no way to manage it, because the Linode website, management console, even status tracker is in Dallas and was affected.
That's perhaps a little broad. You still have direct access to your server, as well as its console via LISH (only the web console is affected by the management site - ssh will work fine), so most management activities should be unaffected. My Dallas Linode was offline during the outage, but those I have in Newark were fine, and I wouldn't have had any problem managing them. Also, I don't think the status site is in Dallas, but believe it's hosted in San Francisco.

> It should be simple enough for the tech staff to do.
I'd like to see HA for the Linode Manager too, but let's be fair here - any application involving changing state (which includes the Linode Manager) is going to be non-trivial to reliably replicate, and then implement synchronization, failover and recovery procedures. Not impossible by any means, but also unlikely to be a trivial amount of work (though much of the data that already feeds the manager is remote from the individual hosts, so that might help a little architecturally). See the various discussions on the forums about doing HA for your own Linode. Would it be nice to have - absolutely. Would I claim it's simple to do. No.

> And status shouldn't even be on your server, but a third party all together, just in case something like this ever happens again.
This was already done earlier this year with status.linode.com. In past outages, I've also had good luck getting status via IRC (also hosted elsewhere).

Oh, and it's a pretty safe bet something like this will happen again in the future. Very few (perhaps no) infrastructures can guarantee 100% uptime.

– David

People seem to be forgetting that Linode is a inexpensive VPS hosting service.

Expecting 100% uptime (or automatic failover or console redundancy) at these prices is very naive.

> Expecting 100% uptime (or automatic failover or console redundancy) at these prices is very naive.

I'm new to VPS hosting and linode.com in general. Should we expect better uptime from VPS hosting than shared hosting?Obviously the speed and amount of freedom is not comparable, but if your site isn't up, you don't need a bandwidth test to see how fast it is.

I only ask because I just moved from a awesome shared host to linode.com and I was hoping for similar uptime.

I've had excellent uptime on linode (and unblemished uptime on slicehost for two years before that) - until this recent spate of dns issues and DDOS attacks.

Hopefully they can get it under control.

Cool jords, thats good to hear. I don't have 100% uptime expectations, especially when it comes to DDOS attacks, its just my last host left me with very hi expectations. I've had 99.3% uptime over the last week, so if this is a rough patch, I have no complaints.

I don't understand why so many people blame Linode for DATACENTERS' problems. Recent failures were about 5x times failure in datacenter's own network (NOT controlled by Linode), and once or twice a DDoS attack aimed at someone in same DC (NOT Linode's fault). Last thing I remember that could be blamed on Linode was that update bug that prevented the hardware hosts from booting. And that was quite a few months ago… and fixed quickly.

Understand this: Linode themselves are a "hurt customer" in these cases, just as you are (actually, BECAUSE you are).

And well, this is Internet. Stuff Happens. Only way to REDUCE (there's no 100% proof) downtime, is a HA setup spread among a few DCs in different locations.

(And yes, I had serious trouble because of the Newark network problem a few weeks ago myself. But I don't whine about it.)

@rsk Agreed. A lot of people, either don't know or forget, that Linode and other hosts rent their big pipes from a outsourced company. It's the natural order of the chain blame.

users blame you >> we blame our web host >> our webhost deals with their providers

hakuna matata

And if we don't blame our web host, they won't blame their upstream and the chain will be broken! :lol:

More seriously, datacenter issues don't explain why ns1-ns4 were all down at the same time.

Hello,

They weren't down, but a remote namserver wasn't doing its job. This has since been corrected (by correcting that issue, and adding five more, globally diverse, nameservers into the mix).

-Chris

Ok, as i said I have had great uptime except for the last week or so, so hopefully things will get better again :)8)

@rsk:

I don't understand why so many people blame Linode for DATACENTERS' problems. Recent failures were about 5x times failure in datacenter's own network (NOT controlled by Linode), and once or twice a DDoS attack aimed at someone in same DC (NOT Linode's fault).
I certainly appreciate the varying sources of failures, but at the same time, Linode selected the data center providers that they are using. Just as when we select a VPS provider, there is competition and differences at the data center level, so Linode isn't completely off the hook in my book just because the actual outage was within the data center. No differently than were I selling a service via my Linode, that my customers would hold me responsible even if my service was down due to a Linode hardware failure.

Yes, Linode has been dependent on the data center providers to resolve most (if not all) recent interruptions, but the data center is a component in the solution Linode is selling - no different than the hardware they chose - and just as relevant to the service reliability. I was bummed to see the status update implying that a single router could take down connectivity in Dallas for so long, since I'd have expected a mesh switched fabric below the level of a router. The other outage with the failed switch card was a little more understandable, since a partial component component failure - particularly at the switch level - can be very hard to diagnose, but even then it was disappointing how long it took to identify/resolve.

I'm also mildly disappointed that the data center can't transparently do whatever maintenance in Dallas they need to tomorrow morning (since I'd expect a major data center to have a way to shunt traffic around hardware they were working on), but I also understand sometimes its necessary and a warning about a 10 minute outage may also just be a CYA notice in case there is disruption during such a cutover. [Edit: Looks like no outages actually occurred]

With that said, I try to periodically watch status boards and customer forums for other providers I had considered when originally selecting Linode, and in general, have found Linode comparing quite favorably in terms of interruptions. I do think they suffered in comparison this past month (Newark and multiple Dallas significant outages) but this is the first month that's happened, and also the first and only month so far where some of my Linodes fell below 99.9% availability.

– David

For the whiners and complainers: http://www.youtube.com/watch?v=QtCqcKzS … r_embedded">http://www.youtube.com/watch?v=QtCqcKzSVCU&feature=player_embedded#!

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct