Disaster Recovery Plan

I need to create a disaster recovery plan for my application and supporting servers, all of which are hosted on Linode. Does anyone have an existing disaster recovery plan tailored to Linode that they would be willing to share? Or maybe just some suggestions on where/how to start developing a DR document.

Chris

11 Replies

My DR plan is as follows:

1) Buy virtual or dedicated servers from another supplier.

2) Restore my backups onto the new servers.

3) Update DNS at my registrar.

4) Fix any resultant problems with the apps.

You do have regular off site backups don't you?

I like to keep a scoped high powered rifle, two hand guns, around 1000 rounds for each weapon, and a month's worth of food and water for the five of us.

Oh, system disaster recovery plan.

DOCUMENT EVERYTHING for a bare metal out restore.

Have known up-to-date config file backps & (re)installation scripts.

Run my own DNS servers on a 3rd party DNS host.

Have warm spare server(s) at a different host.

Finally - TEST YOUR RECOVERY PLAN

This assumes you already keep up-to-date, known valid (and verified) off-site backups of your data.

Warm spare servers are not a requirement unless you need to be back up really quickly. If it's mail and web for a small to medium company it might be more pragmatic to accept some risk of downtime to save half the running costs. It all depends on the company. You need a minimum of one DNS server up at all times but the rest can be down for a few hours.

You can buy EC2, hetzner, bytemark, or whatever machines well within a couple of hours and have them running and tested an hour or so later. In companies that don't conduct their entire business on the web that's normally enough disaster recovery.

Thanks for the replies, this gives me a lot to start with for my documents. How do I download my backups? Do they come in a format that can be imported easily and simply booted on another Xen provider?

Linode's backups only restore to linodes. I think also limited to same linode location, but I'm not 100% sure about that last part. In any case they are dependent on Linode in general functioning and also beeing available to you. Keep in mind that it doesnt have to be a "disaster" - there could be things like legal and contractual disputes that might lock you out of your nodes/data. The Linode people are good guys/gals, obviously - but you never know what can happen ;)

So, myself, beeing a wee bit paranoid and not trusting any company 100%, runs backup myself using BackupPC + database replication and hourly snapshots to a undisclosed location in a different country. Just make sure the backup box has enough outbound bandwidth for a speedy restore..

Would probably also use Linode's backup for easier/faster intra-linode restores, if they supported my setup.

@trippeh:

Keep in mind that it doesnt have to be a "disaster" - there could be things like legal and contractual disputes that might lock you out of your nodes/data. The Linode people are good guys/gals, obviously - but you never know what can happen ;)

Exactly. The disaster recovery plan should cover the possibility that Linode is totally down and not coming back up. Fairly recent offsite backups are essential in any recovery situation no matter who your servers are with. The linode backups are not really good enough for disaster recovery. Not that Linode has ever had security issues (Cough Routers Cough Slush Cough bitcoinica. Ahem, Cough.)

Besides any US company can get large chunks of equipment seized as evidence if the FBI suspect something criminal is happening. It would only take one cracked Linode running a child porn site and they will take the whole rack if they are in a bad mood. http://news.cnet.com/8301-1009_3-200731 … s-servers/">http://news.cnet.com/8301-1009_3-20073102-83/fbi-seizes-web-hosting-companys-servers/

It's probably also a good idea to define what a "disaster" is, and brainstorm some disaster scenarios. Accidental deletion of /var/lib/mysql/ibdata1, loss of physical host (temporary or permanent), loss of datacenter (temporary or permanent), loss of entire hosting provider, etc. It's also a really good idea to define what service levels you're going to provide under various disaster scenarios. Also, consider humans and physical facilities: if you can't run payroll because a category five hurricane is destroying your entire town, do you have an agreement with your bank to do the needful?

I prefer to refer to this sort of plan as "business continuity" rather than "disaster recovery", because it really involves more than servers and disasters, and really gets down into the core of defining what, exactly, you need to do to serve your customers and employees when bad stuff happens.

For disaster recovery I mean long term service disruption at Linode datacenter, an event that would cause me to say Linode is no longer an option. I'm building an API that I want other businesses to rely on for a core part of their workflow. I want to be able to be up and running in 6 hours at most.

hoopycat - continuity of business, I like that. Servers are just part of the problem in such a situation.

I think I'm asking how do I pckage up the infrastructure I build on Linode and move it to a new provider (im not considering doing this)

Source code is already handled.

Database configs and mysqldatafiles can be backed up on their own, or database can be dumped to sql file.

Install scripts can be written to rebuild the OS.

DNS - I'm not sure how I should handle DNS.

In reading the backup description "The backup system must be able to mount your filesystem. If you've used fdisk on your images to create partitions, or created encrypted volumes, or LVM, or done anything other than use our deployment or disk image creation tools, we won't be able to back up the data. The backup system operates on files, not at the block level. " - This is good to know.

host your own hidden master dns server using bind or nsd3 or equivalent, then use linode nameservers as slaves.

nsd3 configuration is easily portable, i'm sure bind is as well.

My favorite approach so far is to use a configuration management system – chef, puppet, ansible, whatever -- to define what each of your servers does. New ones can be deployed in a hurry. I tend to use fabric and libcloud to handle server deployment, so deploying the cluster on another provider involves changing very little. As a bonus, your servers are now described by source code. This is sort of an evolved "install script" concept, but it lasts for the entire lifecycle of a server.

Deploy new servers to replace old servers once in awhile. It might be interesting to try never rebooting your servers: instead, instantiate a new one, then destroy the old one. Your cluster is a multi-cellular organism.

I've had good luck with DNS Made Easy for general customer-facing domains, and Amazon Route 53 via libcloud for the server FQDN domain.

Your data – files, databases, etc -- would still need to be handled somehow, but Amazon S3 is quite workable for general static content storage/serving as well as storing of (encrypted, presumably) database dumps.

A common pattern here is diversity: relying on one provider for everything (even Linode!) is just plain silly. If your domain registrar, DNS host, mail provider, and VPS provider are the same company, you're going to have a bad time.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct