Best practice for live backup?
Currently, for backups, I have a once a day script that runs and saves my site's files and db to S3.
However, it would be devastating to my business to lose even a day's worth of data – so I would love to hear your thoughts on what the best practices are for maintaining a live backup.
I don't care about an hour of downtime once in a blue moon. What I do care about is never losing any data.
The live backup has to cover both my site's files and my site's DB.
Things I've considered include Amazon RDS for the db, replication of the db to another Linode, rsyncing the files to another linode or to s3, etc.
6 Replies
@stex:
However, it would be devastating to my business to lose even a day's worth of data – so I would love to hear your thoughts on what the best practices are for maintaining a live backup.
You could look into various HA solutions (I think the Linode Library has some articles too), but that might be a lot of work, and a lot of that overhead is from aiming at quick failover to prevent outages, as opposed to just data loss.
My own setup seems more in line with your requirements, which is to take significant (but not extreme) steps to ensure data integrity but not sweat the small access outages, or the need to manually cut over during a larger outage.
In my case, I set up a second, mirror, Linode, with an identical configuration (initially cloned from the primary Linode). I maintain them in parallel. I have both set up for Linode's backups, so ensure I have a quick way to do a bare metal recovery to a point in time at least no more than 24 hours old.
Use a file synchronization tool to reflect pure filesystem changes between the two machines (e.g,. rsync, unison, etc…) for any of your own files. Run it frequently - at whatever latency you can afford to lose data. Note that it's easy to say absolutely no data loss, but typically the risk of <1min is rather small, for example.
Set up an appropriate replication system for your database depending on the engine. For example, I use a warm standby with WAL shipping for my PostgreSQL database, with a minimum update frequency of 30-60 seconds (so at most I could lose the last 30-60 seconds of transactions). Once I upgrade to PostgreSQL 9.x I'll probably use the more real time hot standby replication. mysql has similar replication options.
If you have any other applications whose state can not be reflected by a filesystem or database synchronization, identify a way to replicate its changes and include that as well.
Having the standby Linode in the same data center lets you use the private network for all transfers which lets you crank down the latency and not worry about bandwidth usage. It won't protect against a network outage affecting clients (but you said that wasn't a major deal) or a data center wide catastrophe, but you could provision a second machine in a different data center for that purpose and sync from your primary warm standby.
This approach has only a modest cost, with pretty good coverage. It does require manual intervention in a true disaster, and does not absolutely remove any window of time for loss. But trying to have no window at all increases costs and management time severely in my experience and introduces its own failure modes. You used the term "devastating" which might imply you're willing to jump through all the hoops required to try to eliminate that last little bit of risk, but in my experience it's easy to say that but in practice there's almost always a little wiggle room in the risk/cost analysis and it's easier to say "absolutely no risk of data loss" than to implement, much less test and then continuing testing to be sure you got it right.
-- David
As for files, you could rsync your /var/www/ and other critical directories.
MySQL High Availability
Other random chunks of advice:
Friends don't let friends use MyISAM.
rsync is a great tool, but it is not a strategy. OS and software managed by a configuration management system, code and templates stored in a VCS and checked out to the servers, data stored in a database.
Natural disasters, man-made disasters, and business disasters are all disasters, but we generally only think of the first two, or often just the first one when planning for disasters. If Linode ceased operations right now and simultaneously turned off every single host in every datacenter, would your data survive adequately?
If you know how to make a proper bulleted list in this blasted forum software, please tell the world immediately: if you were to get hit by a bus tomorrow morning, I'd be stuck doing the * thing in perpetuity.
@hoopycat:
- If you know how to make a proper bulleted list in this blasted forum software, please tell the world immediately: if you were to get hit by a bus tomorrow morning, I'd be stuck doing the * thing in perpetuity.
Not sure if this qualifies as "proper", but perhaps:
* See
this
example
"If you know how to make a proper bulleted list in this blasted forum software, please tell the world immediately: if you were to get hit by a bus tomorrow morning, I'd be stuck doing the * thing in perpetuity." which was created by:
[list][*]See
[*]this
[*]example
[*]"If you know how to make a proper bulleted list in this blasted forum software, please tell the world immediately: if you were to get hit by a bus tomorrow morning, I'd be stuck doing the * thing in perpetuity."[/list]
I stick the opening and closing operation on the same line as surrounding text to avoid stray vertical space in the rendered output.
See also
– David
My backup strategy for a whole whackload of servers is nightly rsyncs (after a mysql dump) to my file server, which then does a ZFS snapshot on a pool that has both compression and deduplication enabled. I've had to use it in the past, when I accidentally deleted some files from an anime club's library system while doing maintenance. I like that getting into a ZFS snapshot is as simple as cd'ing into a hidden directory (there's a hidden directory for each snapshot, and once you're in it, you're looking at the snapshot). I have high hopes that btrfs eventually does all this stuff as well as ZFS so that I can get off Solaris; the network drivers are horrendously buggy/crashy.