Cross-datacenter failover questions
For clarification sake, I'm really only interested in redundancy in the (hopefully rare) case that the primary datacenter is unavailable. I don't care about load balancing, etc. Just a backup to reduce the chances of site downtime. The sites I host don't get a huge amount of traffic, but we're moving from Dreamhost where frequent random outages have my clients all in a huff.
Basically, I'd simply like to know what additional/different steps are required for a cross-datacenter setup as compared to the tutorial. Is it as simple as using static IPs in place of private IPs in the "Configure Private Networking" section? My experience is that nothing is ever that simple, and I expect there's more to it than that.
Thanks
5 Replies
I'm not sure I'd use DRBD. I'd probably use whatever replication capabilities are available in the database engine, perhaps with rsync (or maybe DRBD, if it will do it) for relatively static files.
For my own purposes, I've more or less concluded that I'm better with a basic synchronization process to a secondary (rsync/unison for static filesystem content, database appropriate support for replication), but leaving the process for cut-over under manual control.
hoopycat's bandwidth comment is well taken too - for example, in one of my node pairs, my main standby node in the same DC as the primary, with maximum sync latency of 60s between the two over the private network. Doing the same between DCs could eat a full node's bandwidth allotment over a month, so might require allowing for a slightly larger (say 5min) latency or just allocate the bandwidth to that task.
Of course, this does impose a minimum latency on any eventual cut-over (mostly my deciding to take the step and then DNS propagation), but to be honest, I'm mostly concerned with protecting against an unexpected multi-hour or more outage due to serious failure than a few minutes here or there.
The incremental cost (time, configuration, expense) to achieve zero latency HA is pretty extreme for the benefits, at least in my own scenarios, and certainly without a mechanism to work around DNS propagation, there's always going to be a reasonable latency to enabling a standby anyway.
– David
Here's what I'm thinking of doing, as a sort-of compromise solution. First off, the site doesn't change all that much on a regular basis aside from new user registrations (which are strictly database changes). I do all of the new article posting myself anyway, so I can do any image/file syncing manually when it's required. For the database, I think I can do pretty much the same… manually sync when I make changes to articles, etc. And I'll just turn off the registered area on the backup site so no database changes can happen there, and no other synchronization is necessary… so the public portion of the site can be available on the secondary host node in case of primary failure.
Honestly I think any more than that is overkill for this particular client, and you all make very valid points about the difficulty involved in doing any more than that.
I was looking at DNS Made Easy to handle DNS failover (I've seen it mentioned on other posts here, and it looks like a good/easy solution). Any other good options for handling that part?
In terms of more timely updates for when you post an article, if you've got some sort of article posting script, you can just have it execute the script that the nightly cron job executes (or if you have no article posting script, initiate it yourself). The sync should be pretty fast since little will have changed, and while the import on the secondary box might take a while, that shouldn't matter since your primary box doesn't need to wait on that.