Multiple datacenter reduncancy - sync/replication thoughts
I did some experimentation a while back with two apache servers, using two A records (pointing an A record at the IP of each server). It seems that all the major browsers, when presented with dual A records will pick one of the servers at random, and then stick with that server until a) client is rebooted, or b) unless IP connectivity to the chosen server is lost (in which case it seems to move immediately to the alternative server listed in DNS). So it seems that with two linodes in different DCs, and dual A records would provide rudimentary failover as well as some spread of traffic.
As our sites are database driven, and also rely on files on disk, this brings up the issue of how to keep the two locations in sync with each other so that there is as close as possible to "live" data should one server go away. I'm not so worried about it being 100% "immediate", as browsers seem to stick with the same server once they've chosen an A record to go with - unless that server fails - as long as the replication occurs within a minute or so we should be in decent shape.
We've experimented with some options here; GlusterFS (server and client running on both servers, and accessing files via the client mountpoint) seems to do a decent job; with a low timeout (~3 seconds) it fails over to "local mode" nicely if connectivity is lost, and then self heals once connectivity is restored.
For MySQL, we're looking at running both servers as Master, and using Master-Master replication (where both servers are a master and a slave of the other). Provided that our auto_increments are offset (eg 1,3,5,7 vs 2,4,6,8) this seems like it would work.
Does anyone have any experience of using GlusterFS or MySQL between two different datacenters like this? Or can recommend a different solution that would work better for any reason?
Also interested in any ideas people have for security (for GlusterFS and/or SQL replication). Gluster only supports ROT13 "encryption" (i.e. no encryption) so I'm thinking we would need to wrap in either a VPN tunnel, or simply run an SSH Port forward between the two machines.
Any thoughts / comments / suggestions / experience on this would be greatly appreciated!
2 Replies
@jeffery:
It seems that all the major browsers, when presented with dual A records will pick one of the servers at random, and then stick with that server until a) client is rebooted, or b) unless IP connectivity to the chosen server is lost (in which case it seems to move immediately to the alternative server listed in DNS). So it seems that with two linodes in different DCs, and dual A records would provide rudimentary failover as well as some spread of traffic.
No. If you have a hostname pointing to two different IP addresses, if one goes down and connections to it time out - chances are that any clients connecting to the server which is down will end up with timeouts and not immediately switch over. You'd end up having to drop the dead servers IP address.
Instead of dealing with the headaches of trying to do multi-master, just do master-slave instead. Have a system in place to alert you if this happens to judge if the downtime is severe enough to switch to the slave. Obviously make sure you test the procedure you've got in place to make sure it can work at any time.