enact failover across data centers
Hi,
I currently lease Linodes in Atlanta, Dallas, Fremont, Newark, Toronto, and Tokyo 2. Over the years and at times, I have found that issues affect service or connection to a specific data-center and in building failover for some of my products, I am looking for something like IP sharing, or a simple way to route traffic to (a clone in) a different data center when and as needed.
What do you recommend?
2 Replies
As I understand it, IP ranges are data centre-specific so you couldn’t use IP failover solutions.
The only option I can think of is to change the IP in DNS to point to your clone when you need it, and keep a low TTL on these records so it doesn’t take long for clients to pick up the change.
If you want this to be fully automated you would need an external source to verify your primary has gone down, and use the Linode API (or your DNS provider’s if you don’t use Linode for your DNS) to update the entry accordingly. You’d also need to replicate the state between your primary and your clone so it’s as up-to-date as possible when the switch occurs.
@andysh --
You write:
You’d also need to replicate the state between your primary and your clone so it’s as up-to-date as possible when the switch occurs.
This is no easy feat to pull off either… The OP is going to have to be judicious in decisions as to what changes trigger an immediate replication and which can wait for a scheduled replication.
This is also going to have a tremendous impact on bandwidth on all the OP's Linodes. I once worked on a project that involved LDAP… The customer (mistakenly) thought that every change had to trigger an immediate replication. OK…no worries… He was mighty unhappy when he discovered that nearly 100% of his bandwidth was being sucked up by LDAP replications triggered by any one of 5 sites around the world. Change a status bit; trigger a replication.
He opined that this problem had to be because of our "faulty software". When we showed him the logs of the number/magnitude of replications (sometimes queued 10 or more deep), he had to back off the blame game and start approaching the problem more rationally.
-- sw