High availability strange error
A strange issue is (seemingly) randomly occuring. After a while of running in the HA configuration (from here
If ha1 is the master, ha2 starts trying to take over the configured services. It tries to bring up the "floating" IP which it manages (and thus causes all my sites to go down), but it can't mount the DRBD drive as it's already mounted on ha1.
On ha1, crm_mon shows that ha1 is online and ha2 is OFFLINE.
On ha2, crm_mon shows that ha2 is online and ha1 is OFFLINE.
I'm not particularly sure on what logs I should be looking at, so if anyone could help that'd be appreciated.
Rebooting ha2 seems to work fine so I'm guessing it might be something to do with that server… I have not tried it the other way around yet.
3 Replies
here
Hopefully someone else here will chime in.
–
Travis
I opened a Linode ticket but they can't really do much, but they did point me in the direction of the LISH shell, which had an error about drbd split brain, as I mentioned in my previous post. Unfortunately as I suspected, it's not the cause of the problem, just a resulting factor. The error is something to do with the cluster management stuff which I'm clueless with.
At the moment I've set ha2 to standby (crm node standby ha2) which has caused it not to stop all my sites working, but the error still exists. It's worth noting that because ha2 hasn't tried taking over, the drbd split brain situation hasn't arisen, hence my logic that that's not the root of the problem.
Even stranger is that yesterday ha2 was standby + OFFLINE, but today (without restarting ha2), it is just standby (therefore online).
I don't even know what to look for in the logs… I'm considering just dropping the second Linode completely and going back to a single Linode, this hassle just isn't worth the extra money I'm spending…