Automated failover strategy for master-slave Mysql replication - why would this not work?

Solution 1:

What you want is a High Availability cluster and I think that your suggested approach seems a bit strange.

A good way to achieve this is creating a Linux HA cluster and sync your MySQL using DRDB sync on filesystem level.

In such a setup you have 3 things:

  1. The Cluster Messaging Layer (Linux-HA or CoroSync)
  2. The Cluster Resource Manager (Pacemaker)
  3. The disk sync (DRDB)

Instead of making a lot of code in your application you use a virtual IP address that you move around to the current active node. Also you use STONITH (Shoot The Other Node In The Head (I did not make this up)) to make sure that the first node is actually dead before trying to take over the resources.

There's some great material to read on these links: http://www.linux-ha.org/wiki/Main_Page http://www.clusterlabs.org/wiki/DRBD_MySQL_HowTo http://theclusterguy.clusterlabs.org/

Solution 2:

The reason automatic failover is not conducive has to do with replication lag. If the slave happens to be behind and failover occurs, you may be writing updates with keys that do not exist yet because the inserts from the master has not been written yet. The more replication lag, the more this is a problem. At my company we use DRBD for automatic failover since the DRBD server you failover to is an exact disk level copy of the original master. As a policy, we do manual for failover of master/slave and master/master setups.