I’ve been thinking recently about the failure scenarios of MySQL replication clusters, such as master-master pairs or master-master-with-slaves. There are a few tools that are designed to help manage failover and load balancing in such clusters, by moving virtual IP addresses around. The ones I’m familiar with don’t always do the right thing when an irregularity is detected. I’ve been debating what the best way to do replication clustering with automatic failover really is.
I’d like to hear your thoughts on the following question: what types of scenarios require what kind of response from such a tool?
I can think of a number of failures. Let me give just a few simple examples in a master-master pair:
- Problem: Query overload on the writable master makes mysqld unresponsive
- Do nothing. Moving the queries to another server will cause cascading failures.
- Problem: The …