In MySQL a big portion of the problems you’re facing is different replication delays so there’s no surprise this is one of the most common interview questions on DBA interviews and I still found people having problems with explaining what they would do in a certain situation. This is why I decided to write a bit about the subject.
1. IO
99% of times the reason is IO. The slave cannot keep up with the amount of writes it gets from the binary logs while in parallel it has to return results of queries as well. In spite of the common belief in MySQL it’s much easier to saturate the disk subsystem (even a raid 10 with 8 SAS disk and cache) than the network.
In this situation all you can do is try to remove some pressure from the slaves.
One way to do this is setting innodb_flush_log_at_trx_commit to 2 if it used to be 1. Usually this is enough.
…[Read more]