High disk IO with MySQL master-master replication

3 votes
3 answers
1342 views
mysql replication master-master-replication semi-sync-replication mysql-8.0
                          We've been evaluating an active-passive master-master replication setup for our MySQL database, running on version 8.0. While diagnosing some general slowness issues, we were surprised to see very high disk IO ranging from 500MB/s - 1500MB/s on both servers.

I've found that stopping the slave on either server brings the disk IO back to expected levels, 15MB/s - 30 MB/s, and I'm concerned that we have a misconfiguration somewhere.

Here is what I've checked:

* The server ID's are unique.
* We are doing row-based replication.
* We are using the semi-synchronous replication plugin.
* The writes are happening in an LVM volume that exclusively houses the data directory.
* vmstat shows no swapping.
* From the network perspective, we see a few megabits per second going between the servers at most.

Here is the redacted configuration:
    
    [mysqld]
    ...
    
    # General
    server_id                        =                          
    event-scheduler                  = ON                              
    sync_binlog                      = 1                               
    log_bin                          = /var/lib/mysql/mysql-bin        
    binlog_format                    = ROW                             
    innodb_file_per_table                                              
    innodb_flush_log_at_trx_commit   =  1                              
    skip-external-locking                                                                                                
    
    
    # Replication
    relay_log                        = /var/lib/mysql/mysql-relay-bin
    relay_log_recovery               = ON
    log_slave_updates                = 1
    sync_master_info                 = 1
    sync_relay_log                   = 1
    sync_relay_log_info              = 1
    rpl_semi_sync_master_enabled     = 1
    rpl_semi_sync_master_timeout     = 10000    # 10 seconds
    rpl_semi_sync_slave_enabled      = 1
    
    
    # Fine Tuning
    key_buffer_size          = 16M
    max_allowed_packet       = 16M
    thread_stack             = 192K
    thread_cache_size        = 8
    myisam-recover-options   = BACKUP           
    max_connections          = 800
    max_connect_errors       = 1000
    innodb_buffer_pool_size  = 192G    # Server has 256GB RAM
    innodb_log_buffer_size   = 256M
    innodb_log_file_size     = 2G
    innodb_flush_method      = O_DIRECT
    innodb_lock_wait_timeout = 60
    ...

My problem sounds suspiciously similar to this person's dead thread  from 2013 (Who were you, DenverCoder9? WHAT DID YOU SEE!? )

**EDIT:**

In an effort to zero in on the problem, we simplified the setup for testing. We used one master and one slave, both running 8.0.11, same configuration as above, and we saw the following write patterns according to iostat (master is blue, slave is red):



(The "Time" on the X-axis is the minutes and seconds of the current time.)

Those per-minute 1+ GB spikes on the master were tracked down to a script updating two small tables (1000-2000 rows each), around 10-15 MB in size on disk. These tables had keys, but the keys were not primary or unique. The script would DELETE all rows in the tables and then repopulate them by performing one INSERT statement per row, all within a transaction. Despite the glaring problems, I still don't understand how operations on tables this small can result in this much disk IO.

Without fixing the problems, we then returned to the master-master active-passive setup and saw the following (active master is blue, passive master is red):



So some additional questions I have are this. Why is the slave consistently writing much more than the master in the first graph? Also, why in master-master replication is there the ping-pong effect seen above where writes to the master result in heavy IO on the slave, which results in heavy IO again on the master?



                        
Asked by billyw (151 rep)
May 21, 2018, 05:35 PM
Last activity: Jun 18, 2018, 06:37 PM
High disk IO with MySQL master-master replication

Related Questions