Sample Header Ad - 728x90

MariaDB master feeds slave's Slave_IO with a low rate

2 votes
0 answers
56 views
I have a pair of identical servers: Ubuntu 22.04 LTS with MariaDB 11.7.2. 32 CPU cores, 128 GB RAM, 2x2TB NVME, i210 gigabit ethernet A classic master-slave replication (logfile/logpos) configured. Everything work pretty fine except slave slowly lagging behind master. Slave is in the state:
Slave_SQL_State: Slave has read all relay log; waiting for more updates
       Slave_IO_State: Waiting for master to send event
      Master_Log_File: log-bin.062358
  Read_Master_Log_Pos: 43451522
    . . . . 
Seconds_Behind_Master: 0
Master is in the state:
1201054 repl 1.2.3.4:45678  NULL  Binlog Dump  81402  Writing to net  NULL  0.000
The problem is that the newest master binlog file is log-bin.069669 - 7200+ chunks ahead, 100MB each. So slave is 700GB+ behind the master. There are LOT of updates on the master, approx 500MB of binlog per minute. 500MB/m = 4000Mb/m = 70Mb/s that is way lower than available 1Gb/s AVG load on the master is quite low. Ping between servers is 25ms. I have changed binlog_row_image variable from FULL to MINIMAL (an advice from here - https://dba.stackexchange.com/a/289768/7895 ) with no visible effect. The only mystic symptom is that slave shows zero seconds behind master most of time and sometimes i'm lucky enough to see a real lag with show slave status\G. Has anyone encountered a similar problem? What was the cause, and how did you overcome it? UPDATE ------ Master:
MariaDB [(none)]> show master status;
+----------------+----------+-----------------+------------------+
| File           | Position | Binlog_Do_DB    | Binlog_Ignore_DB |
+----------------+----------+-----------------+------------------+
| log-bin.073714 | 96274193 | aaa,bbb,ccc,ddd |                  |
+----------------+----------+-----------------+------------------+
1 row in set (0.000 sec)
Slave:
MariaDB [(none)]> show all slaves status \G
*************************** 1. row ***************************
               Connection_name:
               Slave_SQL_State: Slave has read all relay log; waiting for more updates
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 1.2.3.4
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: log-bin.065969
           Read_Master_Log_Pos: 65381018
                Relay_Log_File: relay-bin.012836
                 Relay_Log_Pos: 6879217
         Relay_Master_Log_File: log-bin.065969
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 65345606
               Relay_Log_Space: 18336490
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: Yes
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
 Master_SSL_Verify_Server_Cert: Yes
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 7
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 0-7-1737276444
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: conservative
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
              Slave_DDL_Groups: 1
Slave_Non_Transactional_Groups: 0
    Slave_Transactional_Groups: 33122121
          Replicate_Rewrite_DB:
          Retried_transactions: 1
            Max_relay_log_size: 1073741824
          Executed_log_entries: 142786513
     Slave_received_heartbeats: 0
        Slave_heartbeat_period: 30.000
                Gtid_Slave_Pos: 0-7-1737276444
        Master_last_event_time: 2025-06-06 19:43:36
         Slave_last_event_time: 2025-06-06 19:43:36
        Master_Slave_time_diff: 0
1 row in set (0.001 sec)

MariaDB [(none)]> show global variables like '%parallel%';
+-------------------------------+--------------+
| Variable_name                 | Value        |
+-------------------------------+--------------+
| slave_domain_parallel_threads | 0            |
| slave_parallel_max_queued     | 131072       |
| slave_parallel_mode           | conservative |
| slave_parallel_threads        | 4            |
| slave_parallel_workers        | 4            |
+-------------------------------+--------------+
5 rows in set (0.001 sec)
UPDATE #2 --------- When I run stop slave; start slave; the lag become decreasing for some time but then everything turned back. The steady lag decreasing was achieved when replication been restarted every minute or so. Slave lag I think this proves that the VPN is the cause of the problem. I'm not a VPN guy, can anyone at least give me some keywords to google? UPDATE #3 --------- I have started netcat listener on the slave host: nc -l 23456 | pv -r -b > /dev/null and a feeder on a master side: cat /mnt/repl/log-bin.*[0-9] | pv -L 10M -s 100G -S -r -b | nc -4 1.2.3.4 23456 I got a stable 10MB/s stream so I'm confident now it isn't a networking issue.
Asked by Kondybas (4800 rep)
Jun 7, 2025, 10:26 AM
Last activity: Jun 12, 2025, 07:27 AM