Sample Header Ad - 728x90

Cluster MariaDB Primary + Two Replicas orchestrated via MaxScale: how to recover from a major disruption?

0 votes
0 answers
76 views
I am working on a Mariadb cluster primary/replica handled via maxscale, to which I successfully added a third node. When I left things few days ago everything was fine. today I found that all nodes were down, supposedly because of a power outage. This is a lab environment, so nobody really cared to check before rebooting the hardware. Because it is a lab environment, I could easily restart from the last point, but since I'm here, I'd like to take the chance and learn something. I managed to reboot all nodes, except one which is not getting back to the cluster. This are my variables right now:
mariadb221 [(none)]> show global variables like '%gtid%';

Variable_name            Value
gtid_binlog_pos          1-3000-1359255
gtid_binlog_state        1-2000-358368,1-1000-1359225,1-3000-1359255
gtid_cleanup_batch_size  64
gtid_current_pos         1-3000-1359255
gtid_domain_id           1
gtid_ignore_duplicates   OFF
gtid_pos_auto_engines
gtid_slave_pos           1-3000-1359255
gtid_strict_mode         ON
wsrep_gtid_domain_id     0
wsrep_gtid_mode          OFF
mariadb222 [(none)]> show global variables like '%gtid%';

Variable_name            Value
gtid_binlog_pos          1-1000-1359231
gtid_binlog_state        1-2000-358368,1-3000-359229,1-1000-1359231
gtid_cleanup_batch_size  64
gtid_current_pos         110001359254
gtid_domain_id           1
gtid_ignore_duplicates   OFF
gtid_pos_auto_engines
gtid_slave_pos           110001359254
gtid_strict_mode         ON
wsrep_gtid_domain_id     0
wsrep_gtid_mode          OFF
mariadb223 [(none)]> show global variables like '%gtid%';
Variable_name            Value
gtid_binlog_pos          1-3000-1359255
gtid_binlog_state        1-1000-1359230,1-2000-358368,1-3000-1359255
gtid_cleanup_batch_size  64
gtid_current_pos         1-3000-1359255
gtid_domain_id           1
gtid_ignore_duplicates   OFF
gtid_pos_auto_engines                            
gtid_slave_pos           1-3000-1359255
gtid_strict_mode         ON
wsrep_gtid_domain_id     0
wsrep_gtid_mode          OFF
Aside from deleting the bad node and recreating it like it was a new one (backup of the remaining slave + restore of that backup onto it + adding it to the cluster again), is there anything I can do to recover from this position? Any suggestion of anything else I should check will also be appreciated. P.S.: the bad node is the second one
Asked by albea798 (1 rep)
Oct 20, 2024, 09:57 AM
Last activity: Oct 23, 2024, 11:06 AM