Sample Header Ad - 728x90

How to restart MariaDB Galera cluster?

5 votes
3 answers
47678 views
How to restart MariaDB Galera cluster? After all node have been crashed I try to recover the cluster but without success. I have only 2 nodes. As documentation says I set a parameter on one of the node: set global wsrep_provider_options="pc.bootstrap=true"; And then try to start first node: systemctl start mariadb After that I get an error: Oct 11 16:11:12 proxy1 sh: 2016-10-11 16:11:12 140291677038720 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB) starting as process 2402 ... Oct 11 16:11:15 proxy1 sh: WSREP: Recovered position b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:141 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] /usr/sbin/mysqld (mysqld 10.1.18-MariaDB) starting as process 2434 ... Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Read nil XID from storage engines, skipping position init Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_load(): Galera 25.3.18(r3632) by Codership Oy loaded successfully. Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: CRC-32C: using hardware acceleration. Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Found saved state: b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:-1 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.0.41; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140046790919936 [Note] WSREP: Service thread queue flushed. Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Assign initial position for certification: 141, protocol version: -1 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: wsrep_sst_grab() Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Start replication Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Setting initial position to b6c1dc93-8fa7-11e6-933e-e64cd44e3be0:141 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: protonet asio version 0 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: Using CRC-32C for message checksums. Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: backend: asio Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: gcomm thread scheduling priority set to other:0 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory) Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: restore pc from disk failed Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: GMCast version 0 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: EVS version 0 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: gcomm: connecting to group 'test_cluster', peer '192.168.0.41:,192.168.0.42:' Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 30a7b2e6 tcp://192.168.0.41:4567 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.41:4567' points to own listening address, blacklisting Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 30a7b2e6 tcp://192.168.0.41:4567 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') connection established to 1ef15511 tcp://192.168.0.42:4567 Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: declaring 1ef15511 at tcp://192.168.0.42:4567 stable Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Warning] WSREP: no nodes coming from prim view, prim not possible Oct 11 16:11:15 proxy1 mysqld: 2016-10-11 16:11:15 140047023368320 [Note] WSREP: view(view_id(NON_PRIM,1ef15511,2) memb { Oct 11 16:11:15 proxy1 mysqld: 1ef15511,0 Oct 11 16:11:15 proxy1 mysqld: 30a7b2e6,0 Oct 11 16:11:15 proxy1 mysqld: } joined { Oct 11 16:11:15 proxy1 mysqld: } left { Oct 11 16:11:15 proxy1 mysqld: } partitioned { Oct 11 16:11:15 proxy1 mysqld: }) Oct 11 16:11:18 proxy1 mysqld: 2016-10-11 16:11:18 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting off Oct 11 16:11:19 proxy1 mysqld: 2016-10-11 16:11:19 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.42:4567 Oct 11 16:11:20 proxy1 mysqld: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: forgetting 1ef15511 (tcp://192.168.0.42:4567) Oct 11 16:11:20 proxy1 mysqld: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: (30a7b2e6, 'tcp://0.0.0.0:4567') turning message relay requesting off Oct 11 16:11:20 proxy1 mysqld: 2016-10-11 16:11:20 140047023368320 [Warning] WSREP: no nodes coming from prim view, prim not possible Oct 11 16:11:20 proxy1 mysqld: 2016-10-11 16:11:20 140047023368320 [Note] WSREP: view(view_id(NON_PRIM,30a7b2e6,3) memb { Oct 11 16:11:20 proxy1 mysqld: 30a7b2e6,0 Oct 11 16:11:20 proxy1 mysqld: } joined { Oct 11 16:11:20 proxy1 mysqld: } left { Oct 11 16:11:20 proxy1 mysqld: } partitioned { Oct 11 16:11:20 proxy1 mysqld: 1ef15511,0 Oct 11 16:11:20 proxy1 mysqld: }) Oct 11 16:11:25 proxy1 mysqld: 2016-10-11 16:11:25 140047023368320 [Note] WSREP: cleaning up 1ef15511 (tcp://192.168.0.42:4567) Oct 11 16:11:46 proxy1 mysqld: 2016-10-11 16:11:46 140047023368320 [Note] WSREP: view((empty)) Oct 11 16:11:46 proxy1 mysqld: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld: at gcomm/src/pc.cpp:connect():162 Oct 11 16:11:46 proxy1 mysqld: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1380: Failed to open channel 'test_cluster' at 'gcomm://192.168.0.41,192.168.0.42': -110 (Connection timed out) Oct 11 16:11:46 proxy1 mysqld: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: gcs connect failed: Connection timed out Oct 11 16:11:46 proxy1 mysqld: 2016-10-11 16:11:46 140047023368320 [ERROR] WSREP: wsrep::connect(gcomm://192.168.0.41,192.168.0.42) failed: 7 Oct 11 16:11:46 proxy1 mysqld: 2016-10-11 16:11:46 140047023368320 [ERROR] Aborting Oct 11 16:11:47 proxy1 systemd: mariadb.service: main process exited, code=exited, status=1/FAILURE Oct 11 16:11:47 proxy1 systemd: Failed to start MariaDB database server. -- Subject: Unit mariadb.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit mariadb.service has failed. -- -- The result is failed. Oct 11 16:11:47 proxy1 systemd: Unit mariadb.service entered failed state. Oct 11 16:11:47 proxy1 systemd: mariadb.service failed. Oct 11 16:11:47 proxy1 polkitd: Unregistered Authentication Agent for unix-process:2360:148848 (system bus name :1.15, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus) How to recover a cluster?
Asked by Oleksandr (403 rep)
Oct 11, 2016, 10:20 AM
Last activity: Jul 29, 2023, 12:09 PM