Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
139
views
How do I add 2 missing nodes to MariaDB Galera Cluster using IST and not SST
I inherited an issue where a MariaDB Galera Cluster has only one out of 3 nodes working. Node1: up and running in production and is accepting reads and writes Node2: fresh install it just need to sync with node 1 Node3: fresh install it just need to sync with node 1 If I run `service mysql start` on...
I inherited an issue where a MariaDB Galera Cluster has only one out of 3 nodes working.
Node1: up and running in production and is accepting reads and writes
Node2: fresh install it just need to sync with node 1
Node3: fresh install it just need to sync with node 1
If I run
service mysql start
on Node2/Node3 it changes the state of wsrep_local_state_comment
from Synced
to Donor/Desynced
and a SST is triggered but I will have a lot of downtime.
I have a 2TB of databases and it's going to take about 5-7 hours for a newly added node to sync, and I can't afford the downtime (as an idea, during the day I have 1GB/minute of binlogs and at night about 4GB/hour of binlogs).
I take a db backup every night with Percona XtraBackup and I know how to restore it on a fresh server.
I have galera.cache
set at 32GB.
Is there a way of restoring the other 2 nodes without triggering an SST?
I was thinking of restoring the backup of Node1 to Node2 maybe it will trigger an IST.
netrangermike
(1 rep)
Feb 21, 2023, 12:16 AM
• Last activity: Aug 5, 2025, 09:06 PM
0
votes
1
answers
1355
views
InnoDB Buffer causing a lot of deadlocks and connection/client timeouts
I have a PXC cluster (5.7) with three nodes running on CentOS 7 VMs, each with the following system info: mysql_db# free total used free shared buff/cache available Mem: 8009248 2591576 358504 146040 5059168 4875716 Swap: 5242876 3592 5239284 mysql_db# free -g total used free shared buff/cache avail...
I have a PXC cluster (5.7) with three nodes running on CentOS 7 VMs, each with the following system info:
mysql_db# free
total used free shared buff/cache available
Mem: 8009248 2591576 358504 146040 5059168 4875716
Swap: 5242876 3592 5239284
mysql_db# free -g
total used free shared buff/cache available
Mem: 7 2 0 0 4 4
Swap: 4 0 4
mysql_db# nproc
8
mysql_db# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cl-root 44G 20G 25G 46% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 146M 3.7G 4% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sda1 1014M 272M 743M 27% /boot
tmpfs 783M 0 783M 0% /run/user/1318214225
tmpfs 783M 0 783M 0% /run/user/0
mysql_db# top
top - 07:39:38 up 50 days, 4:54, 2 users, load average: 0.46, 0.61, 0.61
Tasks: 227 total, 1 running, 226 sleeping, 0 stopped, 0 zombie
%Cpu(s): 20.3 us, 4.0 sy, 0.0 ni, 75.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8009248 total, 357176 free, 2592348 used, 5059724 buff/cache
KiB Swap: 5242876 total, 5239284 free, 3592 used. 4874700 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1574 mysql 20 0 9241472 2.3g 238932 S 104.8 30.1 6343:02 mysqld
3988 root 20 0 172268 2348 1580 R 19.0 0.0 0:00.12 top
13164 root 20 0 177264 94680 4676 S 19.0 1.2 2693:42 mysqld_exporter
1 root 20 0 191404 4420 2612 S 0.0 0.1 187:50.00 systemd
mysql_db# iostat
Linux 3.10.0-862.14.4.el7.x86_64 (dev_mysql_02.pd.local) 11/20/2018 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.35 0.00 0.36 0.05 0.00 97.24
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 3.83 153.19 153.17 664744979 664666279
dm-0 3.74 139.26 153.17 604320443 664647691
dm-1 0.00 0.00 0.00 2236 2460
mysql_db#lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 1
Model name: AMD Opteron(TM) Processor 6276
Stepping: 2
CPU MHz: 2300.028
BogoMIPS: 4600.05
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 16K
L1i cache: 64K
L2 cache: 2048K
L3 cache: 12288K
NUMA node0 CPU(s): 0-7
Below is some important InnoDB variables in the my.cnf config file:
...
back_log = 65535
binlog_format = ROW
character_set_server = utf8
collation_server = utf8_general_ci
default_storage_engine = InnoDB
enforce-gtid-consistency = 1
expand_fast_index_creation = 1
gtid_mode = ON
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 3
innodb_buffer_pool_size = 3G
innodb_data_file_path = ibdata1:64M;ibdata2:64M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1600
innodb_large_prefix
innodb_log_file_size = 256M
innodb_print_all_deadlocks = 1
innodb_read_io_threads = 64
innodb_stats_on_metadata = FALSE
innodb_write_io_threads = 64
long_query_time = 1
log_bin_trust_function_creators = 1
master_info_repository = TABLE
max_allowed_packet = 64M
max_connect_errors = 4294967295
max_connections = 2600
max_user_connections = 2500
min_examined_row_limit = 1000
relay_log_info_repository = TABLE
relay-log-recovery = TRUE
skip-name-resolve
slave_parallel_workers = 8
slow_query_log = 1
#slow_query_log_timestamp_always = 1
#thread_cache = 1024
tmpdir = /srv/tmp
transaction_isolation = READ-COMMITTED
updatable_views_with_limit = 0
user = mysql
wait_timeout = 60
userstat
table_open_cache = 4096
innodb_open_files = 10240
open_files_limit = 10240
connect_timeout=60
thread_cache_size = 4096
sql_mode = NO_ENGINE_SUBSTITUTION
query_cache_size = 0
slave_pending_jobs_size_max=32M
range_optimizer_max_mem_size=8M
log_timestamps=SYSTEM
server-id= 172162 #use IP
userstat=1
...
As seen above, i have set innodb_buffer_pool_size to 3GB, but not sure if this is a good value given the system info provided above, especially that Total RAM is 7.6GB, with about 5GB allocated to SWAP. We have noticed in the mysql error logs a lot of timeouts and deadlocks frequently happening, and running SHOW ENGINE INNODB STATUS shows this:
Status:
=====================================
2018-11-20 07:46:14 0x7f6acc0f9700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 8 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 201141 srv_active, 0 srv_shutdown, 2836117 srv_idle
srv_master_thread log flush and writes: 3037214
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 32658
OS WAIT ARRAY INFO: signal count 33393
RW-shared spins 0, rounds 156414, OS waits 27395
RW-excl spins 0, rounds 222081, OS waits 1171
RW-sx spins 11313, rounds 167795, OS waits 1583
Spin rounds per wait: 156414.00 RW-shared, 222081.00 RW-excl, 14.83 RW-sx
------------------------
LATEST FOREIGN KEY ERROR
------------------------
2018-11-20 07:06:51 0x7f6aad2eb700 Transaction:
TRANSACTION 25842563, ACTIVE 1 sec updating or deleting
mysql tables in use 1, locked 1
12 lock struct(s), heap size 1136, 6 row lock(s), undo log entries 3
MySQL thread id 577793, OS thread handle 140096148780800, query id 3550710 10.168.103.11 slashire_dev update
INSERT INTO member_bank_acct (
ba_member_id
,ba_create_user
,ba_update_user
,ba_country
,ba_name
,ba_currency
,ba_bank_name
,ba_bank_acct_num
,ba_swift_code
,ba_city
,ba_branch
,ba_iban
,ba_beneficiary_bank
) VALUES ('00000000000000000252','SYSTEM','SYSTEM','','','','','','','','','','') ON DUPLICATE KEY UPDATE ba_country
= '',ba_name
= '',ba_currency
= '',ba_bank_name
= '',ba_bank_acct_num
= '',ba_swift_code
= '',ba_city
= '',ba_branch
= '',ba_iban
= '',ba_beneficiary_bank
= ''
Foreign key constraint fails for table slashire_dev
.member_bank_acct
:
,
CONSTRAINT member_bank_acct_fk02
FOREIGN KEY (ba_country
) REFERENCES country
(c_code
)
Trying to add in child table, in index member_bank_acct_fk02 tuple:
DATA TUPLE: 2 fields;
0: len 0; hex ; asc ;;
1: len 20; hex 3030303030303030303030303030303030323532; asc 00000000000000000252;;
But in parent table slashire_dev
.country
, in index PRIMARY,
the closest match we can find is record:
PHYSICAL RECORD: n_fields 9; compact format; info bits 0
0: len 2; hex 3030; asc 00;;
1: len 6; hex 00000000566c; asc Vl;;
2: len 7; hex b0000001eb0110; asc ;;
3: len 3; hex 416c6c; asc All;;
4: len 2; hex 8000; asc ;;
5: len 5; hex 999f0c0000; asc ;;
6: len 6; hex 53595354454d; asc SYSTEM;;
7: len 5; hex 999f0c0000; asc ;;
8: len 6; hex 53595354454d; asc SYSTEM;;
------------
TRANSACTIONS
------------
Trx id counter 25989367
Purge done for trx's n:o < 25989079 undo n:o < 0 state: running but idle
History list length 27
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421576625901616, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625897008, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625900464, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625899312, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625895856, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625898160, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625890096, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625893552, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625888944, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625867056, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625894704, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625892400, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625891248, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625886640, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625887792, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625885488, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625884336, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625883184, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625882032, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625880880, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625879728, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625878576, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625877424, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625876272, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625875120, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625873968, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625872816, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625871664, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625870512, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625869360, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421576625868208, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (read thread)
I/O thread 7 state: waiting for completed aio requests (read thread)
I/O thread 8 state: waiting for completed aio requests (read thread)
I/O thread 9 state: waiting for completed aio requests (read thread)
I/O thread 10 state: waiting for completed aio requests (read thread)
I/O thread 11 state: waiting for completed aio requests (read thread)
I/O thread 12 state: waiting for completed aio requests (read thread)
I/O thread 13 state: waiting for completed aio requests (read thread)
I/O thread 14 state: waiting for completed aio requests (read thread)
I/O thread 15 state: waiting for completed aio requests (read thread)
I/O thread 16 state: waiting for completed aio requests (read thread)
I/O thread 17 state: waiting for completed aio requests (read thread)
I/O thread 18 state: waiting for completed aio requests (read thread)
I/O thread 19 state: waiting for completed aio requests (read thread)
I/O thread 20 state: waiting for completed aio requests (read thread)
I/O thread 21 state: waiting for completed aio requests (read thread)
I/O thread 22 state: waiting for completed aio requests (read thread)
I/O thread 23 state: waiting for completed aio requests (read thread)
I/O thread 24 state: waiting for completed aio requests (read thread)
I/O thread 25 state: waiting for completed aio requests (read thread)
I/O thread 26 state: waiting for completed aio requests (read thread)
I/O thread 27 state: waiting for completed aio requests (read thread)
I/O thread 28 state: waiting for completed aio requests (read thread)
I/O thread 29 state: waiting for completed aio requests (read thread)
I/O thread 30 state: waiting for completed aio requests (read thread)
I/O thread 31 state: waiting for completed aio requests (read thread)
I/O thread 32 state: waiting for completed aio requests (read thread)
I/O thread 33 state: waiting for completed aio requests (read thread)
I/O thread 34 state: waiting for completed aio requests (read thread)
I/O thread 35 state: waiting for completed aio requests (read thread)
I/O thread 36 state: waiting for completed aio requests (read thread)
I/O thread 37 state: waiting for completed aio requests (read thread)
I/O thread 38 state: waiting for completed aio requests (read thread)
I/O thread 39 state: waiting for completed aio requests (read thread)
I/O thread 40 state: waiting for completed aio requests (read thread)
I/O thread 41 state: waiting for completed aio requests (read thread)
I/O thread 42 state: waiting for completed aio requests (read thread)
I/O thread 43 state: waiting for completed aio requests (read thread)
I/O thread 44 state: waiting for completed aio requests (read thread)
I/O thread 45 state: waiting for completed aio requests (read thread)
I/O thread 46 state: waiting for completed aio requests (read thread)
I/O thread 47 state: waiting for completed aio requests (read thread)
I/O thread 48 state: waiting for completed aio requests (read thread)
I/O thread 49 state: waiting for completed aio requests (read thread)
I/O thread 50 state: waiting for completed aio requests (read thread)
I/O thread 51 state: waiting for completed aio requests (read thread)
I/O thread 52 state: waiting for completed aio requests (read thread)
I/O thread 53 state: waiting for completed aio requests (read thread)
I/O thread 54 state: waiting for completed aio requests (read thread)
I/O thread 55 state: waiting for completed aio requests (read thread)
I/O thread 56 state: waiting for completed aio requests (read thread)
I/O thread 57 state: waiting for completed aio requests (read thread)
I/O thread 58 state: waiting for completed aio requests (read thread)
I/O thread 59 state: waiting for completed aio requests (read thread)
I/O thread 60 state: waiting for completed aio requests (read thread)
I/O thread 61 state: waiting for completed aio requests (read thread)
I/O thread 62 state: waiting for completed aio requests (read thread)
I/O thread 63 state: waiting for completed aio requests (read thread)
I/O thread 64 state: waiting for completed aio requests (read thread)
I/O thread 65 state: waiting for completed aio requests (read thread)
I/O thread 66 state: waiting for completed aio requests (write thread)
I/O thread 67 state: waiting for completed aio requests (write thread)
I/O thread 68 state: waiting for completed aio requests (write thread)
I/O thread 69 state: waiting for completed aio requests (write thread)
I/O thread 70 state: waiting for completed aio requests (write thread)
I/O thread 71 state: waiting for completed aio requests (write thread)
I/O thread 72 state: waiting for completed aio requests (write thread)
I/O thread 73 state: waiting for completed aio requests (write thread)
I/O thread 74 state: waiting for completed aio requests (write thread)
I/O thread 75 state: waiting for completed aio requests (write thread)
I/O thread 76 state: waiting for completed aio requests (write thread)
I/O thread 77 state: waiting for completed aio requests (write thread)
I/O thread 78 state: waiting for completed aio requests (write thread)
I/O thread 79 state: waiting for completed aio requests (write thread)
I/O thread 80 state: waiting for completed aio requests (write thread)
I/O thread 81 state: waiting for completed aio requests (write thread)
I/O thread 82 state: waiting for completed aio requests (write thread)
I/O thread 83 state: waiting for completed aio requests (write thread)
I/O thread 84 state: waiting for completed aio requests (write thread)
I/O thread 85 state: waiting for completed aio requests (write thread)
I/O thread 86 state: waiting for completed aio requests (write thread)
I/O thread 87 state: waiting for completed aio requests (write thread)
I/O thread 88 state: waiting for completed aio requests (write thread)
I/O thread 89 state: waiting for completed aio requests (write thread)
I/O thread 90 state: waiting for completed aio requests (write thread)
I/O thread 91 state: waiting for completed aio requests (write thread)
I/O thread 92 state: waiting for completed aio requests (write thread)
I/O thread 93 state: waiting for completed aio requests (write thread)
I/O thread 94 state: waiting for completed aio requests (write thread)
I/O thread 95 state: waiting for completed aio requests (write thread)
I/O thread 96 state: waiting for completed aio requests (write thread)
I/O thread 97 state: waiting for completed aio requests (write thread)
I/O thread 98 state: waiting for completed aio requests (write thread)
I/O thread 99 state: waiting for completed aio requests (write thread)
I/O thread 100 state: waiting for completed aio requests (write thread)
I/O thread 101 state: waiting for completed aio requests (write thread)
I/O thread 102 state: waiting for completed aio requests (write thread)
I/O thread 103 state: waiting for completed aio requests (write thread)
I/O thread 104 state: waiting for completed aio requests (write thread)
I/O thread 105 state: waiting for completed aio requests (write thread)
I/O thread 106 state: waiting for completed aio requests (write thread)
I/O thread 107 state: waiting for completed aio requests (write thread)
I/O thread 108 state: waiting for completed aio requests (write thread)
I/O thread 109 state: waiting for completed aio requests (write thread)
I/O thread 110 state: waiting for completed aio requests (write thread)
I/O thread 111 state: waiting for completed aio requests (write thread)
I/O thread 112 state: waiting for completed aio requests (write thread)
I/O thread 113 state: waiting for completed aio requests (write thread)
I/O thread 114 state: waiting for completed aio requests (write thread)
I/O thread 115 state: waiting for completed aio requests (write thread)
I/O thread 116 state: waiting for completed aio requests (write thread)
I/O thread 117 state: waiting for completed aio requests (write thread)
I/O thread 118 state: waiting for completed aio requests (write thread)
I/O thread 119 state: waiting for completed aio requests (write thread)
I/O thread 120 state: waiting for completed aio requests (write thread)
I/O thread 121 state: waiting for completed aio requests (write thread)
I/O thread 122 state: waiting for completed aio requests (write thread)
I/O thread 123 state: waiting for completed aio requests (write thread)
I/O thread 124 state: waiting for completed aio requests (write thread)
I/O thread 125 state: waiting for completed aio requests (write thread)
I/O thread 126 state: waiting for completed aio requests (write thread)
I/O thread 127 state: waiting for completed aio requests (write thread)
I/O thread 128 state: waiting for completed aio requests (write thread)
I/O thread 129 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] , aio writes: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ,
ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
8353 OS file reads, 2256885 OS file writes, 649203 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 6.37 writes/s, 1.75 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 108 merges
merged operations:
insert 1, delete mark 1, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 796871, node heap has 46 buffer(s)
Hash table size 796871, node heap has 8 buffer(s)
Hash table size 796871, node heap has 7 buffer(s)
Hash table size 796871, node heap has 14 buffer(s)
Hash table size 796871, node heap has 23 buffer(s)
Hash table size 796871, node heap has 16 buffer(s)
Hash table size 796871, node heap has 15 buffer(s)
Hash table size 796871, node heap has 94 buffer(s)
234.22 hash searches/s, 8.87 non-hash searches/s
---
LOG
---
Log sequence number 1427669147
Log flushed up to 1427669147
Pages flushed up to 1427669147
Last checkpoint at 1427669138
Max checkpoint age 434154333
Checkpoint age target 420587011
Modified age 0
Checkpoint age 9
0 pending log flushes, 0 pending chkp writes
452361 log i/o's done, 1.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 3353346048
Dictionary memory allocated 11634313
Internal hash tables (constant factor + variable factor)
Adaptive hash index 54687040 (50999744 + 3687296)
Page hash 1107112 (buffer pool 0 only)
Dictionary cache 24384249 (12749936 + 11634313)
File system 1676640 (812272 + 864368)
Lock system 8004712 (7969496 + 35216)
Recovery system 0 (0 + 0)
Buffer pool size 196584
Buffer pool size, bytes 3220832256
Free buffers 146566
Database pages 49795
Old database pages 18321
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 10000, not young 35358
0.00 youngs/s, 0.00 non-youngs/s
Pages read 7697, created 42280, written 1645380
0.00 reads/s, 0.00 creates/s, 5.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 49795, unzip_LRU len: 0
I/O sum:cur, unzip sum:cur
----------------------
INDIVIDUAL BUFFER POOL INFO
----------------------
---BUFFER POOL 0
Buffer pool size 65528
Buffer pool size, bytes 1073610752
Free buffers 48863
Database pages 16588
Old database pages 6103
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3655, not young 13239
0.00 youngs/s, 0.00 non-youngs/s
Pages read 2611, created 14049, written 334207
0.00 reads/s, 0.00 creates/s, 0.50 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 16588, unzip_LRU len: 0
I/O sum:cur, unzip sum:cur
---BUFFER POOL 1
Buffer pool size 65528
Buffer pool size, bytes 1073610752
Free buffers 48627
Database pages 16823
Old database pages 6190
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3253, not young 12647
0.00 youngs/s, 0.00 non-youngs/s
Pages read 2476, created 14408, written 1147163
0.00 reads/s, 0.00 creates/s, 3.87 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 16823, unzip_LRU len: 0
I/O sum:cur, unzip sum:cur
---BUFFER POOL 2
Buffer pool size 65528
Buffer pool size, bytes 1073610752
Free buffers 49076
Database pages 16384
Old database pages 6028
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 3092, not young 9472
0.00 youngs/s, 0.00 non-youngs/s
Pages read 2610, created 13823, written 164010
0.00 reads/s, 0.00 creates/s, 0.62 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 16384, unzip_LRU len: 0
I/O sum:cur, unzip sum:cur
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
0 RW transactions active inside InnoDB
Process ID=1574, Main thread ID=140097892615936, state: sleeping
Number of rows inserted 25413524, updated 63914, deleted 6172, read 1035018804
65.87 inserts/s, 0.12 updates/s, 0.00 deletes/s, 97.36 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
Are there any tell tell signs from the above info causing these timeouts and deadlocks? How can i find out the percentage of utilized InnoDB buffer pool against total RAM?
# Ping Results
64 bytes from 10.1.5.100: icmp_seq=1 ttl=64 time=0.958 ms
64 bytes from 10.1.5.100: icmp_seq=2 ttl=64 time=1.09 ms
64 bytes from 10.1.5.100: icmp_seq=4 ttl=64 time=1.22 ms
64 bytes from 10.1.5.100: icmp_seq=5 ttl=64 time=2.24 ms
DEADLOCK INFO
*** Priority TRANSACTION:
TRANSACTION 165015546, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
MySQL thread id 14, OS thread handle 140101363758848, query id 11358891 System lock
*** Victim TRANSACTION:
TRANSACTION 165015100, ACTIVE 1 sec
, undo log entries 1
MySQL thread id 1027074, OS thread handle 140096133596928, query id 11358888 10.168.103.11 membership wsrep: initiating replication for write set (-1)
COMMIT
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2481 page no 17 n bits 80 index PRIMARY of table membership
.member_token
trx id 165015100 lock_mode X locks rec but not gap
2018-12-06T03:12:23.732058-00:00 14 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-12-06T03:12:23.732119-00:00 14 [Note] WSREP: cluster conflict due to high priority abort for threads:
2018-12-06T03:12:23.732331-00:00 14 [Note] WSREP: Winning thread:
THD: 14, mode: applier, state: executing, conflict: no conflict, seqno: 152121
SQL: (null)
2018-12-06T03:12:23.732392-00:00 14 [Note] WSREP: Victim thread:
THD: 1027074, mode: local, state: committing, conflict: no conflict, seqno: -1
SQL: COMMIT
The Georgia
(343 rep)
Nov 20, 2018, 07:49 AM
• Last activity: Aug 4, 2025, 01:00 PM
0
votes
2
answers
7805
views
MariaDB Galera cluster node won't start
I have configured a MariaDB Galera cluster with three nodes. The master node and one slave-master node is up and running, but the third node just won't run. The system specs and cluster config are the same on all three nodes (except changing the wsrep_node_address and the wsrep_node_name on each nod...
I have configured a MariaDB Galera cluster with three nodes. The master node and one slave-master node is up and running, but the third node just won't run. The system specs and cluster config are the same on all three nodes (except changing the wsrep_node_address and the wsrep_node_name on each node).
Here is the config for the nodes (wsrep_node_address and the wsrep_node_name are unique on each node):
[galera]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
datadir=/var/lib/mysql
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://192.168.1.111,192.168.1.112,192.168.1.113"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='192.168.9.113'
wsrep_node_name='george-db4'
wsrep_sst_method=xtrabackup
wsrep_sst_auth=username:password
It seems like MariaDb does not like the config
wsrep_cluster_address="gcomm://192.168.1.111,192.168.1.112,192.168.1.113"
When I comment it out MariaDB restarts just fine. When I uncomment it, it just displays this message:
Starting MySQL................................... ERROR!
ERROR! Failed to restart server.
When I change the wsrep_cluster_address to just
wsrep_cluster_address="gcomm://
MariaDb runs with no problem. So I am not sure why it is having problems with assigning the IP address (even if it works fine on the other two nodes). I have tried emptying the galera.cache file, I have tried re-installing the cluster, I have tried rebooting the machine etc. and nothing works so far. I even truncated the log file /var/lib/mysql/db1.err with the command truncate logfile --size 0 (now I cannot figure out how to get my log file back with space as I realized I would need the log file).
The Georgia
(343 rep)
Nov 4, 2014, 03:09 AM
• Last activity: Jul 31, 2025, 10:04 AM
0
votes
1
answers
913
views
Mariadb Galera cluster empty set
I am playing around with a MariaDB Galera Master Master Cluster and I have some issues which I do not understand. [mysqld] bind-address=0.0.0.0 [galera] wsrep_on=ON wsrep_provider=/usr/lib64/galera/libgalera_smm.so binlog_format=ROW innodb_autoinc_lock_mode=2 innodb_locks_unsafe_for_binlog=1 query_c...
I am playing around with a MariaDB Galera Master Master Cluster and I have some issues which I do not understand.
[mysqld]
bind-address=0.0.0.0
[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
binlog_format=ROW
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
default_storage_engine=InnoDB
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='192.168.2.67'
wsrep_node_name='KVM-1'
wsrep_sst_method=rsync
wsrep_sst_auth=repl_user:PASS
On the other nodes
[mysqld]
bind-address=0.0.0.0
[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
binlog_format=ROW
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
default_storage_engine=InnoDB
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://IP_ADDR_MAIN_NODE,OTHER_NODES_ADDRS"
wsrep_cluster_name='galera_cluster'
wsrep_node_address='192.168.2.68'
wsrep_node_name='KVM-2'
wsrep_sst_method=rsync
wsrep_sst_auth=repl_user:PASS
when i login to mysql and query for
SHOW STATUS LIKE 'wsrep_cluster_size';
Empty set (0.00 sec)
Am i missing anything?
mohan reddy
(11 rep)
May 18, 2018, 05:10 PM
• Last activity: Jul 31, 2025, 12:09 AM
0
votes
1
answers
153
views
Can MySQL Commercial server Multi Master Replication using Galera Cluster?
Can MySQL Commercial Sever 8.x Multi-Master Replication using Galera 4 and MySQL WSRep?
Can MySQL Commercial Sever 8.x Multi-Master Replication using Galera 4 and MySQL WSRep?
user3562918
(1 rep)
Nov 7, 2022, 11:36 AM
• Last activity: Jul 29, 2025, 04:06 AM
0
votes
1
answers
392
views
MariaDB 10.2.22 Galera Cluster Using Bad Indexes After Analyze
I'm using Drupal on a MariaDB Galera 10.2.22 3 node cluster and I'm having some problems with one query. I have a table (InnoDB) called field_data_field_department which has a bunch of indexes. If I dump the 5 tables used in my query and load them into a new DB, the query runs fine using the index f...
I'm using Drupal on a MariaDB Galera 10.2.22 3 node cluster and I'm having some problems with one query. I have a table (InnoDB) called field_data_field_department which has a bunch of indexes. If I dump the 5 tables used in my query and load them into a new DB, the query runs fine using the index field_department_target_id. But once I analyze that table (or MariaDB does an internal stats rebuild) the query stops using that index and uses the PRIMARY index. I don't understand why it does this, and the query is MUCH slower using PRIMARY (90-120 seconds vs under 1/10th of a second.)
Does anyone know how I can diagnose why the planner is picking the wrong index? Or why ANALYZEing the table makes it break (I would think that would make things better) The query is dynamically generated by Drupal so I can't just go into the code and add FORCE INDEX.
Here's the explain for the fast result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE file_managed ref uri,status,file_type status 1 const 7403 Using where; Using temporary; Using filesort
1 SIMPLE field_data_field_department ref PRIMARY,entity_type,deleted,entity_id PRIMARY 391 const,drupal_authoring.file_managed.fid,const 1 Using where
1 SIMPLE taxonomy_term_data_field_data_field_department eq_ref PRIMARY PRIMARY 4 drupal_authoring.field_data_field_department.field_department_target_id 1 Using where; Using index
1 SIMPLE taxonomy_term_data_field_data_field_department__field_data_field_department ref PRIMARY,entity_type,deleted,field_department_target_id field_department_target_id 390 drupal_authoring.taxonomy_term_data_field_data_field_department.tid,const 15 Using where; Using index
1 SIMPLE file_usage ref PRIMARY,fid_count,fid_module fid_count 4 drupal_authoring.file_managed.fid 1 Using index
And for the slow query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE file_managed ref uri,status,file_type file_type 152 const 7592 Using index condition; Using where; Using temporary; Using filesort
1 SIMPLE field_data_field_department ref PRIMARY,entity_type,deleted,entity_id PRIMARY 391 const,drupal_authoring.file_managed.fid,const 1 Using where
1 SIMPLE taxonomy_term_data_field_data_field_department eq_ref PRIMARY PRIMARY 4 drupal_authoring.field_data_field_department.field_department_target_id 1 Using where; Using index
1 SIMPLE taxonomy_term_data_field_data_field_department__field_data_field_department ref PRIMARY,entity_type,deleted,field_department_target_id PRIMARY 386 const 7985 Using where
1 SIMPLE file_usage ref PRIMARY,fid_count,fid_module fid_count 4 drupal_authoring.file_managed.fid 1 Using index
And here is the output of SHOW INDEXES
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
field_data_field_department 0 PRIMARY 1 entity_type A 4 NULL NULL BTREE
field_data_field_department 0 PRIMARY 2 entity_id A 15742 NULL NULL BTREE
field_data_field_department 0 PRIMARY 3 deleted A 15742 NULL NULL BTREE
field_data_field_department 0 PRIMARY 4 delta A 15742 NULL NULL BTREE
field_data_field_department 0 PRIMARY 5 language A 15742 NULL NULL BTREE
field_data_field_department 1 entity_type 1 entity_type A 4 NULL NULL BTREE
field_data_field_department 1 bundle 1 bundle A 24 NULL NULL BTREE
field_data_field_department 1 deleted 1 deleted A 2 NULL NULL BTREE
field_data_field_department 1 entity_id 1 entity_id A 15742 NULL NULL BTREE
field_data_field_department 1 revision_id 1 revision_id A 15742 NULL NULL YES BTREE
field_data_field_department 1 language 1 language A 2 NULL NULL BTREE
field_data_field_department 1 field_department_target_id 1 field_department_target_id A 715 NULL NULL BTREE
Any ideas? Even if I can just do something to stop MariaDB from analyzing the table that would help. I can dump and reload it as a temp fix, but that only lasts for a little while.
Charles Tassell
(9 rep)
Feb 23, 2019, 03:58 AM
• Last activity: Jul 28, 2025, 10:04 AM
0
votes
1
answers
1103
views
Galera Cluster Setup - Primary and Secondary Site Scenario
I'm very new to Galera Cluster and is exploring a potential setup with reasonable resiliency to node failure and network failure. Looking at the very bottom part of [this documentation][1], the **Weighted Quorum for a Primary and Secondary Site Scenario** is quite promising. For ease of reading, I'v...
I'm very new to Galera Cluster and is exploring a potential setup with reasonable resiliency to node failure and network failure. Looking at the very bottom part of this documentation , the **Weighted Quorum for a Primary and Secondary Site Scenario** is quite promising. For ease of reading, I've extracted the setup from the document as follows:
> When configuring quorum weights for primary and secondary sites, use
> the following pattern:
>
> Primary Site:
> node1: pc.weight = 2
> node2: pc.weight = 2
>
> Secondary Site:
> node3: pc.weight = 1
> node4: pc.weight = 1
>
> Under this pattern, some nodes are located at the primary site while
> others are at the secondary site. In the event that the secondary site
> goes down or if network connectivity is lost between the sites, the
> nodes at the primary site remain the Primary Component. Additionally,
> either node1 or node2 can crash without the rest of the nodes becoming
> non-primary components.
But there seems to be two drawbacks:
1. If there are two failed nodes and one of them happened to be on the primary site, the quorum will be <= 50% and the remaining two nodes will become non-primary components.
2. Despite pc.weight is a dynamic option that can be changed while the server is running, flipping between primary site and secondary site requires modification on all nodes, which is a bit troublesome.
So I've come up with another idea in mind - leave the weight as 1 for all nodes, and in the primary site add a Galera Arbitrator. In this case:
- The primary site will remain the Primary Component on network issue,
just like the original setup.
- The cluster still functions even if two nodes failed.
- Flipping between primary and secondary site just require a move of the Galera Arbitrator.
May I know if there's anything wrong with my idea, or if there will be any practical difficulties? Appreciate if you can share your thoughts with me.
CLDev
(141 rep)
Oct 6, 2017, 03:51 PM
• Last activity: Jul 28, 2025, 02:02 AM
2
votes
1
answers
2284
views
Percona mysql xtradb cluster doesn't start properly and node restarts don't work
**tl;dr** When starting a fresh percona cluster of 3 kubernetes pods, the `grastate.dat` `seq_no` is set at `-1` and doesn't change. On deleting one pod and watching it restart, expecting it to rejoin the cluster, it sets it's inital position to `00000000-0000-0000-0000-000000000000:-1` and tries to...
**tl;dr**
When starting a fresh percona cluster of 3 kubernetes pods, the
grastate.dat
seq_no
is set at -1
and doesn't change. On deleting one pod and watching it restart, expecting it to rejoin the cluster, it sets it's inital position to 00000000-0000-0000-0000-000000000000:-1
and tries to connect to itself (it's former ip), maybe because it'd been the first pod in the cluster? It then timeouts in it's erroneous connection to itself:
2017-03-26T08:38:05.374058Z 0 [Note] WSREP: (b7571ff8, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
**The cluster doesn't get started properly and I'm unable to successfully restart pods in the cluster.**
**Full**
When I start the cluster from scratch. With blank data directories and a fresh etcd cluster, everything seems to come up. However I look at the grastate.dat
and I find that the seq_no
for each pod is -1
:
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
At this point I can do mysql -h percona -u wordpress -p
and connect and wordpress works too.
Scenario:
I have 3 percona pods
/ # jonathan@ubuntu:~/Projects/k8wp$ kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-0 1/1 Running 1 12h
etcd-1 1/1 Running 0 12h
etcd-2 1/1 Running 3 12h
etcd-3 1/1 Running 1 12h
percona-0 1/1 Running 0 8m
percona-1 1/1 Running 0 57m
percona-2 1/1 Running 0 57m
When I try to restart percona-0 it gets kicked out of the cluster on restarting, percona-0's gvwstate.dat
file shows
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/gvwstate.dat
my_uuid: b7571ff8-11f8-11e7-bd2d-8b50487e1523
#vwbeg
view_id: 3 b7571ff8-11f8-11e7-bd2d-8b50487e1523 3
bootstrap: 0
member: b7571ff8-11f8-11e7-bd2d-8b50487e1523 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
The other 2 pods in the cluster show:
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/gvwstate.dat
my_uuid: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a
#vwbeg
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/gvwstate.dat
my_uuid: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a
#vwbeg
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
Here are what I think are the relevant errors from percona-0's startup:
2017-03-26T08:37:58.370605Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer '10.52.0.26:'
2017-03-26T08:38:01.373345Z 0 [Note] WSREP: (b7571ff8, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
2017-03-26T08:38:01.373682Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-03-26T08:38:01.373750Z 0 [Note] WSREP: view(view_id(NON_PRIM,b7571ff8,5) memb {
b7571ff8,0
} joined {
} left {
} partitioned {
})
2017-03-26T08:38:01.373838Z 0 [Note] WSREP: gcomm: connected
2017-03-26T08:38:01.373872Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-26T08:38:01.373987Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-03-26T08:38:01.374012Z 0 [Note] WSREP: Opened channel 'wordpress-001'
2017-03-26T08:38:01.374108Z 0 [Note] WSREP: Waiting for SST to complete.
2017-03-26T08:38:01.374417Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-03-26T08:38:01.374469Z 0 [Note] WSREP: Flow-control interval: [16, 16]
2017-03-26T08:38:01.374491Z 0 [Note] WSREP: Received NON-PRIMARY.
2017-03-26T08:38:01.374560Z 1 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1
The ip it's trying to connect to 10.52.0.26
in 2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer '10.52.0.26:'
is actually that pods previous ip, here's the listing of keys in etcd I did before deleting percona-0
/ # etcdctl ls --recursive
/pxc-cluster
/pxc-cluster/wordpress
/pxc-cluster/queue
/pxc-cluster/queue/wordpress
/pxc-cluster/queue/wordpress-001
/pxc-cluster/wordpress-001
/pxc-cluster/wordpress-001/10.52.1.46
/pxc-cluster/wordpress-001/10.52.1.46/ipaddr
/pxc-cluster/wordpress-001/10.52.1.46/hostname
/pxc-cluster/wordpress-001/10.52.2.33
/pxc-cluster/wordpress-001/10.52.2.33/ipaddr
/pxc-cluster/wordpress-001/10.52.2.33/hostname
/pxc-cluster/wordpress-001/10.52.0.26
/pxc-cluster/wordpress-001/10.52.0.26/hostname
/pxc-cluster/wordpress-001/10.52.0.26/ipaddr
After kubectl delete pods/percona-0:
/ # etcdctl ls --recursive
/pxc-cluster
/pxc-cluster/queue
/pxc-cluster/queue/wordpress
/pxc-cluster/queue/wordpress-001
/pxc-cluster/wordpress-001
/pxc-cluster/wordpress-001/10.52.1.46
/pxc-cluster/wordpress-001/10.52.1.46/ipaddr
/pxc-cluster/wordpress-001/10.52.1.46/hostname
/pxc-cluster/wordpress-001/10.52.2.33
/pxc-cluster/wordpress-001/10.52.2.33/ipaddr
/pxc-cluster/wordpress-001/10.52.2.33/hostname
/pxc-cluster/wordpress
Also during the restart percona-0 tried to register to etcd with:
{"action":"create","node":{"key":"/pxc-cluster/queue/wordpress-001/00000000000000009886","value":"10.52.0.27","expiration":"2017-03-26T08:38:57.980325718Z","ttl":60,"modifiedIndex":9886,"createdIndex":9886}}
{"action":"set","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27/ipaddr","value":"10.52.0.27","expiration":"2017-03-26T08:38:28.01814818Z","ttl":30,"modifiedIndex":9887,"createdIndex":9887}}
{"action":"set","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27/hostname","value":"percona-0","expiration":"2017-03-26T08:38:28.037188157Z","ttl":30,"modifiedIndex":9888,"createdIndex":9888}}
{"action":"update","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27","dir":true,"expiration":"2017-03-26T08:38:28.054726795Z","ttl":30,"modifiedIndex":9889,"createdIndex":9887},"prevNode":{"key":"/pxc-cluster/wordpress-001/10.52.0.27","dir":true,"modifiedIndex":9887,"createdIndex":9887}}
which doesn't work.
From the second member of the cluster percona-1
:
2017-03-26T08:37:44.069583Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.52.0.26:4567
2017-03-26T08:37:45.069756Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') reconnecting to b7571ff8 (tcp://10.52.0.26:4567), attempt 0
2017-03-26T08:37:48.570332Z 0 [Note] WSREP: (bd05a643, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
2017-03-26T08:37:49.605089Z 0 [Note] WSREP: evs::proto(bd05a643, GATHER, view_id(REG,b7571ff8,3)) suspecting node: b7571ff8
2017-03-26T08:37:49.605276Z 0 [Note] WSREP: evs::proto(bd05a643, GATHER, view_id(REG,b7571ff8,3)) suspected node without join message, declaring inactive
2017-03-26T08:37:50.104676Z 0 [Note] WSREP: declaring c33d6a73 at tcp://10.52.2.33:4567 stable
**New Info:**
I restarted percona-0 again, and this time it somehow came up! After a few tries I realised the pod needs to restarted twice to come up i.e. after deleting it the first time, it comes up with the above errors, after deleting it the second time it comes up okay and syncs with the other members. Could this be because it was the first pod in the cluster?
I've tested deleting the other pods but they all come back up okay.
The issue only lies with percona-0.
Also;
Taking down all the pods at once, if my node was to crash, that's the situation where the pods don't come back up at all! I suspect it's because no state is saved to grastate.dat , i.e. seq_no remains -1 even though the global id may change, the pods exit with mysqld shutdown, and the following errors:
jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-2 | grep ERROR
2017-03-26T11:20:25.795085Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:25.795276Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:25.795544Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.2.36': -110 (Connection timed out)
2017-03-26T11:20:25.795618Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:25.795645Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.2.36) failed: 7
2017-03-26T11:20:25.795693Z 0 [ERROR] Aborting
jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-1 | grep ERROR
2017-03-26T11:20:27.093780Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:27.093977Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:27.094145Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.1.49': -110 (Connection timed out)
2017-03-26T11:20:27.094200Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:27.094227Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.1.49) failed: 7
2017-03-26T11:20:27.094247Z 0 [ERROR] Aborting
jonathan@ubuntu:~/Projects/k8wp$ kubectl logs percona-0 | grep ERROR
2017-03-26T11:20:52.040214Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
2017-03-26T11:20:52.040279Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-03-26T11:20:52.040385Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1437: Failed to open channel 'wordpress-001' at 'gcomm://10.52.2.36': -110 (Connection timed out)
2017-03-26T11:20:52.040437Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-03-26T11:20:52.040471Z 0 [ERROR] WSREP: wsrep::connect(gcomm://10.52.2.36) failed: 7
2017-03-26T11:20:52.040508Z 0 [ERROR] Aborting
grastate.dat
on deleting all pods:
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid: a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno: -1
safe_to_bootstrap: 0
No, gvwstate.dat
Jonathan
(121 rep)
Mar 26, 2017, 09:18 AM
• Last activity: Jul 23, 2025, 03:01 AM
1
votes
1
answers
146
views
MariaDB Parallel Replication Drifting at random times
I have recently been thrown in the deep-end to manage our Database servers since our DBA left. The current set-up is Mariadb (mariadb Ver 15.1 Distrib 10.1.38-MariaDB, for debian-linux-gnu), InnoDB (I turned on innodb_file_per_table as we had some 200+GB ibdata1 files.). We use parallel (conservativ...
I have recently been thrown in the deep-end to manage our Database servers since our DBA left.
The current set-up is Mariadb (mariadb Ver 15.1 Distrib 10.1.38-MariaDB, for debian-linux-gnu), InnoDB (I turned on innodb_file_per_table as we had some 200+GB ibdata1 files.). We use parallel (conservative) replication.
a little bit of the config (just without ssl certs)
innodb_file_per_table = On
innodb_thread_concurrency = 0
innodb_buffer_pool_size = 40G
innodb_buffer_pool_instances = 20
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
table_open_cache = 8192
thread_cache_size = 256
table_cache = 70000
open_files_limit = 100000
log_slave_updates
collation-server = utf8_general_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
## Logging
log-output = FILE
slow-query-log = 1
slow-query-log-file = /var/lib/mysql/slow-log
log-slow-slave-statements
long-query-time = 30
log_warnings = 2
I have noticed that sometimes our servers drift but this usually corrects itself as I am assuming a large statement(s), one of the reasons I turned on Logging. But some servers will just not correct themselves and one of these only has a 2.6G ibdata1 file, I have restarted slave a few times but its just very slow. I am good on the Linux and Hardware side and ensured there is no other problems with disks. The network link between the servers via web, each server has 1Gbps Connection though and we see no latency ~6ms ping and stable. Some servers drift when there is no load on the servers, like the slave is not requesting from the master and it certainly doesn't seem to use full bandwidth. Overnight one of our servers has gone from being synced to 3 hours behind and seems to be increasing. I cannot see any issue on other database or even any users logged in to our application. could something be locking it from replicating?
Any help or suggestions would be really appreciated as I am kind of thrown in the deep-end here, I am not a DBA but whilst we do not currently have one I am the "best fit" and we need to get these servers replicated correctly for production.
If anyone also has any suggestions on learning resources, I think maybe this setup needs to be started from scratch, we do not use GTID for example, I wonder if we even use the correct replication method.
Many Thanks.
hexce
(11 rep)
Sep 8, 2019, 05:24 PM
• Last activity: Jul 19, 2025, 05:05 AM
1
votes
1
answers
154
views
Updating gcache.size on a Three Node Cluster
Is it possible to update this value on Percona 5.7 without restarting a node? Also, do I need to bring the entire cluster down to update the setting, or can I update one node at a time?
Is it possible to update this value on Percona 5.7 without restarting a node?
Also, do I need to bring the entire cluster down to update the setting, or can I update one node at a time?
Ndeb
(11 rep)
Feb 14, 2023, 06:36 PM
• Last activity: Jul 11, 2025, 10:10 AM
2
votes
2
answers
1616
views
Galera, MariaDB and multiple datacenter
Looking into using Galera and MariaDB across 3 data centers. We have a metroE comnection between the sites. Connectivity is pretty stable, but my concern is if a WAN link goes down at one of the sites, and I am writing to the local nodes. How does it handle that situation? How many nodes at minimum...
Looking into using Galera and MariaDB across 3 data centers. We have a metroE comnection between the sites. Connectivity is pretty stable, but my concern is if a WAN link goes down at one of the sites, and I am writing to the local nodes.
How does it handle that situation?
How many nodes at minimum at each site is recommended?
This question was similar, but different:
https://stackoverflow.com/questions/27777134/galera-cluster-replication-between-two-data-centers/27793927#27793927
Carol AndorMarten Liebster
(121 rep)
Jun 13, 2015, 08:44 PM
• Last activity: Jul 9, 2025, 12:43 AM
0
votes
1
answers
179
views
MariaDB Galera and Geo redundancy
I am trying to simulate 2 clusters in different datacenters/different locations, using VMWare VSXi and 6 VMs: 3 are located on DC1 and other 3 on DC2. All nodes are RHEL 9. Both DCs communicate through the same VLAN and this works fine. I have installed MariaDB 10.5.22 + Galera. On DC1 , the 3 nodes...
I am trying to simulate 2 clusters in different datacenters/different locations, using VMWare VSXi and 6 VMs: 3 are located on DC1 and other 3 on DC2. All nodes are RHEL 9.
Both DCs communicate through the same VLAN and this works fine.
I have installed MariaDB 10.5.22 + Galera.
On DC1 , the 3 nodes are all data nodes, where node #1 was bootstrapped using galera_new_cluster script
On DC2, 2 nodes are Data nodes and I added 1 Arbitrator
When starting nodes, all 6 are connected on the same cluster, sharing data, and this seems to be working fine.
The problem that I find is when I simulate a network outage: I modify VLAN ID on DC2 and both DCs loose connectivity.
Looking at each node's logs, they effectively loose connectivity, but :
- all nodes (3 from DC1 and 2 from DC2) are unavailable for queries
- Arbitrator keeps retrying to connect to nodes, ex.:
Oct 7 20:08:16 ire-lab-se3 garb-systemd: 2024-10-07 20:08:16.562 INFO: (dd79df53-972e, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.20.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 358971429 cwnd: 1 last_queued_since: 359271583299206 last_delivered_since: 359271583299206 send_queue_length: 0 send_queue_bytes: 0
I was expecting that:
- DC1 would remain available and synced, after all, the bootstrap was on Node#1 and there are 3 nodes available on this network
- Garbd arbitrator would handle the remaining 2 nodes in DC2, allowing applications on DC2 to use the current data
I have tried multiple test scenarios, and this is the only one that is failing (network outage) and I can't seem to be able to solve the issue.
The configuration files on each Galera Data Node is similar to (only node name and ID differ):
#
# These groups are read by MariaDB server.
# Use it for options that only the server (but not clients) should see
#
# See the examples of server my.cnf files in /usr/share/mysql/
#
# this is read by the standalone daemon and embedded servers
[server]
log_error=/var/log/mariadb/mariadb.err
log_warnings=9
#default_time_zone='UTC'
table_definition_cache=4096
table_open_cache=4096
#innodb_read_only_compressed=OFF # only for 10.6 and later
# this is only for the mysqld standalone daemon
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mysqld/mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
pid-file=/run/mariadb/mariadb.pid
# disabling symlinks is recommend to prevent assorted security risks
symbolic_links=0
#enable binary logging
log_bin=/var/log/mariadb/mariadb-bin
log_bin_index=/var/log/mariadb/mariadb-bin.index
#enable relay log files
relay_log=/var/log/mariadb/relay-bin
relay_log_index=/var/log/mariadb/relay-bin.index
log_slave_updates=1
performance_schema=ON
interactive_timeout=180
wait_timeout=180
max_connections=500
#
# * Galera-related settings
#
[galera]
# Mandatory settings
#wsrep_on=ON
#wsrep_provider=
#wsrep_cluster_address=
#binlog_format=row
#default_storage_engine=InnoDB
#innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0
# this is only for embedded server
[embedded]
# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
# This group is only read by MariaDB-10.5 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.5]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
wsrep_on=ON
query_cache_size=0
query_cache_type=0
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
# below parameter should not include Arbitrators, only MariaDB nodes
wsrep_cluster_address="gcomm://10.20.0.1,10.20.0.2,10.20.0.3,10.20.0.4,10.20.0.5"
wsrep_cluster_name='galera_cluster_LAB'
wsrep_node_address='10.20.0.1'
wsrep_node_name='galera_LAB_1'
wsrep_sst_method='rsync'
#'mariabackup'
#wsrep_sst_auth=backupuser;backupuser
server_id=2
wsrep_provider_options='gmcast.segment=1;gcache.size=2G'
#wsrep_sst_donor="galera_1"
and the Gardb Arbitrator is set as:
# Copyright (C) 2012 Codership Oy
# This config file is to be sourced by garb service script.
# A comma-separated list of node addresses (address[:port]) in the cluster
GALERA_NODES="10.20.0.4:4567 10.20.0.5:4567"
# Galera cluster name, should be the same as on the rest of the nodes.
GALERA_GROUP="galera_cluster_LAB"
# Optional Galera internal options string (e.g. SSL settings)
# see https://galeracluster.com/library/documentation/galera-parameters.html
GALERA_OPTIONS=""
# Log file for garbd. Optional, by default logs to syslog
# FOR SOME REASON IT DOESN'T WORK; RETURNS AN ERROR PRBABLY RELATED TO PERMISSIONS
# LOG_FILE="/var/log/mariadb/garbd.log"
# Where to persist necessary data
# WORK_DIR=""
So I think this is pretty much standard....
I googled a lot, but could not find anything that makes this work as expected. The requirements are that, in production, both DCs should communicate freely and if a network outage happens, application servers on both DCs should continue working with the local mariadb galera nodes (querying and inserting new rows, if required) and once the network is up again, all nodes should sync data between each of them (new data from DC1 exported to DC2 and new data from DC2 exported to DC1, and so on....)
Am I missing something here? could anyone help?
Thanks
Sergio
scharrua
(11 rep)
Oct 7, 2024, 09:54 PM
• Last activity: Jul 4, 2025, 08:08 AM
1
votes
1
answers
216
views
Is it possible to join MySQL 8.0 to a cluster of MySQL 5.7
I have a MySQL 5.7 (5.7.19-17-57-log Percona XtraDB Cluster) cluster using Galera, but I am in the middle of upgrading to 8.0 -- 8.0.23-14 Percona Server (GPL), Release '14'. I wasn't sure if it was possible to join the 8.0 box to the clustered 5.7.
I have a MySQL 5.7 (5.7.19-17-57-log Percona XtraDB Cluster) cluster using Galera, but I am in the middle of upgrading to 8.0 -- 8.0.23-14 Percona Server (GPL), Release '14'. I wasn't sure if it was possible to join the 8.0 box to the clustered 5.7.
user3525290
(113 rep)
May 25, 2021, 10:18 AM
• Last activity: Jun 22, 2025, 12:02 PM
0
votes
1
answers
231
views
mariadb galera : member count keeps changing between 5 and 3
I have a 5 nodes cluster, 2 nodes per DC, and an arbiter on a 3rd DC. The cluster is running fine in the sense that data is synced, DBs don't have any errors. The network is fine too. However, in the logs, galera shows members leaving and rejoining the cluster every few hours. The same behaviour is...
I have a 5 nodes cluster, 2 nodes per DC, and an arbiter on a 3rd DC.
The cluster is running fine in the sense that data is synced, DBs don't have any errors.
The network is fine too.
However, in the logs, galera shows members leaving and rejoining the cluster every few hours. The same behaviour is observed on, it would be good to unuderstand.
2022-10-10 11:33:39 2 [Note] WSREP: Non-primary view
2022-10-10 11:33:39 2 [Note] WSREP: Server status change synced -> connected
2022-10-10 11:33:39 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2022-10-10 11:33:39 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2022-10-10 11:33:39 0 [Note] WSREP: view(view_id(NON_PRIM,599bc3d2-b9fe,13720) memb {
599bc3d2-b9fe,0
} joined {
} left {
} partitioned {
37d9a4cb-997b,0
47b6cf8d-9d2a,0
5de8a52e-aa58,0
d5464472-acc7,0
})
2022-10-10 11:33:39 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2022-10-10 11:33:39 0 [Note] WSREP: Flow-control interval: [16, 16]
2022-10-10 11:33:39 0 [Note] WSREP: Received NON-PRIMARY.
2022-10-10 11:33:39 2 [Note] WSREP: ================================================
View:
id: 2e8c4b3d-3d2c-11e9-ba33-9f9156834d83:50217575
status: non-primary
protocol_version: 4
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERE
D, STREAMING, NBO
final: no
own_index: 0
members(1):
0: 599bc3d2-444c-11ed-b9fe-93ddbbf65238, p1mariadb1a
=================================================
2022-10-10 11:33:39 2 [Note] WSREP: Non-primary view
2022-10-10 11:33:39 2 [Note] WSREP: Server status change connected -> connected
# 5 nodes coming together
2022-10-10 11:33:40 0 [Note] WSREP: re-bootstrapping prim from partitioned components
2022-10-10 11:33:40 0 [Note] WSREP: view(view_id(PRIM,37d9a4cb-997b,13721) memb {
37d9a4cb-997b,0
47b6cf8d-9d2a,0
599bc3d2-b9fe,0
5de8a52e-aa58,0
d5464472-acc7,0
} joined {
} left {
} partitioned {
})
2022-10-10 11:33:40 0 [Warning] WSREP: Quorum: No node with complete state:
....
# cluster ok again:
2022-10-10 11:33:40 0 [Note] WSREP: Full re-merge of primary xx found: 5 of 5.
2022-10-10 11:33:40 0 [Note] WSREP: Quorum results:
version = 6,
component = PRIMARY,
conf_id = 7760,
members = 5/5 (joined/total),
sudo zgrep "members =" /var/log/mysql/error.log-202210*
/var/log/mysql/error.log-20221009.gz: members = 4/4 (joined/total),
/var/log/mysql/error.log-20221009.gz: members = 4/5 (joined/total),
/var/log/mysql/error.log-20221011.gz: members = 5/5 (joined/total),
/var/log/mysql/error.log-20221011.gz: members = 5/5 (joined/total),
/var/log/mysql/error.log-20221011.gz: members = 3/5 (joined/total),
show status where variable_name in ('wsrep_cluster_size','wsrep_cluster_status','wsrep_flow_control_paused','wsrep_ready','wsrep_connected','wsrep_local_state_comment');
| Variable_name | Value |
| wsrep_flow_control_paused | 0.00954013 |
| wsrep_local_state_comment | Synced |
| wsrep_cluster_size | 5 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_ready | ON |
boran
(101 rep)
Oct 11, 2022, 06:52 AM
• Last activity: Jun 13, 2025, 04:06 PM
1
votes
2
answers
53
views
Why Maxscale doesn't switch master?
I have a galera cluster with mariadb 10.6.22 and maxscale 24.02.2 in ubuntu 22.04 server. This is my configuration: [maxscale] threads=auto [srv1] type=server address=127.0.0.1 port=3306 [srv2] type=server address=10.0.0.2 port=3306 [Galera-Cluster] type=monitor module=galeramon servers=srv1,srv2 us...
I have a galera cluster with mariadb 10.6.22 and maxscale 24.02.2 in ubuntu 22.04 server.
This is my configuration:
[maxscale]
threads=auto
[srv1]
type=server
address=127.0.0.1
port=3306
[srv2]
type=server
address=10.0.0.2
port=3306
[Galera-Cluster]
type=monitor
module=galeramon
servers=srv1,srv2
user=maxscale
password=xxxxxxxxxx
monitor_interval=2s
root_node_as_master=true
[RW-Router]
type=service
router=readwritesplit
cluster=Galera-Cluster
user=maxscale
password=xxxxxxxxxxx
[Read-Write-Listener]
type=listener
service=RW-Router
protocol=mariadbprotocol
address=0.0.0.0
port=4008
I'm experincing this issue: when I stop the master node, maxscale doesn't switch the slave to master.
┌────────┬────────────┬──────┬─────────────┬────────────────────────┬──────┬────────────────┐
│ Server │ Address │ Port │ Connections │ State │ GTID │ Monitor │
├────────┼────────────┼──────┼─────────────┼────────────────────────┼──────┼────────────────┤
│ srv1 │ 127.0.0.1 │ 3306 │ 0 │ Slave, Synced, Running │ │ Galera-Cluster │
├────────┼────────────┼──────┼─────────────┼────────────────────────┼──────┼────────────────┤
│ srv2 │ 10.0.0.11 │ 3306 │ 0 │ Down │ │ Galera-Cluster │
└────────┴────────────┴──────┴─────────────┴────────────────────────┴──────┴────────────────┘
I cannot understand the reason. Looking into maxscale log I dont find nothing useful:
MariaDB MaxScale /var/log/maxscale/maxscale.log Thu Jun 5 15:04:18 2025
----------------------------------------------------------------------------
notice : Module 'galeramon' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libgaleramon.so'.
notice : Module 'readwritesplit' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so'.
notice : The logging of info messages has been enabled.
notice : Using up to 1.16GiB of memory for query classifier cache
notice : syslog logging is disabled.
notice : maxlog logging is enabled.
notice : Host: 'srv1.cinebot.it' OS: Linux@5.15.0-141-generic, #151-Ubuntu SMP Sun May 18 21:35:19 UTC 2025, x86_64 with 2 processor cores (2.00 available).
notice : Total main memory: 7.75GiB (7.75GiB usable).
notice : MaxScale is running in process 93218
notice : MariaDB MaxScale 24.02.2 started (Commit: b362d654969c495ec50fdf028f419514a854dd0a)
notice : Configuration file: /etc/maxscale.cnf
notice : Log directory: /var/log/maxscale
notice : Data directory: /var/lib/maxscale
notice : Module directory: /usr/lib/x86_64-linux-gnu/maxscale
notice : Service cache: /var/cache/maxscale
notice : Working directory: /var/log/maxscale
notice : Query classification results are cached and reused. Memory used per thread: 595.34MiB
notice : Password encryption key file '/var/lib/maxscale/.secrets' not found, using configured passwords as plaintext.
notice : The systemd watchdog is Enabled. Internal timeout = 30s
notice : Module 'pp_sqlite' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libpp_sqlite.so'.
info : pp_sqlite loaded.
notice : [MariaDBProtocol] Parser plugin loaded.
info : [pp_sqlite] In-memory sqlite database successfully opened for thread 140044797709888.
info : No 'auto_tune' parameters specified, no auto tuning will be performed.
notice : Using HS256 for JWT signatures
warning: The MaxScale GUI is enabled but encryption for the REST API is not enabled, the GUI will not be enabled. Configure
admin_ssl_key
and admin_ssl_cert
to enable HTTPS or add admin_secure_gui=false
to allow use of the GUI without encryption.
notice : Started REST API on [127.0.0.1]:8989
notice : srv1 sent version string '10.6.22-MariaDB-0ubuntu0.22.04.1'. Detected type: MariaDB, version: 10.6.22.
notice : Server 'srv1' charset: utf8mb4_general_ci
info : Variables have changed on 'srv1': 'character_set_client = utf8mb4', 'character_set_connection = utf8mb4', 'character_set_results = utf8mb4', 'max_allowed_packet = 16777216', 'session_track_system_variables = autocommit,character_set_client,character_set_connection,character_set_results,time_zone', 'system_time_zone = CEST', 'time_zone = SYSTEM', 'tx_isolation = REPEATABLE-READ', 'wait_timeout = 28800'
error : Monitor was unable to connect to server srv2[10.0.0.11:3306] : 'Can't connect to server on '10.0.0.11' (115)'
notice : [galeramon] Found cluster members
notice : Starting a total of 1 services...
notice : (Read-Write-Listener); Listening for connections at [0.0.0.0]:4008
notice : Service 'RW-Router' started (1/1)
info : [pp_sqlite] In-memory sqlite database successfully opened for thread 140044754216512.
info : Epoll instance for listening sockets added to worker epoll instance.
info : [pp_sqlite] In-memory sqlite database successfully opened for thread 140044745823808.
info : Epoll instance for listening sockets added to worker epoll instance.
notice : MaxScale started with 2 worker threads.
notice : Read 19 user@host entries from 'srv1' for service 'RW-Router'.
info : Accept authentication from 'admin', using password. Request: /v1/servers
info : Accept authentication from 'admin', using password. Request: /v1/servers
The galera status seems to be ok for the slave server:
wsrep_local_state 4
wsrep_local_state_comment Synced
wsrep_cluster_status Primary
wsrep_local_index 1
Do you have any idea why Maxscaler doesn't switch slave to master if master is down?
Tobia
(211 rep)
Jun 5, 2025, 01:30 PM
• Last activity: Jun 10, 2025, 04:52 AM
2
votes
2
answers
1400
views
Mysql And Galera cluster mariaDB , Private and Public IP Addresses multisite
I am just wondering has anyone come accross this before and the best method to go about doing it. We have a customer that wants multi site replication over a galera cluster. I am familiar with setting this up but not in the way they are asking for. First of all the client has a dedicated L2 connecti...
I am just wondering has anyone come accross this before and the best method to go about doing it.
We have a customer that wants multi site replication over a galera cluster.
I am familiar with setting this up but not in the way they are asking for.
First of all the client has a dedicated L2 connection between the datacentres so I have set up the cluster with Private IP addresses going across the dedicated L2 link, the issue is if the L2 link goes down they loose the replication between sites, So I was thinking of natting the nodes private IP addresses to Public IPs.
My question is: Is there a way I can set the galera cluster so that if it can't reach the private IPs it will go out over the internet through the firewall and hit the public IPs?
Could I maybe put the private and public IPs of each site within this setting:
wsrep_cluster_address=gcomm://server1private_IP1, server1public_IP1, server2private_IP1,server2public_ip1 etc
Saying that server1 is local in my datacentre and server2 is in another datacentre.
Or would that write the data twice over the 2 links?
JamesG
(21 rep)
Dec 13, 2016, 04:56 PM
• Last activity: May 28, 2025, 09:07 PM
2
votes
1
answers
6035
views
'kill <thread id>' gives ERROR 1095 (HY000): You are not owner of thread ... even though I am
I did an `ALTER TABLE` on a huge table on a MariaDB Galera Cluster 10.2, but after a while of waiting I changed my mind and tried to kill the connection. However, MariaDB said `ERROR 1095 (HY000): You are not owner of thread 123` even though I most definitely am the owner. So what is going on?
I did an
ALTER TABLE
on a huge table on a MariaDB Galera Cluster 10.2, but after a while of waiting I changed my mind and tried to kill the connection. However, MariaDB said ERROR 1095 (HY000): You are not owner of thread 123
even though I most definitely am the owner. So what is going on?
dbdemon
(7029 rep)
Nov 1, 2018, 11:18 PM
• Last activity: May 25, 2025, 07:17 PM
1
votes
1
answers
1582
views
Innodb not started?
I installed MariaDB_Galera_server10.0 , but when i check the error log i see this: 170118 14:49:09 [Note] InnoDB: 128 rollback segment(s) are active. 170118 14:49:09 [Note] InnoDB: Waiting for purge to start 170118 14:49:09 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.34-79.1 sta rted;...
I installed MariaDB_Galera_server10.0 , but when i check the error log i see this:
170118 14:49:09 [Note] InnoDB: 128 rollback segment(s) are active.
170118 14:49:09 [Note] InnoDB: Waiting for purge to start
170118 14:49:09 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.34-79.1 sta
rted; log sequence number 1627308
170118 14:49:09 [Note] Plugin ‘FEEDBACK’ is disabled.
170118 14:49:09 [Note] WSREP: Service disconnected.
170118 14:49:10 [Note] WSREP: Some threads may fail to exit.
170118 14:49:10 [Note] InnoDB: FTS optimize thread exiting.
170118 14:49:10 [Note] InnoDB: Starting shutdown...
170118 14:49:10 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer po
170118 14:49:11 [Note] InnoDB: Shutdown completed; log sequence number 1627318
this is the cluster configuration '/etc/mysql/conf.d/cluster.cnf' :
[mysqld]
query_cache_size=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
query_cache_type=0
bind-address=0.0.0.0
wsrep_on=ON
wsrep_provider=/usr/Lib64/libgalera_smm.so
wsrep_provider_options="gcache.size=32G"
wsrep_cluster_name="test_cluster"
wsrep_cluster_address=gcomm://192.168.10.231, 192.168.10.233
wsrep_sst_method= rsync
wsrep_sst_auth = wsrep_sst_user:wsrep_sst_pass
wsrep_node_address='192.168.10.231'
wsrep_node_name="yasoo"
and my.cnf :
# MariaDB database server configuration file.
# You can copy this file to one of:
- “/etc/mysql/my.cnf’ to set global options,
# - “—/.my.cnf” to set user-specific options.
# One can use all, long options that the program supports.
# Run program with - -help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
# For explanations see
# http://dev .mysql .com/doc/mysql/en/server-system-variables.html
# This will be passed to all mysql clients
# It has been reported that passwords should be enclosed with ticks/quotes
# escpecially if they contain “#‘ chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.
[client]
port = 3306
socket = /var/run/mysqld/mysqld.sock
# Here is entries for some specific programs
# This was formally known as [safe_mysqid] . Both versions are currently parsed.
[mysqld_safe]
log-bin=/var/log/mysql-bin.log
log=/var/tog/mysql.log
#1og-error= /var/log/mysqld.error.log
socket = /var/run/mysqld/mysqld.sock
nice =0
[mysqid]
#* Basic Settings
user = mysql
pid - file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/1ib/mysql
tmpdir = ltmp
lc_messages_dir = /usr/share/mysql
lc_messages = en_US
skip-external-locking
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
#bind-address = 127.0.0.1
#* Fine Tuning
max_connections = 100
connect_timeout = 5
wait_timeout = 600
max-allowed_packet = 16M
thread_cache_size = 128
sort_buffer_size = 4M
bulk_insert_buffer_size = 16M
tmp_table_size = 32M
max_heap_table_size = 32M
#* MyISAM
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched. On error, make copy and try a repair.
myisam_recover_options = BACKUP
key_buffer_size = 128M
#open-files-limit = 2000
table_open_cache = 400
myisam_sort_buffer_size = 512M
concurrent_insert = 2
read buffer size = 2M
read md buffer size = 1M
#* Query Cache Configuration
# Cache only tiny result sets, so we can fit more in the query cache.
query_cache_limit = 128K
# for more write intensive setups, set to DEMAND or OFF
#query_cache_type = DEMAND
* Logging and Replication
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime’
general_log_file = /var/log/mysqi/mysql.log
general_log = 1
# Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.
#1og-bin=/var/log/mysql-bin .log
#1og=/var/log/mysqi.1og
#]og-error= /var/log/mysqld.error.iog
# we do want to know about network errors and such
]og_warnings = 2
# Enable the slow query log to see queries with especially long duration
#slow_query_log[={O 1})
slow_query_log_file = /var/log/mysql/mariadb-slow.log
long_query_time = 10
log_slow_verbosity = query_plan
#log -queries -not -using -indexes
#log_slow_admin_statements
The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication sl.ave, see README.Debian about
other settings you may need to change.
server-id = 121
#report_host = masterl
#auto_inc rement_inc rement = 2
#auto_increment_offset = 1
log_bin = /var/log/mysql/mariadb-bin
log_bin_index = /var/log/mysql/ma riadb-bin.index
# not fab for performance, but safer
#sync_binlog = 1
expire_logs_days = 10
max_binlog_size = lOOM
# slaves
relay_log = /var/log/mysql/relay-bin
relay_log_index = /var/log/mysql/relay-bin.index
relay_log_info_file = /var/log/mysql/relay-bin.info
log_slave_updates
# If applications support it, this stricter sql_mode prevents some
# mistakes like inserting invalid dates etc.
#sql_mode = NO_ENGINE_SUBSTITUTION,TRADITIONAL
* InnoDB
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/
# Read the manual for more InnoDB related options. There are many!
default_storage_engine = InnoDB
# you can’t just change log file size, requires special procedure
nnodb_log_file_size = SOM
innodb_buffer_pool_size = 256M
innodb_log_buffer_size = 8M
innodb_file_per_table = 1
innodb_open_files = 400
innodb_io_capacity = 400
innodb flush method = 0_DIRECT
#* Security Features
# Read the manu&L, too, if you want chroot!
# chroot = /var/lJb/mysql/
# For generating SSL certificates I recommend the OpenSSL GUI “tinyca”.
# ssl. -ca=/etc/mysql/cace rt . pem
# ssl-cert=/etc/mysql/server-cert .pem
# ssl -key=/etc/mysql/server-key .pem
#*Galera-related settings
[galera]
#Mandatory settings
log-error=/var/log/mysql/mysql.err
log -bin=/var/log/mysql/mysql-replication.og
# Allow server to accept connections on all interfaces.
#bind-address=G.O.O.O
# Optional setting
wsrep_slave_threads=16
#innodb_flush_tog_at_trx_commit=0
[mysqidump]
quick
quote-names
max_allowed_packet = 16M
[mysql]
#no-auto-rehash # faster start of mysq1. but no tab completion
[isamchkJ
key_buffer = 16M
*# IMPORTANT: Additional settings that can override those from this file!
# The files must end with ‘.cnf’, otherwise they’ll be ignored.
!includedir /etc/mysql/conf.d/
Yaser Jawi
(11 rep)
Jan 19, 2017, 10:01 AM
• Last activity: May 24, 2025, 09:07 PM
0
votes
1
answers
282
views
Galera Cluster in read-write split and query cache
I currently have 3 MariaDB 10.4 servers, configured as a Galera cluster. In front of it, a Maxscale router to split writes (1 master) and reads (2 slaves). QC is disabled in all 3 servers. My question is: as the servers 2 and 3 are receiving READS only, is there any (real) benefit to enable QC on th...
I currently have 3 MariaDB 10.4 servers, configured as a Galera cluster. In front of it, a Maxscale router to split writes (1 master) and reads (2 slaves). QC is disabled in all 3 servers.
My question is: as the servers 2 and 3 are receiving READS only, is there any (real) benefit to enable QC on those 2 servers? Like server load decreasing?
I've already tried the Maxscale cache filter out of the box, but some queries have problems with it.
CrazyRabbit
(111 rep)
Jan 13, 2022, 03:12 PM
• Last activity: May 22, 2025, 04:10 PM
2
votes
1
answers
2347
views
Galera - single database backup & restore
I have a MariaDB Galera cluster. One of its nodes is dedicated for the backups (pc.weight=0 and the MaxScale SQL proxy doesn't know it exists). I'm using `mariabackup --backup --galera-info --target-dir=/path/to/my/dir` to do the daily backups. So this backs up everything. I want to be able to do 2...
I have a MariaDB Galera cluster. One of its nodes is dedicated for the backups (pc.weight=0 and the MaxScale SQL proxy doesn't know it exists).
I'm using
mariabackup --backup --galera-info --target-dir=/path/to/my/dir
to do the daily backups. So this backs up everything.
I want to be able to do 2 kinds of restore:
1. full restore in case of disaster (e.g. data center explosion)
2. single database restore in case of data loss (e.g. human error/hacking)
The case 1 is easy, I just take the backup I have and use mariabackup --prepare
and mariabackup --copy-back
on the new freshly installed Galera nodes.
But how can I achieve case 2?
I know I could do a mariabackup --prepare
and then for each table of the database I want to restore:
1. discard the table space of table X
2. copy back the file(s) of table X from my backup
3. import the table space
The problem is that this process is too low-level for Galera/wsrep to notice it and the changes are of course not replicated to the other nodes. So I would need to repeat this process on all the nodes.
I could also:
1. take a logical backup (mysqldump)
2. drop the database on one of the Galera nodes
3. re-create the database
4. import the dump I just did
This would be replicated to all the nodes. But it's highly inefficient and I would need to setup a temporary MariaDB server to run the mysqldump
on the backed up data.
There must be a better way, right? How are you doing it?
Your answers are much appreciated!
HORF
(21 rep)
Nov 21, 2018, 10:57 AM
• Last activity: May 6, 2025, 12:05 AM
Showing page 1 of 20 total questions