PostgreSQL process stuck in a "forked new backend" loop when used with Patroni
1
vote
1
answer
712
views
I have a 3 node Patroni2.1.1 + PGSQL13.4 cluster and one of the nodes is acting weird once it finished the basebackup bootstrap. I have another identical node working fine when put thru the same process.
Basically once Patroni tries to connect to localhost:5432 the server seems to die and refork (from my understanding). You'll see below the "loop" starting at
connection received: host=::1 port=XXXXX
.
2021-11-01 12:01:43.395 EDT LOG: starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-11-01 12:01:43.396 EDT LOG: listening on IPv6 address "::1", port 5432
2021-11-01 12:01:43.396 EDT LOG: listening on IPv4 address "127.0.0.1", port 5432
2021-11-01 12:01:43.397 EDT LOG: listening on IPv4 address "192.168.23.13", port 5432
2021-11-01 12:01:43.398 EDT LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-11-01 12:01:43.401 EDT LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-11-01 12:01:43.405 EDT LOG: database system was interrupted; last known up at 2021-11-01 12:01:25 EDT
2021-11-01 12:01:43.405 EDT DEBUG: removing all temporary WAL segments
2021-11-01 12:01:43.436 EDT DEBUG: executing restore command "cp /var/lib/pgsql/13/archive/00000077.history pg_wal/RECOVERYHISTORY"
cp: cannot stat '/var/lib/pgsql/13/archive/00000077.history': No such file or directory
2021-11-01 12:01:43.440 EDT DEBUG: could not restore file "00000077.history" from archive: child process exited with exit code 1
2021-11-01 12:01:43.441 EDT LOG: entering standby mode
2021-11-01 12:01:43.441 EDT DEBUG: backup time 2021-11-01 12:01:25 EDT in file "backup_label"
2021-11-01 12:01:43.441 EDT DEBUG: backup label pg_basebackup base backup in file "backup_label"
2021-11-01 12:01:43.441 EDT DEBUG: backup timeline 118 in file "backup_label"
2021-11-01 12:01:43.441 EDT DEBUG: executing restore command "cp /var/lib/pgsql/13/archive/00000076.history pg_wal/RECOVERYHISTORY"
2021-11-01 12:01:43.538 EDT LOG: restored log file "00000076.history" from archive
2021-11-01 12:01:43.540 EDT DEBUG: executing restore command "cp /var/lib/pgsql/13/archive/000000760000000100000000 pg_wal/RECOVERYXLOG"
2021-11-01 12:01:43.689 EDT LOG: restored log file "000000760000000100000000" from archive
2021-11-01 12:01:43.692 EDT DEBUG: got WAL segment from archive
2021-11-01 12:01:43.693 EDT DEBUG: checkpoint record is at 1/60
2021-11-01 12:01:43.693 EDT DEBUG: redo record is at 1/28; shutdown false
2021-11-01 12:01:43.693 EDT DEBUG: next transaction ID: 531; next OID: 24606
2021-11-01 12:01:43.693 EDT DEBUG: next MultiXactId: 1; next MultiXactOffset: 0
2021-11-01 12:01:43.693 EDT DEBUG: oldest unfrozen transaction ID: 478, in database 1
2021-11-01 12:01:43.693 EDT DEBUG: oldest MultiXactId: 1, in database 1
2021-11-01 12:01:43.693 EDT DEBUG: commit timestamp Xid oldest/newest: 0/0
2021-11-01 12:01:43.693 EDT DEBUG: transaction ID wrap limit is 2147484125, limited by database with OID 1
2021-11-01 12:01:43.693 EDT DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2021-11-01 12:01:43.693 EDT DEBUG: starting up replication slots
2021-11-01 12:01:43.693 EDT DEBUG: starting up replication origin progress state
2021-11-01 12:01:43.697 EDT DEBUG: resetting unlogged relations: cleanup 1 init 0
2021-11-01 12:01:43.698 EDT DEBUG: initializing for hot standby
2021-11-01 12:01:43.698 EDT DEBUG: my backend ID is 1
2021-11-01 12:01:43.698 EDT LOG: redo starts at 1/28
2021-11-01 12:01:43.698 EDT DEBUG: prune KnownAssignedXids to 531
2021-11-01 12:01:43.698 EDT CONTEXT: WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT DEBUG: 0 KnownAssignedXids (num=0 tail=0 head=0)
2021-11-01 12:01:43.698 EDT CONTEXT: WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT DEBUG: recovery snapshots are now enabled
2021-11-01 12:01:43.698 EDT CONTEXT: WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT DEBUG: prune KnownAssignedXids to 531
2021-11-01 12:01:43.698 EDT CONTEXT: WAL redo at 1/D8 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT DEBUG: end of backup reached
2021-11-01 12:01:43.698 EDT CONTEXT: WAL redo at 1/110 for XLOG/BACKUP_END: 1/28
2021-11-01 12:01:43.699 EDT DEBUG: checkpointer updated shared memory configuration values
2021-11-01 12:01:43.700 EDT LOG: consistent recovery state reached at 1/138
2021-11-01 12:01:43.700 EDT DEBUG: executing restore command "cp /var/lib/pgsql/13/archive/000000760000000100000001 pg_wal/RECOVERYXLOG"
2021-11-01 12:01:43.700 EDT LOG: database system is ready to accept read only connections
cp: cannot stat '/var/lib/pgsql/13/archive/000000760000000100000001': No such file or directory
2021-11-01 12:01:43.704 EDT DEBUG: could not restore file "000000760000000100000001" from archive: child process exited with exit code 1
2021-11-01 12:01:43.704 EDT DEBUG: could not open file "pg_wal/000000760000000100000001": No such file or directory
2021-11-01 12:01:43.704 EDT DEBUG: switched WAL source from archive to stream after failure
2021-11-01 12:01:43.705 EDT DEBUG: find_in_dynamic_libpath: trying "/usr/pgsql-13/lib/libpqwalreceiver"
2021-11-01 12:01:43.706 EDT DEBUG: find_in_dynamic_libpath: trying "/usr/pgsql-13/lib/libpqwalreceiver.so"
2021-11-01 12:01:44.370 EDT DEBUG: forked new backend, pid=242938 socket=12
2021-11-01 12:01:44.371 EDT LOG: connection received: host=::1 port=36714
2021-11-01 12:01:51.235 EDT DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:01:51.235 EDT DEBUG: exit(0)
2021-11-01 12:01:51.235 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:01:51.236 EDT DEBUG: forked new backend, pid=242941 socket=12
2021-11-01 12:01:51.237 EDT LOG: connection received: host=127.0.0.1 port=43428
2021-11-01 12:01:51.237 EDT DEBUG: reaping dead processes
2021-11-01 12:01:51.237 EDT DEBUG: server process (PID 242938) exited with exit code 0
2021-11-01 12:01:58.284 EDT DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:01:58.284 EDT DEBUG: exit(0)
2021-11-01 12:01:58.284 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:01:58.285 EDT DEBUG: reaping dead processes
2021-11-01 12:01:58.285 EDT DEBUG: server process (PID 242941) exited with exit code 0
2021-11-01 12:01:59.295 EDT DEBUG: forked new backend, pid=242944 socket=12
2021-11-01 12:01:59.295 EDT LOG: connection received: host=::1 port=36758
2021-11-01 12:02:06.301 EDT DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:02:06.301 EDT DEBUG: exit(0)
2021-11-01 12:02:06.301 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:06.301 EDT DEBUG: forked new backend, pid=242948 socket=12
2021-11-01 12:02:06.302 EDT DEBUG: reaping dead processes
2021-11-01 12:02:06.302 EDT DEBUG: server process (PID 242944) exited with exit code 0
2021-11-01 12:02:06.302 EDT LOG: connection received: host=127.0.0.1 port=43470
2021-11-01 12:02:13.440 EDT DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:02:13.440 EDT DEBUG: exit(0)
2021-11-01 12:02:13.440 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:13.441 EDT DEBUG: reaping dead processes
2021-11-01 12:02:13.441 EDT DEBUG: server process (PID 242948) exited with exit code 0
2021-11-01 12:02:14.148 EDT DEBUG: postmaster received signal 2
2021-11-01 12:02:14.148 EDT LOG: received fast shutdown request
2021-11-01 12:02:14.150 EDT LOG: aborting any active transactions
2021-11-01 12:02:14.150 EDT DEBUG: shmem_exit(0): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: proc_exit(0): 2 callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: exit(0)
2021-11-01 12:02:14.150 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: shmem_exit(1): 7 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: removing all KnownAssignedXids
2021-11-01 12:02:14.150 EDT DEBUG: release all standby locks
2021-11-01 12:02:14.150 EDT DEBUG: proc_exit(1): 2 callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: exit(1)
2021-11-01 12:02:14.150 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:14.151 EDT DEBUG: reaping dead processes
2021-11-01 12:02:14.151 EDT DEBUG: reaping dead processes
2021-11-01 12:02:14.449 EDT DEBUG: forked new backend, pid=242954 socket=12
2021-11-01 12:02:14.450 EDT LOG: connection received: host=::1 port=36798
2021-11-01 12:02:15.915 EDT FATAL: terminating walreceiver process due to administrator command
2021-11-01 12:02:15.915 EDT DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:15.915 EDT DEBUG: shmem_exit(1): 6 on_shmem_exit callbacks to make
2021-11-01 12:02:15.915 EDT DEBUG: proc_exit(1): 2 callbacks to make
2021-11-01 12:02:15.915 EDT DEBUG: exit(1)
2021-11-01 12:02:15.916 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.916 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.916 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.917 EDT DEBUG: reaping dead processes
2021-11-01 12:02:15.917 EDT LOG: shutting down
2021-11-01 12:02:15.917 EDT DEBUG: skipping restartpoint, already performed at 1/28
2021-11-01 12:02:15.919 EDT DEBUG: updated min recovery point to 1/1000000 on timeline 118
2021-11-01 12:02:15.920 EDT DEBUG: shmem_exit(0): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT DEBUG: shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT DEBUG: proc_exit(0): 2 callbacks to make
2021-11-01 12:02:15.920 EDT DEBUG: exit(0)
2021-11-01 12:02:15.920 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.921 EDT DEBUG: reaping dead processes
2021-11-01 12:02:15.921 EDT DEBUG: writing stats file "pg_stat/global.stat"
2021-11-01 12:02:15.921 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.921 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.921 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.922 EDT DEBUG: reaping dead processes
2021-11-01 12:02:17.890 EDT DEBUG: postmaster received signal 2
2021-11-01 12:02:18.617 EDT DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:02:18.617 EDT DEBUG: exit(0)
2021-11-01 12:02:18.617 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:18.618 EDT DEBUG: reaping dead processes
2021-11-01 12:02:18.618 EDT DEBUG: server process (PID 242954) exited with exit code 0
2021-11-01 12:02:18.618 EDT DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.618 EDT DEBUG: shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:18.618 EDT DEBUG: cleaning up dynamic shared memory control segment with ID 2104592565
2021-11-01 12:02:18.631 EDT DEBUG: proc_exit(0): 2 callbacks to make
2021-11-01 12:02:18.631 EDT LOG: database system is shut down
2021-11-01 12:02:18.631 EDT DEBUG: exit(0)
2021-11-01 12:02:18.631 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.631 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.631 EDT DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:18.633 EDT DEBUG: logger shutting down
2021-11-01 12:02:18.633 EDT DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT DEBUG: proc_exit(0): 0 callbacks to make
2021-11-01 12:02:18.633 EDT DEBUG: exit(0)
2021-11-01 12:02:18.633 EDT DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT DEBUG: proc_exit(-1): 0 callbacks to make
If I stop Patroni, the server starts running fine and joins the replication as expected. The user postgres can connect just fine locally or remotely using the psql utility or even using pgadmin.
Could someone explain what's going on here? I'm still learning PGSQL and I'm not quite comfortable with those logs yet... Thank you!
For more context you can also look at Patroni issue #2104 I posted.
Asked by JulioQc
(143 rep)
Nov 1, 2021, 04:20 PM
Last activity: Nov 3, 2021, 05:14 PM
Last activity: Nov 3, 2021, 05:14 PM