Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

0 answers

50 views

Postgres pg_wal size increasing possibly from patroni outage

I've recently hit an issue of pg_wal files not being cleaned up on a replica postgres instance server despite these being archived on the master instance. Patroni was out during the time due to an ETCD outage, after correcting this and Patroni reestablishing connections the wal files began to be cleaned up on the read only instance. During this time I saw no replication lag during this time. The second replica was also not affected in the same way. All 3 servers had ETCD being out. System * Postgres 16 * Patroni 4.0.4 * 3 nodes in the cluster 1 leader, 2 read only replicas * cluster is managed by patroni * Replication is done by physical replication Question From my understanding of Patroni the outage on Patroni shouldn't prevent the WAL files on the replication instance being removed. Am I missing something here in terms of the active responsibilities of Patroni? I ensured that the read only replica was still in standby mode and it was operating as such during that time so wasn't that it was promoted during the outage. From that time Patroni was repeatedly failing on

Jul 23 05:02:24  patroni: 2025-07-23 05:02:24,141 ERROR: watchprefix failed: ProtocolError("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes rea
Jul 23 05:02:25  patroni: Traceback (most recent call last):
...
Jul 23 05:02:25  patroni: File "/usr/lib/python3/dist-packages/patroni/dcs/etcd.py", line 262, in _do_http_request Jul 23 05:02:25  patroni: raise etcd.EtcdConnectionFailed('No more machines in the cluster') 
Jul 23 05:02:25  patroni: etcd.EtcdConnectionFailed: No more machines in the cluster 
Jul 23 05:02:25  systemd: patroni.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 05:02:25  systemd: patroni.service: Failed with result 'exit-code'.
Jul 23 05:02:25  systemd: patroni.service: Unit process 407680 (postgres) remains running after unit stopped.
Jul 23 05:02:25  systemd: patroni.service: Unit process 407683 (postgres) remains running after unit stopped.
...

From this, Patroni is crashing whilst the postgres process remains up (The proc for postgres server is 407680) During this time I had also looked at pg_stat_activity and hadn't seen any long running queries that I'd expect to hold the WAL back. In terms of postgres logs, the only things of note were when the pg_wal started to increase this is somewhat confirmed in the logs

2025-07-22 06:46:02.235 UTC  LOG:  restartpoint complete: wrote 517224 buffers (6.2%); 0 WAL file(s) added, 246 removed, 23 recycled; write=66.028 s, sync=0.499 s, total=66.721 s; sync files=686, longest=0.109 s, average=0.001 s; distance=4407526 kB, estimate=4407526 kB; lsn=21204/6CAA5F0, redo lsn=21203/13048F80
2025-07-22 06:47:27.708 UTC  LOG:  restartpoint complete: wrote 523662 buffers (6.2%); 0 WAL file(s) added, 1 removed, 22 recycled; write=77.386 s, sync=0.263 s, total=77.676 s; sync files=676, longest=0.012 s, average=0.001 s; distance=4407428 kB, estimate=4407516 kB; lsn=21205/159E6E88, redo lsn=21204/2006A040
2025-07-22 06:49:06.802 UTC  LOG:  restartpoint complete: wrote 562491 buffers (6.7%); 1 WAL file(s) added, 0 removed, 0 recycled; write=89.970 s, sync=0.256 s, total=90.321 s; sync files=598, longest=0.019 s, average=0.001 s; distance=4407504 kB, estimate=4407515 kB; lsn=21206/2207D1D0, redo lsn=21205/2D09E288

Where the WAL file(s) removed goes to 0 and then on recovery postgres receives a SIGHUP which causes it to reload config

2025-07-23 08:16:53.955 UTC  LOG:  received SIGHUP, reloading configuration files

Iamterribleatcoding (1 rep)

Jul 23, 2025, 11:37 AM • Last activity: Jul 23, 2025, 09:49 PM

0 votes

1 answers

224 views

On which machine we should install WAL-G?

postgresql high-availability patroni

We are using PostgreSQL version 12.1. Currently we have high availability and automatic failover using Patroni and it's working fine. Now we are planning for disaster recovery and we are planning to use WAL-G for this. We read some article to know that on which machine we should install WAL-G master...

                                  We are using PostgreSQL version 12.1. Currently we have high availability and automatic failover using Patroni and it's working fine.

Now we are planning for disaster recovery and we are planning to use WAL-G for this.

We read some article to know that on which machine we should install WAL-G master, slave or both? But couldn't get a clear idea.

Can you please help us to know on which machine should we set up WAL-G?

Tekchand Dagar (101 rep)

Mar 5, 2020, 10:49 AM • Last activity: Jun 7, 2025, 02:00 PM

1 votes

1 answers

98 views

Is this Patroni topology safe?

postgresql high-availability patroni

My organization is planning to implement a high-availability PostgreSQL cluster using Patroni and etcd. However, we only have two data center sites available, which makes deploying a standard 3-node etcd cluster across separate failure domains challenging. We understand that running only 2-node etcd...

                                  My organization is planning to implement a high-availability PostgreSQL cluster using Patroni and etcd. However, we only have two data center sites available, which makes deploying a standard 3-node etcd cluster across separate failure domains challenging.

We understand that running only 2-node etcd cluster increases the risk of split-brain or unavailability if one site becomes unreachable, due to the lack of a quorum.

to address this, we come with the following topology:

DC (Primary Site):

- 192.168.30.80: PostgreSQL node running Patroni (initial master)

- 192.168.30.83: etcd node

DRC (Disaster Recovery Site):

- 192.168.30.81: PostgreSQL node running Patroni (replica)

- 192.168.30.82: backup etcd node

each site runs a single-node etcd cluster, we have tested that failover still works in this setup, we use etcd mirror maker feature to continuously relay key creates and updates to a separate cluster in the DRC. We then use keepalived to manage a floating IP between the etcd clusters, which is used by Patroni on both nodes to access etcd.

My questions are:
- What are the risks are involved in running this kind of setup?

- Would it be better to add a lightweight third etcd node in separate site (e.g., the cloud) to form a proper quorum?

Ilham Syamsuddin (35 rep)

May 21, 2025, 09:53 AM • Last activity: May 26, 2025, 01:33 PM

0 votes

0 answers

123 views

etcd is out of space, getting NOSPACE alarm

postgresql high-availability patroni

We have two node patroni postgres databases on preprod environment. Today I tried to do patronictl edit-config and saw etcd was out of space. I looked at /var/lib/etcd/member directory and it was around 2GB. Because of this I can't edit postgresql.conf file but also manage patroni cluster. I tried e...

                                  We have two node patroni postgres databases on preprod environment. Today I tried to do patronictl edit-config and saw etcd was out of space. I looked at /var/lib/etcd/member directory and it was around 2GB. Because of this I can't edit postgresql.conf file but also manage patroni cluster.

I tried etcdctl compact and defrag commands but they did nothing. Etcd database size remained the same. I'd like to decrease size of etcd with correct methods so I'd really appreciate your help.

Aras Etiya (1 rep)

Dec 26, 2024, 12:59 PM

1 votes

0 answers

78 views

How to prevent PostgreSQL logging Patroni health check connections?

postgresql patroni haproxy

I have PostgreSQL 15.2 from the Percona repo, along with Patroni 3.0.1 as management tool, HAProxy 2.5.11 for proxying and PGBouncer 1.18.0 as server side pool and keepalived 2.1. for the virtual IP. I have 3 nodes, I will use these IPs for the sake of this post: 10.0.1.1 is the primary, node1 10.0....

listen primary
    bind *:9999
    option httpchk /primary
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 10.1.0.1:6432 check port 8008
    server node2 10.1.0.2:6432 check port 8008
    server node3 10.1.0.3:6432 check port 8008

6432 is the PGBouncer port which will actually receive the SQL traffic, but health check is done over http on port 8008 with Patroni. This causes Patroni to each time make a TCP ping to each PostgreSQL node, resulting in the following very ugly PostgreSQL logs every 3 seconds on every node:

2024-11-26 20:45:04.200 CET - Process=, Application=[[unknown]], User=[[unknown]]@[10.0.1.1(36024)]/[[unknown]], SessionID=[67462540.4d387], Session start=[2024-11-26 20:45:04 CET], Command tag=[]: LOG:  could not receive data from client: Connection reset by peer
2024-11-26 20:45:05.152 CET - Process=, Application=[[unknown]], User=[[unknown]]@[10.0.1.2(48138)]/[[unknown]], SessionID=[67462541.4d391], Session start=[2024-11-26 20:45:05 CET], Command tag=[]: LOG:  could not receive data from client: Connection reset by peer
2024-11-26 20:45:06.202 CET - Process=, Application=[[unknown]], User=[[unknown]]@[10.0.1.3(51212)]/[[unknown]], SessionID=[67462542.4d39d], Session start=[2024-11-26 20:45:06 CET], Command tag=[]: LOG:  could not receive data from client: Connection reset by peer

On a network level it is a simple 3-way TCP handshake, closed immediately:

I didn't find any way to hide this in the PostgreSQL log. Any idea? It is unnecessary and misleading log message.

Gábor Major (163 rep)

Nov 26, 2024, 09:01 PM • Last activity: Nov 27, 2024, 05:29 PM

1 votes

2 answers

368 views

Postgres Hot standby stuck starting waiting for contrrecord despite wal file being present

postgresql replication high-availability patroni

We are experimenting with [patroni Multi Datacenter HA](https://patroni.readthedocs.io/en/latest/ha_multi_dc.html) with postgres16 using replication slots. At high level for building the remote cluster from scratch , we are by running base backup, while active is experiencing writes . We use replica...

postgresql:
  parameters:
    archive_command: /bin/true
    archive_mode: 'on'
    checkpoint_completion_target: '0.9'
    checkpoint_timeout: 300
    default_statistics_target: 100
    effective_cache_size: 12GB
    effective_io_concurrency: 2
    fsync: 'on'
    full_page_writes: 'on'
    hot_standby: true
    hot_standby_feedback: true
    idle_in_transaction_session_timeout: 10min
    log_min_duration_statement: 10000
    log_min_error_statement: ERROR
    log_min_messages: WARNING
    log_temp_files: 4000
    maintenance_work_mem: 3GB
    max_connections: 512
    max_parallel_maintenance_workers: 4
    max_parallel_workers: 10
    max_parallel_workers_per_gather: 2
    max_prepared_transactions: 1024
    max_replication_slots: 10
    max_slot_wal_keep_size: 20GB
    max_wal_senders: 10
    max_wal_size: 2GB
    max_worker_processes: 10
    min_wal_size: 512MB
    random_page_cost: 2
    shared_buffers: 8GB
    synchronous_commit: remote_write
    track_functions: all
    track_io_timing: 'on'
    wal_buffers: 16MB
    wal_compression: 'on'
    wal_keep_segments: 128
    wal_level: replica
    work_mem: 16MB
  use_pg_rewind: true
  use_slots: true
retry_timeout: 14

What we see in experiments is, once PG is started post basebackup via patroni, it moves to patroni leader/running state , then we apply standby cluster config

curl -X PATCH http://localhost:8008/config \
     -H "Content-Type: application/json" \
     -d '{
           "standby_cluster": {
             "host": "standby.example.com",
             "port": 5432,
             "primary_slot_name": "standby_slot"
           }
         }' \
     --data-urlencode "force=true"

This results in pg rewind , followed by cluster moving to Standby leader / streaming state most of the time. But at other times, it moves to Standby leader/ starting state at the end of rewind .What we see in pg logs is something similar to below

2024-08-19 20:02:00 UTC:::@::LOG:  contrecord is requested by 93/4C000028
2024-08-19 20:02:00 UTC:::@::LOG:  waiting for WAL to become available at 93/4C000040

Further analyzing lsn in replication slot indicates this record is generated post base backup time,was archived by leader just after base backup. In standby cluster , the wal file associated with lsn is present in pg_wal dir , but database is stuck on starting up , it never goes to streaming state and current state of replication slot is not active. Why wal file not being sent to hot standby despite being present on active cluster , to enable it move it to streaming state ?

Ganesh (121 rep)

Aug 19, 2024, 09:48 PM • Last activity: Sep 7, 2024, 04:12 AM

0 votes

1 answers

4487 views

Patroni with etcd starting up

postgresql configuration high-availability patroni

I have an existing PostgreSQL cluster v11.5 with streaming replication. I want to set up high availability managed by Patroni. I installed Patroni on both machines, I set up an etcd cluster on three others Virtuals Machines ['http://10.A.A.191:2379 ', 'http://10.B.B.99:2379 ', 'http://10.C.C.170:2379 '] but Patroni starting failed with systemctl. When I start Patroni with patroni -c /etc/patroni.yml command, I have the errors below:

-12-26 16:51:38,207 DEBUG: New etcd client created for http://127.0.0.1:4001 
2019-12-26 16:51:38,207 DEBUG: Selection next machine in cache. Available machines: ['http://10.A.A.191:2379 ', 'http://10.B.B.99:2379 ', 'http://10.C.C.170:2379 ']
2019-12-26 16:51:38,207 INFO: Selected new etcd server http://10.C.C.170:2379 
2019-12-26 16:51:38,208 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2019-12-26 16:51:38,208 DEBUG: Starting new HTTP connection (1): 10.C.C.170:2379
2019-12-26 16:51:38,210 DEBUG: http://10.C.C.170:2379  "GET /v2/machines HTTP/1.1" 404 19
2019-12-26 16:51:38,211 ERROR: Failed to get list of machines from http://10.C.C.170:2379/v2 : EtcdException('Bad response : 404 page not found\n',)
2019-12-26 16:51:38,211 INFO: Retrying on http://10.A.A.191:2379 
2019-12-26 16:51:38,211 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2019-12-26 16:51:38,211 DEBUG: Starting new HTTP connection (1): 10.237.132.191:2379
2019-12-26 16:51:38,214 DEBUG: http://10.A.A.191:2379  "GET /v2/machines HTTP/1.1" 404 19
2019-12-26 16:51:38,214 ERROR: Failed to get list of machines from http://10.A.A.191:2379/v2 : EtcdException('Bad response : 404 page not found\n',)
2019-12-26 16:51:38,214 INFO: Retrying on http://10.B.B.99:2379 
2019-12-26 16:51:38,215 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2019-12-26 16:51:38,215 DEBUG: Starting new HTTP connection (1): 10.B.B.99:2379
2019-12-26 16:51:38,217 DEBUG: http://10.B.B.99:2379  "GET /v2/machines HTTP/1.1" 404 19
2019-12-26 16:51:38,217 ERROR: Failed to get list of machines from http://10.B.B.99:2379/v2 : EtcdException('Bad response : 404 page not found\n',)
2019-12-26 16:51:38,217 INFO: waiting on etcd

The problem seems to be at API level DEBUG: http://10.B.B.99:2379 "GET /v2/machines HTTP/1.1" 404 19 etcd is reacheable from PostgreSQL host: curl -s http://10.B.B.99:2380/members |jq -r returns

[
  {
    "id": 730XXXXXXXXXXX000,
    "peerURLs": [
      "http://10.B.B.99:2380 "
    ],
    "name": "etcd1",
    "clientURLs": [
      "http://0.0.0.0:2379 "
    ]
  },
  {
    "id": 956XXXXXXXXXXX000,
    "peerURLs": [
      "http://10.C.C.170:2380 "
    ],
    "name": "etcd3",
    "clientURLs": [
      "http://0.0.0.0:2379 "
    ]
  },
  {
    "id": 1551XXXXXXXXXXX000,
    "peerURLs": [
      "http://10.A.A.191:2380 "
    ],
    "name": "etcd2",
    "clientURLs": [
      "http://0.0.0.0:2379 "
    ]
  }
]

***etcd1 start command:***

ExecStart=/usr/local/bin/etcd \
--name etcd1 \
--advertise-client-urls http://0.0.0.0:2379  \
--listen-client-urls http://0.0.0.0:2379  \
--initial-advertise-peer-urls http://10.B.B.99:2380  \
--listen-peer-urls http://0.0.0.0:2380  \
--initial-cluster etcd1=http://10.B.B.99:2380,etcd2=http://10.A.A.191:2380,etcd3=http://10.C.C.170:2380  \
--initial-cluster-token etcd-cluster-atom \
--initial-cluster-state new \
--logger zap \
--log-outputs stdout \
--log-level debug

Someone could help me please? What is wrong? Many thanks **Versions:** etcd Version: _3.4.0_ etcdctl version: _3.4.0_ API version: _3.4_ patroni _1.6.3_ python-etcd==_0.4.5_

Mika (361 rep)

Dec 26, 2019, 04:29 PM • Last activity: May 18, 2024, 04:52 PM

1 votes

1 answers

4224 views

Is it good practice to use both PGBouncer and PGPool together?

postgresql pgpool pgbouncer patroni

We have setup a streaming replication on Postgres 13 and it all works fine. Current setup is as follows. We have one Primary Postgres and one Secondary Postgres connected via streaming replication. This replication and cluster management of failover is managed by Patroni. We have PGBouncer for conne...

                                  We have setup a streaming replication on Postgres 13 and it all works fine. Current setup is as follows.

We have one Primary Postgres and one Secondary Postgres connected via streaming replication. This replication and cluster management of failover is managed by Patroni. We have PGBouncer for connection pooling and it is currently connected to PG Master. This works fine without issues. 

We wanted to use the secondary postgres for read purposes to reduce the read load on Primary. So we wanted to introduce PGPool in the eco system to get benefit of read/write splits. Couple of articles suggested that we could use both PGBouncer and PgPool together to get better results from both tools (https://scalegrid.io/blog/postgresql-connection-pooling-part-4-pgbouncer-vs-pgpool/) . So now our setup adds PgPool-II between PgBouncer and PG with connection pooling disabled.

This setup also works fine. However I am not able to understand about connection pooling. I could see many connections in pgpool when i do SHOW POOL_POOLS; 

 - Who is holding on to the connections for connection pooling ? PgBouncer/PgPool ?
 - Why I see connections when I issue command show pool_pools in pgpool when connection pooling is disabled.
 - Is this setup right and scalable? What configurations I need to change to get better of both tools ?

Pramod (111 rep)

Oct 26, 2021, 10:19 AM • Last activity: Apr 30, 2024, 09:16 AM

0 votes

0 answers

960 views

Patroni failed to get list of machines from etcd

postgresql clustering high-availability failover patroni

I am running a patroni cluster (3.4) on linux with an etcd cluster. Normally the cluster runs perfectly fine but sometimes I get some errors saying Request to etcd server failed (ReadtmeoutError, NewConnectionError, ConnectTimeoutError) ETCD: 10.100.10.4 10.100.11.3 10.100.11.5 Patroni/PostgreSQL No...

2024-04-21 04:45:42,868 DEBUG: Writing {"conn_url":"postgres://10.100.10.10:5432/postgres","api_url":"http://10.100.10.10:8008/patroni ","state":"running","role":"master","version":"3.3.0","xlog_location":1642347112,"timeline":5} to key /db/mycluster/members/postgresql0 ttl=30 dir=False append=False
2024-04-21 04:45:42,869 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:42,871 DEBUG: http://10.100.11.3:2379  "PUT /v2/keys/db/mycluster/members/postgresql0 HTTP/1.1" 200 790
2024-04-21 04:45:42,871 INFO: no action. I am (postgresql0), the leader with the lock
2024-04-21 04:45:46,136 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': }
2024-04-21 04:45:46,136 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:46,138 DEBUG: http://10.100.11.3:2379  "GET /v2/keys/db/mycluster/?recursive=true&quorum=false HTTP/1.1" 200 None
2024-04-21 04:45:46,139 DEBUG: API thread: 10.100.11.6 - - "GET /cluster HTTP/1.1" 200 - latency: 3.354 ms
2024-04-21 04:45:51,981 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': }
2024-04-21 04:45:51,983 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:51,987 DEBUG: http://10.100.11.3:2379  "GET /v2/keys/db/mycluster/?recursive=true&quorum=false HTTP/1.1" 200 None
2024-04-21 04:45:51,989 DEBUG: API thread: 10.100.11.6 - - "GET /cluster HTTP/1.1" 200 - latency: 16.522 ms
2024-04-21 04:45:52,859 DEBUG: Issuing read for key /db/mycluster/ with args {'recursive': True, 'quorum': False, 'retry': }
2024-04-21 04:45:52,861 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:56,198 ERROR: Request to server http://10.100.11.3:2379  failed: MaxRetryError('HTTPConnectionPool(host=\'10.100.11.3\', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'10.100.11.3\', port=2379): Read timed out. (read timeout=3.332937417338447)"))')
2024-04-21 04:45:56,198 INFO: Reconnection allowed, looking for another server.
2024-04-21 04:45:56,198 INFO: Retrying on http://10.100.10.4:2379 
2024-04-21 04:45:56,199 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:56,199 DEBUG: Starting new HTTP connection (1): 10.100.10.4:2379
2024-04-21 04:45:56,200 ERROR: Request to server http://10.100.10.4:2379  failed: MaxRetryError("HTTPConnectionPool(host='10.100.10.4', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))")
2024-04-21 04:45:56,200 INFO: Reconnection allowed, looking for another server.
2024-04-21 04:45:56,200 INFO: Retrying on http://10.100.11.5:2379 
2024-04-21 04:45:56,200 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:56,200 DEBUG: Starting new HTTP connection (1): 10.100.11.5:2379
2024-04-21 04:45:57,870 ERROR: Request to server http://10.100.11.5:2379  failed: MaxRetryError("HTTPConnectionPool(host='10.100.11.5', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ConnectTimeoutError(, 'Connection to 10.100.11.5 timed out. (connect timeout=1.6666666666666667)'))")
2024-04-21 04:45:57,870 INFO: Reconnection allowed, looking for another server.
2024-04-21 04:45:57,871 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:57,871 DEBUG: Starting new HTTP connection (1): 10.100.11.5:2379
2024-04-21 04:45:59,540 ERROR: Failed to get list of machines from http://10.100.11.5:2379/v2 : MaxRetryError("HTTPConnectionPool(host='10.100.11.5', port=2379): Max retries exceeded with url: /v2/machines (Caused by ConnectTimeoutError(, 'Connection to 10.100.11.5 timed out. (connect timeout=1.6666666666666667)'))")
2024-04-21 04:45:59,541 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:45:59,541 DEBUG: Starting new HTTP connection (1): 10.100.11.3:2379
2024-04-21 04:46:01,210 ERROR: Failed to get list of machines from http://10.100.11.3:2379/v2 : MaxRetryError("HTTPConnectionPool(host='10.100.11.3', port=2379): Max retries exceeded with url: /v2/machines (Caused by ConnectTimeoutError(, 'Connection to 10.100.11.3 timed out. (connect timeout=1.6666666666666667)'))")
2024-04-21 04:46:01,211 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:46:01,211 DEBUG: Starting new HTTP connection (1): 10.100.10.4:2379
2024-04-21 04:46:01,212 ERROR: Failed to get list of machines from http://10.100.10.4:2379/v2 : MaxRetryError("HTTPConnectionPool(host='10.100.10.4', port=2379): Max retries exceeded with url: /v2/machines (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))")
2024-04-21 04:46:01,212 DEBUG: Failed to update list of etcd nodes: EtcdException('Could not get the list of servers, maybe you provided the wrong host(s) to connect to?')
2024-04-21 04:46:01,484 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2024-04-21 04:46:01,484 DEBUG: Starting new HTTP connection (1): 10.100.11.3:2379
2024-04-21 04:46:02,486 ERROR: Request to server http://10.100.11.3:2379  failed: MaxRetryError("HTTPConnectionPool(host='10.100.11.3', port=2379): Max retries exceeded with url: /v2/keys/db/mycluster/?recursive=true&quorum=false (Caused by ConnectTimeoutError(, 'Connection to 10.100.11.3 timed out. (connect timeout=1.0)'))")
2024-04-21 04:46:02,486 INFO: Reconnection allowed, looking for another server.

Firewall should not be a problem, but maybe timeouts? This error only appear on one node (10.100.10.10) If you need more information, please let me know! Thank you!

mymarcelsql (21 rep)

Apr 21, 2024, 05:23 AM • Last activity: Apr 22, 2024, 09:20 AM

0 votes

0 answers

211 views

Pgbackrest connection to Patroni cluster with HAProxy (VIP)

postgresql patroni

I have Patroni cluster with: - 1 VIP - 3 nodes HAProxy - 3 nodes PostgreSQL Patroni - 3 nodes etcd And dedicated pgbackrest server, from which I want to connect to Patroni cluster and take backups. But I don't want to connect to physical servers, because I've configured backup standby, and so after...

                                  I have Patroni cluster with:

 - 1 VIP 
 - 3 nodes HAProxy 
 - 3 nodes PostgreSQL Patroni 
 - 3 nodes etcd

And dedicated pgbackrest server, from which I want to connect to Patroni cluster and take backups. But I don't want to connect to physical servers, because I've configured backup standby, and so after switchover or failovers, I must to change pgbackrest.conf file each time.
Is it acceptable to connect through HAProxy VIP to Patroni cluster from pgbackrest.conf?

pgbackrest.conf:

    [global]
    #repo1-path=/var/lib/pgbackrest
    repo1-host-user=postgres
    repo1-host-type=ssh
    log-level-console=debug
    log-level-file=detail
    compress=n
    start-fast=y
    stop-auto=y
    
    [pg_test]
    repo1-path=/var/lib/pgbackrest/pg_test
    pg1-host=VIP
    pg1-host-user=pgbackrest
    pg1-path=/opt/postgres/data/
    pg1-port=5000
    pg2-host=VIP
    pg2-host-user=pgbackrest
    pg2-path=/opt/postgres/data/
    pg2-port=5001
    backup-standby=y
    archive-async=y
    process-max=2
    repo1-retention-full=2

My patroni parameters:

    parameters:
        archive_command: pgbackrest --stanza=pg_test archive-push %p
        archive_mode: true
        autovacuum_analyze_scale_factor: 0.05
        autovacuum_vacuum_scale_factor: 0.05
        checkpoint_completion_target: 0.9
        checkpoint_timeout: 15 min
        effective_cache_size: 11 GB
        effective_io_concurrency: 100
        huge_pages: false
        maintenance_work_mem: 320 MB
        max_connections: 1000;
        max_parallel_maintenance_workers: 4
        max_parallel_workers: 8
        max_parallel_workers_per_gather: 4
        max_wal_senders: 10
        max_wal_size: 1024 MB
        max_worker_processes: 8
        min_wal_size: 512 MB
        parallel_leader_participation: true
        random_page_cost: 1.25
        shared_buffers: 4096 MB
        work_mem: 32 MB

And this is error I'm getting when creating stanza:

    -------------------PROCESS START-------------------
    2024-03-18 08:41:49.679 P00   INFO: stanza-create command begin 2.50: --exec-id=1598083-c6b0e543 --log-level-console=info --log-level-file=detail --pg1-host=VIP_adress --pg2-host=VIP_adress --pg1-host-user=pgbackrest --pg2-host-user=pgbackrest --pg1-path=/opt/postgres/data/ --pg2-path=/opt/postgres/data/ --pg1-port=5000 --pg2-port=5001 --repo1-host-type=ssh --repo1-path=/var/lib/pgbackrest/pg_test --stanza=pg_test
    2024-03-18 08:41:50.055 P00   WARN: unable to check pg1: [OptionRequiredError] remote-0 process on 'VIP_adress' terminated unexpectedly : ERROR: : stanza-create command requires option: pg2-path
                                               HINT: does this stanza exist?
    2024-03-18 08:41:50.292 P00   WARN: unable to check pg2: [OptionRequiredError] remote-0 process on 'VIP_adress' terminated unexpectedly : ERROR: : stanza-create command requires option: pg2-path
                                               HINT: does this stanza exist?
    2024-03-18 08:41:50.292 P00  ERROR: : unable to find primary cluster - cannot proceed
                                        HINT: are all available clusters in recovery?
    2024-03-18 08:41:50.292 P00   INFO: stanza-create command end: aborted with exception 
                                

Farid Jafarzadeh (1 rep)

Mar 18, 2024, 10:51 AM • Last activity: Mar 22, 2024, 06:58 AM

0 votes

1 answers

171 views

High Availability postgres db cluster with Patroni and Etcd

postgresql replication high-availability patroni master-slave

I am building a HA Postgres db cluster with one master and multiple slaves. Please look at the below points and correct me if I am wrong: 1. For each slave node, I need to use pg_basebackup to sync and setup stream replication with the master node. 2. Setup Patroni on each node so that the leader ca...

                                  I am building a HA Postgres db cluster with one master and multiple slaves. Please look at the below points and correct me if I am wrong:

For each slave node, I need to use pg_basebackup to sync and setup stream replication with the master node.
Setup Patroni on each node so that the leader can be elected from any of the available nodes in case of master failure (In this case, if the master fails, a slave node will be elected as master and write operations will be handled by it. Now if the master comes online and becomes the master of the cluster, then will the master sync with the previous master i.e. updated slaves).
Setup Etcd on a node and configure it in the Patroni files of each node.

Abizer Susnerwala (1 rep)

Nov 28, 2023, 06:47 AM • Last activity: Nov 28, 2023, 10:53 AM

0 votes

0 answers

2888 views

TimeZone Setting On PostgreSQL Databases

postgresql ubuntu timezone postgresql-15 patroni

I have a **Patroni cluster** with two **Ubuntu 22 db server**. The timezone of my instances was set to **UTC**. I have needed to edit them to set the local timezone to **Europe/Istanbul**. I have made the timezone change on OS-level. When i checked the timezone setting with timedatectl command, i sa...

DATABASE "db_name" SET TIMEZONE TO "Europe/Istanbul";

Also applied the setting:

ALTER SYSTEM SET TimeZone TO 'Europe/Istanbul';

And i edited my patroni config file and entered timezone setting like:

: Europe/Istanbul

After that, i restarted my patroni cluster. After restarting my cluster, i saw that some of databases had applied the changes but other ones had not. I checked those databases configuration. The configuration was same with the applied ones. I still see old timezone setting which was ETC/UTC when I execute sql command below:

timezone

I am a little bit confused about it. Strange situation. What might cause the situation? Do i miss something?

user274715

Sep 6, 2023, 11:19 AM

1 votes

2 answers

787 views

what after WAL LSN value of type pg_lsn reach its limit

postgresql replication clustering high-availability patroni

I am trying to determine the optimal Standby for automated failover based on the highest value of `pg_last_wal_receive_lsn()` of all slaves. But what if `max_wal_size` is reached, does `pg_lsn` values go to 0? Please suggest if there's any better approach to determine node with the best RPO out of a...

                                  I am trying to determine the optimal Standby for automated failover based on the highest value of pg_last_wal_receive_lsn() of all slaves. But what if max_wal_size is reached, does pg_lsn values go to 0?
 
Please suggest if there's any better approach to determine node with the best RPO out of all secondaries.

Lucky Shofar (13 rep)

Oct 27, 2022, 08:33 AM • Last activity: Jun 17, 2023, 06:33 AM

7 votes

1 answers

19011 views

Could not receive data from WAL stream: ERROR: requested WAL segment has already been removed

postgresql postgresql-9.6 patroni

Configuration: Postgres 9.6 with a 3 cluster node. db1 is the master, db2 and db3 are replicas. WAL files are archived in AWS S3 using custom `pgrsync` tool. Cluster managed by patroni. The `archive_command` and `restore_command` is properly configured on all the nodes. To simulate: On db1, do heavy...

Configuration: Postgres 9.6 with a 3 cluster node. db1 is the master, db2 and db3 are replicas. WAL files are archived in AWS S3 using custom pgrsync tool. Cluster managed by patroni. The archive_command and restore_command is properly configured on all the nodes. To simulate: On db1, do heavy writes (like vacuum a large table) and then stop db1 by sudo systemctl stop patroni). db3 becomes the new leader. db2 requests more WAL files, which it gets via the proper restore command from AWS S3, becomes replica to db3. Now, start db1 again by (sudo systemctl start patroni). But db1 (the old leader and the new to-be-replica) never comes up as a replica and gives the error message: >could not receive data from WAL stream: ERROR: requested WAL segment 0000002400053C55000000AE has already been removed. This error message is reported by db3 (the leader), which db1 just logs it. So, let's see the timeline. Initially db1 was in timeline 35 (0x23) and did write the following files to archive:

0000002300053C55000000AE   (TL is 23 and archive at 53C55/AE)
..
0000002300053C5600000043   (TL is 23 and archive at 53C56/43)

db1 is stopped at this point. db3's logs show this:

received promote request
redo done at 53C56/A3FFC8C0
selected new timeline ID: 36
database system is ready to accept connections

and db3 copies the following files to archives

00000024.history
0000002300053C56000000A3.partial
0000002400053C56000000A3
0000002200053C55000000AE

As db3 became leader, db2 starts the process to become replica to db3 (which it successfully becomes) and here is the summary of the logs:

restored log file "00000024.history" from archive
restored log file "0000002300053C55000000AE" from archive
restored log file "00000023.history" from archive
redo starts at 53C55/AE0001A8
restored log file "0000002300053C55000000AF" from archive
...
restored log file "0000002300053C5600000042" from archive
Retrieving 0000002300053C5600000043 to pg_xlog/RECOVERYXLOG FAILED  (log by pgrsync)
...
Retrieving 0000002300053C56000000A2 to pg_xlog/RECOVERYXLOG FAILED
restored log file "0000002400053C56000000A3" from archive
consistent recovery state reached at 53C56/A3FFE900

db1 is started now and here are the logs:

LOG: database system was interrupted while in recovery at log time 2021-01-28 04:08:01 UTC 
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target
LOG: invalid record length at 53C55/AE0001E0: wanted 24, got 0
LOG: started streaming WAL from primary at 53C55/AE000000 on timeline 36
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 0000002400053C55000000AE has already been removed
... and repeats this over and over

Points to note: 1. 0000002400053C55000000AE was never written to archives by any Postgres node. The old leader (db1) copied the archive 0000002300053C55000000AE (note: 0023, not 0024) before it was stopped. 2. The new leader (db3) copied 0000002200053C55000000AE (note: 0022, not 0024) 3. max_wal_size is set to 1024 on all nodes. 4. After db3 became the new leader, there was hardly any activity on the nodes. db3 only writes WAL files every 10 mins (archive_timeout=600s). Questions: 1. Is there any thing wrong in the configuration that makes the the old leader asking for a WAL segment, which the new leader does not have? 2. How to restore the old leader (db1) at this state, without having to erase and start over? Lots of disk space available. The problem can be simulated at will. Had tried pg_rewind on the old primary, pointing to new primary. It just said it is already on the same timeline (not exact words). Note: It was not an error message. But even after that, it was showing the same error, when starting Postgres. We are on 9.6.19. Just a few days back 9.6.21 (and last 9.5.x release) was announced which exactly points out fix for this specific problem (Fix WAL-reading logic so that standbys can handle timeline switches correctly. This issue could have shown itself with errors like "requested WAL segment has already been removed".). However, even after upgrading to 9.6.21, the same problem exists for us.

rsmoorthy (157 rep)

Jan 28, 2021, 05:51 AM • Last activity: Feb 26, 2023, 09:04 AM

0 votes

1 answers

9526 views

Issue with Patroni - postgres not starting

postgresql patroni

I am learning Postgres and I have installed 3 VMs as follows: node1 - Installed Postgres and etcd here node2 - Installed Postgres and etcd here node3 - Installed etcd here I have setup streaming replication between node1 and node2 and it worked fine (even etcd health showed as fine for all 3 nodes)....

                                  I am learning Postgres and I have installed 3 VMs as follows:

node1 - Installed Postgres and etcd here

node2 - Installed Postgres and etcd here

node3 - Installed etcd here

I have setup streaming replication between node1 and node2 and it worked fine (even etcd health showed as fine for all 3 nodes).

Then I installed and configured Patroni on node1 and node2 and when I first started it node2 was promoted to master and node1 was acting as a slave. At the time I could access psql from both nodes but a change (I inserted a row in a table) I did in the master was not propagated to the slave.

I then stopped Patroni on node1 (slave) and restarted it but postgres on the slave is not starting and I cannot access psql on the slave either now. I don't know whether I should be able to access the DB from the slave or not when using Patroni (I could with streaming replication), i.e. should the postgres service remain switched off by design on the slave when you have Patroni? As from my understanding if it finds that there is a master holding a lock, postgres on the slave will fail to start.

I then stopped postgres on master as I thought that it will failover to node 1 (and node1 to become a master) but no automatic failover occurred. Furthermore, postgres service is not starting on either now (not even on node2 - but I can connect using psql on node2 - which still seems to be the master). Should postgres service be showing as running on the master or it also fails to start by design and one should only check the status of Patroni? Patroni is active and running on both nodes.

When I try to start postgres on node2 (the node holding the lock):

Log on node1 (the slave):

etcd is working fine:

From my research I found that pg_rewind might help to get the slave in sync with my master, however, I'm not sure whether that is even my issue.

What do I need to do to fix this situation? 
1. I mainly need to start postgres on the master (I have no idea how I can use psql but postgres service is failing to start on the master).
2. I need the slave to be able to promote itself to master (through Patroni), if the master fails.
 
My config files seem fine (especially since it was working fine prior to trying to switch off one instance for a failover to occur)

user1930901 (349 rep)

May 27, 2021, 06:05 PM • Last activity: Jan 7, 2023, 01:01 PM

2 votes

1 answers

1318 views

High Available PostgreSQL with Patroni and Replication

postgresql master-slave-replication patroni

I want to build a highly available PostgreSQL cluster using patroni with etcd. I want to have 3 server (1 master and 2 slaves) in my cluster, so if something goes wrong one of the slaves automatically takes over the master role. Only the master server should accept write operators, the slaves are on...

                                  I want to build a highly available PostgreSQL cluster using patroni with etcd. I want to have 3 server (1 master and 2 slaves) in my cluster, so if something goes wrong one of the slaves automatically takes over the master role. Only the master server should accept write operators, the slaves are only for safety and read operations.

On the other hand i also need approximately 10 servers with an up-to-date copy of the data on the cluster.
These servers are only there to read data and none of them ever can be promoted to a master server. They shouldn't even be part of the high available cluster.
They are just supposed to get the data using streaming replication etc.

Is such a setup even possible? Can i use replication to other servers when they are part of a patroni cluster?
                                

Tobias (21 rep)

Sep 1, 2020, 10:34 AM • Last activity: Aug 4, 2022, 12:06 PM

1 votes

2 answers

1053 views

PostgreSQL-Patroni-Cluster: Installing oracle_fdw, dependency problem

postgresql environment-variables patroni oracle-fdw

For Oracle to PostgreSQL migrations we're currently looking into ora2pg. However, the performance using either writes to file or direct read from Oracle and write to PostgreSQL have both been very underwhelming. There's not really good information about that, while some parts I found suggested that...

                                  For Oracle to PostgreSQL migrations we're currently looking into ora2pg. However, the performance using either writes to file or direct read from Oracle and write to PostgreSQL have both been very underwhelming. There's not really good information about that, while some parts I found suggested that it could be related to the Oracle-Perl-Driver.

In a new version of ora2pg it introduced a new way of converting the data using the Foreign Data Wrapper oracle_fdw. It is written in C and performance benchmarks seem to suggest a much higher performance. For that, we would like to use oracle_fdw. The target for our conversion is a Patroni Cluster setup with 2 nodes and a 3rd witness node using etcd.

However, adding oracle_fdw to the Patroni cluster has been difficult and so far unsuccessful. After compiling oracle_fdw, trying to load it gives the following message:

> ERROR:  could not load library "/usr/lib/postgresql14/lib64/oracle_fdw.so": libclntsh.so.19.1: cannot open shared object file: No such file or directory

That library exists on the system, but in the directory of the Oracle Instantclient (/dbdata/oracle/instantclient_19_12). For the library to be found, we think we would need the following standard environment variables added to the postgres process that Patroni starts:

    export LD_LIBRARY_PATH=/dbdata/oracle/instantclient_19_12
    export ORACLE_HOME=/dbdata/oracle/instantclient_19_12
    export PATH=/dbdata/oracle/instantclient_19_12:$PATH

However, we have been wholly unsuccessful to inject these variables into the postgress process started by Patroni. The process is started using the system user postgres, as seen here:

    dbserver01 root 6 (/dbdata/oracle/instantclient_19_12): ps -ef|grep postgres
    root      9236  9120  0 07:56 pts/0    00:00:00 su - postgres
    postgres  9238  9236  0 07:56 pts/0    00:00:00 -bash
    postgres  9289  9238  0 07:56 pts/0    00:00:00 psql
    postgres  9290 27443  0 07:56 ?        00:00:00 postgres: pgcluster2: postgres postgres [local] idle
    root      9307  9180  0 07:58 pts/1    00:00:00 grep --color=auto postgres
    postgres 27423     1  0 Mar23 ?        00:01:41 /opt/patroni/bin/python3 /usr/bin/patroni /etc/patroni/postgres.yml
    postgres 27443     1  0 Mar23 ?        00:00:04 postgres -D /dbdata/pgcluster --config-file=/dbdata/pgcluster/postgresql.conf --listen_addresses=0.0.0.0 --port=5432 --cluster_name=pgcluster2 --wal_level=replica --hot_standby=on --max_connections=100 --max_wal_senders=10 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=off --max_replication_slots=10 --max_worker_processes=8 --wal_log_hints=on

The following methods have been tried:

* Adding the export statements to the .bashrc of the postgres user
* Adding the export statements to the .bash_profile of the postgres user
* Adding the export statements to the global /etc/profile of the server
* Adding the /dbdata/oracle/instantclient_19_12/ path to the /etc/ld.so.conf

Each time Patroni was restarted, and so was Postgres (checked by the ever changing PID), yet the environment always looks like this:

    dbserver01 root 7 (/dbdata/oracle/instantclient_19_12): cat /proc/27443/environ | xargs -0 -n 1
    LANG=en_US.UTF-8
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    HOME=/var/lib/pgsql
    LOGNAME=postgres
    USER=postgres
    SHELL=/bin/bash
    INVOCATION_ID=3806a21dfa52455baad431cc2bbfa533
    JOURNAL_STREAM=9:5426402
    dbserver01 root 8 (/dbdata/oracle/instantclient_19_12):

The Patroni documentation has loads of information about Patroni specific environment variables, yet I have so far not found any documentation that helps pass variables to the Postgres process.

Copying the dependencies around might work around the problem, however, adding the variables would be the preferred way of fixing the problem. Is there a right or supported way to add environment variables to the postgres process started by Patroni?

Edit 1: The installation process was done using a checkout of the repository  with the root user, after setting the above mentioned exports, then compiled using make and make install. Here's the output of both commands currently:

    dbserver01 root 25 (/tmp/oracle_fdw): make
    make: Nothing to be done for 'all'.
    goeccdb11l root 26 (/tmp/oracle_fdw): make install
    /usr/bin/mkdir -p '/usr/lib/postgresql14/lib64'
    /usr/bin/mkdir -p '/usr/share/postgresql14/extension'
    /usr/bin/mkdir -p '/usr/share/postgresql14/extension'
    /usr/bin/mkdir -p '/usr/share/doc/packages/postgresql14/extension'
    /usr/bin/install -c -m 755  oracle_fdw.so '/usr/lib/postgresql14/lib64/oracle_fdw.so'
    /usr/bin/install -c -m 644 .//oracle_fdw.control '/usr/share/postgresql14/extension/'
    /usr/bin/install -c -m 644 .//oracle_fdw--1.2.sql .//oracle_fdw--1.0--1.1.sql .//oracle_fdw--1.1--1.2.sql  '/usr/share/postgresql14/extension/'
    /usr/bin/install -c -m 644 .//README.oracle_fdw '/usr/share/doc/packages/postgresql14/extension/'
    dbserver01 root 27 (/tmp/oracle_fdw): ls -la /usr/lib/postgresql14/lib64

Edit 2: I've tried the installation again, using make clean and the process of make and make install again:

    dbserver01 root 11 (/tmp/oracle_fdw): make clean
    rm -f oracle_fdw.so   liboracle_fdw.a  liboracle_fdw.pc
    rm -f oracle_fdw.o oracle_utils.o oracle_gis.o oracle_fdw.bc oracle_utils.bc oracle_gis.bc
    rm -rf results/ regression.diffs regression.out tmp_check/ tmp_check_iso/ log/ output_iso/
    dbserver01 root 12 (/tmp/oracle_fdw): export LD_LIBRARY_PATH=/dbdata/oracle/instantclient_19_12
    dbserver01 root 13 (/tmp/oracle_fdw): export ORACLE_HOME=/dbdata/oracle/instantclient_19_12
    dbserver01 root 14 (/tmp/oracle_fdw): export PATH=/dbdata/oracle/instantclient_19_12:$PATH
    dbserver01 root 15 (/tmp/oracle_fdw): make
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fPIC -I"/dbdata/oracle/instantclient_19_12/sdk/include" -I"/dbdata/oracle/instantclient_19_12/oci/include" -I"/dbdata/oracle/instantclient_19_12/rdbms/public" -I"/dbdata/oracle/instantclient_19_12/" -I/usr/include/oracle/21/client64 -I/usr/include/oracle/19.14/client64 -I/usr/include/oracle/19.12/client64 -I/usr/include/oracle/19.12/client -I/usr/include/oracle/19.11/client64 -I/usr/include/oracle/19.11/client -I/usr/include/oracle/19.10/client64 -I/usr/include/oracle/19.10/client -I/usr/include/oracle/19.9/client -I/usr/include/oracle/19.9/client64 -I/usr/include/oracle/19.8/client -I/usr/include/oracle/19.8/client64 -I/usr/include/oracle/19.6/client -I/usr/include/oracle/19.6/client64 -I/usr/include/oracle/19.3/client -I/usr/include/oracle/19.3/client64 -I/usr/include/oracle/18.5/client -I/usr/include/oracle/18.5/client64 -I/usr/include/oracle/18.3/client -I/usr/include/oracle/18.3/client64 -I/usr/include/oracle/12.2/client -I/usr/include/oracle/12.2/client64 -I/usr/include/oracle/12.1/client -I/usr/include/oracle/12.1/client64 -I/usr/include/oracle/11.2/client -I/usr/include/oracle/11.2/client64 -I. -I./ -I/usr/include/pgsql/server -I/usr/include/pgsql/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o oracle_fdw.o oracle_fdw.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fPIC -I"/dbdata/oracle/instantclient_19_12/sdk/include" -I"/dbdata/oracle/instantclient_19_12/oci/include" -I"/dbdata/oracle/instantclient_19_12/rdbms/public" -I"/dbdata/oracle/instantclient_19_12/" -I/usr/include/oracle/21/client64 -I/usr/include/oracle/19.14/client64 -I/usr/include/oracle/19.12/client64 -I/usr/include/oracle/19.12/client -I/usr/include/oracle/19.11/client64 -I/usr/include/oracle/19.11/client -I/usr/include/oracle/19.10/client64 -I/usr/include/oracle/19.10/client -I/usr/include/oracle/19.9/client -I/usr/include/oracle/19.9/client64 -I/usr/include/oracle/19.8/client -I/usr/include/oracle/19.8/client64 -I/usr/include/oracle/19.6/client -I/usr/include/oracle/19.6/client64 -I/usr/include/oracle/19.3/client -I/usr/include/oracle/19.3/client64 -I/usr/include/oracle/18.5/client -I/usr/include/oracle/18.5/client64 -I/usr/include/oracle/18.3/client -I/usr/include/oracle/18.3/client64 -I/usr/include/oracle/12.2/client -I/usr/include/oracle/12.2/client64 -I/usr/include/oracle/12.1/client -I/usr/include/oracle/12.1/client64 -I/usr/include/oracle/11.2/client -I/usr/include/oracle/11.2/client64 -I. -I./ -I/usr/include/pgsql/server -I/usr/include/pgsql/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o oracle_utils.o oracle_utils.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fPIC -I"/dbdata/oracle/instantclient_19_12/sdk/include" -I"/dbdata/oracle/instantclient_19_12/oci/include" -I"/dbdata/oracle/instantclient_19_12/rdbms/public" -I"/dbdata/oracle/instantclient_19_12/" -I/usr/include/oracle/21/client64 -I/usr/include/oracle/19.14/client64 -I/usr/include/oracle/19.12/client64 -I/usr/include/oracle/19.12/client -I/usr/include/oracle/19.11/client64 -I/usr/include/oracle/19.11/client -I/usr/include/oracle/19.10/client64 -I/usr/include/oracle/19.10/client -I/usr/include/oracle/19.9/client -I/usr/include/oracle/19.9/client64 -I/usr/include/oracle/19.8/client -I/usr/include/oracle/19.8/client64 -I/usr/include/oracle/19.6/client -I/usr/include/oracle/19.6/client64 -I/usr/include/oracle/19.3/client -I/usr/include/oracle/19.3/client64 -I/usr/include/oracle/18.5/client -I/usr/include/oracle/18.5/client64 -I/usr/include/oracle/18.3/client -I/usr/include/oracle/18.3/client64 -I/usr/include/oracle/12.2/client -I/usr/include/oracle/12.2/client64 -I/usr/include/oracle/12.1/client -I/usr/include/oracle/12.1/client64 -I/usr/include/oracle/11.2/client -I/usr/include/oracle/11.2/client64 -I. -I./ -I/usr/include/pgsql/server -I/usr/include/pgsql/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o oracle_gis.o oracle_gis.c
    gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g -fPIC -shared -o oracle_fdw.so oracle_fdw.o oracle_utils.o oracle_gis.o -L/usr/lib/postgresql14/lib64   -L/usr/lib64  -Wl,--as-needed  -L"/dbdata/oracle/instantclient_19_12/" -L"/dbdata/oracle/instantclient_19_12/bin" -L"/dbdata/oracle/instantclient_19_12/lib" -L"/dbdata/oracle/instantclient_19_12/lib/amd64" -lclntsh -L/usr/lib/oracle/21/client64/lib -L/usr/lib/oracle/19.14/client64/lib -L/usr/lib/oracle/19.12/client64/lib -L/usr/lib/oracle/19.12/client/lib -L/usr/lib/oracle/19.11/client64/lib -L/usr/lib/oracle/19.11/client/lib -L/usr/lib/oracle/19.10/client64/lib -L/usr/lib/oracle/19.10/client/lib -L/usr/lib/oracle/19.9/client/lib -L/usr/lib/oracle/19.9/client64/lib -L/usr/lib/oracle/19.8/client/lib -L/usr/lib/oracle/19.8/client64/lib -L/usr/lib/oracle/19.6/client/lib -L/usr/lib/oracle/19.6/client64/lib -L/usr/lib/oracle/19.3/client/lib -L/usr/lib/oracle/19.3/client64/lib -L/usr/lib/oracle/18.5/client/lib -L/usr/lib/oracle/18.5/client64/lib -L/usr/lib/oracle/18.3/client/lib -L/usr/lib/oracle/18.3/client64/lib -L/usr/lib/oracle/12.2/client/lib -L/usr/lib/oracle/12.2/client64/lib -L/usr/lib/oracle/12.1/client/lib -L/usr/lib/oracle/12.1/client64/lib -L/usr/lib/oracle/11.2/client/lib -L/usr/lib/oracle/11.2/client64/lib
    dbserver01 root 16 (/tmp/oracle_fdw): make install
    /usr/bin/mkdir -p '/usr/lib/postgresql14/lib64'
    /usr/bin/mkdir -p '/usr/share/postgresql14/extension'
    /usr/bin/mkdir -p '/usr/share/postgresql14/extension'
    /usr/bin/mkdir -p '/usr/share/doc/packages/postgresql14/extension'
    /usr/bin/install -c -m 755  oracle_fdw.so '/usr/lib/postgresql14/lib64/oracle_fdw.so'
    /usr/bin/install -c -m 644 .//oracle_fdw.control '/usr/share/postgresql14/extension/'
    /usr/bin/install -c -m 644 .//oracle_fdw--1.2.sql .//oracle_fdw--1.0--1.1.sql .//oracle_fdw--1.1--1.2.sql  '/usr/share/postgresql14/extension/'
    /usr/bin/install -c -m 644 .//README.oracle_fdw '/usr/share/doc/packages/postgresql14/extension/'
    dbserver01 root 17 (/tmp/oracle_fdw):

Edit 3: Added a 4th method to get the path into the postgres user process.
                                

0xCAFEBABE (217 rep)

Mar 25, 2022, 07:24 AM • Last activity: Mar 28, 2022, 08:07 AM

1 votes

1 answers

118 views

Many PG Clusters on more than one machine

postgresql repmgr patroni

I want to run ~10k PostgreSQL clusters. Each cluster will be quite small with moderate traffic. I have will have roughly 100 virtual machines. Each VM should run roughly 100 clusters. Each cluster should run in one container. 100 VMs. Each running 100 containers. One cluster per container --> 10k cl...

                                  I want to run ~10k PostgreSQL clusters.

Each cluster will be quite small with moderate traffic.

I have will have roughly 100 virtual machines. Each VM should run roughly 100 clusters. Each cluster should run in one container.

100 VMs. Each running 100 containers. One cluster per container --> 10k clusters

Features which are not available from PG out-of-the-box:

* If I create a new cluster I need to check: Which VM has room for a new cluster?
* I want to be able to move a cluster to a different VM

Before I re-invent the wheel: Is there already a PostgreSQL-way to handle such a scenario?

Does it make sense to use a [Kubernetes Postgres-Operator](https://blog.flant.com/comparing-kubernetes-operators-for-postgresql/)  or Patroni?

Background: It will be some kind of open source learning/teaching environment. Each user should get full database superuser permissions. That's why it is not enough to give each user a database. Each user needs a cluster. At least that's my current impression. Please leave a comment if you have an idea for a different architecture.

guettli (1591 rep)

Jan 6, 2022, 07:30 AM • Last activity: Jan 6, 2022, 07:15 PM

1 votes

1 answers

557 views

Could Barman backup only one database from PostgreSQL?

postgresql backup barman patroni

In my system, I have postgresql 12 and barman v2.13. I have a few backup disk around 1 TB and i have currently added new DB which has 5 TB. Previously my PostgreSQL had no more than 400 GB. i think Barman can not compress backup DB, right? So could i take backup only specific DBs? Is pgbacktress sim...

                                  In my system, I have postgresql 12 and barman v2.13. I have a few backup disk around 1 TB and i have currently added new DB which has 5 TB. Previously my PostgreSQL had no more than 400 GB.

i think Barman can not compress backup DB, right? So could i take backup only specific DBs? Is pgbacktress similar?

Melih (284 rep)

Aug 1, 2021, 10:32 PM • Last activity: Nov 22, 2021, 05:28 PM

1 votes

1 answers

712 views

PostgreSQL process stuck in a "forked new backend" loop when used with Patroni

postgresql patroni

I have a 3 node Patroni2.1.1 + PGSQL13.4 cluster and one of the nodes is acting weird once it finished the basebackup bootstrap. I have another identical node working fine when put thru the same process. Basically once Patroni tries to connect to localhost:5432 the server seems to die and refork (fr...

                                  I have a 3 node Patroni2.1.1 + PGSQL13.4 cluster and one of the nodes is acting weird once it finished the basebackup bootstrap. I have another identical node working fine when put thru the same process. 

Basically once Patroni tries to connect to localhost:5432 the server seems to die and refork (from my understanding). You'll see below the "loop" starting at connection received: host=::1 port=XXXXX.

    2021-11-01 12:01:43.395 EDT  LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
    2021-11-01 12:01:43.396 EDT  LOG:  listening on IPv6 address "::1", port 5432
    2021-11-01 12:01:43.396 EDT  LOG:  listening on IPv4 address "127.0.0.1", port 5432
    2021-11-01 12:01:43.397 EDT  LOG:  listening on IPv4 address "192.168.23.13", port 5432
    2021-11-01 12:01:43.398 EDT  LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
    2021-11-01 12:01:43.401 EDT  LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
    2021-11-01 12:01:43.405 EDT  LOG:  database system was interrupted; last known up at 2021-11-01 12:01:25 EDT
    2021-11-01 12:01:43.405 EDT  DEBUG:  removing all temporary WAL segments
    2021-11-01 12:01:43.436 EDT  DEBUG:  executing restore command "cp /var/lib/pgsql/13/archive/00000077.history pg_wal/RECOVERYHISTORY"
    cp: cannot stat '/var/lib/pgsql/13/archive/00000077.history': No such file or directory
    2021-11-01 12:01:43.440 EDT  DEBUG:  could not restore file "00000077.history" from archive: child process exited with exit code 1
    2021-11-01 12:01:43.441 EDT  LOG:  entering standby mode
    2021-11-01 12:01:43.441 EDT  DEBUG:  backup time 2021-11-01 12:01:25 EDT in file "backup_label"
    2021-11-01 12:01:43.441 EDT  DEBUG:  backup label pg_basebackup base backup in file "backup_label"
    2021-11-01 12:01:43.441 EDT  DEBUG:  backup timeline 118 in file "backup_label"
    2021-11-01 12:01:43.441 EDT  DEBUG:  executing restore command "cp /var/lib/pgsql/13/archive/00000076.history pg_wal/RECOVERYHISTORY"
    2021-11-01 12:01:43.538 EDT  LOG:  restored log file "00000076.history" from archive
    2021-11-01 12:01:43.540 EDT  DEBUG:  executing restore command "cp /var/lib/pgsql/13/archive/000000760000000100000000 pg_wal/RECOVERYXLOG"
    2021-11-01 12:01:43.689 EDT  LOG:  restored log file "000000760000000100000000" from archive
    2021-11-01 12:01:43.692 EDT  DEBUG:  got WAL segment from archive
    2021-11-01 12:01:43.693 EDT  DEBUG:  checkpoint record is at 1/60
    2021-11-01 12:01:43.693 EDT  DEBUG:  redo record is at 1/28; shutdown false
    2021-11-01 12:01:43.693 EDT  DEBUG:  next transaction ID: 531; next OID: 24606
    2021-11-01 12:01:43.693 EDT  DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
    2021-11-01 12:01:43.693 EDT  DEBUG:  oldest unfrozen transaction ID: 478, in database 1
    2021-11-01 12:01:43.693 EDT  DEBUG:  oldest MultiXactId: 1, in database 1
    2021-11-01 12:01:43.693 EDT  DEBUG:  commit timestamp Xid oldest/newest: 0/0
    2021-11-01 12:01:43.693 EDT  DEBUG:  transaction ID wrap limit is 2147484125, limited by database with OID 1
    2021-11-01 12:01:43.693 EDT  DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
    2021-11-01 12:01:43.693 EDT  DEBUG:  starting up replication slots
    2021-11-01 12:01:43.693 EDT  DEBUG:  starting up replication origin progress state
    2021-11-01 12:01:43.697 EDT  DEBUG:  resetting unlogged relations: cleanup 1 init 0
    2021-11-01 12:01:43.698 EDT  DEBUG:  initializing for hot standby
    2021-11-01 12:01:43.698 EDT  DEBUG:  my backend ID is 1
    2021-11-01 12:01:43.698 EDT  LOG:  redo starts at 1/28
    2021-11-01 12:01:43.698 EDT  DEBUG:  prune KnownAssignedXids to 531
    2021-11-01 12:01:43.698 EDT  CONTEXT:  WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
    2021-11-01 12:01:43.698 EDT  DEBUG:  0 KnownAssignedXids (num=0 tail=0 head=0) 
    2021-11-01 12:01:43.698 EDT  CONTEXT:  WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
    2021-11-01 12:01:43.698 EDT  DEBUG:  recovery snapshots are now enabled
    2021-11-01 12:01:43.698 EDT  CONTEXT:  WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
    2021-11-01 12:01:43.698 EDT  DEBUG:  prune KnownAssignedXids to 531
    2021-11-01 12:01:43.698 EDT  CONTEXT:  WAL redo at 1/D8 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
    2021-11-01 12:01:43.698 EDT  DEBUG:  end of backup reached
    2021-11-01 12:01:43.698 EDT  CONTEXT:  WAL redo at 1/110 for XLOG/BACKUP_END: 1/28
    2021-11-01 12:01:43.699 EDT  DEBUG:  checkpointer updated shared memory configuration values
    2021-11-01 12:01:43.700 EDT  LOG:  consistent recovery state reached at 1/138
    2021-11-01 12:01:43.700 EDT  DEBUG:  executing restore command "cp /var/lib/pgsql/13/archive/000000760000000100000001 pg_wal/RECOVERYXLOG"
    2021-11-01 12:01:43.700 EDT  LOG:  database system is ready to accept read only connections
    cp: cannot stat '/var/lib/pgsql/13/archive/000000760000000100000001': No such file or directory
    2021-11-01 12:01:43.704 EDT  DEBUG:  could not restore file "000000760000000100000001" from archive: child process exited with exit code 1
    2021-11-01 12:01:43.704 EDT  DEBUG:  could not open file "pg_wal/000000760000000100000001": No such file or directory
    2021-11-01 12:01:43.704 EDT  DEBUG:  switched WAL source from archive to stream after failure
    2021-11-01 12:01:43.705 EDT  DEBUG:  find_in_dynamic_libpath: trying "/usr/pgsql-13/lib/libpqwalreceiver"
    2021-11-01 12:01:43.706 EDT  DEBUG:  find_in_dynamic_libpath: trying "/usr/pgsql-13/lib/libpqwalreceiver.so"
    2021-11-01 12:01:44.370 EDT  DEBUG:  forked new backend, pid=242938 socket=12
    2021-11-01 12:01:44.371 EDT  LOG:  connection received: host=::1 port=36714
    2021-11-01 12:01:51.235 EDT  DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
    2021-11-01 12:01:51.235 EDT  DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
    2021-11-01 12:01:51.235 EDT  DEBUG:  proc_exit(0): 1 callbacks to make
    2021-11-01 12:01:51.235 EDT  DEBUG:  exit(0)
    2021-11-01 12:01:51.235 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:01:51.235 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:01:51.235 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:01:51.236 EDT  DEBUG:  forked new backend, pid=242941 socket=12
    2021-11-01 12:01:51.237 EDT  LOG:  connection received: host=127.0.0.1 port=43428
    2021-11-01 12:01:51.237 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:01:51.237 EDT  DEBUG:  server process (PID 242938) exited with exit code 0
    2021-11-01 12:01:58.284 EDT  DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
    2021-11-01 12:01:58.284 EDT  DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
    2021-11-01 12:01:58.284 EDT  DEBUG:  proc_exit(0): 1 callbacks to make
    2021-11-01 12:01:58.284 EDT  DEBUG:  exit(0)
    2021-11-01 12:01:58.284 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:01:58.284 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:01:58.284 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:01:58.285 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:01:58.285 EDT  DEBUG:  server process (PID 242941) exited with exit code 0
    2021-11-01 12:01:59.295 EDT  DEBUG:  forked new backend, pid=242944 socket=12
    2021-11-01 12:01:59.295 EDT  LOG:  connection received: host=::1 port=36758
    2021-11-01 12:02:06.301 EDT  DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:06.301 EDT  DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:06.301 EDT  DEBUG:  proc_exit(0): 1 callbacks to make
    2021-11-01 12:02:06.301 EDT  DEBUG:  exit(0)
    2021-11-01 12:02:06.301 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:06.301 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:06.301 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:06.301 EDT  DEBUG:  forked new backend, pid=242948 socket=12
    2021-11-01 12:02:06.302 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:02:06.302 EDT  DEBUG:  server process (PID 242944) exited with exit code 0
    2021-11-01 12:02:06.302 EDT  LOG:  connection received: host=127.0.0.1 port=43470
    2021-11-01 12:02:13.440 EDT  DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:13.440 EDT  DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:13.440 EDT  DEBUG:  proc_exit(0): 1 callbacks to make
    2021-11-01 12:02:13.440 EDT  DEBUG:  exit(0)
    2021-11-01 12:02:13.440 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:13.440 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:13.440 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:13.441 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:02:13.441 EDT  DEBUG:  server process (PID 242948) exited with exit code 0
    2021-11-01 12:02:14.148 EDT  DEBUG:  postmaster received signal 2
    2021-11-01 12:02:14.148 EDT  LOG:  received fast shutdown request
    2021-11-01 12:02:14.150 EDT  LOG:  aborting any active transactions
    2021-11-01 12:02:14.150 EDT  DEBUG:  shmem_exit(0): 1 before_shmem_exit callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  shmem_exit(0): 5 on_shmem_exit callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  proc_exit(0): 2 callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  exit(0)
    2021-11-01 12:02:14.150 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  shmem_exit(1): 1 before_shmem_exit callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  shmem_exit(1): 7 on_shmem_exit callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  removing all KnownAssignedXids
    2021-11-01 12:02:14.150 EDT  DEBUG:  release all standby locks
    2021-11-01 12:02:14.150 EDT  DEBUG:  proc_exit(1): 2 callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  exit(1)
    2021-11-01 12:02:14.150 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:14.150 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:14.151 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:02:14.151 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:02:14.449 EDT  DEBUG:  forked new backend, pid=242954 socket=12
    2021-11-01 12:02:14.450 EDT  LOG:  connection received: host=::1 port=36798
    2021-11-01 12:02:15.915 EDT  FATAL:  terminating walreceiver process due to administrator command
    2021-11-01 12:02:15.915 EDT  DEBUG:  shmem_exit(1): 1 before_shmem_exit callbacks to make
    2021-11-01 12:02:15.915 EDT  DEBUG:  shmem_exit(1): 6 on_shmem_exit callbacks to make
    2021-11-01 12:02:15.915 EDT  DEBUG:  proc_exit(1): 2 callbacks to make
    2021-11-01 12:02:15.915 EDT  DEBUG:  exit(1)
    2021-11-01 12:02:15.916 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:15.916 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:15.916 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:15.917 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:02:15.917 EDT  LOG:  shutting down
    2021-11-01 12:02:15.917 EDT  DEBUG:  skipping restartpoint, already performed at 1/28
    2021-11-01 12:02:15.919 EDT  DEBUG:  updated min recovery point to 1/1000000 on timeline 118
    2021-11-01 12:02:15.920 EDT  DEBUG:  shmem_exit(0): 1 before_shmem_exit callbacks to make
    2021-11-01 12:02:15.920 EDT  DEBUG:  shmem_exit(0): 5 on_shmem_exit callbacks to make
    2021-11-01 12:02:15.920 EDT  DEBUG:  proc_exit(0): 2 callbacks to make
    2021-11-01 12:02:15.920 EDT  DEBUG:  exit(0)
    2021-11-01 12:02:15.920 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:15.920 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:15.920 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:15.921 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:02:15.921 EDT  DEBUG:  writing stats file "pg_stat/global.stat"
    2021-11-01 12:02:15.921 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:15.921 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:15.921 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:15.922 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:02:17.890 EDT  DEBUG:  postmaster received signal 2
    2021-11-01 12:02:18.617 EDT  DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:18.617 EDT  DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:18.617 EDT  DEBUG:  proc_exit(0): 1 callbacks to make
    2021-11-01 12:02:18.617 EDT  DEBUG:  exit(0)
    2021-11-01 12:02:18.617 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:18.617 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:18.617 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:18.618 EDT  DEBUG:  reaping dead processes
    2021-11-01 12:02:18.618 EDT  DEBUG:  server process (PID 242954) exited with exit code 0
    2021-11-01 12:02:18.618 EDT  DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:18.618 EDT  DEBUG:  shmem_exit(0): 5 on_shmem_exit callbacks to make
    2021-11-01 12:02:18.618 EDT  DEBUG:  cleaning up dynamic shared memory control segment with ID 2104592565
    2021-11-01 12:02:18.631 EDT  DEBUG:  proc_exit(0): 2 callbacks to make
    2021-11-01 12:02:18.631 EDT  LOG:  database system is shut down
    2021-11-01 12:02:18.631 EDT  DEBUG:  exit(0)
    2021-11-01 12:02:18.631 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:18.631 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:18.631 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make
    2021-11-01 12:02:18.633 EDT  DEBUG:  logger shutting down
    2021-11-01 12:02:18.633 EDT  DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:18.633 EDT  DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:18.633 EDT  DEBUG:  proc_exit(0): 0 callbacks to make
    2021-11-01 12:02:18.633 EDT  DEBUG:  exit(0)
    2021-11-01 12:02:18.633 EDT  DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
    2021-11-01 12:02:18.633 EDT  DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
    2021-11-01 12:02:18.633 EDT  DEBUG:  proc_exit(-1): 0 callbacks to make


If I stop Patroni, the server starts running fine and joins the replication as expected. The user postgres can connect just fine locally or remotely using the psql utility or even using pgadmin. 

Could someone explain what's going on here? I'm still learning PGSQL and I'm not quite comfortable with those logs yet... Thank you!

For more context you can also look at Patroni issue #2104  I posted.
                                

JulioQc (143 rep)

Nov 1, 2021, 04:20 PM • Last activity: Nov 3, 2021, 05:14 PM

Showing page 1 of 20 total questions