Sample Header Ad - 728x90

How to perform point-in-time recovery in PostgreSQL with PgBackRest and repmgr

1 vote
0 answers
449 views
I have such setup: * 2 VM with Rocky Linux 8 for PostgreSQL Primary + Standby setup * PostgreSQL 15 from official rpm repo * [repmgr](http://www.repmgr.org/) for replication * [PgBackRest](https://pgbackrest.org) for backups to S3 (minio instance) My goal is to find out how to restore to a specific backup. I performed such actions: 1. Setup test db with data from [pagila](https://github.com/devrimgunduz/pagila) 2. Make full backup with sudo -u postgres pgbackrest --stanza=cluster --log-level-console=info --type=full backup 3. Drop film table: drop table film cascade; At this point I want to restore to the backup.
[root@rocky2 ~]# sudo -u postgres pgbackrest --stanza=cluster info
stanza: cluster
    status: ok
    cipher: none

    db (current)
        wal archive min/max (15): 000000010000000000000001/00000003000000000000000B

        full backup: 20231228-165042F
            timestamp start/stop: 2023-12-28 16:50:42 / 2023-12-28 16:51:32
            wal start/stop: 000000010000000000000005 / 000000010000000000000006
            database size: 36.4MB, database backup size: 36.4MB
            repo1: backup set size: 4.8MB, backup size: 4.8MB

        full backup: 20231228-171910F
            timestamp start/stop: 2023-12-28 17:19:10 / 2023-12-28 17:20:04
            wal start/stop: 000000010000000000000009 / 000000010000000000000009
            database size: 47.7MB, database backup size: 47.7MB
            repo1: backup set size: 7.3MB, backup size: 7.3MB
1. Stop PostgreSQL on primary and standby: systemctl stop postgresql-15 2. Remove all data files: find /var/lib/pgsql/15/data -mindepth 1 -delete 3. Restore to the latest backup: sudo -u postgres pgbackrest --stanza=cluster restore 4. Start primary again Unfortunately PostgreSQL had applied all WAL records from S3, so film table does not exist anymore. I tried to perform point-in-time restore: 1. Take a loook at backup start timestamp 2. Restore to this time: sudo -u postgres pgbackrest --stanza=cluster --type=time --target="2023-12-28 17:19:10" restore 3. Start PostgreSQL on primary: systemctl start postgresql-15 I have such lines in PostgreSQL logs:
[16-1] user=,db=,app=,client= LOG:  restored log file "00000003000000000000000C" from archive
[17-1] user=,db=,app=,client= LOG:  recovery stopping before commit of transaction 1018, time 2023-12-28 18:14:58.468349==
[18-1] user=,db=,app=,client= LOG:  pausing at the end of recovery
[19-1] user=,db=,app=,client= HINT:  Execute pg_wal_replay_resume() to promote.
So PostgreSQL is in recovery mode:
demo=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)
And I execute pg_wal_replay_resume() like it is said in the logs. film table exists as expected. Then I try to attach standby:
sudo -u postgres /usr/pgsql-15/bin/repmgr -h 192.168.1.125 -U repmgr -d repmgr -f /etc/repmgr/15/repmgr.conf standby clone
systemctl start postgresql-15
sudo -u postgres /usr/pgsql-15/bin/repmgr standby register -F
When I check the status of the cluster it shows that something is wrong with standby:
[root@rocky2 ~]# sudo -u postgres /usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show
 ID | Name          | Role    | Status    | Upstream      | Location | Priority | Timeline | Connection string                         
----+---------------+---------+-----------+---------------+----------+----------+----------+--------------------------------------------
 1  | 192.168.1.125 | primary | * running |               | default  | 100      | 4        | host=192.168.1.125 user=repmgr user=repmgr
 2  | 192.168.1.126 | standby |   running | 192.168.1.125 | default  | 100      | 4        | host=192.168.1.126 user=repmgr user=repmgr

WARNING: following issues were detected
  - WAL replay is paused on node "192.168.1.126" (ID: 2) with WAL replay pending; this node cannot be manually promoted until WAL replay is resumed
What am I doing wrong? If I perform select pg_wal_replay_resume(); the standby is promoted to the primary, but I don't do so. So the question is: how to perform point-in-time recovery on the primary-slave setup properly?
Asked by 0e39bf7b (11 rep)
Dec 29, 2023, 08:22 AM
Last activity: Dec 29, 2023, 08:41 AM