Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
0
answers
12
views
wal-g/postgresql: How to turn continuous into punctual backups for sparsifying archival?
I would like to implement a backup scheme in which for the past week, I have daily full backups with complete WAL logs for point-in-time-recovery. Past a week, I would like to keep only daily snapshots (without PITR capabilities) for another week and then weekly snapshots for three months, monthly s...
I would like to implement a backup scheme in which for the past week, I have daily full backups with complete WAL logs for point-in-time-recovery. Past a week, I would like to keep only daily snapshots (without PITR capabilities) for another week and then weekly snapshots for three months, monthly snapshots for another year and permanent yearly snapshots.
The details of the schedule don't matter but I am wondering how I could truncate the archived WALs such that for every base backup I decide to keep according to the schedule, I have just enough WAL to reach from the base backup to the closest consistent database state.
I expect, I would need the sparsified daily full backups plus for each a relatively small amount of WAL archives.
Is this something, I can achieve with WAL-G?
Tilman Vogel
(101 rep)
Jul 30, 2025, 03:53 PM
4
votes
1
answers
238
views
Create WAL at interval, but only if data changed
I have a PostgreSQL 8.4 installation, and I have configured the `archive_timeout` setting to 5 minutes to improve the odds that I always have complete WAL file to apply in case of system failure. However, this creates a LOT of big files, even if nothing has changed! Is there a way to tell PostgreSQL...
I have a PostgreSQL 8.4 installation, and I have configured the
archive_timeout
setting to 5 minutes to improve the odds that I always have complete WAL file to apply in case of system failure.
However, this creates a LOT of big files, even if nothing has changed! Is there a way to tell PostgreSQL to only create these WAL files at the timeout interval IF something has changed?
TSG
(193 rep)
Mar 7, 2018, 11:06 PM
• Last activity: Jun 2, 2025, 03:09 PM
0
votes
0
answers
31
views
PostgreSQL Logical Replication Slot Lagging with Empty Tables—Why Does Setup Order Matter?
I’m running a **wal-consumer** pod in a PostgreSQL environment that streams WAL logs from a **replication slot**. However, I noticed that the order in which I set up my services affects whether the consumer works correctly or lags. Two Different Setup Orders, Two Different Outcomes: **Setup A (Cause...
I’m running a **wal-consumer** pod in a PostgreSQL environment that streams WAL logs from a **replication slot**. However, I noticed that the order in which I set up my services affects whether the consumer works correctly or lags.
Two Different Setup Orders, Two Different Outcomes:
**Setup A (Causes Lag)**
- Create the replication slot
- Run database migrations (creating/modifying tables)
- Create the publication
**Issue:** The replication slot starts lagging, and the wal-consumer does not consume WAL logs properly—even though no inserts, updates, or deletes are happening. The last confirmed flushed LSN stays where it was initiated.
**Setup B (Works as Expected)**
- Run database migrations.
- Create the publication
- Create the replication slot
**Outcome:** The wal-consumer streams WAL logs as expected with no lag.
**Why Does This Happen?**
The replication slot appears to accumulate WAL logs in Setup A, even when no data is changing. But when the publication is created first (as in Setup B), everything works smoothly.
**Why does the order of creating the replication slot matter?**
Does PostgreSQL handle WAL retention differently depending on whether a publication exists at the time the slot is created? Would love to understand what’s happening under the hood!
Nayan Pahuja
(1 rep)
Mar 28, 2025, 06:37 PM
1
votes
2
answers
717
views
How should I configure my postgresql.conf file to backup the database once per hour?
My goal is to back up the PostgreSQL 10 database on a Ubuntu 18.04 VPS once per hour. However, I clearly do not understand how to achieve this. Here is what I've done. I modify the postgresql.conf file in four ways: ```` wal_level = replica # minimal, replica, or logical archive_mode = on # enables...
My goal is to back up the PostgreSQL 10 database on a Ubuntu 18.04 VPS once per hour.
However, I clearly do not understand how to achieve this. Here is what I've done.
I modify the postgresql.conf file in four ways:
`
wal_level = replica # minimal, replica, or logical
archive_mode = on # enables archiving; off, on, or always
archive_command = 'cp %p /test/%f' # copy the database file to the /test directory
archive_timeout = 3600 # force a logfile segment switch after this
# number of seconds; 0 disables
`
These are the only entries in postgresql.conf that I've changed. AIUI, I'm saying: copy the current database file every hour to /test using the same filename as the database filename.
I then restart PostgreSQL using /etc/init.d/postgresql reload
(I've also tried
).
However, nothing ever copies to /test. Also, the /etc/postgresql/10/main/pg_wal directory does not contain any files.
What am I doing wrong? Maybe other entries in the file need to be changed? Thanks.
user203748
(111 rep)
Feb 18, 2020, 10:18 PM
• Last activity: Dec 9, 2024, 02:01 PM
2
votes
1
answers
1417
views
When are wal released or removed in postgresql?
I am relatively new to postgresql administration and I still have doubts about the operation or management of the wal files, specifically how these files are released or deleted, have the conception of Oracle which is what I know the most, where the redologs exist , which do a similar function, savi...
I am relatively new to postgresql administration and I still have doubts about the operation or management of the wal files, specifically how these files are released or deleted, have the conception of Oracle which is what I know the most, where the redologs exist , which do a similar function, saving the differences, but in the case of redologs, the administrator establishes the size, quantity and specific groups, now in postgresql it is not clear to me, wals are generated to record the activity (changes) in the database, if it has a lot of activity of this type, more wal is generated, collapsing the file system, when these wal files are deleted?
miguel ramires
(169 rep)
Jun 17, 2023, 04:36 AM
• Last activity: Nov 22, 2024, 08:08 PM
8
votes
2
answers
14893
views
PostgreSQL checkpoint log explained
I know what PostgreSQL checkpoint is and when it is happening. I need some additional information about the logs produced by the `log_checkpoints = on` parameter, so please explain some points of it to me: `2017-09-09 16:31:37 EEST [6428-6524] LOG: checkpoint complete: wrote 30057 buffers (22.9%); 0...
I know what PostgreSQL checkpoint is and when it is happening.
I need some additional information about the logs produced by the
log_checkpoints = on
parameter, so please explain some points of it to me:
2017-09-09 16:31:37 EEST [6428-6524] LOG: checkpoint complete: wrote 30057 buffers (22.9%); 0 transaction log file(s) added, 0 removed, 47 recycled; write=148.465 s, sync=34.339 s, total=182.814 s; sync files=159, longest=16.143 s, average=0.215 s
1. I know that 22.9% of shared buffers are written (I have 1024 MB shared_buffers
so that means 234 MB are written out).
2. I know that 47 WAL files are recycled, i.e., they are not needed anymore for crash recovery, because the real data from them is already on disk.
**Question A**. But what about write=148.465 s
and sync=34.339
? What is the difference? What is write
and why its time is far more than the fsync()
operation?
**Question B**. What are sync files
? Which files: WAL files? Why sync files
are 159, but there are only 47 recycled files? What is the relation between these?
Thank you!
inivanoff1
(183 rep)
Sep 9, 2017, 02:11 PM
• Last activity: Oct 28, 2024, 07:32 PM
1
votes
1
answers
772
views
How to purge old files from pg_xlogs directory
I have an archive command copying WAL files from pg_xlogs to my archive directory. However, I notice that the pg_xlogs directory is growing in size with (but is limiting itself to around 7 files). What controls this "7" value? Is there a risk it will grow to a huge number of files? How do I limit th...
I have an archive command copying WAL files from pg_xlogs to my archive directory. However, I notice that the pg_xlogs directory is growing in size with (but is limiting itself to around 7 files). What controls this "7" value? Is there a risk it will grow to a huge number of files?
How do I limit the number of log files, or even better, limit to no more than 48 hours worth? I found a similarly titled question here but it doesn't really apply. I'm using PostgreSQL version 8.4.20 (so settings/commands from v9 won't apply)
Can I just create a cron job that deletes files from pg_xlogs older than 48 hours? Will this crash/corrupt the pgsql server? Is there a better way?
Can I use my archive_command to delete the source file after moving to my archive directory?
TSG
(193 rep)
Feb 10, 2018, 11:15 PM
• Last activity: Oct 13, 2024, 08:07 PM
1
votes
1
answers
53
views
Database crash just before appending the checkpoint entry to write ahead log
- From what I read about WAL, its an append-only file where all the operations to the DB are written to before the operations are actually performed to the data. - There is also a concept of a "checkpoint" which is when the DB actually writes the data to disk from memory, and appends a special check...
- From what I read about WAL, its an append-only file where all the operations to the DB are written to before the operations are actually performed to the data.
- There is also a concept of a "checkpoint" which is when the DB actually writes the data to disk from memory, and appends a special checkpoint entry at the end of the WAL.
- Now if the DB crashes at any point, it can read the WAL starting from the latest checkpoint entry and redo all the subsequent operations.
- But how does the DB ensure that the checkpoint WAL entry and the actual flushing of the data to disk happen in a transactional way?
- What if the data is flushed but the DB crashes before the checkpoint entry is made in the WAL?
- Conversely, if the WAL is modified first, then what happens if the DB crashes after the checkpoint entry but before the data is actually flushed.
For example, consider the following case:
- We have a dummy table
Person(name, age, salary)
.
- It has an entry John, 25, 100
.
- At time T1, a new transaction arrives UPDATE Person SET salary += 100 WHERE name='John'
.
- Assume that before T1, all the data had been flushed and the checkpoint entry had been appended to WAL.
- Now after this transaction, the DB will first append the log with the exact transaction statement UPDATE Person SET salary += 100 WHERE name='John'
.
- Now the data become John, 25, 200
.
- Then after some time, lets say the DB decides to flush the data to disk at time T2.
- Then at time T3 (just after T2), the DB attempts to write the checkpoint entry to the WAL.
- However, before it could finish, there was a power failure between T2 and T3.
- Now when the DB restarts and tries to recover, it will notice that there is one transaction after the latest checkpoint and will try to execute that: UPDATE Person SET salary += 100 WHERE name='John'
- But since the transaction was already executed before the crash, this time the salary will take the value 300, although it should have been 200.
How does the DB prevent these redundant updates during the recovery?
Anmol Singh Jaggi
(113 rep)
Oct 2, 2024, 05:25 AM
• Last activity: Oct 2, 2024, 11:38 AM
1
votes
1
answers
3148
views
Delete WAL archive files
I have around 200Gb of **WAL archives**. Can I just delete them all without considering any database restoration? I am just asking whether it is safe to delete them all or at least 50% of them. I read a book that one must issue `pg_basebackup` and then delete WAL archives that are older than the "**...
I have around 200Gb of **WAL archives**. Can I just delete them all without considering any database restoration? I am just asking whether it is safe to delete them all or at least 50% of them.
I read a book that one must issue
pg_basebackup
and then delete WAL archives that are older than the "**.backup**" file. But, the postgres manual says that pg_basebackup
is for cluster backup. My database is not running on cluster mode. So, is pg_basebackup
still relevant for me ? If not, how do I do backup ? I want to implement PITR and may be need something like base backup.
I am using Postgres 12.
Thanks
padjee
(337 rep)
Sep 6, 2022, 04:36 PM
• Last activity: Jul 22, 2024, 08:04 AM
0
votes
0
answers
44
views
PostgreSQL Failure To Reinitialise Replica With pg_basebackup
I have a master-replica PostgreSQL 9.5 setup. I'm trying to reinitialize the replica by doing the following: - Stop replica service ( `/etc/init.d/postgresql stop` ) - Clear data directory ( `rm -rf /path/to/postgresql/data` ) - Start pg_basebackup ( `sudo pg_basebackup -h master-host -D /path/to/po...
I have a master-replica PostgreSQL 9.5 setup.
I'm trying to reinitialize the replica by doing the following:
- Stop replica service (
/etc/init.d/postgresql stop
)
- Clear data directory ( rm -rf /path/to/postgresql/data
)
- Start pg_basebackup ( sudo pg_basebackup -h master-host -D /path/to/postgresql/data/ --checkpoint=fast -X stream -R -P -v -U replication
)
- Start replica service ( /etc/init.d/postgresql start
)
Some details:
- It takes the pg_basebackup process to complete around 45mins.
- I'm using -X stream
to keep up with the changes happening on master while the backup is taking place.
- When the process has finished, the data directory in the replica is ~2GB smaller than the master's (743GB vs 745GB).
When starting the service, these are the output logs:
LOG: database system is shut down
LOG: database system was interrupted; last known up at 2024-03-18 20:48:03 UTC
LOG: entering standby mode
LOG: redo starts at 1820/41000098
[unknown]@[unknown] LOG: incomplete startup packet
LOG: consistent recovery state reached at 1820/4A8CB950
LOG: started streaming WAL from primary at 1820/4B000000 on timeline 1
postgres@template1 FATAL: the database system is starting up
postgres@template1 FATAL: the database system is starting up
The service is started but any interaction with it results in the fatal error: database system is starting up
Am I missing something in the reinitialisation of the replica? What is causing this behaviour?
Aaron Ullal
(111 rep)
Mar 19, 2024, 07:43 AM
1
votes
1
answers
454
views
How to create a pg_basebackup for without streaming?
I have a primary and a replica Postgres database that are never connected to each other. We have another layer of technology to store a base backup and the WAL files to and those will get synced to the other server eventually. On the primary db, I set up the WAL files this way: ```none wal_level = l...
I have a primary and a replica Postgres database that are never connected to each other. We have another layer of technology to store a base backup and the WAL files to and those will get synced to the other server eventually.
On the primary db, I set up the WAL files this way:
wal_level = logical
fsync = on
synchronous_commit = local
wal_sync_method = fsync
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/data/wal/%f && cp %p /var/lib/postgresql/data/wal/%f'
archive_timeout = 3600
This works and every hour a new WAL file gets created. Now I'm trying to create a base backup because I believe WAL files don't work by themselves. They need an initial base backup.
I tried running this command:
docker exec primary-db pg_basebackup -h localhost -p 5432 -U postgres -D /var/lib/postgresql/data/basebackup -Ft -z -Xs -P
But I'm getting the following error:
> pg_basebackup: WAL streaming can only be used in plain mode
I believe I'm not doing WAL streaming because the two databases are never connected to each other.
dokgu
(123 rep)
Feb 15, 2024, 10:26 PM
• Last activity: Feb 16, 2024, 06:50 AM
1
votes
1
answers
996
views
How to lock an entire Postgres database
I have a process (for backup and replication using binary logs) that works in MySQL that I want to replicate for Postgres databases (using WAL): 1. Lock database (`FLUSH TABLES WITH READ LOCK;`) 2. Disable binary logging (`SET sql_log_bin = 0;`) 3. Flush binary logs (`FLUSH BINARY LOGS;`) 4. Copy bi...
I have a process (for backup and replication using binary logs) that works in MySQL that I want to replicate for Postgres databases (using WAL):
1. Lock database (
FLUSH TABLES WITH READ LOCK;
)
2. Disable binary logging (SET sql_log_bin = 0;
)
3. Flush binary logs (FLUSH BINARY LOGS;
)
4. Copy binary logs and apply to a remote database for replication purposes
5. Enable binary logging (SET sql_log_bin = 1;
)
6. Unlock database (UNLOCK TABLES;
)
This process works and I'm trying to do the same for Postgres. I was able to find that [CHECKPOINT
](https://www.postgresql.org/docs/current/sql-checkpoint.html) is what I need to flush the binary logs for Postgres but locking and unlocking the database isn't as straightforward as I hoped it would be.
Here's what I found so far:
- [LOCK DATABASE
](https://wiki.postgresql.org/wiki/Lock_database) is a database-level locking mechanism but is not implemented
- BEGIN; LOCK TABLE users IN ACCESS EXCLUSIVE MODE NOWAIT;
seems to be able to lock the tables individually but apparently there is no [UNLOCK TABLE
](https://www.postgresql.org/docs/current/sql-lock.html) command because locks are always released at transaction end
- default_transaction_read_only
- I tried to look for the equivalent of FLUSH TABLES WITH READ LOCK;
in Postgres and found [this question](https://dba.stackexchange.com/questions/271685/postgresql-9-3-set-database-or-all-to-read-only-mode)
I tried the default_transaction_read_only
solution and it does prevent new records from being added but I'm a little worried about what the answer in there said:
> _This takes effect for new sessions, not for sessions that are already connected._
Also when I tried to revert the read-only mode: BEGIN read write; ALTER DATABASE my_database SET default_transaction_read_only to OFF;
it appears to have succeeded but when I try to add a new record, I get the following error:
> ERROR: cannot execute INSERT in a read-only transaction
Any ideas on how I could perform the same process as I am doing for MySQL but for Postgres?
dokgu
(123 rep)
Feb 12, 2024, 04:42 PM
• Last activity: Feb 13, 2024, 07:56 AM
0
votes
1
answers
1614
views
How to flush WAL files?
I'm trying to force WAL files to be flushed on to the disk so that I can copy them somewhere else. I believe [`CHECKPOINT`](https://www.postgresql.org/docs/current/sql-checkpoint.html) is the command I'm interested in but it doesn't seem to work. I have a docker container of Postgres running and I u...
I'm trying to force WAL files to be flushed on to the disk so that I can copy them somewhere else. I believe [
CHECKPOINT
](https://www.postgresql.org/docs/current/sql-checkpoint.html) is the command I'm interested in but it doesn't seem to work.
I have a docker container of Postgres running and I use adminer to add new records to a table I created. I tried to flush the WAL files manually:
$ docker exec db psql pgtest -U postgres -c "CHECKPOINT;"
CHECKPOINT
I believe that should flush the data and I would see a new WAL file created, however there are no new files - I only see the old WAL files. When I restart the container:
$ docker restart db
db
Then I see a new WAL file.
This is what my postgresql.conf looks like:
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
# - Settings -
wal_level = logical # minimal, replica, or logical
# (change requires restart)
fsync = on # flush data to disk for crash safety
# (turning this off can cause
# unrecoverable data corruption)
synchronous_commit = local # synchronization level;
# off, local, remote_write, remote_apply, or on
wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync (default on Linux)
# fsync
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
#wal_compression = off # enable compression of full-page writes
#wal_log_hints = off # also do full page writes of non-critical updates
# (change requires restart)
#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers
# (change requires restart)
#wal_writer_delay = 200ms # 1-10000 milliseconds
wal_writer_flush_after = 1MB # measured in pages, 0 disables
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
# - Checkpoints -
#checkpoint_timeout = 1min # range 30s-1d
#max_wal_size = 1GB
#min_wal_size = 4MB
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_flush_after = 256kB # measured in pages, 0 disables
#checkpoint_warning = 30s # 0 disables
# - Archiving -
archive_mode = on # enables archiving; off, on, or always
# (change requires restart)
archive_command = 'test ! -f /var/lib/postgresql/data/wal/%f && cp %p /var/lib/postgresql/data/wal/%f'
# placeholders: %p = path of file to archive
# %f = file name only
# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0 # force a logfile segment switch after this
# number of seconds; 0 disables
dokgu
(123 rep)
Feb 12, 2024, 05:51 PM
• Last activity: Feb 12, 2024, 06:31 PM
1
votes
0
answers
44
views
Advice on postgreSQL table logging
I am thinking I can turn off logging for certain tables in my PostgreSQL analytic data warehouse. There are a core set of very large tables (between 50 and 500Gb) that are completely rebuilt at the start of each month. Each table is truncated, identity reset, indexes removed, then reloaded, then rei...
I am thinking I can turn off logging for certain tables in my PostgreSQL analytic data warehouse. There are a core set of very large tables (between 50 and 500Gb) that are completely rebuilt at the start of each month. Each table is truncated, identity reset, indexes removed, then reloaded, then reindexed. Once reloaded, the tables are permissioned to select-only and don't get any updates or inserts. Is there any reason for these tables to use the WAL? I should get a performance benefit for the monthly reloads if I turn logging off, and we're not using replication, but I've never used this table option before and want to be sure I'm not missing something important.
zambo
(21 rep)
Nov 29, 2023, 03:53 PM
1
votes
1
answers
735
views
Shared buffers, WAL buffers and Checkpointers
I am taking this [EDB PostgreSQL essential course][1] and Instructor explained about PostgreSQL architecture referring to the diagram that, whenever a client make an update request and suppose data is present in shared buffer (that means no need to fetch it from file storage) then it'll make an entr...
I am taking this EDB PostgreSQL essential course and Instructor explained about PostgreSQL architecture referring to the diagram that, whenever a client make an update request and suppose data is present in shared buffer (that means no need to fetch it from file storage) then it'll make an entry in WAL buffers and upon *committing* the WAL writer will write the transaction to transaction logs and make it permanent but not in the file systems (as far as I've understood, that's the task of *checkpointer*, below.) So far so good.
**image courtesy traning.enterprisedb.com**
Now comes checkpointer, *it is a process which runs after every certain interval of time "usually 5 mins is an ideal time" and, write anything in the shared buffer into the file storage.*
My question is, suppose checkpointer just ran and after that I initiated an atomic transaction and transferred 100 bucks to my friend, how is it that my friend can see it immediately, is Postgres making query to transaction logs? Or, how's this happening?
But upon little pondering, I realize that when the request is made to update the data and in order to update it, Postgres has bring it into the main memory and a viable way to do that is to keep track of dirty data in shared buffer and update the data in shared buffer itself and in the transaction logs we can have

0/1
with every DML transaction entry to identify whether data is present in shared buffer or not. This can also come handy while doing analysis.
Can someone help me understand?
Thanks in advance!
commonSense
(123 rep)
Nov 25, 2023, 01:40 AM
• Last activity: Nov 25, 2023, 03:08 PM
2
votes
1
answers
1158
views
Why does PostgreSQL want max_wal_senders equal to 0 and greater than 0 at the same time?
We're running a backup process based on `pg_basebackup` of a Postgres 14.5 database that creates a lot of WAL archive files and using up huge amounts of disk space. As we don't really need the WAL archive, I thought to disable WAL archiving and I changed these values (everything else is pure default...
We're running a backup process based on
pg_basebackup
of a Postgres 14.5 database that creates a lot of WAL archive files and using up huge amounts of disk space. As we don't really need the WAL archive, I thought to disable WAL archiving and I changed these values (everything else is pure default):
archive_mode = off # was 'on' before
wal_level = minimal # was 'replica' before (there's only a single db instance running)
max_wal_senders = 0 # was '10' before (but wal_level = minimal seems to require '0')
Maybe there's no real need to lower the wal_level
from replica
to minimal
, but the [WAL documentation](https://www.postgresql.org/docs/current/runtime-config-wal.html) states what seems exactly sufficient for my less than critical database:
> minimal
removes all logging except the information required to recover from a crash or immediate shutdown.
Unfortunately, after the above configuration, the backup process now fails with this error:
FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)
In other words, max_wal_senders
must be zero and also greater than zero at the same time! What can I do to get out of this deadlock?
dokaspar
(123 rep)
Sep 18, 2023, 06:28 PM
• Last activity: Sep 18, 2023, 07:13 PM
-1
votes
1
answers
546
views
General questions about write ahead logs and recovery
I've recently learned about write ahead logs and failure recovery in the context of key-value stores like Cassandra and have some follow-up questions: 1. Do WALs get persisted into smaller snapshots so that we only need to recover from the latest snapshot instead of the entire commit history? What i...
I've recently learned about write ahead logs and failure recovery in the context of key-value stores like Cassandra and have some follow-up questions:
1. Do WALs get persisted into smaller snapshots so that we only need to recover from the latest snapshot instead of the entire commit history? What is the high-level mechanism for persisting and recovering from these snapshots?
2. Do logs get batched when writing to disk for performance? Is this at the risk of permanently losing data?
3. What happens when a machine fails during recovery from a WAL? Do we end up with partial results in the database? How do we recover from this state the next time the DB starts?
ejtt
(101 rep)
Apr 6, 2023, 03:19 PM
• Last activity: Apr 13, 2023, 10:50 AM
0
votes
2
answers
161
views
About "wal_buffers" of Postgresql-11 with "synchronous_commit=off"
Last days I tuned my PG11 for a lots of huge writing transactions, and got a amazing result. The only warry in my mind is that whether there is more risk of lossing data when I turned off `synchronous_commit`, and used a large `wal_buffers` value.
Last days I tuned my PG11 for a lots of huge writing transactions, and got a amazing result.
The only warry in my mind is that whether there is more risk of lossing data when I turned off
synchronous_commit
, and used a large wal_buffers
value.
Leon
(411 rep)
Feb 11, 2023, 02:12 AM
• Last activity: Feb 14, 2023, 02:58 AM
7
votes
2
answers
5950
views
What causes large INSERT to slow down and disk usage to explode?
I have a table of about 3.1 million rows with the following definition and indexes: CREATE TABLE digiroad_liikenne_elementti ( ogc_fid serial NOT NULL, wkb_geometry geometry(Geometry,4258), tiee_tila numeric(9,0), vaylatyypp numeric(9,0), toiminnall numeric(9,0), eurooppati character varying(254), k...
I have a table of about 3.1 million rows with the following definition and indexes:
CREATE TABLE digiroad_liikenne_elementti (
ogc_fid serial NOT NULL,
wkb_geometry geometry(Geometry,4258),
tiee_tila numeric(9,0),
vaylatyypp numeric(9,0),
toiminnall numeric(9,0),
eurooppati character varying(254),
kansalline numeric(9,0),
tyyppi numeric(9,0),
liikennevi numeric(9,0),
ens_talo_o numeric(9,0),
talonumero numeric(9,0),
ens_talo_v numeric(9,0),
oik_puol_t character varying(254),
tieosan_ta numeric(9,0),
viim_talo_ numeric(9,0),
viim_tal_1 numeric(9,0),
vas_puol_t character varying(254),
laut_tyypp numeric(9,0),
lautta_lii numeric(9,0),
inv_paalu_ numeric(19,11),
inv_paal_1 numeric(19,11),
liitalue_o numeric(9,0),
ketju_oid numeric(9,0),
tietojoukk numeric(9,0),
ajoratanum numeric(4,0),
viite_guid character varying(254),
"timestamp" date,
tiee_kunta numeric(9,0),
toissij_ti character varying(254),
viite_oid numeric(9,0),
k_elem_id numeric(9,0),
region character varying(40) DEFAULT 'REGION'::character varying,
CONSTRAINT digiroad_liikenne_elementti_pkey PRIMARY KEY (ogc_fid)
);
CREATE INDEX digiroad_liikenne_elementti_wkb_geometry_geom_idx
ON digiroad_liikenne_elementti USING gist (wkb_geometry);
CREATE INDEX dle_k_elem_id_idx
ON digiroad_liikenne_elementti USING btree (k_elem_id);
CREATE INDEX dle_ogc_fid_idx
ON digiroad_liikenne_elementti USING btree (ogc_fid);
CREATE INDEX dle_region_idx
ON digiroad_liikenne_elementti USING btree (region COLLATE pg_catalog."default");
Another table with 8.6 million rows contains attributes for the rows of the first table, the tables can be joined with
k_elem_id
AND region
.
CREATE TABLE digiroad_segmentti (
ogc_fid serial NOT NULL,
wkb_geometry geometry(Geometry,4258),
segm_tila numeric(9,0),
tyyppi numeric(9,0),
loppupiste numeric(19,11),
alkupiste numeric(19,11),
vaikutuska numeric(9,0),
vaikutussu numeric(9,0),
vaikutusai character varying(254),
tieosanume numeric(19,11),
tienumero numeric(9,0),
dyn_arvo numeric(9,0),
dyn_tyyppi numeric(9,0),
omistaja_t numeric(9,0),
pysakki_va numeric(9,0),
pysakki_ty numeric(9,0),
pysakki_su numeric(9,0),
pysakki_ka numeric(9,0),
pysakki_yl character varying(254),
palvelu_pa numeric(9,0),
toissijain numeric(9,0),
siltataitu numeric(9,0),
rdtc_tyypp numeric(9,0),
rdtc_alaty numeric(9,0),
rdtc_paikk numeric(19,11),
rdtc_luokk numeric(9,0),
rdtc_liitt character varying(254),
palvelu_ob numeric(9,0),
ketju_oid numeric(9,0),
tietojoukk numeric(9,0),
ajoratanum numeric(4,0),
viite_guid character varying(254),
"timestamp" date,
sivusiirty numeric(19,11),
toissij_ti character varying(254),
viite_oid numeric(9,0),
k_elem_id numeric(9,0),
region character varying(40) DEFAULT 'REGION'::character varying,
CONSTRAINT digiroad_segmentti_pkey PRIMARY KEY (ogc_fid)
);
CREATE INDEX digiroad_segmentti_wkb_geometry_geom_idx
ON digiroad_segmentti USING gist (wkb_geometry);
CREATE INDEX ds_dyn_arvo_idx
ON digiroad_segmentti USING btree (dyn_arvo);
CREATE INDEX ds_dyn_tyyppi_idx
ON digiroad_segmentti USING btree (dyn_tyyppi);
CREATE INDEX ds_k_elem_id_idx
ON digiroad_segmentti USING btree (k_elem_id);
CREATE INDEX ds_ogc_fid_idx
ON digiroad_segmentti USING btree (ogc_fid);
CREATE INDEX ds_region_idx
ON digiroad_segmentti USING btree (region COLLATE pg_catalog."default");
CREATE INDEX ds_tyyppi_idx
ON digiroad_segmentti USING btree (tyyppi);
I am trying to insert the rows of the first table (with some modification) into a new table:
CREATE TABLE edge_table (
id serial NOT NULL,
geom geometry,
source integer,
target integer,
km double precision,
kmh double precision DEFAULT 60,
kmh_winter double precision DEFAULT 50,
cost double precision,
cost_winter double precision,
reverse_cost double precision,
reverse_cost_winter double precision,
x1 double precision,
y1 double precision,
x2 double precision,
y2 double precision,
k_elem_id integer,
region character varying(40),
CONSTRAINT edge_table_pkey PRIMARY KEY (id)
);
Since running a single insert statement would take a long time and I would not be able to see if the statement is stuck or something, I have decided to do it in smaller chunks inside a loop in a function.
The function looks like this:
DROP FUNCTION IF EXISTS insert_function();
CREATE OR REPLACE FUNCTION insert_function()
RETURNS VOID AS
$$
DECLARE
const_type_1 CONSTANT int := 5;
const_type_2 CONSTANT int := 11;
i int := 0;
row_count int;
BEGIN
CREATE TABLE IF NOT EXISTS edge_table (
id serial PRIMARY KEY,
geom geometry,
source integer,
target integer,
km double precision,
kmh double precision DEFAULT 60,
kmh_winter double precision DEFAULT 50,
cost double precision,
cost_winter double precision,
reverse_cost double precision,
reverse_cost_winter double precision,
x1 double precision,
y1 double precision,
x2 double precision,
y2 double precision,
k_elem_id integer,
region varchar(40)
);
batch_size := 1000;
SELECT COUNT(*) FROM digiroad_liikenne_elementti INTO row_count;
WHILE i*batch_size = i * batch_size
AND
DLE.ogc_fid Nested Loop Left Join (cost=0.86..361121.68 rows=1031 width=23) (actual time=61.901..3377.609 rows=986 loops=1)
Buffers: shared hit=32279 read=3646
-> Index Scan using dle_ogc_fid_idx on digiroad_liikenne_elementti dle (cost=0.43..85.12 rows=1031 width=19) (actual time=31.918..57.309 rows=986 loops=1)
Index Cond: ((ogc_fid >= 200000) AND (ogc_fid Index Scan using ds_k_elem_id_idx on digiroad_segmentti ds (cost=0.44..350.16 rows=1 width=23) (actual time=2.861..3.337 rows=0 loops=986)
Index Cond: (k_elem_id = dle.k_elem_id)
Filter: ((tyyppi = 5::numeric) AND (dyn_tyyppi = 11::numeric) AND (vaikutussu = 3::numeric) AND ((region)::text = (dle.region)::text))
Rows Removed by Filter: 73
Buffers: shared hit=31266 read=3588
Total runtime: 3405.270 ms
My system is running PostgreSQL 9.3.5 on Windows 8 with 8Gb of RAM.
I have experimented with different batch sizes, doing the query in different ways and increasing the memory variables in Postgres configuration, but nothing seems to have really solved the issue.
Configuration variables that have been changed from their default values:
shared_buffers = 2048MB
work_mem = 64MB
effective_cache_size = 6000MB
I'd like to find out what is causing this to happen and what could be done about it.
jeran
(73 rep)
Sep 12, 2015, 03:51 PM
• Last activity: Jan 25, 2023, 02:18 AM
5
votes
1
answers
5465
views
Postgresql 11: terminating walsender process due to replication timeout
I have found some questions about the same error, but didn't find any of them answering my problem. The setup is that I have two Postgres11 clusters (A and B) and they are making use of publication and subscription features to copy data from A to B. A (source DB- publication) --------------> B (targ...
I have found some questions about the same error, but didn't find any of them answering my problem.
The setup is that I have two Postgres11 clusters (A and B) and they are making use of publication and subscription features to copy data from A to B.
A (source DB- publication) --------------> B (target DB - subscription)
This works fine, but often (not always) when the data volume being inserted on a table in node A increases, it gives the following error.
> "terminating walsender process due to replication timeout"
The data volume at the moment being entered is about 30K rows per second continuously for hours through
COPY
command.
Earlier the wal_sender_timeout
was set to 5 sec and I would see this error much often. I then increased it to 1 min and the frequency of this error reduced. But I don't want to keep increasing it without understanding what is causing it. I looked at the code of walsender.c
and found that it was coming from here.
if (wal_sender_timeout > 0 && last_processing >= timeout)
{
/*
* Since typically expiration of replication timeout means
* communication problem, we don't send the error message to the
* standby.
*/
ereport(COMMERROR,
(errmsg("terminating walsender process due to replication timeout")));
WalSndShutdown();
}
But I am still not clear that which parameter is making the sender assume that the receiver node is inactive and therefore it should stop the wal_sender
.
**SourceDB**
sourcedb=# show wal_sender_timeout;
wal_sender_timeout
--------------------
1min
(1 row)
sourcedb=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
------------------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+----------------+--------------------
-
sub_target_DB | pgoutput | logical | 16501 | sourcedb | f | t | 68229 | | 98839088 | 116D0/C36886F8 | 116D0/C3E5D370
**TargetDB**
targetdb=# show wal_receiver_timeout;
wal_receiver_timeout
----------------------
1min
(1 row)
targetdb=# show wal_retrieve_retry_interval ;
wal_retrieve_retry_interval
-----------------------------
5s
(1 row)
targetdb=# show wal_receiver_status_interval;
wal_receiver_status_interval
------------------------------
2s
(1 row)
targetdb=# select * from pg_stat_subscription;
subid | subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | l
atest_end_time
------------+------------------------------------+-------+-------+----------------+-------------------------------+-------------------------------+----------------+---------
----------------------
2378695757 | sub_target_DB | 62371 | | 116D1/2BA8F170 | 2021-08-20 09:05:15.398423+09 | 2021-08-20 09:05:15.398471+09 | 116D1/2BA8F170 | 2021-08-
20 09:05:15.398423+09
**Edit 1:**
Are there any disadvantages to keeping the wal_sender_timeout
or wal_receiver_timeout
to much higher values? I know that in case of an actual failure, the WAL segments would keep piling up in the pg_wal
folder of the sender. But is there a safe limit?
**Edit 2:**
Increased the wal_sender_timeout
to 5 mins and the error started appearing more frequently instead. Not only that, it even killed the active subscription and stopped replicating data. Had to restart it. So clearly, just increasing the wal_sender_timeout
hasn't helped.
Abhishek B
(131 rep)
Aug 20, 2021, 12:11 AM
• Last activity: Nov 14, 2022, 10:07 PM
Showing page 1 of 20 total questions