How to fix checkpoint requests problems?

0 votes

1 answer

641 views

postgresql performance transaction-log performance-tuning

On PostgreSQL v15 every morning at 5:15 AM starts heavy inserts in our database and finishes around 7:00 AM. I try to see if there are some parameters that can be tuned like checkpoints. I have written a script that executes the following SQL: SELECT checkpoints_timed, checkpoints_req FROM pg_stat_bgwriter to get "checkpoints timed" vs. "checkpoints requested". Or in other words do "checkpoints requested" appears at morning heavy inserts. I have set log_checkpoints=onpostgresql.conf parameter checkpoints info to be written in database server log file. I see the following related to checkpoints:

2024-09-09 05:22:34 LOG:  checkpoint starting: time
2024-09-09 05:33:37 LOG:  checkpoint complete: wrote 779881 buffers (8.9%); 0 WAL file(s) added, 483 removed, 664 recycled; write=652.802 s, sync=0.165 s, total=662.212 s; sync files=170, longest=0.059 s, average=0.001 s; distance=12326973 kB, estimate=12326973 kB

2024-09-09 05:34:27 LOG:  checkpoint starting: wal
2024-09-09 05:48:27 LOG:  checkpoint complete: wrote 1818364 buffers (20.7%); 0 WAL file(s) added, 0 removed, 1009 recycled; write=837.323 s, sync=0.097 s, total=839.269 s; sync files=175, longest=0.024 s, average=0.001 s; distance=16533226 kB, estimate=16533226 kB

2024-09-09 05:50:24 LOG:  checkpoint starting: wal
2024-09-09 06:06:02 LOG:  checkpoint complete: wrote 1302347 buffers (14.8%); 0 WAL file(s) added, 0 removed, 1010 recycled; write=938.102 s, sync=0.068 s, total=938.629 s; sync files=301, longest=0.011 s, average=0.001 s; distance=16547328 kB, estimate=16547328 kB

2024-09-09 06:08:20 LOG:  checkpoint starting: wal
2024-09-09 06:35:20 LOG:  checkpoint complete: wrote 1015915 buffers (11.6%); 0 WAL file(s) added, 0 removed, 880 recycled; write=1619.413 s, sync=0.038 s, total=1619.779 s; sync files=488, longest=0.020 s, average=0.001 s; distance=16548651 kB, estimate=16548651 kB

2024-09-09 06:38:20 LOG:  checkpoint starting: time
2024-09-09 07:05:20 LOG:  checkpoint complete: wrote 320834 buffers (3.7%); 0 WAL file(s) added, 0 removed, 376 recycled; write=1619.469 s, sync=0.004 s, total=1619.906 s; sync files=755, longest=0.001 s, average=0.001 s; distance=13792920 kB, estimate=16273078 kB

Whole day and night I see _checkpoint starting: time_ events (just like in first line). I assume those are _checkpoints timed_ events. Exception is at the time: 5:34, 5:50 and 6:08 (see above log) where _checkpoint starting: wal_ appears three times in sequence. I assume those are _checkpoint requested_ event. According to theory those events slow down the database performance and should be avoided. If we look at line 4, there is: - 1818364 buffers. If multiplied by 8192 (block_size) and divided by 67 * 1024 * 1024 * 1024 (shared_buffers) and we get 20.7% - 20.7% is the percentage of memory buffers changed (dirty blocks from checkpoint starting point to checkpoint completed events) that are requested to be saved by checkpoint to $PGDATA/base/"database"/ Data Files. - 0 WAL file(s) added: There was no need to add additional WAL files, because all are already added (according to max_wal_size parameter) - 1009 recycled - number of old WAL files renamed and reused - write=837.323 s - is 13 min and 57 seconds - time how long checkpoint needed to complete (also the time from "checkpoint starting: wal" to "checkpoint complete") - sync=0.097 s - probably some disk fsync or similar (negligible in our case, because of very small number) - total=839.269 s - probably "write + sync + 'something'" - sync files=175 - is number of Data Files written by checkpoint in $PGDATA/base/"database"/ directory from checkpoint "start" to "complete". - longest=0.024 s - maximal time for individual Data File to be written - average=0.001 s - average time for Data File to be written Now there are two parameters that I don't understand the meaning: - distance=16533226 kB - 15.7 GB - estimate=16533226 kB - 15.7 GB QUESTIONS: 1. What does distance/estimate means? 2. Why is in my case checkpoint request fired? 3. Is increasing max_wal_size recommended? If yes, how to calculate new parameter size? 4. Is some other parameter recommended like decrease checkpoint_completion_target to e.g. 0.8? PostgreSQL instance settings:

shared_buffers: 67GB
 wal_buffers:    32MB
 full_page_writes: on
 max_wal_size:   30GB
 checkpoint_timeout: 30min
 checkpoint_completion_target: 0.9
 min_wal_size: 512MB
 block_size: 8192
 wal_segment_siz: 16MB

Regards

Asked by folow (523 rep)

Sep 9, 2024, 12:29 PM
Last activity: Jul 13, 2025, 10:03 AM

How to fix checkpoint requests problems?

Related Questions