Sample Header Ad - 728x90

pg_cron is dramatically faster than running the same query manually

0 votes
0 answers
565 views
I have a simple table with 75 million rows (and roughly 47 million unique ref_ids)
CREATE TABLE tbl (
	id serial NOT NULL PRIMARY KEY,
	ref_id text NOT NULL,
	ts timestamp NOT NULL
);
And I am trying to run the following query on the table (essentially, delete rows where there is an existing row with the same ref_id and a newer ts :
DELETE FROM tbl t 
WHERE EXISTS (
	SELECT 1 FROM tbl t2 WHERE t.id < t2.id AND t.ref_id = t2.ref_id AND t.ts <= t2.ts
)
If I try to run this query manually (i.e. via PGAdmin), it will run for several hours before I give up on it. However, if I put this query in a procedure and schedule it to run 5 minutes later via pg_cron, it will complete in ~70 seconds. There is no traffic to this table during either the manual or cron runs. To confirm, a count of the rows after the pg_cron run shows 47 million rows, where before the run it had 75. I have also experienced similar behavior when running other queries, and I am trying to figure out what could be causing such a massive performance difference between pg_cron and manually running queries. I am running postgres 14.4.
Asked by perennial_ (101 rep)
Jun 6, 2023, 03:32 AM