Sample Header Ad - 728x90

PostgreSQL too slow selecting 1 billion records in a table

0 votes
0 answers
1745 views
I created a PostgreSQL db in Google Cloud SQL and have a single table that stores 1 billion records CREATE TABLE a ( user_id INT NOT NULL, dt DATE NOT NULL DEFAULT CURRENT_DATE, id VARCHAR(255) NOT NULL, cnt SMALLINT NOT NULL DEFAULT 1 ); CREATE INDEX user_id_idx ON a(user_id); CREATE INDEX dt_idx ON a(dt); CREATE INDEX user_id_dt_idx ON a(user_id, dt); The result from SELECT COUNT(*) FROM a WHERE user_id = 1 AND dt BETWEEN ('2021-08-01' AND '2021-12-31') is 10 million rows which is pretty big. What I want is to count id with some conditions like below. SELECT COUNT(*) FROM( SELECT t.id, SUM(t.count) cnt_sum FROM ( SELECT id, count FROM a WHERE user_id = 1 AND dt BETWEEN ('2021-08-01' AND '2021-12-31') ) t GROUP BY t.id ) t2 WHERE t2.cnt_sum > 1 AND t2.cnt_sum Aggregate (cost=432974.05..432974.06 rows=1 width=8) (actual time=76873.827..76873.829 rows=1 loops=1) Buffers: shared hit=199160, temp read=172244 written=172504 -> GroupAggregate (cost=430508.09..432967.22 rows=546 width=41) (actual time=58606.309..76050.907 rows=9089942 loops=1) Group Key: a.id Filter: ((sum(a.count) > 1) AND (sum(a.count) Sort (cost=430508.09..430781.32 rows=109295 width=35) (actual time=58606.287..70580.733 rows=10100259 loops=1) Sort Key: a.id Sort Method: external merge Disk: 454704kB Buffers: shared hit=199160, temp read=172244 written=172504 -> Index Scan using user_id_idx on a (cost=0.57..418372.26 rows=109295 width=35) (actual time=0.025..3334.426 rows=10100259 loops=1) Index Cond: (user_id = 778) Filter: ((dt >= '2021-06-01'::date) AND (dt <= '2022-01-20'::date)) Buffers: shared hit=199160 Planning Time: 0.184 ms Execution Time: 76930.444 ms However, This query took over 80 seconds to complete. Is there any thing to make it perform better?
Asked by Eric Lee (121 rep)
Jan 20, 2022, 08:33 AM
Last activity: Jan 21, 2022, 06:23 AM