PostgreSQL too slow selecting 1 billion records in a table
0
votes
0
answers
1745
views
I created a PostgreSQL db in Google Cloud SQL and have a single table that stores 1 billion records
CREATE TABLE a (
user_id INT NOT NULL,
dt DATE NOT NULL DEFAULT CURRENT_DATE,
id VARCHAR(255) NOT NULL,
cnt SMALLINT NOT NULL DEFAULT 1
);
CREATE INDEX user_id_idx ON a(user_id);
CREATE INDEX dt_idx ON a(dt);
CREATE INDEX user_id_dt_idx ON a(user_id, dt);
The result from
SELECT COUNT(*) FROM a WHERE user_id = 1 AND dt BETWEEN ('2021-08-01' AND '2021-12-31')
is 10 million rows which is pretty big.
What I want is to count id
with some conditions like below.
SELECT COUNT(*)
FROM(
SELECT t.id, SUM(t.count) cnt_sum
FROM (
SELECT id, count
FROM a
WHERE user_id = 1 AND dt BETWEEN ('2021-08-01' AND '2021-12-31')
) t
GROUP BY t.id
) t2
WHERE t2.cnt_sum > 1 AND t2.cnt_sum
Aggregate (cost=432974.05..432974.06 rows=1 width=8) (actual time=76873.827..76873.829 rows=1 loops=1)
Buffers: shared hit=199160, temp read=172244 written=172504
-> GroupAggregate (cost=430508.09..432967.22 rows=546 width=41) (actual time=58606.309..76050.907 rows=9089942 loops=1)
Group Key: a.id
Filter: ((sum(a.count) > 1) AND (sum(a.count) Sort (cost=430508.09..430781.32 rows=109295 width=35) (actual time=58606.287..70580.733 rows=10100259 loops=1)
Sort Key: a.id
Sort Method: external merge Disk: 454704kB
Buffers: shared hit=199160, temp read=172244 written=172504
-> Index Scan using user_id_idx on a (cost=0.57..418372.26 rows=109295 width=35) (actual time=0.025..3334.426 rows=10100259 loops=1)
Index Cond: (user_id = 778)
Filter: ((dt >= '2021-06-01'::date) AND (dt <= '2022-01-20'::date))
Buffers: shared hit=199160
Planning Time: 0.184 ms
Execution Time: 76930.444 ms
However, This query took over 80 seconds to complete.
Is there any thing to make it perform better?
Asked by Eric Lee
(121 rep)
Jan 20, 2022, 08:33 AM
Last activity: Jan 21, 2022, 06:23 AM
Last activity: Jan 21, 2022, 06:23 AM