PostgreSQL too slow selecting 1 billion records in a table

0 votes

0 answers

1745 views

postgresql postgresql-performance big-data

                          I created a PostgreSQL db in Google Cloud SQL and have a single table that stores 1 billion records

    CREATE TABLE a (
        user_id INT NOT NULL,
        dt DATE NOT NULL DEFAULT CURRENT_DATE,
        id VARCHAR(255) NOT NULL,
        cnt SMALLINT NOT NULL DEFAULT 1
    );
    CREATE INDEX user_id_idx ON a(user_id);
    CREATE INDEX dt_idx ON a(dt);
    CREATE INDEX user_id_dt_idx ON a(user_id, dt);

The result from SELECT COUNT(*) FROM a WHERE user_id = 1 AND dt BETWEEN ('2021-08-01' AND '2021-12-31') is 10 million rows which is pretty big.

What I want is to count id with some conditions like below.

    SELECT COUNT(*)
    FROM(
        SELECT t.id, SUM(t.count) cnt_sum
        FROM (
            SELECT id, count
            FROM a
            WHERE user_id = 1 AND dt BETWEEN ('2021-08-01' AND '2021-12-31')
        ) t
        GROUP BY t.id
    ) t2
    WHERE t2.cnt_sum > 1 AND t2.cnt_sum 

    Aggregate  (cost=432974.05..432974.06 rows=1 width=8) (actual time=76873.827..76873.829 rows=1 loops=1)
      Buffers: shared hit=199160, temp read=172244 written=172504
      ->  GroupAggregate  (cost=430508.09..432967.22 rows=546 width=41) (actual time=58606.309..76050.907 rows=9089942 loops=1)
            Group Key: a.id
            Filter: ((sum(a.count) > 1) AND (sum(a.count)   Sort  (cost=430508.09..430781.32 rows=109295 width=35) (actual time=58606.287..70580.733 rows=10100259 loops=1)
                  Sort Key: a.id
                  Sort Method: external merge  Disk: 454704kB
                  Buffers: shared hit=199160, temp read=172244 written=172504
                  ->  Index Scan using user_id_idx on a  (cost=0.57..418372.26 rows=109295 width=35) (actual time=0.025..3334.426 rows=10100259 loops=1)
                        Index Cond: (user_id = 778)
                        Filter: ((dt >= '2021-06-01'::date) AND (dt <= '2022-01-20'::date))
                        Buffers: shared hit=199160
    Planning Time: 0.184 ms
    Execution Time: 76930.444 ms


However, This query took over 80 seconds to complete.

Is there any thing to make it perform better?


                        

Asked by Eric Lee (121 rep)

Jan 20, 2022, 08:33 AM
Last activity: Jan 21, 2022, 06:23 AM

PostgreSQL too slow selecting 1 billion records in a table

Related Questions