WHERE A=x DISTINCT ON (B), with a composite index on (A, B, C)

2 votes

1 answer

83 views

postgresql query-performance index postgresql-performance greatest-n-per-group

I have huge table with a composite index on (A, B, C).

-- psql (13.16 (Debian 13.16-0+deb11u1), server 14.12)

\d index_a_b_c
         Index "public.index_a_b_c"
  Column  |         Type          | Key? | 
----------+-----------------------+------+
 A        | character varying(44) | yes  |
 B        | numeric(20,0)         | yes  |
 C        | numeric(20,0)         | yes  |
btree, for table "public.table_a_b_c"

#### I need all distinct Bs. This query runs with Index Only Scan, but, scans over the all A matches. Which is not scale for my case since for some As there as millions of rows. Millions of Index Only Scan row is slow.

EXPLAIN (ANALYZE true) 
SELECT DISTINCT ON ("B") "B"
  FROM "table_a_b_c"
 WHERE "A" = 'astring'

-- Execution time: 0.172993s
-- Unique  (cost=0.83..105067.18 rows=1123 width=5) (actual time=0.037..19.468 rows=67 loops=1)
--  ->  Index Only Scan using index_a_b_c on table_a_b_c  (cost=0.83..104684.36 rows=153129 width=5) (actual time=0.036..19.209 rows=1702 loops=1)
--        Index Cond: (A = 'astring'::text)
--        Heap Fetches: 351
-- Planning Time: 0.091 ms
-- Execution Time: 19.499 ms

As you see, runs over 1.7k rows and manually filter and returns 67 rows. 20ms getting tens of seconds when 1.7k to millions. #### I also need all biggest Cs for distinct Bs. Same thing as in *1)*. In theory, Postgres could know possible Bs, and not need to check the whole list matched to A.

EXPLAIN (ANALYZE true)
SELECT DISTINCT ON ("B") *
  FROM "table_a_b_c"
 WHERE "A" = 'astring'
 ORDER BY "B" DESC,
          "C" DESC

-- Execution time: 0.822705s 
-- Unique  (cost=0.83..621264.51 rows=1123 width=247) (actual time=0.957..665.927 rows=67 loops=1)
--   ->  Index Scan using index_a_b_c on table_a_b_c  (cost=0.83..620881.69 rows=153130 width=247) (actual time=0.955..664.408 rows=1702 loops=1)
--         Index Cond: (a = 'astring'::text)
-- Planning Time: 0.116 ms
-- Execution Time: 665.978 ms

But for instance, this is fast:

SELECT * WHERE A="x" AND B=1 ORDER BY C DESC
  UNION
SELECT * WHERE A="x" AND B=2 ORDER BY C DESC
  UNION
....

for all possible Bs. It is like loop with number of B time. ### Questions a) Shouldn't the index on (A, B, C) be a superset of (A, B) in theory? (A, B) will be super fast for distinct. b) Why is it hard to find distinct Bs for Postgres? c) How to handle this without new index?

Asked by kadircancetin (23 rep)

Oct 10, 2024, 09:34 AM
Last activity: Oct 12, 2024, 12:22 AM

WHERE A=x DISTINCT ON (B), with a composite index on (A, B, C)

Related Questions