WHERE A=x DISTINCT ON (B), with a composite index on (A, B, C)
2
votes
1
answer
83
views
I have huge table with a composite index on
(A, B, C)
.
-- psql (13.16 (Debian 13.16-0+deb11u1), server 14.12)
\d index_a_b_c
Index "public.index_a_b_c"
Column | Type | Key? |
----------+-----------------------+------+
A | character varying(44) | yes |
B | numeric(20,0) | yes |
C | numeric(20,0) | yes |
btree, for table "public.table_a_b_c"
#### I need all distinct B
s.
This query runs with Index Only Scan
, but, scans over the all A
matches. Which is not scale for my case since for some A
s there as millions of rows. Millions of Index Only Scan
row is slow.
EXPLAIN (ANALYZE true)
SELECT DISTINCT ON ("B") "B"
FROM "table_a_b_c"
WHERE "A" = 'astring'
-- Execution time: 0.172993s
-- Unique (cost=0.83..105067.18 rows=1123 width=5) (actual time=0.037..19.468 rows=67 loops=1)
-- -> Index Only Scan using index_a_b_c on table_a_b_c (cost=0.83..104684.36 rows=153129 width=5) (actual time=0.036..19.209 rows=1702 loops=1)
-- Index Cond: (A = 'astring'::text)
-- Heap Fetches: 351
-- Planning Time: 0.091 ms
-- Execution Time: 19.499 ms
As you see, runs over 1.7k rows and manually filter and returns 67 rows. 20ms getting tens of seconds when 1.7k to millions.
#### I also need all biggest C
s for distinct B
s.
Same thing as in *1)*. In theory, Postgres could know possible B
s, and not need to check the whole list matched to A
.
EXPLAIN (ANALYZE true)
SELECT DISTINCT ON ("B") *
FROM "table_a_b_c"
WHERE "A" = 'astring'
ORDER BY "B" DESC,
"C" DESC
-- Execution time: 0.822705s
-- Unique (cost=0.83..621264.51 rows=1123 width=247) (actual time=0.957..665.927 rows=67 loops=1)
-- -> Index Scan using index_a_b_c on table_a_b_c (cost=0.83..620881.69 rows=153130 width=247) (actual time=0.955..664.408 rows=1702 loops=1)
-- Index Cond: (a = 'astring'::text)
-- Planning Time: 0.116 ms
-- Execution Time: 665.978 ms
But for instance, this is fast:
SELECT * WHERE A="x" AND B=1 ORDER BY C DESC
UNION
SELECT * WHERE A="x" AND B=2 ORDER BY C DESC
UNION
....
for all possible B
s. It is like loop with number of B
time.
### Questions
a) Shouldn't the index on (A, B, C)
be a superset of (A, B)
in theory? (A, B)
will be super fast for distinct.
b) Why is it hard to find distinct B
s for Postgres?
c) How to handle this without new index?
Asked by kadircancetin
(23 rep)
Oct 10, 2024, 09:34 AM
Last activity: Oct 12, 2024, 12:22 AM
Last activity: Oct 12, 2024, 12:22 AM