Understanding composite BTREE + GIN_TRGM_OPS index prioritization & odd lower() behavior
4
votes
1
answer
3267
views
hoping someone can try to help me decrypt some index behavior. I'm working on enabling some simple
contains
type lookups on various user-data columns (~varchar Sort (cost=52220.05..52777.38 rows=222932 width=4) (actual time=251.069..285.652 rows=222408 loops=1)
Output: id
Sort Key: people.id
Sort Method: external merge Disk: 3064kB
-> Bitmap Heap Scan on public.people (cost=3070.23..29368.23 rows=222932 width=4) (actual time=35.156..198.549 rows=222408 loops=1)
Output: id
Recheck Cond: (people.user_category = 2)
Filter: ((people.user_type)::text 'Ogre'::text)
Rows Removed by Filter: 111278
Heap Blocks: exact=21277
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_email (cost=0.00..3014.50 rows=334733 width=0) (actual time=32.017..32.017 rows=333686 loops=1)
Index Cond: (people.user_category = 2)
Planning Time: 0.293 ms
Execution Time: 359.247 ms
Is the original b-tree totally redundant at this point? I expected it might still be picked by the planner if only those two columns were used if the b-tree was faster for those data types but that seems like it is not the case.
Next, I had noticed was that our existing queries were depending on lower()
and seemed to completely ignore the GIN indexes, or rather it seemed to use whichever was the last one created even if that column was not used in the query:
sql
EXPLAIN ANALYZE VERBOSE
SELECT DISTINCT id
FROM people
WHERE user_category = 2
AND (user_type != 'Ogre')
AND (LOWER(last_name) LIKE LOWER('%a62%'))
Results (comparing last_name but using email index):
sql
HashAggregate (cost=28997.16..29086.33 rows=8917 width=4) (actual time=175.204..175.554 rows=1677 loops=1)
Output: id
Group Key: people.id
-> Gather (cost=4016.73..28974.87 rows=8917 width=4) (actual time=39.947..181.936 rows=1677 loops=1)
Output: id
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on public.people (cost=3016.73..27083.17 rows=3715 width=4) (actual time=22.037..156.233 rows=559 loops=3)
Output: id
Recheck Cond: (people.user_category = 2)
Filter: (((people.user_type)::text 'Ogre'::text) AND (lower((people.last_name)::text) ~~ '%a62%'::text))
Rows Removed by Filter: 110670
Heap Blocks: exact=7011
Worker 0: actual time=13.573..147.844 rows=527 loops=1
Worker 1: actual time=13.138..147.867 rows=584 loops=1
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_email (cost=0.00..3014.50 rows=334733 width=0) (actual time=35.445..35.445 rows=333686 loops=1)
Index Cond: (people.user_category = 2)
Planning Time: 7.546 ms
Execution Time: 189.186 ms
Whereas switching to ILIKE
sql
EXPLAIN ANALYZE VERBOSE
SELECT DISTINCT id
FROM people
WHERE user_category = 2
AND (user_type != 'Ogre')
AND (last_name ILIKE '%A62%')
Results are way faster, and using the expected index. What is it about the lower()
call that seems to make the planner skip a beat?
sql
Unique (cost=161.51..161.62 rows=22 width=4) (actual time=27.144..27.570 rows=1677 loops=1)
Output: id
-> Sort (cost=161.51..161.56 rows=22 width=4) (actual time=27.137..27.256 rows=1677 loops=1)
Output: id
Sort Key: people.id
Sort Method: quicksort Memory: 127kB
-> Bitmap Heap Scan on public.people (cost=32.34..161.02 rows=22 width=4) (actual time=16.470..26.798 rows=1677 loops=1)
Output: id
Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
Filter: ((people.user_type)::text 'Ogre'::text)
Rows Removed by Filter: 766
Heap Blocks: exact=2291
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=16.058..16.058 rows=2443 loops=1)
Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
Planning Time: 10.577 ms
Execution Time: 27.746 ms
Next, adding another field into things...
sql
EXPLAIN ANALYZE VERBOSE
SELECT DISTINCT id
FROM people
WHERE user_category = 2
AND (user_type != 'Ogre')
AND (last_name ILIKE '%A62%')
AND (first_name ILIKE '%EAD%')
Is still pretty speedy overall
sql
Unique (cost=161.11..161.11 rows=1 width=4) (actual time=10.854..10.860 rows=12 loops=1)
Output: id
-> Sort (cost=161.11..161.11 rows=1 width=4) (actual time=10.853..10.854 rows=12 loops=1)
Output: id
Sort Key: people.id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on public.people (cost=32.33..161.10 rows=1 width=4) (actual time=3.895..10.831 rows=12 loops=1)
Output: id
Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
Filter: (((people.user_type)::text 'Ogre'::text) AND ((people.first_name)::text ~~* '%EAD%'::text))
Rows Removed by Filter: 2431
Heap Blocks: exact=2291
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=3.173..3.173 rows=2443 loops=1)
Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
Planning Time: 0.257 ms
Execution Time: 10.897 ms
Yet going back to that extra, non-tuple index created at the bottom and filtering on email seems to utilize another index as part of things?
sql
EXPLAIN ANALYZE VERBOSE
SELECT DISTINCT id
FROM people
WHERE user_category = 2
AND (user_type != 'Ogre')
AND (last_name ILIKE '%A62%')
AND (email ILIKE '%0F9%')
Has a different path:
sql
Unique (cost=140.37..140.38 rows=1 width=4) (actual time=4.180..4.184 rows=7 loops=1)
Output: id
-> Sort (cost=140.37..140.38 rows=1 width=4) (actual time=4.180..4.180 rows=7 loops=1)
Output: id
Sort Key: people.id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on public.people (cost=136.34..140.36 rows=1 width=4) (actual time=4.145..4.174 rows=7 loops=1)
Output: id
Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text) AND ((people.email)::text ~~* '%0F9%'::text))
Filter: ((people.user_type)::text 'Ogre'::text)
Rows Removed by Filter: 4
Heap Blocks: exact=11
-> BitmapAnd (cost=136.34..136.34 rows=1 width=0) (actual time=4.125..4.125 rows=0 loops=1)
-> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=3.089..3.089 rows=2443 loops=1)
Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text))
-> Bitmap Index Scan on idx_people_gin_email (cost=0.00..103.76 rows=10101 width=0) (actual time=0.879..0.879 rows=7138 loops=1)
Index Cond: ((people.email)::text ~~* '%0F9%'::text)
Planning Time: 0.291 ms
Execution Time: 4.217 ms
```
The cost seems negligibly lower, but wondering what this means for a fairly dynamic amount of columns that could be filtered by? Would it be ideal to make a non-tuple index for all fields too?
Sorry for the length, have spun my wheels for a while trying to figure through it all, but any insight would be pretty awesome (and it seems like there aren't a ton of GIN indexes like this, though maybe I am missing something more fundamental overall)
Asked by Jeff B.
(141 rep)
Nov 28, 2019, 10:56 PM
Last activity: Nov 29, 2019, 02:55 AM
Last activity: Nov 29, 2019, 02:55 AM