Sample Header Ad - 728x90

Understanding composite BTREE + GIN_TRGM_OPS index prioritization & odd lower() behavior

4 votes
1 answer
3267 views
hoping someone can try to help me decrypt some index behavior. I'm working on enabling some simple contains type lookups on various user-data columns (~varchar Sort (cost=52220.05..52777.38 rows=222932 width=4) (actual time=251.069..285.652 rows=222408 loops=1) Output: id Sort Key: people.id Sort Method: external merge Disk: 3064kB -> Bitmap Heap Scan on public.people (cost=3070.23..29368.23 rows=222932 width=4) (actual time=35.156..198.549 rows=222408 loops=1) Output: id Recheck Cond: (people.user_category = 2) Filter: ((people.user_type)::text 'Ogre'::text) Rows Removed by Filter: 111278 Heap Blocks: exact=21277 -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_email (cost=0.00..3014.50 rows=334733 width=0) (actual time=32.017..32.017 rows=333686 loops=1) Index Cond: (people.user_category = 2) Planning Time: 0.293 ms Execution Time: 359.247 ms
Is the original b-tree totally redundant at this point? I expected it might still be picked by the planner if only those two columns were used if the b-tree was faster for those data types but that seems like it is not the case.

Next, I had noticed was that our existing queries were depending on lower() and seemed to completely ignore the GIN indexes, or rather it seemed to use whichever was the last one created even if that column was not used in the query:
sql EXPLAIN ANALYZE VERBOSE SELECT DISTINCT id FROM people WHERE user_category = 2 AND (user_type != 'Ogre') AND (LOWER(last_name) LIKE LOWER('%a62%'))
Results (comparing last_name but using email index):
sql HashAggregate (cost=28997.16..29086.33 rows=8917 width=4) (actual time=175.204..175.554 rows=1677 loops=1) Output: id Group Key: people.id -> Gather (cost=4016.73..28974.87 rows=8917 width=4) (actual time=39.947..181.936 rows=1677 loops=1) Output: id Workers Planned: 2 Workers Launched: 2 -> Parallel Bitmap Heap Scan on public.people (cost=3016.73..27083.17 rows=3715 width=4) (actual time=22.037..156.233 rows=559 loops=3) Output: id Recheck Cond: (people.user_category = 2) Filter: (((people.user_type)::text 'Ogre'::text) AND (lower((people.last_name)::text) ~~ '%a62%'::text)) Rows Removed by Filter: 110670 Heap Blocks: exact=7011 Worker 0: actual time=13.573..147.844 rows=527 loops=1 Worker 1: actual time=13.138..147.867 rows=584 loops=1 -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_email (cost=0.00..3014.50 rows=334733 width=0) (actual time=35.445..35.445 rows=333686 loops=1) Index Cond: (people.user_category = 2) Planning Time: 7.546 ms Execution Time: 189.186 ms
Whereas switching to ILIKE
sql EXPLAIN ANALYZE VERBOSE SELECT DISTINCT id FROM people WHERE user_category = 2 AND (user_type != 'Ogre') AND (last_name ILIKE '%A62%')
Results are way faster, and using the expected index. What is it about the lower() call that seems to make the planner skip a beat?
sql Unique (cost=161.51..161.62 rows=22 width=4) (actual time=27.144..27.570 rows=1677 loops=1) Output: id -> Sort (cost=161.51..161.56 rows=22 width=4) (actual time=27.137..27.256 rows=1677 loops=1) Output: id Sort Key: people.id Sort Method: quicksort Memory: 127kB -> Bitmap Heap Scan on public.people (cost=32.34..161.02 rows=22 width=4) (actual time=16.470..26.798 rows=1677 loops=1) Output: id Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) Filter: ((people.user_type)::text 'Ogre'::text) Rows Removed by Filter: 766 Heap Blocks: exact=2291 -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=16.058..16.058 rows=2443 loops=1) Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) Planning Time: 10.577 ms Execution Time: 27.746 ms
Next, adding another field into things...
sql EXPLAIN ANALYZE VERBOSE SELECT DISTINCT id FROM people WHERE user_category = 2 AND (user_type != 'Ogre') AND (last_name ILIKE '%A62%') AND (first_name ILIKE '%EAD%')
Is still pretty speedy overall
sql Unique (cost=161.11..161.11 rows=1 width=4) (actual time=10.854..10.860 rows=12 loops=1) Output: id -> Sort (cost=161.11..161.11 rows=1 width=4) (actual time=10.853..10.854 rows=12 loops=1) Output: id Sort Key: people.id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on public.people (cost=32.33..161.10 rows=1 width=4) (actual time=3.895..10.831 rows=12 loops=1) Output: id Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) Filter: (((people.user_type)::text 'Ogre'::text) AND ((people.first_name)::text ~~* '%EAD%'::text)) Rows Removed by Filter: 2431 Heap Blocks: exact=2291 -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=3.173..3.173 rows=2443 loops=1) Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) Planning Time: 0.257 ms Execution Time: 10.897 ms
Yet going back to that extra, non-tuple index created at the bottom and filtering on email seems to utilize another index as part of things?
sql EXPLAIN ANALYZE VERBOSE SELECT DISTINCT id FROM people WHERE user_category = 2 AND (user_type != 'Ogre') AND (last_name ILIKE '%A62%') AND (email ILIKE '%0F9%')
Has a different path:
sql Unique (cost=140.37..140.38 rows=1 width=4) (actual time=4.180..4.184 rows=7 loops=1) Output: id -> Sort (cost=140.37..140.38 rows=1 width=4) (actual time=4.180..4.180 rows=7 loops=1) Output: id Sort Key: people.id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on public.people (cost=136.34..140.36 rows=1 width=4) (actual time=4.145..4.174 rows=7 loops=1) Output: id Recheck Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text) AND ((people.email)::text ~~* '%0F9%'::text)) Filter: ((people.user_type)::text 'Ogre'::text) Rows Removed by Filter: 4 Heap Blocks: exact=11 -> BitmapAnd (cost=136.34..136.34 rows=1 width=0) (actual time=4.125..4.125 rows=0 loops=1) -> Bitmap Index Scan on idx_people_gin_user_category_and_user_type_and_last_name (cost=0.00..32.33 rows=33 width=0) (actual time=3.089..3.089 rows=2443 loops=1) Index Cond: ((people.user_category = 2) AND ((people.last_name)::text ~~* '%A62%'::text)) -> Bitmap Index Scan on idx_people_gin_email (cost=0.00..103.76 rows=10101 width=0) (actual time=0.879..0.879 rows=7138 loops=1) Index Cond: ((people.email)::text ~~* '%0F9%'::text) Planning Time: 0.291 ms Execution Time: 4.217 ms ``` The cost seems negligibly lower, but wondering what this means for a fairly dynamic amount of columns that could be filtered by? Would it be ideal to make a non-tuple index for all fields too? Sorry for the length, have spun my wheels for a while trying to figure through it all, but any insight would be pretty awesome (and it seems like there aren't a ton of GIN indexes like this, though maybe I am missing something more fundamental overall)
Asked by Jeff B. (141 rep)
Nov 28, 2019, 10:56 PM
Last activity: Nov 29, 2019, 02:55 AM