Sample Header Ad - 728x90

Trigram index for ILIKE patterns not working as expected

4 votes
2 answers
2869 views
I've got a simple but slow query: SELECT DISTINCT title FROM ja_jobs WHERE title ILIKE '%RYAN WER%' AND clientid = 31239 AND time_job > 1457826264 ORDER BY title LIMIT 10; **Explain analyze:** Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1) -> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1) -> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1) Sort Key: "title" Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1) Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264)) Filter: (("title")::"text" ~~* '%RYAN WER%'::"text") Rows Removed by Filter: 791 -> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1) Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264)) Total runtime: 2746.879 ms Then, I created a trigram index: CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops); **Explain analyze after adding the index:** (Yes, I analyze) Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1) -> Unique (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1) -> Sort (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1) Sort Key: "title" Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1) Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text")) Rows Removed by Index Recheck: 4 -> BitmapAnd (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1) -> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1) Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264)) -> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1) Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text") Total runtime: 3720.653 ms As you can see, the index did not work. Table public.ja_jobs: CREATE TABLE public.ja_jobs ( id bigint NOT NULL DEFAULT "nextval"('"ja_jobs_id_seq"'::"regclass"), refnum character varying(100) NOT NULL DEFAULT ''::character varying, clientid bigint NOT NULL DEFAULT 0, customerid bigint, time_job bigint, priority smallint NOT NULL DEFAULT 0, status character varying(255) NOT NULL DEFAULT 'active'::"bpchar", title character varying(100) NOT NULL DEFAULT ''::character varying, -- some other irrelevant columns ) Indexes on public.ja_jobs: Indexes: "ja_jobs_pkey" PRIMARY KEY, "btree" ("id") "ix_bill_customer_jobs" "btree" ("customerid", "bill_customer") "ix_clientid_jobs" "btree" ("clientid") "ix_customerid_job" "btree" ("customerid") "ix_ja_jobs_clientid_modified_date_created_date" "btree" ("clientid", "modified_date", "created_date") "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text")) "ix_ja_jobs_trgm_gin" "gin" ("title" "gin_trgm_ops") "ix_job_customer_recent_jobs_lookaside_bill_customer" "btree" ("bill_customer", "modified_date") "ix_job_customer_recent_jobs_lookaside_clientid" "btree" ("clientid", "modified_date") "ix_job_customer_recent_jobs_lookaside_customer" "btree" ("customerid", "modified_date") "ix_jobs_charges_and_parts_sort" "btree" (("charges_count" + "parts_count")) "ix_jobs_client_times" "btree" ("clientid", "time_job", "time_arrival") "ix_jobs_fts_description_en" "gin" ("full_text_universal_cast"("description")) "ix_jobs_fts_full_address_en" "gin" ((((("full_text_universal_cast"("address"::"text") || "full_text_universal_cast"("suburb"::"text")) || "full_text_universal_cast"("city"::"text")) || "full_text_universal_cast"("stpr"::"text")) || "full_text_universal_cast"("postc ode"::"text"))) "ix_jobs_fts_job_number_en" "gin" ("full_text_universal_cast"("job_number"::"text")) "ix_jobs_fts_refnum_en" "gin" ("full_text_universal_cast"("refnum"::"text")) "ix_jobs_fts_title_en" "gin" ("full_text_universal_cast"("title"::"text")) "ix_jobs_full_address_street_first" "btree" (((((COALESCE("address"::character varying, ''::character varying)::"text" || COALESCE(' '::"text" || "suburb"::"text", ''::"text")) || COALESCE(' '::"text" || "city"::"text", ''::"text")) || COALESCE(' '::"text" || "postc ode"::"text", ''::"text")) || COALESCE(' '::"text" || "stpr"::"text", ''::"text"))) "ix_jobs_paying_customers" "btree" ((COALESCE("bill_customer", "customerid"))) "ix_jobs_status_label_ids" "btree" ("status_label_id") "ix_jobs_top_by_client" "btree" ("id", "clientid") "ix_mobiuser_jobs" "btree" ("accepted_mobile_user") "ix_recurrenceid_jobs" "btree" ("recurrenceid") "ix_timejob_jobs" "btree" ("time_job") "ja_jobs_client_type" "btree" ("clientid", "jobtype") "ja_jobs_the_geom_idx" "gist" ("the_geom") ###Question: What can I do to improve the query? Why is the trigram index not working as expected? **UPDATE:** Re-ran the explain analyze buffer Limit (cost=199669.37..199669.39 rows=10 width=20) (actual time=31523.690..31523.691 rows=1 loops=1) Buffers: shared hit=26947 read=101574 dirtied=438 -> Sort (cost=199669.37..199669.40 rows=11 width=20) (actual time=31523.686..31523.686 rows=1 loops=1) Sort Key: "title" Sort Method: quicksort Memory: 25kB Buffers: shared hit=26947 read=101574 dirtied=438 -> Bitmap Heap Scan on "ja_jobs" (cost=4850.60..199669.18 rows=11 width=20) (actual time=11714.504..31523.640 rows=1 loops=1) Recheck Cond: (("clientid" = 2565) AND ("time_job" > 1382496599)) Filter: (("title")::"text" ~~* '%Hislop%'::"text") Rows Removed by Filter: 207654 Buffers: shared hit=26942 read=101574 dirtied=438 -> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..4850.60 rows=155054 width=0) (actual time=11670.956..11670.956 rows=215142 loops=1) Index Cond: (("clientid" = 2565) AND ("time_job" > 1382496599)) Buffers: shared hit=121 read=5772 Total runtime: 31524.874 ms After removing DISTINCT and the left %: explain (analyze, buffers) SELECT title FROM ja_jobs WHERE title ILIKE 'Hislop 13035%' AND clientid = 2565 AND time_job > 1382496599 ORDER BY title LIMIT 10; Limit (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.479..3492.483 rows=1 loops=1) Buffers: shared hit=4940 read=448 I/O Timings: read=83.285 -> Sort (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.475..3492.477 rows=1 loops=1) Sort Key: "title" Sort Method: quicksort Memory: 25kB Buffers: shared hit=4940 read=448 I/O Timings: read=83.285 -> Bitmap Heap Scan on "ja_jobs" (cost=391.62..2275.38 rows=9 width=20) (actual time=3492.460..3492.462 rows=1 loops=1) Recheck Cond: (("title")::"text" ~~* 'Hislop Street Clinic 2513035%'::"text") Filter: (("time_job" > 1382496599) AND ("clientid" = 2565)) Buffers: shared hit=4940 read=448 I/O Timings: read=83.285 -> Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1) Index Cond: (("title")::"text" ~~* 'Hislop 13035%'::"text") Buffers: shared hit=4939 read=448 I/O Timings: read=83.285 Total runtime: 3492.531 ms - As you can see, the query is hitting the new index but it's slower. Then I removed ORDER BY but the query is still slow. Also, I tried to use LIKE (with is much faster), but LIKE is case-sensitive, so for that reason I got no rows back. Can't use it.
Asked by user83914
May 12, 2016, 02:07 AM
Last activity: Aug 1, 2023, 05:15 PM