Sample Header Ad - 728x90

Best index for similarity function

18 votes
1 answer
16691 views
So I have this table with 6.2 millions records and I have to perform search queries with similarity for one for the column. The queries can be: SELECT "lca_test".* FROM "lca_test" WHERE (similarity(job_title, 'sales executive') > 0.6) AND worksite_city = 'los angeles' ORDER BY salary ASC LIMIT 50 OFFSET 0 More conditions can be added in the where(year = X, worksite_state = N, status = 'certified', visa_class = Z). Running some of those queries can take a really long time, over 30seconds. Sometimes more than a minutes. EXPLAIN ANALYZE of the previously mentioned query gives me this: > Limit (cost=0.43..42523.04 rows=50 width=254) (actual time=9070.268..33487.734 rows=2 loops=1) > -> Index Scan using index_lca_test_on_salary on lca_test (cost=0.43..23922368.16 rows=28129 width=254) (actual time=9070.265..33487.727 rows=2 loops=1) > >>>> Filter: (((worksite_city)::text = 'los angeles'::text) AND (similarity((job_title)::text, 'sales executive'::text) > 0.6::double precision)) > >>>> Rows Removed by Filter: 6330130 Total runtime: 33487.802 ms > Total runtime: 33487.802 ms I can't figure out how I should index my column to make it blazing fast. EDIT: Here is the postgres version: > PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2, 64-bit Here is the table definition: Table "public.lca_test" Column | Type | Modifiers | Storage | Stats target | Description ------------------------+-------------------+-------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('lca_test_id_seq'::regclass) | plain | | raw_id | integer | | plain | | year | integer | | plain | | company_id | integer | | plain | | visa_class | character varying | | extended | | employement_start_date | character varying | | extended | | employement_end_date | character varying | | extended | | employer_name | character varying | | extended | | employer_address1 | character varying | | extended | | employer_address2 | character varying | | extended | | employer_city | character varying | | extended | | employer_state | character varying | | extended | | employer_postal_code | character varying | | extended | | employer_phone | character varying | | extended | | employer_phone_ext | character varying | | extended | | job_title | character varying | | extended | | soc_code | character varying | | extended | | naic_code | character varying | | extended | | prevailing_wage | character varying | | extended | | pw_unit_of_pay | character varying | | extended | | wage_unit_of_pay | character varying | | extended | | worksite_city | character varying | | extended | | worksite_state | character varying | | extended | | worksite_postal_code | character varying | | extended | | total_workers | integer | | plain | | case_status | character varying | | extended | | case_no | character varying | | extended | | salary | real | | plain | | salary_max | real | | plain | | prevailing_wage_second | real | | plain | | lawyer_id | integer | | plain | | citizenship | character varying | | extended | | class_of_admission | character varying | | extended | | Indexes: "lca_test_pkey" PRIMARY KEY, btree (id) "index_lca_test_on_id_and_salary" btree (id, salary) "index_lca_test_on_id_and_salary_and_year" btree (id, salary, year) "index_lca_test_on_id_and_salary_and_year_and_wage_unit_of_pay" btree (id, salary, year, wage_unit_of_pay) "index_lca_test_on_id_and_visa_class" btree (id, visa_class) "index_lca_test_on_id_and_worksite_state" btree (id, worksite_state) "index_lca_test_on_lawyer_id" btree (lawyer_id) "index_lca_test_on_lawyer_id_and_company_id" btree (lawyer_id, company_id) "index_lca_test_on_raw_id_and_visa_and_pw_second" btree (raw_id, visa_class, prevailing_wage_second) "index_lca_test_on_raw_id_and_visa_class" btree (raw_id, visa_class) "index_lca_test_on_salary" btree (salary) "index_lca_test_on_visa_class" btree (visa_class) "index_lca_test_on_wage_unit_of_pay" btree (wage_unit_of_pay) "index_lca_test_on_worksite_state" btree (worksite_state) "index_lca_test_on_year_and_company_id" btree (year, company_id) "index_lca_test_on_year_and_company_id_and_case_status" btree (year, company_id, case_status) "index_lcas_job_title_trigram" gin (job_title gin_trgm_ops) "lca_test_company_id" btree (company_id) "lca_test_employer_name" btree (employer_name) "lca_test_id" btree (id) "lca_test_on_year_and_companyid_and_wage_unit_and_salary" btree (year, company_id, wage_unit_of_pay, salary) Foreign-key constraints: "fk_rails_8a90090fe0" FOREIGN KEY (lawyer_id) REFERENCES lawyers(id) Has OIDs: no
Asked by bl0b (363 rep)
Jun 11, 2015, 05:50 PM
Last activity: Aug 1, 2023, 06:05 PM