Sample Header Ad - 728x90

PostgreSQL query performance issue

1 vote
3 answers
269 views
We are sometimes getting poor performance (~14s) when using the following query (PostgreSQL 9.6) to fetch rows from the table items whose ID is present in the table items_categories: SELECT items.* FROM items WHERE EXISTS ( SELECT item_id FROM items_categories WHERE item_id = items.id AND category_id = 626 ) AND items.active = TRUE -- possibly some others "AND" here to use more filters on "items", but not considered for this question ORDER BY modified_at DESC LIMIT 10 Relevant parts of our schema: Table "public.items" Column | Type | Modifiers -----------------------+-------------------+---------------------------------------------------- id | integer | not null default nextval('items_id_seq'::regclass) active | boolean | default true modified_at | timestamp without time zone | default now() Indexes: "items_pkey" PRIMARY KEY, btree (id) "active_idx" btree (active) "aggregate_idx" btree (id) "items_modified_at_idx" btree (modified_at) Table "public.items_categories" Column | Type | Modifiers -------------+---------+----------- item_id | integer | not null category_id | integer | not null Indexes: "unique_cat_item_assoc" UNIQUE CONSTRAINT, btree (item_id, category_id) "items_categories_1_idx" btree (category_id) "items_categories_2_idx" btree (item_id) Foreign-key constraints: "items_categories_category_id_fkey" FOREIGN KEY (category_id) REFERENCES categories(id) "items_categories_item_id_fkey" FOREIGN KEY (item_id) REFERENCES items(id) The table items contains ~2 M rows, and the table items_categories contains ~4 M rows When we ask for 10 items (i.e. LIMIT 10 at the end of the above query) and 10 or more rows match in items_categories, the performance is good (~10ms), but when we ask for 10 items and less than 10 rows match in items_categories, then the query takes ~14s because it’s doing an index scan on items.modified_at to look at each 2 M rows. Query plan when less than 10 rows match in items_categories (poor performance): Limit (cost=0.86..11696.68 rows=10 width=1797) (actual time=168.376..14484.854 rows=7 loops=1) -> Nested Loop Semi Join (cost=0.86..2746178.23 rows=2348 width=1797) (actual time=168.376..14484.836 rows=7 loops=1) -> Index Scan Backward using items_modified_at_idx on items (cost=0.43..1680609.95 rows=2243424 width=1797) (actual time=0.054..7611.300 rows=2251395 loops=1) Filter: active Rows Removed by Filter: 2467 -> Index Only Scan using unique_cat_item_assoc on items_categories (cost=0.43..0.47 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=2251395) Index Cond: ((item_id = items.id) AND (category_id = 626)) Heap Fetches: 7 Planning time: 3.082 ms Execution time: 14485.057 ms Query plan when more than 10 rows match in items_categories (good performance): Limit (cost=0.86..24.07 rows=10 width=1857) (actual time=3.575..3.757 rows=10 loops=1) -> Nested Loop Semi Join (cost=0.86..2763459.56 rows=1190819 width=1857) (actual time=3.574..3.752 rows=10 loops=1) -> Index Scan Backward using items_modified_at_idx on items (cost=0.43..1684408.22 rows=2246967 width=1857) (actual time=0.013..2.205 rows=751 loops=1) Filter: active -> Index Only Scan using unique_cat_item_assoc on items_categories (cost=0.43..0.47 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=751) Index Cond: ((item_id = items.id) AND (category_id = 20)) Heap Fetches: 10 Planning time: 1.650 ms Execution time: 3.868 ms How can we tune this query to handle both situations? (i.e. good performances no matter how many rows of items_categories match). I have a POC working where I first count the number of matching rows in items_categories (separate query) then if the number is low I use a CTE to work on a subset of items instead of all its rows, but it’s really a dirty temporary hack IMO… If the number is high, the CTE takes too long and it's more efficient to NOT use this CTE in this case (i.e. the above query performs better). Thank you!
Asked by xav (113 rep)
Jul 26, 2021, 11:31 AM
Last activity: May 29, 2025, 11:01 AM