Sample Header Ad - 728x90

PostgreSQL query very slow when subquery added

15 votes
3 answers
24506 views
I have a relatively simple query on a table with 1.5M rows: SELECT mtid FROM publication WHERE mtid IN (9762715) OR last_modifier=21321 LIMIT 5000; EXPLAIN ANALYZE output: > Limit (cost=8.84..12.86 rows=1 width=8) (actual time=0.985..0.986 rows=1 loops=1) > -> Bitmap Heap Scan on publication (cost=8.84..12.86 rows=1 width=8) (actual time=0.984..0.985 rows=1 loops=1) > Recheck Cond: ((mtid = 9762715) OR (last_modifier = 21321)) > -> BitmapOr (cost=8.84..8.84 rows=1 width=0) (actual time=0.971..0.971 rows=0 loops=1) > -> Bitmap Index Scan on publication_pkey (cost=0.00..4.42 rows=1 width=0) (actual time=0.295..0.295 rows=1 loops=1) > Index Cond: (mtid = 9762715) > -> Bitmap Index Scan on publication_last_modifier_btree (cost=0.00..4.42 rows=1 width=0) (actual time=0.674..0.674 rows=0 loops=1) > Index Cond: (last_modifier = 21321) > Total runtime: 1.027 ms So far so good, fast and uses the available indexes. Now, if I modify a query just a bit, the result will be: SELECT mtid FROM publication WHERE mtid IN (SELECT 9762715) OR last_modifier=21321 LIMIT 5000; The EXPLAIN ANALYZE output is: > Limit (cost=0.01..2347.74 rows=5000 width=8) (actual time=2735.891..2841.398 rows=1 loops=1) > -> Seq Scan on publication (cost=0.01..349652.84 rows=744661 width=8) (actual time=2735.888..2841.393 rows=1 loops=1) > Filter: ((hashed SubPlan 1) OR (last_modifier = 21321)) > SubPlan 1 > -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) > Total runtime: 2841.442 ms Not so fast, and using seq scan... Of course, the original query run by the application is a bit more complex, and even slower, and of course the hibernate-generated original is not (SELECT 9762715), but the slowness is there even for that (SELECT 9762715)! The query is generated by hibernate, so it is quite a challenge to change them, and some features are not available (e.g. UNION is not available, which would be fast). ### The questions 1. Why cannot the index be used in the second case? How could they be used? 2. Can I improve query performance some other way? ### Additional thoughts It seems that we could use the first case by manually doing a SELECT, and then putting the resulting list into the query. Even with 5000 numbers in the IN() list it is four times faster than the second solution. However, it just seems _WRONG_ (also, it could be 100 times faster :) ). It is completely incomprehensible why the query planner uses a completely different method for these two queries, so I would like to find a nicer solution to this problem.
Asked by P.Péter (911 rep)
Sep 7, 2015, 09:05 AM
Last activity: Dec 5, 2017, 09:30 AM