PostgreSQL query very slow when subquery added
15
votes
3
answers
24506
views
I have a relatively simple query on a table with 1.5M rows:
SELECT mtid FROM publication
WHERE mtid IN (9762715) OR last_modifier=21321
LIMIT 5000;
EXPLAIN ANALYZE
output:
> Limit (cost=8.84..12.86 rows=1 width=8) (actual time=0.985..0.986 rows=1 loops=1)
> -> Bitmap Heap Scan on publication (cost=8.84..12.86 rows=1 width=8) (actual time=0.984..0.985 rows=1 loops=1)
> Recheck Cond: ((mtid = 9762715) OR (last_modifier = 21321))
> -> BitmapOr (cost=8.84..8.84 rows=1 width=0) (actual time=0.971..0.971 rows=0 loops=1)
> -> Bitmap Index Scan on publication_pkey (cost=0.00..4.42 rows=1 width=0) (actual time=0.295..0.295 rows=1 loops=1)
> Index Cond: (mtid = 9762715)
> -> Bitmap Index Scan on publication_last_modifier_btree (cost=0.00..4.42 rows=1 width=0) (actual time=0.674..0.674 rows=0 loops=1)
> Index Cond: (last_modifier = 21321)
> Total runtime: 1.027 ms
So far so good, fast and uses the available indexes.
Now, if I modify a query just a bit, the result will be:
SELECT mtid FROM publication
WHERE mtid IN (SELECT 9762715) OR last_modifier=21321
LIMIT 5000;
The EXPLAIN ANALYZE
output is:
> Limit (cost=0.01..2347.74 rows=5000 width=8) (actual time=2735.891..2841.398 rows=1 loops=1)
> -> Seq Scan on publication (cost=0.01..349652.84 rows=744661 width=8) (actual time=2735.888..2841.393 rows=1 loops=1)
> Filter: ((hashed SubPlan 1) OR (last_modifier = 21321))
> SubPlan 1
> -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
> Total runtime: 2841.442 ms
Not so fast, and using seq scan...
Of course, the original query run by the application is a bit more complex, and even slower, and of course the hibernate-generated original is not (SELECT 9762715)
, but the slowness is there even for that (SELECT 9762715)
! The query is generated by hibernate, so it is quite a challenge to change them, and some features are not available (e.g. UNION
is not available, which would be fast).
### The questions
1. Why cannot the index be used in the second case? How could they be used?
2. Can I improve query performance some other way?
### Additional thoughts
It seems that we could use the first case by manually doing a SELECT, and then putting the resulting list into the query. Even with 5000 numbers in the IN() list it is four times faster than the second solution. However, it just seems _WRONG_ (also, it could be 100 times faster :) ). It is completely incomprehensible why the query planner uses a completely different method for these two queries, so I would like to find a nicer solution to this problem.
Asked by P.Péter
(911 rep)
Sep 7, 2015, 09:05 AM
Last activity: Dec 5, 2017, 09:30 AM
Last activity: Dec 5, 2017, 09:30 AM