PostgreSQL query very slow when subquery added

15 votes
3 answers
24506 views
postgresql performance subquery hibernate query-performance
                          
I have a relatively simple query on a table with 1.5M rows:

    SELECT mtid FROM publication
    WHERE mtid IN (9762715) OR last_modifier=21321
    LIMIT 5000;

EXPLAIN ANALYZE output:

>     Limit  (cost=8.84..12.86 rows=1 width=8) (actual time=0.985..0.986 rows=1 loops=1)
>       ->  Bitmap Heap Scan on publication  (cost=8.84..12.86 rows=1 width=8) (actual time=0.984..0.985 rows=1 loops=1)
>             Recheck Cond: ((mtid = 9762715) OR (last_modifier = 21321))
>             ->  BitmapOr  (cost=8.84..8.84 rows=1 width=0) (actual time=0.971..0.971 rows=0 loops=1)
>                   ->  Bitmap Index Scan on publication_pkey  (cost=0.00..4.42 rows=1 width=0) (actual time=0.295..0.295 rows=1 loops=1)
>                         Index Cond: (mtid = 9762715)
>                   ->  Bitmap Index Scan on publication_last_modifier_btree  (cost=0.00..4.42 rows=1 width=0) (actual time=0.674..0.674 rows=0 loops=1)
>                         Index Cond: (last_modifier = 21321)
>     Total runtime: 1.027 ms

So far so good, fast and uses the available indexes.  
Now, if I modify a query just a bit, the result will be:

    SELECT mtid FROM publication
    WHERE mtid IN (SELECT 9762715) OR last_modifier=21321
    LIMIT 5000;

The EXPLAIN ANALYZE output is:

>     Limit  (cost=0.01..2347.74 rows=5000 width=8) (actual time=2735.891..2841.398 rows=1 loops=1)
>       ->  Seq Scan on publication  (cost=0.01..349652.84 rows=744661 width=8) (actual time=2735.888..2841.393 rows=1 loops=1)
>             Filter: ((hashed SubPlan 1) OR (last_modifier = 21321))
>             SubPlan 1
>               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
>     Total runtime: 2841.442 ms

Not so fast, and using seq scan...

Of course, the original query run by the application is a bit more complex, and even slower, and of course the hibernate-generated original is not (SELECT 9762715), but the slowness is there even for that (SELECT 9762715)! The query is generated by hibernate, so it is quite a challenge to change them, and some features are not available (e.g. UNION is not available, which would be fast).

### The questions

 1. Why cannot the index be used in the second case? How could they be used?
 2. Can I improve query performance some other way?

### Additional thoughts

It seems that we could use the first case by manually doing a SELECT, and then putting the resulting list into the query. Even with 5000 numbers in the IN() list it is four times faster than the second solution. However, it just seems _WRONG_ (also, it could be 100 times faster :) ). It is completely incomprehensible why the query planner uses a completely different method for these two queries, so I would like to find a nicer solution to this problem.
Asked by P.Péter (911 rep)
Sep 7, 2015, 09:05 AM
Last activity: Dec 5, 2017, 09:30 AM
PostgreSQL query very slow when subquery added

Related Questions