Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
4
votes
1
answers
154
views
Postgresql: Why is a join or subquery so much slower than a literal in a where clause?
I'm using PostgreSQL 14.17. My database schema has two tables: ``` Table "public.log_records" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------------+--------------------------+-----------+----------+------------------------------+----------...
I'm using PostgreSQL 14.17.
My database schema has two tables:
Table "public.log_records"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------------+--------------------------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
id | bigint | | not null | generated always as identity | plain | | |
log_id | bigint | | | | plain | | |
payload | jsonb | | not null | | extended | | |
created_at | timestamp with time zone | | | CURRENT_TIMESTAMP | plain | | |
Indexes:
"log_records_pkey" PRIMARY KEY, btree (id)
"idx_log_records_log_created" btree (log_id, created_at)
"idx_log_records_log_id" btree (log_id)
Foreign-key constraints:
"log_records_log_id_fkey" FOREIGN KEY (log_id) REFERENCES logs(id)
Access method: heap
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
------------------+--------------------------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
id | bigint | | not null | generated always as identity | plain | | |
name | character varying(255) | | not null | | extended | | |
retention_period | integer | | not null | | plain | | |
created_at | timestamp with time zone | | | CURRENT_TIMESTAMP | plain | | |
Indexes:
"logs_pkey" PRIMARY KEY, btree (id)
"idx_logs_id_name" btree (id, name)
"logs_name_key" UNIQUE CONSTRAINT, btree (name)
Referenced by:
TABLE "log_records" CONSTRAINT "log_records_log_id_fkey" FOREIGN KEY (log_id) REFERENCES logs(id)
Access method: heap
There are a small number of records in logs and not all of the log_ids are used in the log_records table:
=# select count(*) from logs;
count
-------
13
(1 row)
=# select count(*) from log_records;
count
----------
14552534
(1 row)
=# select count(distinct log_id) from log_records;
count
-------
8
(1 row)
I want to query the log_records table for log records belonging to a named log. However, if I pass the name of a log that has not got any records in the log_records table the query is very slow:
# explain analyze
SELECT e.id, e.payload, e.created_at, e.headers
FROM log_records e JOIN logs l ON l.id = e.log_id
WHERE l.name = 'Log1'
ORDER BY e.id
LIMIT 100;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..244.61 rows=100 width=395) (actual time=7371.696..7371.698 rows=0 loops=1)
-> Nested Loop (cost=0.43..2729140.04 rows=1117680 width=395) (actual time=7371.696..7371.697 rows=0 loops=1)
Join Filter: (e.log_id = l.id)
Rows Removed by Join Filter: 14552570
-> Index Scan using log_records_pkey on log_records e (cost=0.43..2511191.29 rows=14529839 width=403) (actual time=0.012..4955.006 rows=14552570 loops=1)
-> Materialize (cost=0.00..1.17 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=14552570)
-> Seq Scan on logs l (cost=0.00..1.16 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=1)
Filter: ((name)::text = 'Log1'::text)
Rows Removed by Filter: 12
Planning Time: 0.233 ms
Execution Time: 7371.717 ms
(11 rows)
I get a different plan and better but still poor performance for a scalar subquery:
# explain analyze
SELECT e.id, e.payload, e.created_at, e.headers
FROM log_records e
WHERE e.log_id =
(SELECT id FROM logs WHERE name = 'Log1')
ORDER BY e.id
LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.60..141.86 rows=100 width=395) (actual time=3686.420..3686.421 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on logs (cost=0.00..1.16 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
Filter: ((name)::text = 'Log1'::text)
Rows Removed by Filter: 12
-> Index Scan using log_records_pkey on log_records e (cost=0.43..2547511.66 rows=1816230 width=395) (actual time=3686.419..3686.419 rows=0 loops=1)
Filter: (log_id = $0)
Rows Removed by Filter: 14552594
Planning Time: 0.318 ms
Execution Time: 3686.487 ms
(10 rows)
However if I hardcode the log_id as a literal I get good performance:
# explain analyze
SELECT e.id, e.payload, e.created_at, e.headers
FROM log_records e
WHERE e.log_id = 13
ORDER BY e.id
LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.08..4.09 rows=1 width=395) (actual time=0.015..0.015 rows=0 loops=1)
-> Sort (cost=4.08..4.09 rows=1 width=395) (actual time=0.014..0.015 rows=0 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Index Scan using idx_log_records_log_id on log_records e (cost=0.43..4.07 rows=1 width=395) (actual time=0.011..0.011 rows=0 loops=1)
Index Cond: (log_id = 13)
Planning Time: 0.098 ms
Execution Time: 0.028 ms
(8 rows)
Why can't I get this type of performance from the joins or subquery? Why is the subquery faster than the join when most advice would say don't use subqueries?
willn-cpx
(43 rep)
Aug 1, 2025, 10:24 AM
• Last activity: Aug 6, 2025, 12:50 AM
1
votes
0
answers
31
views
Necessary to include filtered index filter column if value always NULL in PostgreSQL?
My question is about PostgreSQL. I found similar questions for MS SQL server but I don't know if the answers apply here. My table looks like this: scores ====== | ID | UserID | ValidFrom | ValidUntil | ScorePoints | +----+--------+------------+------------+-------------| | 1 | 1 | 2025-08-01 | 2025-...
My question is about PostgreSQL. I found similar questions for MS SQL server but I don't know if the answers apply here.
My table looks like this:
scores
======
| ID | UserID | ValidFrom | ValidUntil | ScorePoints |
+----+--------+------------+------------+-------------|
| 1 | 1 | 2025-08-01 | 2025-08-02 | 80 |
| 2 | 1 | 2025-08-02 | NULL | 85 |
There will be a lot of queries to find the currently valid score for a given UserID (= the row for UserID = ? AND ValidUntil IS NULL).
I have a unique index like this:
CREATE UNIQUE INDEX ix_uq_scores ON scores ( userid ) WHERE validuntil IS NULL;
Or should it be:
CREATE UNIQUE INDEX ix_uq_scores ON scores ( userid, validuntil ) WHERE validuntil IS NULL;
A query might look like
SELECT u.id, u.username, s.scorepoints
FROM users u
INNER JOIN scores s ON s.userid = u.id AND s.validuntil IS NULL
WHERE u.id = 123;
My filtered index will only include rows where validuntil **IS** NULL. So do I have to add this constant NULL value to the index tuple?
MrSnrub
(181 rep)
Aug 4, 2025, 02:32 AM
3
votes
1
answers
461
views
How to deal with seldom used big database and postgresql?
I loaded in PostgreSQL (9.3) OpenStreetMap data for whole europe (400gb). Then I installed a geocoding api (nominatim) that queries the database with gis queries. My problem is : > This database is queried a few times a day, but because postgres loads > the data in its buffer on demand, my first que...
I loaded in PostgreSQL (9.3) OpenStreetMap data for whole europe (400gb). Then I installed a geocoding api (nominatim) that queries the database with gis queries.
My problem is :
> This database is queried a few times a day, but because postgres loads
> the data in its buffer on demand, my first query on a particular gps
> point is always slow, and I do only one query per GPS point, so it's
> always slow (like 60sec slow against 100ms when the data is buffered)
.
What could I do here ?
- I'm using a A6 Azure instance (4 Cores, 28 GB memory, HDD).
- Server is Ubuntu 14.04.4 LTS
- Swapping is enabled
- There is index on the geometry columns queried.
- The shared buffer is 10GB
- work mem is 256MB
- maintenance work mem is 4GB
remi bourgarel
(259 rep)
Oct 4, 2016, 12:31 PM
• Last activity: Aug 2, 2025, 12:08 PM
1
votes
2
answers
3309
views
REINDEX takes significantly more time on production system than on local snapshot
We have a table with time-series-like data (~200million rows, 30GB data size + index) in a PG 11 database, and for the first time since we started writing into it, we have to delete from it (~30%). To keep things performant after the delete we want to - rebuild the indexes - run a full vacuum on it...
We have a table with time-series-like data (~200million rows, 30GB data size + index) in a PG 11 database, and for the first time since we started writing into it, we have to delete from it (~30%).
To keep things performant after the delete we want to
- rebuild the indexes
- run a full vacuum on it
To get a better understanding of how much time this will take I have created a dump of the given table, loaded it into a local database, deleted the data, rebuilt the index, and ran a full vacuum:
REINDEX TABLE my_schema.my_table;
VACUUM (FULL, VERBOSE, ANALYZE) my_schema.my_table;
To my surprise, both finished quite fast (2 x 30min).
I know to get the exact time required to run these commands on the server I need to backup the filesystem to get the exact same local snapshot as the server and run it on the same hardware. However from my limited understanding because we have never deleted or updated rows in the table the restored local snapshot should be physically very similar to the table on the server. So the operations on the server should take a similar amount of time to finish.
This is not what I observe currently. I have deleted the data and re-indexing took 6x more times than on the local system. Similarly, the vacuum took around 6x times more as well. This is a significant difference that I would have never expected.
My local hardware is similar to the server (16GB vs 20Gb), but with more CPU cores (2vCPU vs i7 8vCPU). I have SSD on my local machine and I don't know what storage the server uses but I would be surprised if it was non-SSD (We are using Azure Single Server for PostgreSQL).
The CPU utilization was below 10% on the server all the time so I think the operation was not CPU bound. (I am just guessing here.)
The table contains 4 indexes: 3 of those a single-column BTREE index with mostly even distribution (date-time, synced source file ID, consumer ID) and a compound index for these 3 together.
Why the difference is so big? Was my local testing flawed?
NoNameProvided
(111 rep)
Jan 30, 2022, 04:44 PM
• Last activity: Jul 30, 2025, 03:02 PM
1
votes
1
answers
850
views
Remap array columns to single column with value
Here is the sample database schema. | id | Critical | High | Low | -------------------------------------------- | 1 | {apple, ball} | {cat} | {dog, egg} | Now I want to remap these to another table. Critical set to 1, High to 2 and Low to 3 like: | element | priority | ---------------------- | apple...
Here is the sample database schema.
| id | Critical | High | Low |
--------------------------------------------
| 1 | {apple, ball} | {cat} | {dog, egg} |
Now I want to remap these to another table. Critical set to 1, High to 2 and Low to 3 like:
| element | priority |
----------------------
| apple | 1 |
| ball | 1 |
| cat | 2 |
| dog | 3 |
| egg | 3 |
1,2,3 can be changed to column name too.
I've written a query to do it for 1 column at a time:
insert into new_table (element, priority)
select unnest(critical), 1 from old_table where id=1;
But I want to do all the 3 columns in this single query.
I want it fast performing. Is
UNION
the most efficient way?
PaxPrz
(219 rep)
Mar 15, 2021, 04:57 AM
• Last activity: Jul 30, 2025, 12:01 PM
0
votes
1
answers
3405
views
PostgreSQL update column in one table with value from another, based on key from first table
Sorry about the title, I wasn't sure how best to word this. My situation is this: I have two tables: paper_author [paper_id, author_id, author_name] authors [author_id, author_name] Right now, the `author_name` column in `paper_author` is `NULL`. I would like to select the author name from the `auth...
Sorry about the title, I wasn't sure how best to word this. My situation is this:
I have two tables:
paper_author [paper_id, author_id, author_name]
authors [author_id, author_name]
Right now, the
author_name
column in paper_author
is NULL
. I would like to select the author name from the authors
table and insert it into the author_name
column in the paper_author
table. I've written the following query:
UPDATE paper_author pa
SET author_name = (SELECT author_name FROM authors a WHERE a.author_id = pa.author_id);
Which I think will do what I want, but this is taking a very long time to run (days). For reference, the table paper_author
has ~900M rows and the table authors
has ~200M rows. The author_id
is the primary key in the authors
table. author_id
has an index in the table paper_author
. The rows in paper_author
are not unique on author_id
(i.e. each author_id
may appear multiple times associated with different papers).
Is there a more efficient way to write this? Have I missed something?
Gabriel Getzie
(1 rep)
Oct 6, 2020, 02:07 AM
• Last activity: Jul 30, 2025, 12:04 AM
0
votes
1
answers
65
views
RDS postgres slow read IO
We are running postgres 14.12 in rds and expirence very slow IO reads.. around 30MB/s on index scans. we can't figure out what might be the cause of it. any ideas to what we should / could check? **configuration** instance class: `db.m6idn.8xlarge` (which should support 3125MB/s throughput) RAM: `12...
We are running postgres 14.12 in rds and expirence very slow IO reads.. around 30MB/s on index scans. we can't figure out what might be the cause of it. any ideas to what we should / could check?
**configuration**
instance class:
db.m6idn.8xlarge
(which should support 3125MB/s throughput)
RAM: 128GB
vCPU: 32
storage type: gp3
with 25000 IOPS (we only reach 18K) and 4000MiB/s throughput.
most of our slow queries are due to slow IO read..
edena
(1 rep)
Mar 4, 2025, 04:36 PM
• Last activity: Jul 28, 2025, 04:09 PM
0
votes
1
answers
33
views
Is there a way to get transaction time statistics on specific tables
I saw some log entries that indicated transaction time outliers of up to 10s at times, where transaction times are typically below 1s. To get a view of how often this happens, is there a way to get transaction time statistics in Postgres for transactions involving particular tables? Even just count...
I saw some log entries that indicated transaction time outliers of up to 10s at times, where transaction times are typically below 1s. To get a view of how often this happens, is there a way to get transaction time statistics in Postgres for transactions involving particular tables?
Even just count number of transactions over some threshold, which I could then compare to the total number?
Thank you.
nsandersen
(241 rep)
Jul 25, 2025, 12:44 PM
• Last activity: Jul 26, 2025, 03:43 PM
6
votes
1
answers
1035
views
PostgreSQL predicate not pushed down (through join conditions)
Consider the following data model in a PostgreSQL v13 system; [![parent-child data model][1]][1] Here, parent table `dim` contains a small set of reference data, and child table `fact` contains a much higher volume of records. A typical use case for these data sets would be to query all `fact::value...
Consider the following data model in a PostgreSQL v13 system;
Here, parent table

dim
contains a small set of reference data, and child table fact
contains a much higher volume of records. A typical use case for these data sets would be to query all fact::value
's data belonging to a dim::name
. Note that dim::name
holds a UNIQUE constraint.
While I think this is a very common scenario, I was somewhat taken aback that the style of queries I've been using for years on other RDBMS's (Oracle, MSSQL) didn't perform _at all_ on PostgreSQL the way I imagined they would. That is, when querying a dataset (fact
) using a highly selective, but implicit, predicate (fact::dim_id eq X
) through a join condition, I expect the index on fact::dim_id
to be used (in a nested-loop). Instead, a hash-join is used, requiring a full table scan of fact
.
**Question:** is there some way I can nudge the query planner into considering any predicate I issue on a joined relation to not need a full table scan? (without impacting other DB loads)
To illustrate the problem with an example, these tables are populated with some random data;
CREATE TABLE dim(
id SERIAL NOT NULL
, name TEXT NOT NULL
, CONSTRAINT pk_dim PRIMARY KEY (id)
, CONSTRAINT uq_dim UNIQUE (name)
);
CREATE TABLE fact(
id SERIAL NOT NULL
, dim_id INTEGER NOT NULL
, value TEXT
, CONSTRAINT pk_fact PRIMARY KEY (id)
, CONSTRAINT fk_facts_dim FOREIGN KEY (dim_id) REFERENCES dim (id)
);
CREATE INDEX idx_fact_dim ON fact(dim_id);
INSERT INTO dim(name)
SELECT SUBSTRING(md5(random()::TEXT) FOR 5)
FROM generate_series(1,50)
UNION
SELECT 'key';
INSERT INTO fact(dim_id, value)
SELECT (SELECT id FROM dim ORDER BY random() LIMIT 1)
, md5(random()::TEXT)
FROM generate_series(1,1000000);
ANALYZE dim;
ANALYZE fact;
EXPLAIN ANALYZE
SELECT f.*
FROM fact AS f
JOIN dim AS d
ON (d.id = f.dim_id)
WHERE d.name = 'key'; -- Note: UNIQUE
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1001.65..18493.29 rows=20588 width=41) (actual time=319.331..322.582 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Join (cost=1.65..15434.49 rows=8578 width=41) (actual time=306.193..306.195 rows=0 loops=3)
Hash Cond: (f.dim_id = d.id)
-> Parallel Seq Scan on fact f (cost=0.00..14188.98 rows=437498 width=41) (actual time=0.144..131.050 rows=350000 loops=3)
-> Hash (cost=1.64..1.64 rows=1 width=4) (actual time=0.138..0.139 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on dim d (cost=0.00..1.64 rows=1 width=4) (actual time=0.099..0.109 rows=1 loops=3)
Filter: (name = 'key'::text)
Rows Removed by Filter: 50
Planning Time: 1.059 ms
Execution Time: 322.662 ms
Now, we execute the same question, but instead of filtering using an inner join, we filter using a scalar subquery;
EXPLAIN ANALYZE
SELECT *
FROM fact
WHERE dim_id = (SELECT id FROM dim WHERE name = 'key');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_fact_dim on fact (cost=2.07..15759.53 rows=524998 width=41) (actual time=0.096..0.097 rows=0 loops=1)
Index Cond: (dim_id = $0)
InitPlan 1 (returns $0)
-> Seq Scan on dim (cost=0.00..1.64 rows=1 width=4) (actual time=0.046..0.054 rows=1 loops=1)
Filter: (name = 'key'::text)
Rows Removed by Filter: 50
Planning Time: 0.313 ms
Execution Time: 0.156 ms
As shown, the performance difference is huge. Somehow, the query planner did not consider the predicate on the unique dim::name
attribute to be equal to a predicate on fact::dim_id
in the first query.
Michiel T
(161 rep)
Jan 15, 2021, 10:55 PM
• Last activity: Jul 25, 2025, 03:10 AM
0
votes
1
answers
145
views
Postgresql table function: get where condition
I have defined a function that returns a table. In this function, I use multiple temporary table to pre-elaborate data. Here, there is a very simple example: create or replace function public.fn_world_cities() returns table ( city_id int, city_name varchar, country_id int ) as $$ begin ---- Extract...
I have defined a function that returns a table. In this function, I use multiple temporary table to pre-elaborate data.
Here, there is a very simple example:
create or replace function public.fn_world_cities()
returns table (
city_id int,
city_name varchar,
country_id int
)
as $$
begin
---- Extract temp table with all cities
create temporary table tmp_all_cities on commit drop
as
begin $$
select city_id, city_name, country_id
from public.table_world_cities
where row_status = 'A'; ---- filter active record
---- Return data
return query
select city_id, city_name, country_id
from tmp_all_cities;
end; $$ language plpgsql;
The temporary table creation is time expensive.
When I use the function, always use some
WHERE
condition. Like:
select * from public.fn_world_cities()
where country_id = 10;
In my function, there are many fields on which I can set a WHERE
, so it's impossible to add function parameters for everyone.
Is there a way, at run time, to know there WHERE
condition values? In that way I can pre-filter data during temporary table creation.
I wish I can have a function like this:
create or replace function public.fn_world_cities()
returns table (
city_id int,
city_name varchar,
country_id int
)
as $$
begin
---- Extract temp table with all cities
create temporary table tmp_all_cities on commit drop
as
begin $$
select city_id, city_name, country_id
from public.table_world_cities
where row_status = 'A' ---- filter active record
and ***WHERE CONDITION APPLIED TO FUNCTION CALL***
;
---- Return data
return query
select city_id, city_name, country_id
from tmp_all_cities;
end; $$ language plpgsql;
Radioleao
(153 rep)
Apr 9, 2019, 03:26 PM
• Last activity: Jul 24, 2025, 11:06 PM
0
votes
2
answers
614
views
Postgresql Aggregate Query With Sort Very Slow
I have a simple aggregate query that is running very slowly (198s) on a table with ~4.6mil rows, can anyone suggest how to optimise? SELECT u.user_id,u.firstname,u.surname,u.image_url,p.pool_name, count(s.session_id) as totalSwims,sum(s.total_time_in_seconds) as totalTotalTime from sessions as s lef...
I have a simple aggregate query that is running very slowly (198s) on a table with ~4.6mil rows, can anyone suggest how to optimise?
SELECT u.user_id,u.firstname,u.surname,u.image_url,p.pool_name, count(s.session_id) as totalSwims,sum(s.total_time_in_seconds) as totalTotalTime
from sessions as s
left join user_sessions as us on us.session_id = s.session_id
left join users as u on u.user_id = us.user_id
left join user_prefs as up on up.user_id = u.user_id
left join pools as p on p.pool_id = u.local_pool
where p.client_id = 1
and s.requires_checking is false
and s.swim_rating > 10
and us.user_id is not null
and s.date_swam > '2023-03-01 01:59:54.962+00'
and s.date_swam Seq Scan on public.pools p (cost=0.00..235.00 rows=62 width=33) (actual time=0.018..0.794 rows=61 loops=3)
Output: p.pool_name, p.client_id, p.pool_id
Filter: (p.client_id = 1)
Rows Removed by Filter: 5674
Buffers: shared hit=486
Worker 0: actual time=0.018..0.651 rows=61 loops=1
Buffers: shared hit=162
Worker 1: actual time=0.019..0.790 rows=61 loops=1
Buffers: shared hit=162
-> Index Scan using user_prefs_pkey on public.user_prefs up (cost=0.42..3.77 rows=1 width=4) (actual time=0.079..0.079 rows=0 loops=61372)
Output: up.user_id
Index Cond: (up.user_id = u.user_id)
Filter: (up.enable_public_leaderboards IS TRUE)
Rows Removed by Filter: 1
Buffers: shared hit=239339 read=6309 dirtied=59
I/O Timings: read=4577.017
Worker 0: actual time=0.071..0.071 rows=0 loops=20288
Buffers: shared hit=79332 read=1879 dirtied=18
I/O Timings: read=1354.770
Worker 1: actual time=0.081..0.081 rows=0 loops=20803
Buffers: shared hit=81068 read=2195 dirtied=18
I/O Timings: read=1605.396
-> Index Only Scan using user_sessions_pkey on public.user_sessions us (cost=0.43..140.52 rows=154 width=8) (actual time=0.619..1.525 rows=73 loops=11592)
Output: us.user_id, us.session_id
Index Cond: ((us.user_id = u.user_id) AND (us.user_id IS NOT NULL))
Heap Fetches: 488381
Buffers: shared hit=498091 read=18665
I/O Timings: read=16850.427
Worker 0: actual time=0.634..1.555 rows=74 loops=3830
Buffers: shared hit=167709 read=6288
I/O Timings: read=5689.264
Worker 1: actual time=0.624..1.549 rows=73 loops=3790
Buffers: shared hit=164737 read=6208
I/O Timings: read=5598.264
-> Index Scan using sessions_pkey on public.sessions s (cost=0.43..4.00 rows=1 width=10) (actual time=0.676..0.676 rows=0 loops=845589)
Output: s.session_id, s.total_time_in_seconds
Index Cond: (s.session_id = us.session_id)
Filter: ((s.requires_checking IS FALSE) AND (s.swim_rating > 10) AND (s.date_swam > '2023-03-01 01:59:54.962+00'::timestamp with time zone) AND (s.date_swam 10. I then dropped the session_id primary key index to force the planner to use the new index, it gives the following plan (updated a second time!)....
Limit (cost=107902.84..107902.89 rows=20 width=140) (actual
time=13760.459..13802.105 rows=20 loops=1)
Output: u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, (count(s.session_id)), (sum(s.total_time_in_seconds)), p.client_id, p.pool_id
Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
I/O Timings: read=38115.334
-> Sort (cost=107902.84..107902.94 rows=43 width=140) (actual time=13760.458..13802.101 rows=20 loops=1)
Output: u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, (count(s.session_id)), (sum(s.total_time_in_seconds)), p.client_id, p.pool_id
Sort Key: (sum(s.total_time_in_seconds)) DESC
Sort Method: top-N heapsort Memory: 33kB
Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
I/O Timings: read=38115.334
-> Finalize GroupAggregate (cost=107896.10..107901.69 rows=43 width=140) (actual time=13748.876..13801.273 rows=1476 loops=1)
Output: u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, count(s.session_id), sum(s.total_time_in_seconds), p.client_id, p.pool_id
Group Key: u.user_id, p.pool_id
Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
I/O Timings: read=38115.334
-> Gather Merge (cost=107896.10..107900.71 rows=36 width=140) (actual time=13748.858..13797.814 rows=3340 loops=1)
Output: u.user_id, p.pool_id, u.firstname, u.surname, u.image_url, p.pool_name, (PARTIAL count(s.session_id)), (PARTIAL sum(s.total_time_in_seconds)), p.client_id
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
I/O Timings: read=38115.334
-> Partial GroupAggregate (cost=106896.08..106896.53 rows=18 width=140) (actual time=13743.988..13746.974 rows=1113 loops=3)
Output: u.user_id, p.pool_id, u.firstname, u.surname, u.image_url, p.pool_name, PARTIAL count(s.session_id), PARTIAL sum(s.total_time_in_seconds), p.client_id
Group Key: u.user_id, p.pool_id
Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
I/O Timings: read=38115.334
Worker 0: actual time=13741.803..13744.770 rows=1123 loops=1
Buffers: shared hit=262454 read=9464 dirtied=14, temp read=7403 written=7316
I/O Timings: read=12590.170
Worker 1: actual time=13742.976..13745.801 rows=1035 loops=1
Buffers: shared hit=258266 read=9090 dirtied=26, temp read=7763 written=8300
I/O Timings: read=12656.606
-> Sort (cost=106896.08..106896.12 rows=18 width=110) (actual time=13743.971..13744.390 rows=3256 loops=3)
Output: u.user_id, p.pool_id, u.firstname, u.surname, u.image_url, p.pool_name, s.session_id, s.total_time_in_seconds, p.client_id
Sort Key: u.user_id, p.pool_id
Sort Method: quicksort Memory: 811kB
Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
I/O Timings: read=38115.334
Worker 0: actual time=13741.780..13742.171 rows=3221 loops=1
Sort Method: quicksort Memory: 701kB
Buffers: shared hit=262454 read=9464 dirtied=14, temp read=7403 written=7316
I/O Timings: read=12590.170
Worker 1: actual time=13742.958..13743.323 rows=2736 loops=1
Sort Method: quicksort Memory: 608kB
Buffers: shared hit=258266 read=9090 dirtied=26, temp read=7763 written=8300
I/O Timings: read=12656.606
-> Parallel Hash Join (cost=34085.80..106895.70 rows=18 width=110) (actual time=13647.907..13741.713 rows=3256 loops=3)
Output: u.user_id, p.pool_id, u.firstname, u.surname, u.image_url, p.pool_name, s.session_id, s.total_time_in_seconds, p.client_id
Hash Cond: (s.session_id = us.session_id)
Buffers: shared hit=771793 read=27894 dirtied=52, temp read=23232 written=23668
I/O Timings: read=38115.334
Worker 0: actual time=13655.793..13739.514 rows=3221 loops=1
Buffers: shared hit=262447 read=9464 dirtied=14, temp read=7403 written=7316
I/O Timings: read=12590.170
Worker 1: actual time=13644.134..13741.201 rows=2736 loops=1
Buffers: shared hit=258259 read=9090 dirtied=26, temp read=7763 written=8300
I/O Timings: read=12656.606
-> Parallel Bitmap Heap Scan on public.sessions s (cost=447.58..73224.60 rows=8746 width=10) (actual time=7.704..24.683 rows=15126 loops=3)
Output: s.session_id, s.total_time_in_seconds
Recheck Cond: ((s.date_swam > '2023-03-01 01:59:54.962+00'::timestamp with time zone) AND (s.date_swam 10))
Heap Blocks: exact=7524
Buffers: shared hit=19743
Worker 0: actual time=6.706..17.200 rows=7148 loops=1
Buffers: shared hit=3877
Worker 1: actual time=8.199..27.980 rows=20303 loops=1
Buffers: shared hit=8214
-> Bitmap Index Scan on sessions_date_swam_idx (cost=0.00..442.33 rows=20990 width=0) (actual time=4.946..4.946 rows=47051 loops=1)
Index Cond: ((s.date_swam > '2023-03-01 01:59:54.962+00'::timestamp with time zone) AND (s.date_swam Parallel Hash (cost=33585.40..33585.40 rows=4226 width=104) (actual time=13599.941..13599.945 rows=283349 loops=3)
Output: us.session_id, u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, p.client_id, p.pool_id
Buckets: 32768 (originally 16384) Batches: 32 (originally 1) Memory Usage: 4032kB
Buffers: shared hit=752034 read=27894 dirtied=52, temp read=10015 written=23284
I/O Timings: read=38115.334
Worker 0: actual time=13599.432..13599.437 rows=288936 loops=1
Buffers: shared hit=258562 read=9464 dirtied=14, temp read=2720 written=7188
I/O Timings: read=12590.170
Worker 1: actual time=13597.604..13597.609 rows=285810 loops=1
Buffers: shared hit=250037 read=9090 dirtied=26, temp read=3648 written=8172
I/O Timings: read=12656.606
-> Nested Loop (cost=239.54..33585.40 rows=4226 width=104) (actual time=6.219..13166.722 rows=283349 loops=3)
Output: us.session_id, u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, p.client_id, p.pool_id
Buffers: shared hit=752034 read=27894 dirtied=52
I/O Timings: read=38115.334
Worker 0: actual time=3.284..13087.841 rows=288936 loops=1
Buffers: shared hit=258562 read=9464 dirtied=14
I/O Timings: read=12590.170
Worker 1: actual time=7.931..13106.351 rows=285810 loops=1
Buffers: shared hit=250037 read=9090 dirtied=26
I/O Timings: read=12656.606
-> Nested Loop (cost=239.11..15989.40 rows=124 width=104) (actual time=5.013..1891.004 rows=3880 loops=3)
Output: u.user_id, u.firstname, u.surname, u.image_url, up.user_id, p.pool_name, p.client_id, p.pool_id
Inner Unique: true
Buffers: shared hit=254626 read=4579 dirtied=52
I/O Timings: read=5183.492
Worker 0: actual time=1.601..1793.850 rows=4041 loops=1
Buffers: shared hit=90103 read=1608 dirtied=14
I/O Timings: read=1612.670
Worker 1: actual time=6.604..2114.870 rows=3960 loops=1
Buffers: shared hit=83066 read=1497 dirtied=26
I/O Timings: read=1968.382
-> Hash Join (cost=238.69..13655.78 rows=624 width=100) (actual time=0.954..42.871 rows=20490 loops=3)
Output: u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, p.client_id, p.pool_id
Inner Unique: true
Hash Cond: (u.local_pool = p.pool_id)
Buffers: shared hit=13161
Worker 0: actual time=0.980..45.020 rows=21799 loops=1
Buffers: shared hit=4452
Worker 1: actual time=0.778..40.547 rows=19986 loops=1
Buffers: shared hit=4573
-> Parallel Seq Scan on public.users u (cost=0.00..13263.23 rows=58571 width=71) (actual time=0.007..27.779 rows=46880 loops=3)
Output: u.user_id, u.firstname, u.surname, u.image_url, u.local_pool
Filter: (u.disabled IS FALSE)
Rows Removed by Filter: 664
Buffers: shared hit=12669
Worker 0: actual time=0.006..29.047 rows=49923 loops=1
Buffers: shared hit=4288
Worker 1: actual time=0.005..25.210 rows=45846 loops=1
Buffers: shared hit=4409
-> Hash (cost=237.90..237.90 rows=63 width=33) (actual time=0.931..0.932 rows=61 loops=3)
Output: p.pool_name, p.client_id, p.pool_id
Buckets: 1024 Batches: 1 Memory Usage: 12kB
Buffers: shared hit=492
Worker 0: actual time=0.961..0.962 rows=61 loops=1
Buffers: shared hit=164
Worker 1: actual time=0.761..0.762 rows=61 loops=1
Buffers: shared hit=164
-> Seq Scan on public.pools p (cost=0.00..237.90 rows=63 width=33) (actual time=0.018..0.905 rows=61 loops=3)
Output: p.pool_name, p.client_id, p.pool_id
Filter: (p.client_id = 1)
Rows Removed by Filter: 5682
Buffers: shared hit=492
Worker 0: actual time=0.022..0.931 rows=61 loops=1
Buffers: shared hit=164
Worker 1: actual time=0.018..0.738 rows=61 loops=1
Buffers: shared hit=164
-> Index Scan using user_prefs_pkey on public.user_prefs up (cost=0.42..3.74 rows=1 width=4) (actual time=0.090..0.090 rows=0 loops=61471)
Output: up.user_id
Index Cond: (up.user_id = u.user_id)
Filter: (up.enable_public_leaderboards IS TRUE)
Rows Removed by Filter: 1
Buffers: shared hit=241465 read=4579 dirtied=52
I/O Timings: read=5183.492
Worker 0: actual time=0.080..0.080 rows=0 loops=21799
Buffers: shared hit=85651 read=1608 dirtied=14
I/O Timings: read=1612.670
Worker 1: actual time=0.103..0.103 rows=0 loops=19986
Buffers: shared hit=78493 read=1497 dirtied=26
I/O Timings: read=1968.382
-> Index Only Scan using user_sessions_pkey on public.user_sessions us (cost=0.43..140.35 rows=155 width=8) (actual time=0.788..2.892 rows=73 loops=11640)
Output: us.user_id, us.session_id
Index Cond: ((us.user_id = u.user_id) AND (us.user_id IS NOT NULL))
Heap Fetches: 492751
Buffers: shared hit=497408 read=23315
I/O Timings: read=32931.842
Worker 0: actual time=0.774..2.781 rows=72 loops=4041
Buffers: shared hit=168459 read=7856
I/O Timings: read=10977.501
Worker 1: actual time=0.727..2.761 rows=72 loops=3960
Buffers: shared hit=166971 read=7593
I/O Timings: read=10688.224
Planning:
Buffers: shared hit=101
Planning Time: 1.549 ms
Execution Time: 13802.195 ms
DaveB
(319 rep)
Mar 30, 2023, 03:29 PM
• Last activity: Jul 24, 2025, 09:03 AM
0
votes
1
answers
142
views
Postgres stuck with ALTER query onp roduction
So I have this big database in production having partitions on date field. I ran this migration to add a new column and its now 1-hour db is stuck not responding and still migrating. ALTER TABLE public.ur_sms_traffic ADD COLUMN trace_id varchar(16); The table has no relation. just stand-alone table,
So I have this big database in production having partitions on date field.
I ran this migration to add a new column and its now 1-hour db is stuck not responding and still migrating.
ALTER TABLE public.ur_sms_traffic ADD COLUMN trace_id varchar(16);
The table has no relation. just stand-alone table,
Shaz Hemani
(101 rep)
Dec 15, 2021, 08:41 AM
• Last activity: Jul 19, 2025, 09:05 PM
1
votes
1
answers
48
views
Postgres query planner join selectivity greater than 1?
I am using PostgreSQL 14.17. I am trying to debug a query planner failure in a bigger query, but I think I've narrowed down the problem to a self-join on a join table: ```sql SELECT t2.item_id FROM item_sessions t1 JOIN item_sessions t2 ON t1.session_key = t2.session_key WHERE t1.item_id = 'xxxxxxxx...
I am using PostgreSQL 14.17.
I am trying to debug a query planner failure in a bigger query, but I think I've narrowed down the
problem to a self-join on a join table:
SELECT t2.item_id
FROM item_sessions t1
JOIN item_sessions t2
ON t1.session_key = t2.session_key
WHERE t1.item_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
After running ANALYZE on the table, EXPLAIN gives this plan (which matches the subplan in the larger query):
Nested Loop (cost=1.12..119.60 rows=7398 width=16)
-> Index Only Scan using item_sessions_item_id_session_key_uniq on item_sessions t1 (cost=0.56..8.58 rows=1 width=33)
Index Cond: (item_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'::uuid)
-> Index Only Scan using item_sessions_session_idx on item_sessions t2 (cost=0.56..110.11 rows=91 width=49)
Index Cond: (session_key = (t1.session_key)::text)
**Why is the loop estimating 7398 rows when the two child nodes estimate 1 and 91 respectively?** I would have expected the loop total to be less than 1 * 91
FWIW, the child estimates seem correct. item_id has n_distinct at -0.77649677, so the expected row count is 1.3, and session_key has n_distinct at 149555 out of an estimated 1.36e+07 tuples, which gives 90.9 expected tuples per session_key.
The indexes referenced in the plan are:
- item_sessions_session_idx btree (session_key, item_id)
- item_sessions_item_id_session_key_uniq UNIQUE CONSTRAINT, btree (item_id, session_key)
ETA: I created a minimal reproduction [here](https://github.com/felipeochoa/pg-plan-selectivity-gt1) . The failure is visible [in the job logs](https://github.com/felipeochoa/pg-plan-selectivity-gt1/actions/runs/16359411460/job/46224463766) on 17.5, 16.9, and 15.13
Felipe
(317 rep)
Jul 17, 2025, 04:41 AM
• Last activity: Jul 18, 2025, 09:14 AM
0
votes
1
answers
160
views
Postgresql same query is slower on new hardware
I migrate my data to a new server hardware that after my benchmark looks faster in terms of CPU and disk i/o. Unfortunatly my query seems slower on that new hardware with the same data and the same query. Here is my query : EXPLAIN ANALYZE SELECT AVG(CAST(completeness->>'fr' AS INTEGER)) as average...
I migrate my data to a new server hardware that after my benchmark looks faster in terms of CPU and disk i/o.
Unfortunatly my query seems slower on that new hardware with the same data and the same query.
Here is my query :
EXPLAIN ANALYZE SELECT AVG(CAST(completeness->>'fr' AS INTEGER)) as average FROM product INNER JOIN channel ON channel.id = product.channel_id WHERE channel_id = 'myUuid';
On old server my times are :
Planning Time: 0.281 ms
Execution Time: 50.683 ms
On the new server my times are:
Planning Time: 0.162 ms
Execution Time: 115.268 ms
That is significally slower, I already tried making a VACUUM FULL and REINDEX mydatabase
And also increasing the default_statistics_target but any of this changed anything.
Of course I'm using the same postgresql version (12.14) and everything I could found in similar questions, do you have any other solutions ?
EDIT :
For some more details :
Old and faster :
Aggregate (cost=58441.85..58441.86 rows=1 width=32) (actual time=66.284..66.285 rows=1 loops=1)
Buffers: shared hit=1132 read=18379 written=16
I/O Timings: read=36.167 write=0.101
-> Nested Loop (cost=594.96..58185.68 rows=25617 width=25) (actual time=5.003..61.783 rows=25985 loops=1)
Buffers: shared hit=1132 read=18379 written=16
I/O Timings: read=36.167 write=0.101
-> Seq Scan on channel (cost=0.00..1.30 rows=1 width=16) (actual time=0.028..0.031 rows=1 loops=1)
Filter: (id = 'myuuid'::uuid)
Rows Removed by Filter: 23
Buffers: shared hit=1
-> Bitmap Heap Scan on product (cost=594.96..57928.21 rows=25617 width=41) (actual time=4.969..59.276 rows=25985 loops=1)
Recheck Cond: (channel_id = 'myuuid'::uuid)
Heap Blocks: exact=19408
Buffers: shared hit=1131 read=18379 written=16
I/O Timings: read=36.167 write=0.101
-> Bitmap Index Scan on idx_8ac439d272f5a1aa (cost=0.00..588.55 rows=25617 width=0) (actual time=3.177..3.177 rows=25985 loops=1)
Index Cond: (channel_id = 'myuuid'::uuid)
Buffers: shared read=102
I/O Timings: read=0.389
Planning Time: 1.945 ms
Execution Time: 66.585 ms
(21 rows)
New and slower :
Aggregate (cost=59666.14..59666.15 rows=1 width=32) (actual time=146.506..146.508 rows=1 loops=1)
Buffers: shared hit=108 read=19371 written=172
I/O Timings: read=85.949 write=1.528
-> Nested Loop (cost=617.08..59402.07 rows=26407 width=25) (actual time=7.752..139.180 rows=25985 loops=1)
Buffers: shared hit=108 read=19371 written=172
I/O Timings: read=85.949 write=1.528
-> Seq Scan on channel (cost=0.00..1.30 rows=1 width=16) (actual time=0.012..0.016 rows=1 loops=1)
Filter: (id = 'myuuid'::uuid)
Rows Removed by Filter: 23
Buffers: shared hit=1
-> Bitmap Heap Scan on product (cost=617.08..59136.70 rows=26407 width=41) (actual time=7.737..134.826 rows=25985 loops=1)
Recheck Cond: (channel_id = 'myuuid'::uuid)
Heap Blocks: exact=19374
Buffers: shared hit=107 read=19371 written=172
I/O Timings: read=85.949 write=1.528
-> Bitmap Index Scan on idx_8ac439d272f5a1aa (cost=0.00..610.48 rows=26407 width=0) (actual time=5.073..5.073 rows=25987 loops=1)
Index Cond: (channel_id = 'myuuid'::uuid)
Buffers: shared read=102
I/O Timings: read=0.688
Planning Time: 0.773 ms
Execution Time: 146.750 ms
(21 rows)
TomLorenzi
(1 rep)
Jul 12, 2023, 01:24 PM
• Last activity: Jul 17, 2025, 10:08 PM
0
votes
0
answers
32
views
VACUUM: ERROR: invalid memory alloc request size postgresql
I am getting >ERROR: invalid memory alloc request size 2727388320 after I tried vacuuming in EnterpriseDB. It is getting bigger and bigger. How to solve this issue now ?
I am getting
>ERROR: invalid memory alloc request size 2727388320
after I tried vacuuming in EnterpriseDB. It is getting bigger and bigger. How to solve this issue now ?
MD Nasirul Islam
(101 rep)
Jul 16, 2025, 06:49 AM
• Last activity: Jul 16, 2025, 01:20 PM
0
votes
1
answers
159
views
Selecting columns based on time ranges
I have a table which looks like this: ``` Table t1 timestamp (pkey) | A | B | C ---------------------------------- 2024-01-01 12:00:00 | 1 | 2 | 2 2024-01-01 12:00:01 | 1 | 2 | 2 2024-01-01 12:00:03 | 15 | 4 | 2 ``` Table creation SQL: ``` CREATE TABLE IF NOT EXISTS t1 ("id" serial NOT NULL, "timest...
I have a table which looks like this:
Table t1
timestamp (pkey) | A | B | C
----------------------------------
2024-01-01 12:00:00 | 1 | 2 | 2
2024-01-01 12:00:01 | 1 | 2 | 2
2024-01-01 12:00:03 | 15 | 4 | 2
Table creation SQL:
CREATE TABLE IF NOT EXISTS t1 ("id" serial NOT NULL, "timestamp" timestamp PRIMARY KEY, "A" real, "B" real, "C" real)
Some tables might have 100+ columns.
I'd like to select several columns having omitting time ranges on per column basis. The way I do it today is like the following:
SELECT
(CASE WHEN timestamp NOT BETWEEN '2017-06-12T17:01' AND '2018-06-12T21:57'
AND timestamp NOT BETWEEN '2020-06-12T17:01' AND '2021-02-15T21:57' THEN A ELSE NULL END) "A",
(CASE WHEN timestamp NOT BETWEEN '2018-07-14' AND '2019-01-01'
AND timestamp NOT BETWEEN '2020-06-12T17:01' AND '2021-02-15T21:57' THEN B ELSE NULL END) "B",
C
FROM t1
WHERE timestamp > '2014-01-01'
timestamp
is an index obviously.
Is there a better / more effective way to do it?
Miro
(113 rep)
Mar 25, 2024, 07:22 PM
• Last activity: Jul 14, 2025, 10:04 AM
0
votes
1
answers
157
views
postgres_fdw: from where, is the resource utilized? remote or local?
I have an OLTP database and ETL jobs are running in the same database in the background. I was thinking of separating the OLTP and ETL instances so that resource utilization would be distributed. Basically, the OLTP instance should have fewer or no ETL overhead. The idea is to create foreign tables...
I have an OLTP database and ETL jobs are running in the same database in the background.
I was thinking of separating the OLTP and ETL instances so that resource utilization would be distributed. Basically, the OLTP instance should have fewer or no ETL overhead.
The idea is to create foreign tables on the ETL instance connecting to OLTP remote server using *postgres_fdw*.
I understand that Postgres will fetch chunks of data from the remote server using the cursor.
Can someone please help me if my understanding is right that running a complex query including foreign tables would use resources(RAM,CPU) from the local server? and is the remote server safe from these executions overhead?
And if I am wrong which instance resources would Postgres use to run a complex SQL with joins on foreign tables?
Thanks in advance!
Sajith P Shetty
(312 rep)
Dec 13, 2022, 02:00 PM
• Last activity: Jul 14, 2025, 06:02 AM
1
votes
1
answers
158
views
Postgres 13 Sort Performance
Sort performance on one of our query is very bad that it takes up to 14 seconds to run the query. Here is the Query: ``` SELECT "stock_quant".id FROM "stock_quant" WHERE ((((("stock_quant"."qty" > 0.0) AND "stock_quant"."reservation_id" IS NULL ) AND ("stock_quant"."location_id" in (34))) AND ("stoc...
Sort performance on one of our query is very bad that it takes up to 14 seconds to run the query.
Here is the Query:
SELECT "stock_quant".id
FROM "stock_quant"
WHERE ((((("stock_quant"."qty" > 0.0)
AND "stock_quant"."reservation_id" IS NULL )
AND ("stock_quant"."location_id" in (34)))
AND ("stock_quant"."product_id" = 203330))
AND ("stock_quant"."company_id" = 5))
ORDER BY "stock_quant"."in_date" ,"stock_quant"."id"
limit 10;
When used Explain, this is what postgres says
explain (analyze,buffers) SELECT "stock_quant".id FROM "stock_quant" WHERE ((((("stock_quant"."qty" > 0.0) AND "stock_quant"."reservation_id" IS NULL ) AND ("stock_quant"."location_id" in (34))) AND ("stock_quant"."product_id" = 203330)) AND ("stock_quant"."company_id" = 5)) ORDER BY "stock_quant"."in_date" ,"stock_quant"."id" limit 10;
Limit (cost=0.56..4723.78 rows=10 width=12) (actual time=15754.259..15754.260 rows=0 loops=1)
Buffers: shared hit=9988201 read=94226
-> Index Scan using stock_quant_multisort_idx on stock_quant (cost=0.56..1923768.25 rows=4073 width=12) (actual time=15754.257..15754.257 rows=0 loops=1)
Filter: ((reservation_id IS NULL) AND (qty > '0'::double precision) AND (location_id = 34) AND (product_id = 203330) AND (company_id = 5))
Rows Removed by Filter: 24052667
Buffers: shared hit=9988201 read=94226
Planning Time: 0.291 ms
Execution Time: 15754.288 ms
(8 rows)
explain SELECT "stock_quant".id FROM "stock_quant" WHERE ((((("stock_quant"."qty" > 0.0) AND "stock_quant"."reservation_id" IS NULL ) AND ("stock_quant"."location_id" in (34))) AND ("stock_quant"."product_id" = 203330)) AND ("stock_quant"."company_id" = 5)) ORDER BY "stock_quant"."in_date" ,"stock_quant"."id" limit 10;
Limit (cost=0.56..4723.82 rows=10 width=12)
-> Index Scan using stock_quant_multisort_idx on stock_quant (cost=0.56..1923781.40 rows=4073 width=12)
Filter: ((reservation_id IS NULL) AND (qty > '0'::double precision) AND (location_id = 34) AND (product_id = 203330) AND (company_id = 5)) (3 rows)
And here are the indexes in the table:
"stock_quant_pkey" PRIMARY KEY, btree (id)
"stock_quant_company_id_index" btree (company_id)
"stock_quant_location_id_index" btree (location_id)
"stock_quant_lot_id_index" btree (lot_id)
"stock_quant_multisort_idx" btree (in_date, id)
"stock_quant_owner_id_index" btree (owner_id)
"stock_quant_package_id_index" btree (package_id)
"stock_quant_product_id_index" btree (product_id)
"stock_quant_product_location_index" btree (product_id, location_id, company_id, qty, in_date, reservation_id)
"stock_quant_propagated_from_id_index" btree (propagated_from_id)
"stock_quant_qty_index" btree (qty)
"stock_quant_reservation_id_index" btree (reservation_id)
Work Mem is set at **512MB**
Any idea what needs to be changed? Without sort, the same query executes in less than 200ms.
Update:
Explain Analyze without Order by
explain (analyze,buffers) SELECT "stock_quant".id FROM "stock_quant" WHERE ((((("stock_quant"."qty" > 0.0) AND "stock_quant"."reservation_id" IS NULL ) AND ("stock_quant"."location_id" in (34))) AND ("stock_quant"."product_id" = 203330)) AND ("stock_quant"."company_id" = 5)) limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..33.76 rows=10 width=4) (actual time=0.661..0.662 rows=0 loops=1)
Buffers: shared hit=2 read=2
-> Index Scan using stock_quant_product_location_index on stock_quant (cost=0.56..13524.04 rows=4074 width=4) (actual time=0.660..0.660 rows=0 loops=1)
Index Cond: ((product_id = 203330) AND (location_id = 34) AND (company_id = 5) AND (qty > '0'::double precision) AND (reservation_id IS NULL))
Buffers: shared hit=2 read=2
Planning:
Buffers: shared hit=248 read=16
Planning Time: 7.005 ms
Execution Time: 0.691 ms
Abul Hassan
(11 rep)
Oct 27, 2021, 03:56 AM
• Last activity: Jul 12, 2025, 06:04 AM
1
votes
1
answers
168
views
Postgresql - optimizing multicolumn search with "prefix" search pattern as one of the filters
I'm now working on the project that automates "contracts" creation and signing. Recently I received a requirements to implement search functionality for "contracts", so I struggle to find performant way for doing it (I'm kinda new to postgres so far). Contract table looks like CREATE TABLE contracts...
I'm now working on the project that automates "contracts" creation and signing. Recently I received a requirements to implement search functionality for "contracts", so I struggle to find performant way for doing it (I'm kinda new to postgres so far).
Contract table looks like
CREATE TABLE contracts(
id UUID,
status TEXT,
name TEXT,
issuers JSOB,
approver JSONB,
period JSONB
created TIMESTAMP WITHOUT TIME ZONE
)
*status* - contract status, one of *draft, signed, active, expired*
*issuers* - json array of objects like {"id": , email: , name: , ...}
*approver* - json obj like {"id": , email: , name: , ...}
*period* - json obj like {"start": , "end": ...}
Search requirements:
1. "prefix" search with user-specified value where any of contract name, contract issuer (any) email, contract issuer (any) name, approver name, approver email matches condition.
2. Filtering by contract status using IN operator.
3. Ordering by created timestamp, period start date, period end date
4. Paging
As application is user-centric all queries will contain either issuer id, approver id or both as additional filter criteria.
Search examples:
1. For issuer with id "1" display contracts matching predicate "test@gmail" order by created timestamp. Such search is supposed to return following contracts
{"name": "test@gmail contract name e.g.", issuers:[{id: "1"...}...], ...}
{"name": "Name", issuers: [{id: "1"...}, {... email: "test@gmail.com"...}...]}
{"name": "Name1",issuers:[{id: "1"...}...], approver: {.."email": "test@gmail.com"..}}
2. For approver with id "11" display contracts matching predicate "aggre" order by start date. Such search is supposed to return following contracts
{"name": "aggreement 11232", approver:{"id": "11"...}, ...}
{"name": "Name", issuers: [{"name: "Aggre Xyz"...},...], approver:["id": "11"...}}
3. For issuer with id "1" and approver with id "11" display contracts matching predicate "aggre" order by start date. Such search is supposed to return following contracts
{"name": "aggreement 11232", approver:{"id": "11"...}, issuers: [{id: "1"...} ...}
Considerations:
1. With current db layout I have only gin index for the rescue, which will not help with the sorting as far as I know, apart from other complications it brings.
2. Rebuilding the schema to more classical one to many contract-issuer relation + saving approver and period objects fields as columns. On the one hand it will let utilize btree indexes for at least issuer id/approver id filtering + will let perform efficient sorting. On the other hand finding matches by issuer name/email doesn't seem easy and would involve many joins + plus it's a huge change for the schema.
3. Adding separate "search-optimized" table.
Two options
3.1
CREATE TABLE search(
issuer_id UUID,
approver_id UUID,
attribute_name TEXT,
attribute_value TEXT,
contract_id
)
*attribute_name* - contract-name, issuer_name...
With that approach search seems to be pretty easy, but amount of additional rows is a concern
3.2
Smth like this with attributes merged in array
CREATE TABLE search(
issuer_id UUID,
approver_id UUID,
search_attributes TEXT[],
contract_id
)
search attributes will contain all the strings that might match for "prefix" search. While it complicates the writes (one more db object to be updated...) it seems to be beneficial for reads.
SELECT *
FROM contracts
INNER JOIN (
SELECT contract_id
FROM search
WHERE issuer_id = 'id'
AND EXISTS(
SELECT 1
FROM unnest(search_attributes) attr,
WHERE attr LIKE 'input%'
)
I would appreciate any suggestion.
EPe
(11 rep)
Nov 20, 2023, 01:13 PM
• Last activity: Jul 11, 2025, 09:04 AM
-3
votes
1
answers
371
views
slow postgres performance with simple querys
After a lot of reading I found multiple sources saying Postgres should be able to handle ~100 million rows. I set up a simple table with: ``` CREATE TABLE trade ( id INT PRIMARY KEY NOT NULL, symbol VARCHAR(), open_time TIMESTAMP, end_price INT ) ``` I have 12 million of these records. My simple que...
After a lot of reading I found multiple sources saying Postgres should be able to handle ~100 million rows.
I set up a simple table with:
CREATE TABLE trade
(
id INT PRIMARY KEY NOT NULL,
symbol VARCHAR(),
open_time TIMESTAMP,
end_price INT
)
I have 12 million of these records.
My simple query of
SELECT * FROM table WHERE symbol=x and open_time>start_time and open_time
This query always returns less than a 1000 rows yet it takes 1100 milliseconds. This seems like a lot for a simple table with 10x less rows than it should handle?
Although I don't have any indexes cause I don't know what would be the best indexes to put on this table.
Is it possible to get this down to 100ms?
Any help writing a more performant query or db would be appreciated
EDIT 1:
After reading the comments I put the following indexes on the table
CREATE INDEX open_time ON trade(open_time);
CREATE INDEX symbol ON trade(symbol);
CREATE INDEX end_price ON trade(open_price);
After adding this the query time is 240 ms is this the max?
Although I have noticed
querying the beginning 40k rows the query time drops to 60ms
after that is rises to 240ms what is causing this?
sir snibbles
(11 rep)
May 19, 2022, 07:53 PM
• Last activity: Jul 9, 2025, 02:43 PM
Showing page 1 of 20 total questions