Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

4 votes

1 answers

154 views

Postgresql: Why is a join or subquery so much slower than a literal in a where clause?

I'm using PostgreSQL 14.17. My database schema has two tables:

Table "public.log_records"
    Column    |           Type           | Collation | Nullable |           Default            | Storage  | Compression | Stats target | Description
--------------+--------------------------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
 id           | bigint                   |           | not null | generated always as identity | plain    |             |              |
 log_id       | bigint                   |           |          |                              | plain    |             |              |
 payload      | jsonb                    |           | not null |                              | extended |             |              |
 created_at   | timestamp with time zone |           |          | CURRENT_TIMESTAMP            | plain    |             |              |
Indexes:
    "log_records_pkey" PRIMARY KEY, btree (id)
    "idx_log_records_log_created" btree (log_id, created_at)
    "idx_log_records_log_id" btree (log_id)
Foreign-key constraints:
    "log_records_log_id_fkey" FOREIGN KEY (log_id) REFERENCES logs(id)
Access method: heap

      Column      |           Type           | Collation | Nullable |           Default            | Storage  | Compression | Stats target | Description
------------------+--------------------------+-----------+----------+------------------------------+----------+-------------+--------------+-------------
 id               | bigint                   |           | not null | generated always as identity | plain    |             |              |
 name             | character varying(255)   |           | not null |                              | extended |             |              |
 retention_period | integer                  |           | not null |                              | plain    |             |              |
 created_at       | timestamp with time zone |           |          | CURRENT_TIMESTAMP            | plain    |             |              |
Indexes:
    "logs_pkey" PRIMARY KEY, btree (id)
    "idx_logs_id_name" btree (id, name)
    "logs_name_key" UNIQUE CONSTRAINT, btree (name)
Referenced by:
    TABLE "log_records" CONSTRAINT "log_records_log_id_fkey" FOREIGN KEY (log_id) REFERENCES logs(id)
Access method: heap

There are a small number of records in logs and not all of the log_ids are used in the log_records table:

=# select count(*) from logs;
 count
-------
    13
(1 row)

=# select count(*) from log_records;
  count
----------
 14552534
(1 row)

=# select count(distinct log_id) from log_records;
 count
-------
     8
(1 row)

I want to query the log_records table for log records belonging to a named log. However, if I pass the name of a log that has not got any records in the log_records table the query is very slow:

# explain analyze 
SELECT e.id, e.payload, e.created_at, e.headers 
FROM log_records e JOIN logs l ON l.id = e.log_id 
WHERE l.name = 'Log1' 
ORDER BY e.id 
LIMIT 100;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..244.61 rows=100 width=395) (actual time=7371.696..7371.698 rows=0 loops=1)
   ->  Nested Loop  (cost=0.43..2729140.04 rows=1117680 width=395) (actual time=7371.696..7371.697 rows=0 loops=1)
         Join Filter: (e.log_id = l.id)
         Rows Removed by Join Filter: 14552570
         ->  Index Scan using log_records_pkey on log_records e  (cost=0.43..2511191.29 rows=14529839 width=403) (actual time=0.012..4955.006 rows=14552570 loops=1)
         ->  Materialize  (cost=0.00..1.17 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=14552570)
               ->  Seq Scan on logs l  (cost=0.00..1.16 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=1)
                     Filter: ((name)::text = 'Log1'::text)
                     Rows Removed by Filter: 12
 Planning Time: 0.233 ms
 Execution Time: 7371.717 ms
(11 rows)

I get a different plan and better but still poor performance for a scalar subquery:

# explain analyze 
SELECT e.id, e.payload, e.created_at, e.headers
FROM log_records e 
WHERE e.log_id = 
  (SELECT id FROM logs WHERE name = 'Log1') 
ORDER BY e.id 
LIMIT 100;
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.60..141.86 rows=100 width=395) (actual time=3686.420..3686.421 rows=0 loops=1)
   InitPlan 1 (returns $0)
     ->  Seq Scan on logs  (cost=0.00..1.16 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
           Filter: ((name)::text = 'Log1'::text)
           Rows Removed by Filter: 12
   ->  Index Scan using log_records_pkey on log_records e  (cost=0.43..2547511.66 rows=1816230 width=395) (actual time=3686.419..3686.419 rows=0 loops=1)
         Filter: (log_id = $0)
         Rows Removed by Filter: 14552594
 Planning Time: 0.318 ms
 Execution Time: 3686.487 ms
(10 rows)

However if I hardcode the log_id as a literal I get good performance:

# explain analyze 
SELECT e.id, e.payload, e.created_at, e.headers 
FROM log_records e 
WHERE e.log_id = 13 
ORDER BY e.id 
LIMIT 100;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=4.08..4.09 rows=1 width=395) (actual time=0.015..0.015 rows=0 loops=1)
   ->  Sort  (cost=4.08..4.09 rows=1 width=395) (actual time=0.014..0.015 rows=0 loops=1)
         Sort Key: id
         Sort Method: quicksort  Memory: 25kB
         ->  Index Scan using idx_log_records_log_id on log_records e  (cost=0.43..4.07 rows=1 width=395) (actual time=0.011..0.011 rows=0 loops=1)
               Index Cond: (log_id = 13)
 Planning Time: 0.098 ms
 Execution Time: 0.028 ms
(8 rows)

Why can't I get this type of performance from the joins or subquery? Why is the subquery faster than the join when most advice would say don't use subqueries?

willn-cpx (43 rep)

Aug 1, 2025, 10:24 AM • Last activity: Aug 6, 2025, 12:50 AM

1 votes

0 answers

31 views

Necessary to include filtered index filter column if value always NULL in PostgreSQL?

postgresql index postgresql-performance filtered-index isnull

My question is about PostgreSQL. I found similar questions for MS SQL server but I don't know if the answers apply here. My table looks like this: scores ====== | ID | UserID | ValidFrom | ValidUntil | ScorePoints | +----+--------+------------+------------+-------------| | 1 | 1 | 2025-08-01 | 2025-...

                                  My question is about PostgreSQL. I found similar questions for MS SQL server but I don't know if the answers apply here. 

My table looks like this:

    scores
    ======
    | ID | UserID | ValidFrom  | ValidUntil | ScorePoints |
    +----+--------+------------+------------+-------------|
    |  1 |      1 | 2025-08-01 | 2025-08-02 |          80 |
    |  2 |      1 | 2025-08-02 |       NULL |          85 |

There will be a lot of queries to find the currently valid score for a given UserID (= the row for UserID = ? AND ValidUntil IS NULL).

I have a unique index like this:

    CREATE UNIQUE INDEX ix_uq_scores ON scores ( userid ) WHERE validuntil IS NULL;

Or should it be:

    CREATE UNIQUE INDEX ix_uq_scores ON scores ( userid, validuntil ) WHERE validuntil IS NULL;

A query might look like

    SELECT u.id, u.username, s.scorepoints 
    FROM users u 
    INNER JOIN scores s ON s.userid = u.id AND s.validuntil IS NULL
    WHERE u.id = 123;

My filtered index will only include rows where validuntil **IS** NULL. So do I have to add this constant NULL value to the index tuple?
                                

MrSnrub (181 rep)

Aug 4, 2025, 02:32 AM

3 votes

1 answers

461 views

How to deal with seldom used big database and postgresql?

postgresql performance azure-vm postgresql-performance

I loaded in PostgreSQL (9.3) OpenStreetMap data for whole europe (400gb). Then I installed a geocoding api (nominatim) that queries the database with gis queries. My problem is : > This database is queried a few times a day, but because postgres loads > the data in its buffer on demand, my first que...

                                  I loaded in PostgreSQL (9.3) OpenStreetMap data for whole europe (400gb). Then I installed a geocoding api (nominatim) that queries the database with gis queries.

My problem is : 

> This database is queried a few times a day, but because postgres loads
> the data in its buffer on demand, my first query on a particular gps
> point is always slow, and I do only one query per GPS point, so it's
> always slow (like 60sec slow against 100ms when the data is buffered)

.

What could I do here ? 

 - I'm using a A6 Azure instance (4 Cores, 28 GB memory, HDD).
 - Server is Ubuntu 14.04.4 LTS
 - Swapping is enabled
 - There is index on the geometry columns queried.
 - The shared buffer is 10GB
 - work mem is 256MB
 - maintenance work mem is 4GB


                                

remi bourgarel (259 rep)

Oct 4, 2016, 12:31 PM • Last activity: Aug 2, 2025, 12:08 PM

1 votes

2 answers

3309 views

REINDEX takes significantly more time on production system than on local snapshot

postgresql index postgresql-performance

We have a table with time-series-like data (~200million rows, 30GB data size + index) in a PG 11 database, and for the first time since we started writing into it, we have to delete from it (~30%). To keep things performant after the delete we want to - rebuild the indexes - run a full vacuum on it To get a better understanding of how much time this will take I have created a dump of the given table, loaded it into a local database, deleted the data, rebuilt the index, and ran a full vacuum:

REINDEX TABLE my_schema.my_table;
VACUUM (FULL, VERBOSE, ANALYZE) my_schema.my_table;

To my surprise, both finished quite fast (2 x 30min). I know to get the exact time required to run these commands on the server I need to backup the filesystem to get the exact same local snapshot as the server and run it on the same hardware. However from my limited understanding because we have never deleted or updated rows in the table the restored local snapshot should be physically very similar to the table on the server. So the operations on the server should take a similar amount of time to finish. This is not what I observe currently. I have deleted the data and re-indexing took 6x more times than on the local system. Similarly, the vacuum took around 6x times more as well. This is a significant difference that I would have never expected. My local hardware is similar to the server (16GB vs 20Gb), but with more CPU cores (2vCPU vs i7 8vCPU). I have SSD on my local machine and I don't know what storage the server uses but I would be surprised if it was non-SSD (We are using Azure Single Server for PostgreSQL). The CPU utilization was below 10% on the server all the time so I think the operation was not CPU bound. (I am just guessing here.) The table contains 4 indexes: 3 of those a single-column BTREE index with mostly even distribution (date-time, synced source file ID, consumer ID) and a compound index for these 3 together. Why the difference is so big? Was my local testing flawed?

NoNameProvided (111 rep)

Jan 30, 2022, 04:44 PM • Last activity: Jul 30, 2025, 03:02 PM

1 votes

1 answers

850 views

Remap array columns to single column with value

postgresql postgresql-performance array union

Here is the sample database schema. | id | Critical | High | Low | -------------------------------------------- | 1 | {apple, ball} | {cat} | {dog, egg} | Now I want to remap these to another table. Critical set to 1, High to 2 and Low to 3 like: | element | priority | ---------------------- | apple...

                                  Here is the sample database schema.

    | id |  Critical     |  High  |    Low     |
    --------------------------------------------
    | 1  | {apple, ball} |  {cat} | {dog, egg} |

Now I want to remap these to another table. Critical set to 1, High to 2 and Low to 3 like:

    | element | priority |
    ----------------------
    |  apple  |   1      |
    |  ball   |   1      |
    |  cat    |   2      |
    |  dog    |   3      |
    |  egg    |   3      |

1,2,3 can be changed to column name too.

I've written a query to do it for 1 column at a time:

    insert into new_table (element, priority)
    select unnest(critical), 1 from old_table where id=1;

But I want to do all the 3 columns in this single query.

I want it fast performing. Is UNION the most efficient way?
                                

PaxPrz (219 rep)

Mar 15, 2021, 04:57 AM • Last activity: Jul 30, 2025, 12:01 PM

0 votes

1 answers

3405 views

PostgreSQL update column in one table with value from another, based on key from first table

postgresql update postgresql-performance postgresql-11

Sorry about the title, I wasn't sure how best to word this. My situation is this: I have two tables: paper_author [paper_id, author_id, author_name] authors [author_id, author_name] Right now, the `author_name` column in `paper_author` is `NULL`. I would like to select the author name from the `auth...

Sorry about the title, I wasn't sure how best to word this. My situation is this: I have two tables: paper_author [paper_id, author_id, author_name] authors [author_id, author_name] Right now, the author_name column in paper_author is NULL. I would like to select the author name from the authors table and insert it into the author_name column in the paper_author table. I've written the following query:

UPDATE paper_author pa 
 SET author_name = (SELECT author_name FROM authors a WHERE a.author_id = pa.author_id);

Which I think will do what I want, but this is taking a very long time to run (days). For reference, the table paper_author has ~900M rows and the table authors has ~200M rows. The author_id is the primary key in the authors table. author_id has an index in the table paper_author. The rows in paper_author are not unique on author_id (i.e. each author_id may appear multiple times associated with different papers). Is there a more efficient way to write this? Have I missed something?

Gabriel Getzie (1 rep)

Oct 6, 2020, 02:07 AM • Last activity: Jul 30, 2025, 12:04 AM

0 votes

1 answers

65 views

RDS postgres slow read IO

postgresql postgresql-performance amazon-rds aws

We are running postgres 14.12 in rds and expirence very slow IO reads.. around 30MB/s on index scans. we can't figure out what might be the cause of it. any ideas to what we should / could check? **configuration** instance class: `db.m6idn.8xlarge` (which should support 3125MB/s throughput) RAM: `12...

                                  We are running postgres 14.12 in rds and expirence very slow IO reads.. around 30MB/s on index scans. we can't figure out what might be the cause of it. any ideas to what we should / could check?

**configuration**

instance class: db.m6idn.8xlarge (which should support 3125MB/s throughput)

RAM: 128GB

vCPU: 32

storage type: gp3 with 25000 IOPS (we only reach 18K) and 4000MiB/s throughput.  

most of our slow queries are due to slow IO read..

edena (1 rep)

Mar 4, 2025, 04:36 PM • Last activity: Jul 28, 2025, 04:09 PM

0 votes

1 answers

33 views

Is there a way to get transaction time statistics on specific tables

postgresql postgresql-performance transaction

I saw some log entries that indicated transaction time outliers of up to 10s at times, where transaction times are typically below 1s. To get a view of how often this happens, is there a way to get transaction time statistics in Postgres for transactions involving particular tables? Even just count...

                                  I saw some log entries that indicated transaction time outliers of up to 10s at times, where transaction times are typically below 1s. To get a view of how often this happens, is there a way to get transaction time statistics in Postgres for transactions involving particular tables?

Even just count number of transactions over some threshold, which I could then compare to the total number?

Thank you.

nsandersen (241 rep)

Jul 25, 2025, 12:44 PM • Last activity: Jul 26, 2025, 03:43 PM

6 votes

1 answers

1035 views

PostgreSQL predicate not pushed down (through join conditions)

postgresql execution-plan postgresql-performance postgresql-13

Consider the following data model in a PostgreSQL v13 system; [![parent-child data model][1]][1] Here, parent table `dim` contains a small set of reference data, and child table `fact` contains a much higher volume of records. A typical use case for these data sets would be to query all `fact::value...

Consider the following data model in a PostgreSQL v13 system;

Here, parent table dim contains a small set of reference data, and child table fact contains a much higher volume of records. A typical use case for these data sets would be to query all fact::value's data belonging to a dim::name. Note that dim::name holds a UNIQUE constraint. While I think this is a very common scenario, I was somewhat taken aback that the style of queries I've been using for years on other RDBMS's (Oracle, MSSQL) didn't perform _at all_ on PostgreSQL the way I imagined they would. That is, when querying a dataset (fact) using a highly selective, but implicit, predicate (fact::dim_id eq X) through a join condition, I expect the index on fact::dim_id to be used (in a nested-loop). Instead, a hash-join is used, requiring a full table scan of fact. **Question:** is there some way I can nudge the query planner into considering any predicate I issue on a joined relation to not need a full table scan? (without impacting other DB loads) To illustrate the problem with an example, these tables are populated with some random data;

CREATE TABLE dim(
  id       SERIAL NOT NULL
, name     TEXT   NOT NULL
, CONSTRAINT pk_dim PRIMARY KEY (id)
, CONSTRAINT uq_dim UNIQUE (name)
);

CREATE TABLE fact(
  id        SERIAL  NOT NULL
, dim_id    INTEGER NOT NULL
, value     TEXT
, CONSTRAINT pk_fact PRIMARY KEY (id)
, CONSTRAINT fk_facts_dim FOREIGN KEY (dim_id) REFERENCES dim (id)
);

CREATE INDEX idx_fact_dim ON fact(dim_id);

INSERT INTO dim(name)
SELECT SUBSTRING(md5(random()::TEXT) FOR 5)
FROM   generate_series(1,50)
UNION
SELECT 'key';

INSERT INTO fact(dim_id, value)
SELECT (SELECT id FROM dim ORDER BY random() LIMIT 1)
,      md5(random()::TEXT)
FROM   generate_series(1,1000000);

ANALYZE dim;
ANALYZE fact;

EXPLAIN ANALYZE
SELECT f.*
FROM   fact AS f
JOIN   dim  AS d
       ON (d.id = f.dim_id)
WHERE  d.name = 'key';       -- Note: UNIQUE

                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1001.65..18493.29 rows=20588 width=41) (actual time=319.331..322.582 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Hash Join  (cost=1.65..15434.49 rows=8578 width=41) (actual time=306.193..306.195 rows=0 loops=3)
         Hash Cond: (f.dim_id = d.id)
         ->  Parallel Seq Scan on fact f  (cost=0.00..14188.98 rows=437498 width=41) (actual time=0.144..131.050 rows=350000 loops=3)
         ->  Hash  (cost=1.64..1.64 rows=1 width=4) (actual time=0.138..0.139 rows=1 loops=3)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Seq Scan on dim d  (cost=0.00..1.64 rows=1 width=4) (actual time=0.099..0.109 rows=1 loops=3)
                     Filter: (name = 'key'::text)
                     Rows Removed by Filter: 50
 Planning Time: 1.059 ms
 Execution Time: 322.662 ms

Now, we execute the same question, but instead of filtering using an inner join, we filter using a scalar subquery;

EXPLAIN ANALYZE
SELECT *
FROM   fact
WHERE  dim_id = (SELECT id FROM dim WHERE name = 'key');

                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_fact_dim on fact  (cost=2.07..15759.53 rows=524998 width=41) (actual time=0.096..0.097 rows=0 loops=1)
   Index Cond: (dim_id = $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on dim  (cost=0.00..1.64 rows=1 width=4) (actual time=0.046..0.054 rows=1 loops=1)
           Filter: (name = 'key'::text)
           Rows Removed by Filter: 50
 Planning Time: 0.313 ms
 Execution Time: 0.156 ms

As shown, the performance difference is huge. Somehow, the query planner did not consider the predicate on the unique dim::name attribute to be equal to a predicate on fact::dim_id in the first query.

Michiel T (161 rep)

Jan 15, 2021, 10:55 PM • Last activity: Jul 25, 2025, 03:10 AM

0 votes

1 answers

145 views

Postgresql table function: get where condition

postgresql performance functions set-returning-functions postgresql-performance

I have defined a function that returns a table. In this function, I use multiple temporary table to pre-elaborate data. Here, there is a very simple example: create or replace function public.fn_world_cities() returns table ( city_id int, city_name varchar, country_id int ) as $$ begin ---- Extract...

                                  I have defined a function that returns a table. In this function, I use multiple temporary table to pre-elaborate data.

Here, there is a very simple example:

    create or replace function public.fn_world_cities()
    returns table (
		city_id int,
		city_name varchar, 
		country_id int
    	)
    as $$
    begin 
    
    	---- Extract temp table with all cities		
    	create temporary table tmp_all_cities on commit drop
    	as 
    	begin $$ 
    	select city_id, city_name, country_id
    	from public.table_world_cities
    	where row_status = 'A'; ---- filter active record
    	
    	---- Return data
    	return query 
    	select city_id, city_name, country_id
    	from tmp_all_cities;
    
    end; $$ language plpgsql;



The temporary table creation is time expensive.
When I use the function, always use some WHERE condition. Like:

    select * from public.fn_world_cities() 
    where country_id = 10;

In my function, there are many fields on which I can set a WHERE, so it's impossible to add function parameters for everyone.

Is there a way, at run time, to know there WHERE condition values? In that way I can pre-filter data during temporary table creation.

I wish I can have a function like this:

    create or replace function public.fn_world_cities()
    returns table (
		city_id int,
		city_name varchar, 
		country_id int
    	)
    as $$
    begin 
    
    	---- Extract temp table with all cities		
    	create temporary table tmp_all_cities on commit drop
    	as 
    	begin $$ 
    	select city_id, city_name, country_id
    	from public.table_world_cities
    	where row_status = 'A' ---- filter active record
    	and ***WHERE CONDITION APPLIED TO FUNCTION CALL***
    	;

    	
    	---- Return data
    	return query 
    	select city_id, city_name, country_id
    	from tmp_all_cities;
    
    end; $$ language plpgsql;
                                

Radioleao (153 rep)

Apr 9, 2019, 03:26 PM • Last activity: Jul 24, 2025, 11:06 PM

0 votes

2 answers

614 views

Postgresql Aggregate Query With Sort Very Slow

postgresql postgresql-performance

I have a simple aggregate query that is running very slowly (198s) on a table with ~4.6mil rows, can anyone suggest how to optimise? SELECT u.user_id,u.firstname,u.surname,u.image_url,p.pool_name, count(s.session_id) as totalSwims,sum(s.total_time_in_seconds) as totalTotalTime from sessions as s lef...

                                  I have a simple aggregate query that is running very slowly (198s) on a table with ~4.6mil rows, can anyone suggest how to optimise?

    SELECT u.user_id,u.firstname,u.surname,u.image_url,p.pool_name, count(s.session_id) as totalSwims,sum(s.total_time_in_seconds) as totalTotalTime
    from sessions as s 
    left join user_sessions as us on us.session_id = s.session_id 
    left join users as u on u.user_id = us.user_id 
    left join user_prefs as up on up.user_id = u.user_id 
    left join pools as p on p.pool_id = u.local_pool 
    where p.client_id = 1 
    and s.requires_checking is false 
    and s.swim_rating > 10 
    and us.user_id is not null 
    and s.date_swam > '2023-03-01 01:59:54.962+00' 
    and s.date_swam   Seq Scan on public.pools p  (cost=0.00..235.00 rows=62 width=33) (actual time=0.018..0.794 rows=61 loops=3)
                                                                            Output: p.pool_name, p.client_id, p.pool_id
                                                                            Filter: (p.client_id = 1)
                                                                            Rows Removed by Filter: 5674
                                                                            Buffers: shared hit=486
                                                                            Worker 0:  actual time=0.018..0.651 rows=61 loops=1
                                                                              Buffers: shared hit=162
                                                                            Worker 1:  actual time=0.019..0.790 rows=61 loops=1
                                                                              Buffers: shared hit=162
                                                          ->  Index Scan using user_prefs_pkey on public.user_prefs up  (cost=0.42..3.77 rows=1 width=4) (actual time=0.079..0.079 rows=0 loops=61372)
                                                                Output: up.user_id
                                                                Index Cond: (up.user_id = u.user_id)
                                                                Filter: (up.enable_public_leaderboards IS TRUE)
                                                                Rows Removed by Filter: 1
                                                                Buffers: shared hit=239339 read=6309 dirtied=59
                                                                I/O Timings: read=4577.017
                                                                Worker 0:  actual time=0.071..0.071 rows=0 loops=20288
                                                                  Buffers: shared hit=79332 read=1879 dirtied=18
                                                                  I/O Timings: read=1354.770
                                                                Worker 1:  actual time=0.081..0.081 rows=0 loops=20803
                                                                  Buffers: shared hit=81068 read=2195 dirtied=18
                                                                  I/O Timings: read=1605.396
                                                    ->  Index Only Scan using user_sessions_pkey on public.user_sessions us  (cost=0.43..140.52 rows=154 width=8) (actual time=0.619..1.525 rows=73 loops=11592)
                                                          Output: us.user_id, us.session_id
                                                          Index Cond: ((us.user_id = u.user_id) AND (us.user_id IS NOT NULL))
                                                          Heap Fetches: 488381
                                                          Buffers: shared hit=498091 read=18665
                                                          I/O Timings: read=16850.427
                                                          Worker 0:  actual time=0.634..1.555 rows=74 loops=3830
                                                            Buffers: shared hit=167709 read=6288
                                                            I/O Timings: read=5689.264
                                                          Worker 1:  actual time=0.624..1.549 rows=73 loops=3790
                                                            Buffers: shared hit=164737 read=6208
                                                            I/O Timings: read=5598.264
                                              ->  Index Scan using sessions_pkey on public.sessions s  (cost=0.43..4.00 rows=1 width=10) (actual time=0.676..0.676 rows=0 loops=845589)
                                                    Output: s.session_id, s.total_time_in_seconds
                                                    Index Cond: (s.session_id = us.session_id)
                                                    Filter: ((s.requires_checking IS FALSE) AND (s.swim_rating > 10) AND (s.date_swam > '2023-03-01 01:59:54.962+00'::timestamp with time zone) AND (s.date_swam  10. I then dropped the session_id primary key index to force the planner to use the new index, it gives the following plan (updated a second time!)....

    Limit  (cost=107902.84..107902.89 rows=20 width=140) (actual 

    time=13760.459..13802.105 rows=20 loops=1)
      Output: u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, (count(s.session_id)), (sum(s.total_time_in_seconds)), p.client_id, p.pool_id
      Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
      I/O Timings: read=38115.334
      ->  Sort  (cost=107902.84..107902.94 rows=43 width=140) (actual time=13760.458..13802.101 rows=20 loops=1)
            Output: u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, (count(s.session_id)), (sum(s.total_time_in_seconds)), p.client_id, p.pool_id
            Sort Key: (sum(s.total_time_in_seconds)) DESC
            Sort Method: top-N heapsort  Memory: 33kB
            Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
            I/O Timings: read=38115.334
            ->  Finalize GroupAggregate  (cost=107896.10..107901.69 rows=43 width=140) (actual time=13748.876..13801.273 rows=1476 loops=1)
                  Output: u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, count(s.session_id), sum(s.total_time_in_seconds), p.client_id, p.pool_id
                  Group Key: u.user_id, p.pool_id
                  Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
                  I/O Timings: read=38115.334
                  ->  Gather Merge  (cost=107896.10..107900.71 rows=36 width=140) (actual time=13748.858..13797.814 rows=3340 loops=1)
                        Output: u.user_id, p.pool_id, u.firstname, u.surname, u.image_url, p.pool_name, (PARTIAL count(s.session_id)), (PARTIAL sum(s.total_time_in_seconds)), p.client_id
                        Workers Planned: 2
                        Workers Launched: 2
                        Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
                        I/O Timings: read=38115.334
                        ->  Partial GroupAggregate  (cost=106896.08..106896.53 rows=18 width=140) (actual time=13743.988..13746.974 rows=1113 loops=3)
                              Output: u.user_id, p.pool_id, u.firstname, u.surname, u.image_url, p.pool_name, PARTIAL count(s.session_id), PARTIAL sum(s.total_time_in_seconds), p.client_id
                              Group Key: u.user_id, p.pool_id
                              Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
                              I/O Timings: read=38115.334
                              Worker 0:  actual time=13741.803..13744.770 rows=1123 loops=1
                                Buffers: shared hit=262454 read=9464 dirtied=14, temp read=7403 written=7316
                                I/O Timings: read=12590.170
                              Worker 1:  actual time=13742.976..13745.801 rows=1035 loops=1
                                Buffers: shared hit=258266 read=9090 dirtied=26, temp read=7763 written=8300
                                I/O Timings: read=12656.606
                              ->  Sort  (cost=106896.08..106896.12 rows=18 width=110) (actual time=13743.971..13744.390 rows=3256 loops=3)
                                    Output: u.user_id, p.pool_id, u.firstname, u.surname, u.image_url, p.pool_name, s.session_id, s.total_time_in_seconds, p.client_id
                                    Sort Key: u.user_id, p.pool_id
                                    Sort Method: quicksort  Memory: 811kB
                                    Buffers: shared hit=771807 read=27894 dirtied=52, temp read=23232 written=23668
                                    I/O Timings: read=38115.334
                                    Worker 0:  actual time=13741.780..13742.171 rows=3221 loops=1
                                      Sort Method: quicksort  Memory: 701kB
                                      Buffers: shared hit=262454 read=9464 dirtied=14, temp read=7403 written=7316
                                      I/O Timings: read=12590.170
                                    Worker 1:  actual time=13742.958..13743.323 rows=2736 loops=1
                                      Sort Method: quicksort  Memory: 608kB
                                      Buffers: shared hit=258266 read=9090 dirtied=26, temp read=7763 written=8300
                                      I/O Timings: read=12656.606
                                    ->  Parallel Hash Join  (cost=34085.80..106895.70 rows=18 width=110) (actual time=13647.907..13741.713 rows=3256 loops=3)
                                          Output: u.user_id, p.pool_id, u.firstname, u.surname, u.image_url, p.pool_name, s.session_id, s.total_time_in_seconds, p.client_id
                                          Hash Cond: (s.session_id = us.session_id)
                                          Buffers: shared hit=771793 read=27894 dirtied=52, temp read=23232 written=23668
                                          I/O Timings: read=38115.334
                                          Worker 0:  actual time=13655.793..13739.514 rows=3221 loops=1
                                            Buffers: shared hit=262447 read=9464 dirtied=14, temp read=7403 written=7316
                                            I/O Timings: read=12590.170
                                          Worker 1:  actual time=13644.134..13741.201 rows=2736 loops=1
                                            Buffers: shared hit=258259 read=9090 dirtied=26, temp read=7763 written=8300
                                            I/O Timings: read=12656.606
                                          ->  Parallel Bitmap Heap Scan on public.sessions s  (cost=447.58..73224.60 rows=8746 width=10) (actual time=7.704..24.683 rows=15126 loops=3)
                                                Output: s.session_id, s.total_time_in_seconds
                                                Recheck Cond: ((s.date_swam > '2023-03-01 01:59:54.962+00'::timestamp with time zone) AND (s.date_swam  10))
                                                Heap Blocks: exact=7524
                                                Buffers: shared hit=19743
                                                Worker 0:  actual time=6.706..17.200 rows=7148 loops=1
                                                  Buffers: shared hit=3877
                                                Worker 1:  actual time=8.199..27.980 rows=20303 loops=1
                                                  Buffers: shared hit=8214
                                                ->  Bitmap Index Scan on sessions_date_swam_idx  (cost=0.00..442.33 rows=20990 width=0) (actual time=4.946..4.946 rows=47051 loops=1)
                                                      Index Cond: ((s.date_swam > '2023-03-01 01:59:54.962+00'::timestamp with time zone) AND (s.date_swam   Parallel Hash  (cost=33585.40..33585.40 rows=4226 width=104) (actual time=13599.941..13599.945 rows=283349 loops=3)
                                                Output: us.session_id, u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, p.client_id, p.pool_id
                                                Buckets: 32768 (originally 16384)  Batches: 32 (originally 1)  Memory Usage: 4032kB
                                                Buffers: shared hit=752034 read=27894 dirtied=52, temp read=10015 written=23284
                                                I/O Timings: read=38115.334
                                                Worker 0:  actual time=13599.432..13599.437 rows=288936 loops=1
                                                  Buffers: shared hit=258562 read=9464 dirtied=14, temp read=2720 written=7188
                                                  I/O Timings: read=12590.170
                                                Worker 1:  actual time=13597.604..13597.609 rows=285810 loops=1
                                                  Buffers: shared hit=250037 read=9090 dirtied=26, temp read=3648 written=8172
                                                  I/O Timings: read=12656.606
                                                ->  Nested Loop  (cost=239.54..33585.40 rows=4226 width=104) (actual time=6.219..13166.722 rows=283349 loops=3)
                                                      Output: us.session_id, u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, p.client_id, p.pool_id
                                                      Buffers: shared hit=752034 read=27894 dirtied=52
                                                      I/O Timings: read=38115.334
                                                      Worker 0:  actual time=3.284..13087.841 rows=288936 loops=1
                                                        Buffers: shared hit=258562 read=9464 dirtied=14
                                                        I/O Timings: read=12590.170
                                                      Worker 1:  actual time=7.931..13106.351 rows=285810 loops=1
                                                        Buffers: shared hit=250037 read=9090 dirtied=26
                                                        I/O Timings: read=12656.606
                                                      ->  Nested Loop  (cost=239.11..15989.40 rows=124 width=104) (actual time=5.013..1891.004 rows=3880 loops=3)
                                                            Output: u.user_id, u.firstname, u.surname, u.image_url, up.user_id, p.pool_name, p.client_id, p.pool_id
                                                            Inner Unique: true
                                                            Buffers: shared hit=254626 read=4579 dirtied=52
                                                            I/O Timings: read=5183.492
                                                            Worker 0:  actual time=1.601..1793.850 rows=4041 loops=1
                                                              Buffers: shared hit=90103 read=1608 dirtied=14
                                                              I/O Timings: read=1612.670
                                                            Worker 1:  actual time=6.604..2114.870 rows=3960 loops=1
                                                              Buffers: shared hit=83066 read=1497 dirtied=26
                                                              I/O Timings: read=1968.382
                                                            ->  Hash Join  (cost=238.69..13655.78 rows=624 width=100) (actual time=0.954..42.871 rows=20490 loops=3)
                                                                  Output: u.user_id, u.firstname, u.surname, u.image_url, p.pool_name, p.client_id, p.pool_id
                                                                  Inner Unique: true
                                                                  Hash Cond: (u.local_pool = p.pool_id)
                                                                  Buffers: shared hit=13161
                                                                  Worker 0:  actual time=0.980..45.020 rows=21799 loops=1
                                                                    Buffers: shared hit=4452
                                                                  Worker 1:  actual time=0.778..40.547 rows=19986 loops=1
                                                                    Buffers: shared hit=4573
                                                                  ->  Parallel Seq Scan on public.users u  (cost=0.00..13263.23 rows=58571 width=71) (actual time=0.007..27.779 rows=46880 loops=3)
                                                                        Output: u.user_id, u.firstname, u.surname, u.image_url, u.local_pool
                                                                        Filter: (u.disabled IS FALSE)
                                                                        Rows Removed by Filter: 664
                                                                        Buffers: shared hit=12669
                                                                        Worker 0:  actual time=0.006..29.047 rows=49923 loops=1
                                                                          Buffers: shared hit=4288
                                                                        Worker 1:  actual time=0.005..25.210 rows=45846 loops=1
                                                                          Buffers: shared hit=4409
                                                                  ->  Hash  (cost=237.90..237.90 rows=63 width=33) (actual time=0.931..0.932 rows=61 loops=3)
                                                                        Output: p.pool_name, p.client_id, p.pool_id
                                                                        Buckets: 1024  Batches: 1  Memory Usage: 12kB
                                                                        Buffers: shared hit=492
                                                                        Worker 0:  actual time=0.961..0.962 rows=61 loops=1
                                                                          Buffers: shared hit=164
                                                                        Worker 1:  actual time=0.761..0.762 rows=61 loops=1
                                                                          Buffers: shared hit=164
                                                                        ->  Seq Scan on public.pools p  (cost=0.00..237.90 rows=63 width=33) (actual time=0.018..0.905 rows=61 loops=3)
                                                                              Output: p.pool_name, p.client_id, p.pool_id
                                                                              Filter: (p.client_id = 1)
                                                                              Rows Removed by Filter: 5682
                                                                              Buffers: shared hit=492
                                                                              Worker 0:  actual time=0.022..0.931 rows=61 loops=1
                                                                                Buffers: shared hit=164
                                                                              Worker 1:  actual time=0.018..0.738 rows=61 loops=1
                                                                                Buffers: shared hit=164
                                                            ->  Index Scan using user_prefs_pkey on public.user_prefs up  (cost=0.42..3.74 rows=1 width=4) (actual time=0.090..0.090 rows=0 loops=61471)
                                                                  Output: up.user_id
                                                                  Index Cond: (up.user_id = u.user_id)
                                                                  Filter: (up.enable_public_leaderboards IS TRUE)
                                                                  Rows Removed by Filter: 1
                                                                  Buffers: shared hit=241465 read=4579 dirtied=52
                                                                  I/O Timings: read=5183.492
                                                                  Worker 0:  actual time=0.080..0.080 rows=0 loops=21799
                                                                    Buffers: shared hit=85651 read=1608 dirtied=14
                                                                    I/O Timings: read=1612.670
                                                                  Worker 1:  actual time=0.103..0.103 rows=0 loops=19986
                                                                    Buffers: shared hit=78493 read=1497 dirtied=26
                                                                    I/O Timings: read=1968.382
                                                      ->  Index Only Scan using user_sessions_pkey on public.user_sessions us  (cost=0.43..140.35 rows=155 width=8) (actual time=0.788..2.892 rows=73 loops=11640)
                                                            Output: us.user_id, us.session_id
                                                            Index Cond: ((us.user_id = u.user_id) AND (us.user_id IS NOT NULL))
                                                            Heap Fetches: 492751
                                                            Buffers: shared hit=497408 read=23315
                                                            I/O Timings: read=32931.842
                                                            Worker 0:  actual time=0.774..2.781 rows=72 loops=4041
                                                              Buffers: shared hit=168459 read=7856
                                                              I/O Timings: read=10977.501
                                                            Worker 1:  actual time=0.727..2.761 rows=72 loops=3960
                                                              Buffers: shared hit=166971 read=7593
                                                              I/O Timings: read=10688.224
    Planning:
      Buffers: shared hit=101
    Planning Time: 1.549 ms
    Execution Time: 13802.195 ms
                                

DaveB (319 rep)

Mar 30, 2023, 03:29 PM • Last activity: Jul 24, 2025, 09:03 AM

0 votes

1 answers

142 views

Postgres stuck with ALTER query onp roduction

postgresql postgresql-performance

So I have this big database in production having partitions on date field. I ran this migration to add a new column and its now 1-hour db is stuck not responding and still migrating. ALTER TABLE public.ur_sms_traffic ADD COLUMN trace_id varchar(16); The table has no relation. just stand-alone table,

                                  So I have this big database in production having partitions on date field.
I ran this migration to add a new column and its now 1-hour db is stuck not responding and still migrating. 

    ALTER TABLE public.ur_sms_traffic ADD COLUMN trace_id varchar(16);

The table has no relation. just stand-alone table,

Shaz Hemani (101 rep)

Dec 15, 2021, 08:41 AM • Last activity: Jul 19, 2025, 09:05 PM

1 votes

1 answers

48 views

Postgres query planner join selectivity greater than 1?

postgresql performance postgresql-performance execution-plan cardinality-estimates

I am using PostgreSQL 14.17. I am trying to debug a query planner failure in a bigger query, but I think I've narrowed down the problem to a self-join on a join table: ```sql SELECT t2.item_id FROM item_sessions t1 JOIN item_sessions t2 ON t1.session_key = t2.session_key WHERE t1.item_id = 'xxxxxxxx...

I am using PostgreSQL 14.17. I am trying to debug a query planner failure in a bigger query, but I think I've narrowed down the problem to a self-join on a join table:

SELECT t2.item_id
  FROM item_sessions t1
  JOIN item_sessions t2
       ON t1.session_key = t2.session_key
 WHERE t1.item_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';

After running ANALYZE on the table, EXPLAIN gives this plan (which matches the subplan in the larger query):

Nested Loop  (cost=1.12..119.60 rows=7398 width=16)
   ->  Index Only Scan using item_sessions_item_id_session_key_uniq on item_sessions t1  (cost=0.56..8.58 rows=1 width=33)
         Index Cond: (item_id = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'::uuid)
   ->  Index Only Scan using item_sessions_session_idx on item_sessions t2  (cost=0.56..110.11 rows=91 width=49)
         Index Cond: (session_key = (t1.session_key)::text)

**Why is the loop estimating 7398 rows when the two child nodes estimate 1 and 91 respectively?** I would have expected the loop total to be less than 1 * 91 FWIW, the child estimates seem correct. item_id has n_distinct at -0.77649677, so the expected row count is 1.3, and session_key has n_distinct at 149555 out of an estimated 1.36e+07 tuples, which gives 90.9 expected tuples per session_key. The indexes referenced in the plan are: - item_sessions_session_idx btree (session_key, item_id) - item_sessions_item_id_session_key_uniq UNIQUE CONSTRAINT, btree (item_id, session_key) ETA: I created a minimal reproduction [here](https://github.com/felipeochoa/pg-plan-selectivity-gt1) . The failure is visible [in the job logs](https://github.com/felipeochoa/pg-plan-selectivity-gt1/actions/runs/16359411460/job/46224463766) on 17.5, 16.9, and 15.13

Felipe (317 rep)

Jul 17, 2025, 04:41 AM • Last activity: Jul 18, 2025, 09:14 AM

0 votes

1 answers

160 views

Postgresql same query is slower on new hardware

postgresql postgresql-performance

I migrate my data to a new server hardware that after my benchmark looks faster in terms of CPU and disk i/o. Unfortunatly my query seems slower on that new hardware with the same data and the same query. Here is my query : EXPLAIN ANALYZE SELECT AVG(CAST(completeness->>'fr' AS INTEGER)) as average FROM product INNER JOIN channel ON channel.id = product.channel_id WHERE channel_id = 'myUuid'; On old server my times are : Planning Time: 0.281 ms Execution Time: 50.683 ms On the new server my times are: Planning Time: 0.162 ms Execution Time: 115.268 ms That is significally slower, I already tried making a VACUUM FULL and REINDEX mydatabase And also increasing the default_statistics_target but any of this changed anything. Of course I'm using the same postgresql version (12.14) and everything I could found in similar questions, do you have any other solutions ? EDIT : For some more details : Old and faster :

Aggregate  (cost=58441.85..58441.86 rows=1 width=32) (actual time=66.284..66.285 rows=1 loops=1)
   Buffers: shared hit=1132 read=18379 written=16
   I/O Timings: read=36.167 write=0.101
   ->  Nested Loop  (cost=594.96..58185.68 rows=25617 width=25) (actual time=5.003..61.783 rows=25985 loops=1)
         Buffers: shared hit=1132 read=18379 written=16
         I/O Timings: read=36.167 write=0.101
         ->  Seq Scan on channel  (cost=0.00..1.30 rows=1 width=16) (actual time=0.028..0.031 rows=1 loops=1)
               Filter: (id = 'myuuid'::uuid)
               Rows Removed by Filter: 23
               Buffers: shared hit=1
         ->  Bitmap Heap Scan on product  (cost=594.96..57928.21 rows=25617 width=41) (actual time=4.969..59.276 rows=25985 loops=1)
               Recheck Cond: (channel_id = 'myuuid'::uuid)
               Heap Blocks: exact=19408
               Buffers: shared hit=1131 read=18379 written=16
               I/O Timings: read=36.167 write=0.101
               ->  Bitmap Index Scan on idx_8ac439d272f5a1aa  (cost=0.00..588.55 rows=25617 width=0) (actual time=3.177..3.177 rows=25985 loops=1)
                     Index Cond: (channel_id = 'myuuid'::uuid)
                     Buffers: shared read=102
                     I/O Timings: read=0.389
 Planning Time: 1.945 ms
 Execution Time: 66.585 ms
(21 rows)

New and slower :

Aggregate  (cost=59666.14..59666.15 rows=1 width=32) (actual time=146.506..146.508 rows=1 loops=1)
   Buffers: shared hit=108 read=19371 written=172
   I/O Timings: read=85.949 write=1.528
   ->  Nested Loop  (cost=617.08..59402.07 rows=26407 width=25) (actual time=7.752..139.180 rows=25985 loops=1)
         Buffers: shared hit=108 read=19371 written=172
         I/O Timings: read=85.949 write=1.528
         ->  Seq Scan on channel  (cost=0.00..1.30 rows=1 width=16) (actual time=0.012..0.016 rows=1 loops=1)
               Filter: (id = 'myuuid'::uuid)
               Rows Removed by Filter: 23
               Buffers: shared hit=1
         ->  Bitmap Heap Scan on product  (cost=617.08..59136.70 rows=26407 width=41) (actual time=7.737..134.826 rows=25985 loops=1)
               Recheck Cond: (channel_id = 'myuuid'::uuid)
               Heap Blocks: exact=19374
               Buffers: shared hit=107 read=19371 written=172
               I/O Timings: read=85.949 write=1.528
               ->  Bitmap Index Scan on idx_8ac439d272f5a1aa  (cost=0.00..610.48 rows=26407 width=0) (actual time=5.073..5.073 rows=25987 loops=1)
                     Index Cond: (channel_id = 'myuuid'::uuid)
                     Buffers: shared read=102
                     I/O Timings: read=0.688
 Planning Time: 0.773 ms
 Execution Time: 146.750 ms
(21 rows)

TomLorenzi (1 rep)

Jul 12, 2023, 01:24 PM • Last activity: Jul 17, 2025, 10:08 PM

0 votes

0 answers

32 views

VACUUM: ERROR: invalid memory alloc request size postgresql

postgresql postgresql-performance enterprisedb

I am getting >ERROR: invalid memory alloc request size 2727388320 after I tried vacuuming in EnterpriseDB. It is getting bigger and bigger. How to solve this issue now ?

                                  I am getting 
>ERROR:  invalid memory alloc request size 2727388320 

after I tried vacuuming in EnterpriseDB. It is getting bigger and bigger. How to solve this issue now ? 
                                

MD Nasirul Islam (101 rep)

Jul 16, 2025, 06:49 AM • Last activity: Jul 16, 2025, 01:20 PM

0 votes

1 answers

159 views

Selecting columns based on time ranges

postgresql postgresql-performance time-series-database

I have a table which looks like this: ``` Table t1 timestamp (pkey) | A | B | C ---------------------------------- 2024-01-01 12:00:00 | 1 | 2 | 2 2024-01-01 12:00:01 | 1 | 2 | 2 2024-01-01 12:00:03 | 15 | 4 | 2 ``` Table creation SQL: ``` CREATE TABLE IF NOT EXISTS t1 ("id" serial NOT NULL, "timest...

I have a table which looks like this:

Table t1

timestamp (pkey)     | A  | B | C
----------------------------------
2024-01-01 12:00:00  | 1  | 2 | 2
2024-01-01 12:00:01  | 1  | 2 | 2
2024-01-01 12:00:03  | 15 | 4 | 2

Table creation SQL:

CREATE TABLE IF NOT EXISTS t1 ("id" serial NOT NULL, "timestamp" timestamp PRIMARY KEY, "A" real, "B" real, "C" real)

Some tables might have 100+ columns. I'd like to select several columns having omitting time ranges on per column basis. The way I do it today is like the following:

SELECT 
(CASE WHEN timestamp NOT BETWEEN '2017-06-12T17:01' AND '2018-06-12T21:57'
                    AND timestamp NOT BETWEEN '2020-06-12T17:01' AND '2021-02-15T21:57' THEN A ELSE NULL END) "A",

(CASE WHEN timestamp NOT BETWEEN '2018-07-14' AND '2019-01-01'
                    AND timestamp NOT BETWEEN '2020-06-12T17:01' AND '2021-02-15T21:57' THEN B ELSE NULL END) "B",

C
FROM t1
WHERE timestamp > '2014-01-01'

timestamp is an index obviously. Is there a better / more effective way to do it?

Miro (113 rep)

Mar 25, 2024, 07:22 PM • Last activity: Jul 14, 2025, 10:04 AM

0 votes

1 answers

157 views

postgres_fdw: from where, is the resource utilized? remote or local?

postgresql postgresql-performance dblink postgresql-fdw foreign-data

I have an OLTP database and ETL jobs are running in the same database in the background. I was thinking of separating the OLTP and ETL instances so that resource utilization would be distributed. Basically, the OLTP instance should have fewer or no ETL overhead. The idea is to create foreign tables...

                                  I have an OLTP database and ETL jobs are running in the same database in the background.

I was thinking of separating the OLTP and ETL instances so that resource utilization would be distributed. Basically, the OLTP instance should have fewer or no ETL overhead. 

The idea is to create foreign tables on the ETL instance connecting to OLTP remote server using *postgres_fdw*. 

I understand that Postgres will fetch chunks of data from the remote server using the cursor.

Can someone please help me if my understanding is right that running a complex query including foreign tables would use resources(RAM,CPU) from the local server? and is the remote server safe from these executions overhead?

And if I am wrong which instance resources would Postgres use to run a complex SQL with joins on foreign tables?  

Thanks in advance!

Sajith P Shetty (312 rep)

Dec 13, 2022, 02:00 PM • Last activity: Jul 14, 2025, 06:02 AM

1 votes

1 answers

158 views

Postgres 13 Sort Performance

postgresql postgresql-performance postgresql-13

Sort performance on one of our query is very bad that it takes up to 14 seconds to run the query. Here is the Query: ``` SELECT "stock_quant".id FROM "stock_quant" WHERE ((((("stock_quant"."qty" > 0.0) AND "stock_quant"."reservation_id" IS NULL ) AND ("stock_quant"."location_id" in (34))) AND ("stoc...

Sort performance on one of our query is very bad that it takes up to 14 seconds to run the query. Here is the Query:

SELECT "stock_quant".id 
FROM "stock_quant" 
WHERE ((((("stock_quant"."qty" > 0.0)  
      AND "stock_quant"."reservation_id" IS NULL )  
      AND ("stock_quant"."location_id" in (34)))
      AND ("stock_quant"."product_id" = 203330))  
      AND ("stock_quant"."company_id" = 5)) 
ORDER BY "stock_quant"."in_date" ,"stock_quant"."id"   
limit 10;

When used Explain, this is what postgres says explain (analyze,buffers) SELECT "stock_quant".id FROM "stock_quant" WHERE ((((("stock_quant"."qty" > 0.0) AND "stock_quant"."reservation_id" IS NULL ) AND ("stock_quant"."location_id" in (34))) AND ("stock_quant"."product_id" = 203330)) AND ("stock_quant"."company_id" = 5)) ORDER BY "stock_quant"."in_date" ,"stock_quant"."id" limit 10;

Limit  (cost=0.56..4723.78 rows=10 width=12) (actual time=15754.259..15754.260 rows=0 loops=1)
    Buffers: shared hit=9988201 read=94226
    ->  Index Scan using stock_quant_multisort_idx on stock_quant  (cost=0.56..1923768.25 rows=4073 width=12) (actual time=15754.257..15754.257 rows=0 loops=1)
                 Filter: ((reservation_id IS NULL) AND (qty > '0'::double precision) AND (location_id = 34) AND (product_id = 203330) AND (company_id = 5))
                 Rows Removed by Filter: 24052667
                 Buffers: shared hit=9988201 read=94226  
Planning Time: 0.291 ms  
Execution Time: 15754.288 ms 
(8 rows)

explain SELECT "stock_quant".id FROM "stock_quant" WHERE ((((("stock_quant"."qty" > 0.0) AND "stock_quant"."reservation_id" IS NULL ) AND ("stock_quant"."location_id" in (34))) AND ("stock_quant"."product_id" = 203330)) AND ("stock_quant"."company_id" = 5)) ORDER BY "stock_quant"."in_date" ,"stock_quant"."id" limit 10;

Limit  (cost=0.56..4723.82 rows=10 width=12)
    ->  Index Scan using stock_quant_multisort_idx on stock_quant  (cost=0.56..1923781.40 rows=4073 width=12)
                 Filter: ((reservation_id IS NULL) AND (qty > '0'::double precision) AND (location_id = 34) AND (product_id = 203330) AND (company_id = 5)) (3 rows)

And here are the indexes in the table: "stock_quant_pkey" PRIMARY KEY, btree (id) "stock_quant_company_id_index" btree (company_id) "stock_quant_location_id_index" btree (location_id) "stock_quant_lot_id_index" btree (lot_id) "stock_quant_multisort_idx" btree (in_date, id) "stock_quant_owner_id_index" btree (owner_id) "stock_quant_package_id_index" btree (package_id) "stock_quant_product_id_index" btree (product_id) "stock_quant_product_location_index" btree (product_id, location_id, company_id, qty, in_date, reservation_id) "stock_quant_propagated_from_id_index" btree (propagated_from_id) "stock_quant_qty_index" btree (qty) "stock_quant_reservation_id_index" btree (reservation_id) Work Mem is set at **512MB** Any idea what needs to be changed? Without sort, the same query executes in less than 200ms. Update: Explain Analyze without Order by explain (analyze,buffers) SELECT "stock_quant".id FROM "stock_quant" WHERE ((((("stock_quant"."qty" > 0.0) AND "stock_quant"."reservation_id" IS NULL ) AND ("stock_quant"."location_id" in (34))) AND ("stock_quant"."product_id" = 203330)) AND ("stock_quant"."company_id" = 5)) limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.56..33.76 rows=10 width=4) (actual time=0.661..0.662 rows=0 loops=1) Buffers: shared hit=2 read=2 -> Index Scan using stock_quant_product_location_index on stock_quant (cost=0.56..13524.04 rows=4074 width=4) (actual time=0.660..0.660 rows=0 loops=1) Index Cond: ((product_id = 203330) AND (location_id = 34) AND (company_id = 5) AND (qty > '0'::double precision) AND (reservation_id IS NULL)) Buffers: shared hit=2 read=2 Planning: Buffers: shared hit=248 read=16 Planning Time: 7.005 ms Execution Time: 0.691 ms

Abul Hassan (11 rep)

Oct 27, 2021, 03:56 AM • Last activity: Jul 12, 2025, 06:04 AM

1 votes

1 answers

168 views

Postgresql - optimizing multicolumn search with "prefix" search pattern as one of the filters

postgresql performance index postgresql-performance

I'm now working on the project that automates "contracts" creation and signing. Recently I received a requirements to implement search functionality for "contracts", so I struggle to find performant way for doing it (I'm kinda new to postgres so far). Contract table looks like CREATE TABLE contracts...

                                  I'm now working on the project that automates "contracts" creation and signing. Recently I received a requirements to implement search functionality for "contracts", so I struggle to find performant way for doing it (I'm kinda new to postgres so far).

Contract table looks like

    CREATE TABLE contracts(
    id UUID,
    status TEXT,
    name TEXT,
    issuers JSOB,
    approver JSONB,
    period JSONB
    created TIMESTAMP WITHOUT TIME ZONE
    )
*status* - contract status, one of *draft, signed, active, expired*

*issuers* - json array of objects like {"id": , email: , name: , ...}

*approver* - json obj like {"id": , email: , name: , ...}

*period* - json obj like {"start": , "end":  ...}

Search requirements:
1. "prefix" search with user-specified value where any of contract name, contract issuer (any) email, contract issuer (any) name, approver name, approver email matches condition.
2. Filtering by contract status using IN operator.
3. Ordering by created timestamp, period start date, period end date
4. Paging

As application is user-centric all queries will contain either issuer id, approver id or both as additional filter criteria.

Search examples:
 1. For issuer with id "1" display contracts matching predicate "test@gmail" order by created timestamp. Such search is supposed to return following contracts
 

    {"name": "test@gmail contract name e.g.", issuers:[{id: "1"...}...], ...}

    {"name": "Name", issuers: [{id: "1"...}, {... email: "test@gmail.com"...}...]}

    {"name": "Name1",issuers:[{id: "1"...}...], approver: {.."email": "test@gmail.com"..}}

 2. For approver with id "11" display contracts matching predicate "aggre" order by start date. Such search is supposed to return following contracts
 

    {"name": "aggreement 11232", approver:{"id": "11"...}, ...}

    {"name": "Name", issuers: [{"name: "Aggre Xyz"...},...],  approver:["id": "11"...}}
    
3. For issuer with id "1" and approver with id "11" display contracts matching predicate "aggre" order by start date. Such search is supposed to return following contracts

    {"name": "aggreement 11232", approver:{"id": "11"...}, issuers: [{id: "1"...} ...}

Considerations:
 1. With current db layout I have only gin index for the rescue, which will not help with the sorting as far as I know, apart from other complications it brings.
 2. Rebuilding the schema to more classical one to many contract-issuer relation + saving approver and period objects fields as columns. On the one hand it will let utilize btree indexes for at least issuer id/approver id filtering + will let perform efficient sorting. On the other hand finding matches by issuer name/email doesn't seem easy and would involve many joins + plus it's a huge change for the schema.
 3. Adding separate "search-optimized" table. 

Two options

3.1 

    CREATE TABLE search(
        issuer_id UUID,
        approver_id UUID,
        attribute_name TEXT,
        attribute_value TEXT,
        contract_id
        )

*attribute_name* - contract-name, issuer_name...

With that approach search seems to be pretty easy, but amount of additional rows is a concern

3.2
Smth like this with attributes merged in array

    CREATE TABLE search(
    issuer_id UUID,
    approver_id UUID,
    search_attributes TEXT[],
    contract_id
    )

search attributes will contain all the strings that might match for "prefix" search. While it complicates the writes (one more db object to be updated...) it seems to be beneficial for reads.

    SELECT *
    FROM contracts
    INNER JOIN (
       SELECT contract_id
       FROM search
       WHERE issuer_id = 'id'
       AND EXISTS(
        SELECT 1
        FROM unnest(search_attributes) attr,
        WHERE attr LIKE 'input%'
    )


I would appreciate any suggestion.


                                

EPe (11 rep)

Nov 20, 2023, 01:13 PM • Last activity: Jul 11, 2025, 09:04 AM

-3 votes

1 answers

371 views

slow postgres performance with simple querys

postgresql postgresql-performance performance-tuning

After a lot of reading I found multiple sources saying Postgres should be able to handle ~100 million rows. I set up a simple table with: ``` CREATE TABLE trade ( id INT PRIMARY KEY NOT NULL, symbol VARCHAR(), open_time TIMESTAMP, end_price INT ) ``` I have 12 million of these records. My simple que...

After a lot of reading I found multiple sources saying Postgres should be able to handle ~100 million rows. I set up a simple table with:

CREATE TABLE trade
(
    id INT PRIMARY KEY NOT NULL,
    symbol VARCHAR(),
    open_time TIMESTAMP,
    end_price INT
)

I have 12 million of these records. My simple query of

SELECT * FROM table WHERE symbol=x and open_time>start_time and open_time



This query always returns less than a 1000 rows yet it takes 1100 milliseconds. This seems like a lot for a simple table with 10x less rows than it should handle? 

Although I don't have any indexes cause I don't know what would be the best indexes to put on this table.

Is it possible to get this down to 100ms?

Any help writing a more performant query or db would be appreciated

EDIT 1:

After reading the comments I put the following indexes on the table
CREATE INDEX open_time ON trade(open_time);
CREATE INDEX symbol ON trade(symbol);
CREATE INDEX end_price ON trade(open_price);

After adding this the query time is 240 ms is this the max?

Although I have noticed
querying the beginning 40k rows the query time drops to 60ms  
after that is rises to 240ms what is causing this?


                          
                          
                        
                        
                        
                          
                            
                            sir snibbles
                            (11 rep)
                          
                          
                            
                            May 19, 2022, 07:53 PM
                            
                              • Last activity: Jul 9, 2025, 02:43 PM


          
          

          
          
            
              
                
                  
                    Previous
                  
                
                
                
                  Page 1
                
                
                
                  
                    
                      Next
                    
                  
                
              
            
            
            
              
              Showing page 1 of 20 total questions