Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

1 answers

494 views

Postgres psql command intermittently 'stalls' in powershell

I'm in the process of moving a postgres database to a new server. I have number of large tables, partitioned by month, each contains several 10's of millions of rows. I have a powershell script to do the following.... - Copy the backup file (for a single partition) from archive location (using roboc...

                                  I'm in the process of moving a postgres database to a new server.

I have number of large tables, partitioned by month, each contains several 10's of millions of rows.

I have a powershell script to do the following....

 - Copy the backup file (for a single partition) from archive location
   (using robocopy) 
 - load the file back into the database using psql.
 - drop the backup file. 
 - loop back to the beginning and move onto the
   next partition.

Copying the file from the archive takes maybe 10 minutes, loading it typically 40 minutes, then it moves onto the next partition. **The problem** is that the PowerShell console seems to occasionally get 'stuck' after loading a partition. Occasionally, a partition seems to be taking too long (say an hour) I check the windows task manager, and none of the processes seem to be doing anything.  I check the dashboard in pgAdmin4 (no locks, most sessions idle). Then I check the record count for the partition being loaded - the record count seems to be static (still the same 5 minutes later). So as far as I can tell the partition has been loaded successfully, and the database is mostly idle. But the powershell console that's running the load session hasn't changed, then I shift focus to the powershell console, press return a couple of times and suddenly it bursts into life. The console tells me it's moved onto the partition, the task-manager indicates that the powershell process and postgres process are now running again, pgAdmin4 also indicates the database has sprung back into life.

The powershell script logs to screen using Write-host and the current time between each step, so I always know what the most recent step was. My conclusion is that the interaction between powershell and psql is somehow 'broken' - powershell hasn't worked out that psql has finished and doesn't know to move onto the next step.

Other info - there's no sign of any errors (that I can spot) all the tables seem to get populated correctly (eventually) and the script works fine on all the smaller partitions. This only happens on about 1 partition in 10, the only cure seems to be pressing enter a couple of times.

This isn't a critical problem, occasionally hitting enter a couple of times is a viable solution, but an understanding of what might be going on (or even a solution) would be much appreciated.

ConanTheGerbil (1303 rep)

Feb 20, 2022, 12:03 PM • Last activity: Jul 13, 2025, 10:03 PM

0 votes

1 answers

157 views

Is there any possibility to delete all data directory of PostgreSQL by PgPool?

postgresql postgresql-13 postgresql-14

Recently I found that suddenly all data directories were deleted both primary and standby. I am assuming that it happens due to failover. Example: Suppose there are 4 nodes where 1 is primary and another 3 is standby. For example, suppose 1 and 2 at a time are Primary due to failover. Both primaries...

                                  Recently I found that suddenly all data directories were deleted both primary and standby. I am assuming that it happens due to failover. 

Example: Suppose there are 4 nodes where 1 is primary and another 3 is standby. For example, suppose 1 and 2 at a time are Primary due to failover. Both primaries are somehow in active mode with the other two on standby.

Is it possible by pgpool to delete all(4 nodes) data directories of PostgreSQL primary and standby? 

Need expert opinion.

Sheikh Wasiu Al Hasib (283 rep)

Sep 14, 2023, 06:24 PM • Last activity: Jul 11, 2025, 01:03 PM

0 votes

1 answers

162 views

Is there a way to change query / set the epoch portion of the txid_current?

postgresql postgresql-14

The postgres documentation says that txid_current is the combination of the epoch and the current transaction id, done so that every transaction should be assigned a unique 64 bit number. Is there any way to check what the epoch is currently? Also - is there a way to get the server to increment the...

                                  The postgres documentation says that txid_current is the combination of the epoch and the current transaction id, done so that every transaction should be assigned a unique 64 bit number.

Is there any way to check what the epoch is currently?  Also - is there a way to get the server to increment the epoch?

I am using postgres 14.5.

zelinka (101 rep)

Nov 4, 2022, 09:11 PM • Last activity: Jul 7, 2025, 10:04 AM

1 votes

0 answers

27 views

the different partion by fields value but get the greater than 1 row number when using row_number

postgresql postgresql-14

I want to keep the first row of dulplicate records in PostgreSQL 14, this is the sql look like: WITH xzqh_info AS ( SELECT xzqhdm, xzqhmc FROM area_table WHERE 1=1 and sfyx = '1' AND xzqhdm IN ('411122') ), latest_records AS ( SELECT a.bdcdy_id, b.bddjzmh_wz, a.djlx, a.qllx, a.djdy_id, ROW_NUMBER()...

                                  I want to keep the first row of dulplicate records in PostgreSQL 14, this is the sql look like:


    WITH xzqh_info AS ( 
        SELECT  
            xzqhdm,  
            xzqhmc 
        FROM  
            area_table 
        WHERE 1=1  
        and sfyx = '1' 
        AND xzqhdm IN ('411122')  
    ),  
    latest_records AS ( 
        SELECT  
            a.bdcdy_id, 
            b.bddjzmh_wz, 
            a.djlx,  
            a.qllx,
            a.djdy_id,
            ROW_NUMBER() OVER (PARTITION BY a.bdcdy_id, b.bddjzmh_wz ORDER BY a.dbsj DESC) AS rn 
        FROM base_table a 
        LEFT JOIN another_biz_table b ON a.djdy_id = b.djdy_id 
        WHERE a.sfbcx = '0' 
        AND a.qllx IN ('qllx036', 'qllx039') 
        AND a.dbsj BETWEEN to_date('2025-07-01', 'YYYY-MM-DD') AND (to_date('2025-07-01', 'YYYY-MM-DD') + interval '1 month' - interval '1 day') 
        AND a.xzqh IN ('411122') 
    )
    
    select * 
    from latest_records 
    where djlx = 'djlx001' 
    AND qllx = 'qllx036'



now I found the result with row number 2 records, but the a.bdcdy_id, b.bddjzmh_wz look like not the same, I have recheck again and again and still did not figure out where is going wrong. This is the result record:





Am I missing something?
                                

Dolphin (939 rep)

Jul 5, 2025, 02:36 PM

0 votes

1 answers

206 views

How to append 2 default values to a JSONB column in Postgres SQL 14?

postgresql query postgresql-14 jsonb

Previously I had 4 default values for a jsonb column in my table cities. create_table "cities", force: :cascade do |t| t.jsonb "config", default: {"A"=>5, "B"=>7, "C"=>10, "D"=>10} Now I want to add 2 more default values `"E"=10` and `"F"=0` to this column without changing the values of A,B,C and D....

                                  Previously I had 4 default values for a jsonb column in my table cities.

    create_table "cities", force: :cascade do |t|
      
      t.jsonb    "config",          default: {"A"=>5, "B"=>7, "C"=>10, "D"=>10}

Now I want to add 2 more default values "E"=10 and "F"=0 to this column without changing the values of A,B,C and D. Note that the values of A,B,C and D have been updated differently for different cities. So I just want to append the default values of E and F to the column without changing the existing ones of A,B,C and D.

I cant't quite figure out the UPDATE query for this. I am quite new to this so its a little confusing. Thanks in advance for your help.

Priyanshu0007 (1 rep)

Dec 13, 2022, 06:07 AM • Last activity: Jun 29, 2025, 10:07 PM

0 votes

1 answers

218 views

Using ENUM as partition key when RANGE Partitioning PostgreSQL

postgresql partitioning postgresql-14 enum

I am working with PostgreSQL 14. I have a table called `Order` with a status column: ```sql CREATE TABLE "Order" ( "orderId" BIG SERIAL PRIMARY KEY NOT NULL, "orderDescription" TEXT NOT NULL, "statusId" SMALLINT NOT NULL ) PARTITION BY RANGE ("statusId"); ``` As shown, this table is partitioned by `...

I am working with PostgreSQL 14. I have a table called Order with a status column:

CREATE TABLE "Order" (
    "orderId" BIG SERIAL PRIMARY KEY NOT NULL,
    "orderDescription" TEXT NOT NULL,
    "statusId" SMALLINT NOT NULL
) PARTITION BY RANGE ("statusId");

As shown, this table is partitioned by RANGE based on the statusId. We need to place orders that have moved to a closed status in a separate partition. To achieve this, I implemented a simple trick. I defined a table called OrderStatus and assigned open statuses in the range of 10 to 20 and closed statuses in the range of 20 to 30:

CREATE TABLE "OrderStatus" (
    "statusId" SMALLINT PRIMARY KEY NOT NULL,
    "statusName" VARCHAR UNIQUE NOT NULL
);

INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (11, 'WAITING');   -- open
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (12, 'OPEN');      -- open
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (13, 'CANCELING'); -- open
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (14, 'SENDING');   -- open
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (21, 'FINISHED');  -- close
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (22, 'CANCELED');  -- close

Based on this, the partitions are defined as follows:

CREATE TABLE "Order_Open" PARTITION OF "Order" FOR VALUES FROM (10) TO (20);
CREATE TABLE "Order_Close" PARTITION OF "Order" FOR VALUES FROM (20) TO (30);

So far, everything works as expected. Now, we are undergoing a system refactor, and one of the proposed changes is to convert the order statuses to an enum type. However, if we do this, since the numeric values of enums (enumsortorder column in pg_enum table) are defined by PostgreSQL, it seems that the trick we used for categorizing open and closed statuses will no longer be applicable. Here are the potential solutions I have considered, each with its own explanation, but I'm not sure if they are the best approaches: 1. Add a new boolean column isOpen to indicate whether the order is open or closed, and partition the table based on this column. I feel that having an additional column for this information is not ideal. 2. Manipulate the enumsortorder values of the enum in the pg_enum table to retain our categorization trick (I haven't tried this yet to see if it's possible). This approach seems risky and not quite right to me. 3. Change the partitioning type to LIST. While this does not pose significant issues, if we need to add a new value to the enum in the future, we would have to change the table definitions, which is not appealing. 4. Abandon the idea of using enum altogether and steel use statusId of OrderStatus table. I feel that for cases where we need to partition a table based on a column, using enum might not be suitable, especially for RANGE partitioning. Maybe there's a best practice for this situation that I'm not aware of. If anyone can provide insights or advice on this, I would greatly appreciate it.

Mofarah (35 rep)

Jun 22, 2024, 01:06 PM • Last activity: Jun 20, 2025, 10:03 AM

0 votes

0 answers

22 views

how to avoid the postgresql 14 shared buffer swap out

postgresql-14 shared-buffers

I am facing a issue that the sql first time query was slow. I read the execution plan and found out the index are not in shared buffers the first time query. then I prewarm the index like this: SELECT pg_prewarm('public.pk_w_03002_kt_h','buffer'); and check the index that exists in shared buffer: SE...

                                  I am facing a issue that the sql first time query was slow. I read the execution plan and found out the index are not in shared buffers the first time query. then I prewarm the index like this:

    SELECT pg_prewarm('public.pk_w_03002_kt_h','buffer');

and check the index that exists in shared buffer:

    SELECT
         c.relname,
         pg_size_pretty(count(*) * 8192) as buffered,
         round(100.0 * count(*) / (SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer, 1) AS buffer_percent,
         round(100.0 * count(*) * 8192 / pg_table_size(c.oid), 1) AS percent_of_relation
     FROM pg_class c
     INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
     INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
     GROUP BY c.oid, c.relname
     ORDER BY 3 DESC
     LIMIT 10;

Now I facing a issue that the cached index will be swap out sometimes. Is it possible to keep the index always in the shared buffer? the index size was 60MB and the shared buffer config is 8GB. it still not solved the first time query slow issue.

Dolphin (939 rep)

May 20, 2025, 03:51 AM

1 votes

0 answers

38 views

how to make the execution plan as same with the actual query in PostgreSQL 14

explain postgresql-14

When I use the execution plan in PostgreSQL 14, I found the execution plan much faster than the actual query, this is how I generate the execution plan: explain (analyze true, timing true) long sql... the execute plan takes 16s, and the actual query takes 23s. and the timing in execute plan is 16s....

                                  
When I use the execution plan in PostgreSQL 14, I found the execution plan much faster than the actual query, this is how I generate the execution plan:

    explain (analyze true, timing true) long sql...

the execute plan takes 16s, and the actual query takes 23s. and the timing in execute plan is 16s. is it possible to make the explain time keep the same with actual sql(if different means that actual run are diff with the explain, it may mistake the user)? maybe we should lock the explain.

Dolphin (939 rep)

May 10, 2025, 05:41 AM • Last activity: May 10, 2025, 06:57 AM

0 votes

0 answers

31 views

is it possible to make the CTE parallel in PostgreSQL 14

cte postgresql-14 parallel

I have an analyse query in PostgreSQL 14 that use many CTEs, like this: with a0 as (select count(*) from ccc0) , a1 as (select count(*) from ccc1) , a2 as (select count(*) from ccc2) , a3 as (select count(*) from ccc3) , a4 as (select count(*) from ccc4) , a5 as (select count(*) from ccc5) , a6 as (...

                                  I have an analyse query in PostgreSQL 14 that use many CTEs, like this:


    with
    a0 as (select count(*) from ccc0) ,   
    a1 as (select count(*) from ccc1) ,  
    a2 as (select count(*) from ccc2) ,  
    a3 as (select count(*) from ccc3) ,  
    a4 as (select count(*) from ccc4) ,  
    a5 as (select count(*) from ccc5) ,  
    a6 as (select count(*) from ccc6) ,  
    a7 as (select count(*) from ccc7) ,  
    a8 as (select count(*) from ccc8) ,  
    a9 as (select count(*) from ccc9) ,  
    a10 as (select count(*) from ccc10) ,   
    a11 as (select count(*) from ccc11) ,  
    a12 as (select count(*) from ccc12) ,  
    a13 as (select count(*) from ccc13) ,  
    a14 as (select count(*) from ccc14) ,  
    a15 as (select count(*) from ccc15) ,  
    a16 as (select count(*) from ccc16) ,  
    a17 as (select count(*) from ccc17) ,  
    a18 as (select count(*) from ccc18) ,  
    a19 as (select count(*) from ccc19) ,  
    a20 as (select count(*) from ccc20) ,   
    a21 as (select count(*) from ccc21) ,  
    a22 as (select count(*) from ccc22) ,  
    a23 as (select count(*) from ccc23)   
    select * from a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21,a22,a23;



I read some docs that tell set max_parallel_workers_per_gather can make the CTE parallel https://github.com/digoal/blog/blob/master/201903/20190317_17.md , and the https://www.postgresql.org/docs/current/parallel-safety.html  docs told:




> The following operations are always parallel restricted:
> 
> Scans of common table expressions (CTEs).
> 
> Scans of temporary tables.
> 
> Scans of foreign tables, unless the foreign data wrapper has an
> IsForeignScanParallelSafe API that indicates otherwise.
> 
> Plan nodes that reference a correlated SubPlan.




is it possible to make the CTE parallel? now each CTE takes more than 7s make the query so slow. I have tried to add index and the result of each CTE more than 100m rows(the data could not filter anymore). if the parallel works, the result can be fetched 7s, if not, the result will take minites to fetch the result.












                                

Dolphin (939 rep)

May 10, 2025, 04:06 AM

0 votes

0 answers

19 views

postgresql row number function with large amount of result

postgresql-14

I have a sql like this in PostgreSQL 14: SELECT biz_id login_time, ROW_NUMBER() OVER (PARTITION BY biz_id ORDER BY login_time DESC) AS priority_rn FROM example then I need to get the first row in the outer query. Now I facing a issue that the inner query result too large(>3000000 rows), it look like...

                                  I have a sql like this in PostgreSQL 14:

    SELECT 
            biz_id
            login_time,
            ROW_NUMBER() OVER (PARTITION BY biz_id ORDER BY login_time DESC) AS priority_rn
        FROM example

then I need to get the first row in the outer query. Now I facing a issue that the inner query result too large(>3000000 rows), it look like this row number have a performance issue(>4min to get the result). is it possible to optimize this query? or I need to change to another way?

Dolphin (939 rep)

May 8, 2025, 09:04 AM

0 votes

0 answers

30 views

Postgres - would multiple triggers for different record types on the same table improve performance?

postgresql postgresql-14

I have a trigger currently defined like this... CREATE TRIGGER trig1 AFTER INSERT ON public.posMessage FOR EACH ROW EXECUTE PROCEDURE public.parsemess(); The procedure `parsemess()` will parse one of the fields(a long text string) into individual fields and insert them into a 2nd table. The first th...

                                  I have a trigger currently defined like this...

    CREATE TRIGGER trig1
    AFTER INSERT ON public.posMessage
    FOR EACH ROW EXECUTE PROCEDURE public.parsemess();

The procedure parsemess() will parse one of the fields(a long text string) into individual fields and insert them into a 2nd table.

The first thing parsemess() does is split the incoming messages into different types and then parse each type slightly differently depending on message type. I think it may be possible to split the processing earlier into different processes by having multiple triggers on the same table, each for a given message type. Something like the following...

    CREATE TRIGGER tempA
        AFTER INSERT
        ON public.posMessage
        FOR EACH ROW WHEN (new.messageType like '%COW%' )
        EXECUTE PROCEDURE public.parsemessA();	 
    
    CREATE TRIGGER tempB
        AFTER INSERT
        ON public.posMessage
        FOR EACH ROW WHEN (new.messageType like '%DOG%' )
        EXECUTE PROCEDURE public.parsemessB();	 
    
    CREATE TRIGGER tempC
        AFTER INSERT
        ON public.posMessage
        FOR EACH ROW WHEN (new.messageType like '%CAT%' )
        EXECUTE PROCEDURE public.parsemessC();	 

Would this give me any performance benefits? My thinking (probably flawed) is that Postgres would split the processing up into one process for each trigger making the overall processing a lot faster?
                                

ConanTheGerbil (1303 rep)

Apr 23, 2025, 07:18 PM

0 votes

1 answers

544 views

Convert byte to certain unit

postgresql-14

What's the function to convert byte value to certain unit in PostgreSQL-14? for example: Byte Convert Converted Value Unit Value ----- ---- ----- 1 MB 0.000001 3000000 GB 0.0029296875

                                  What's the function to convert byte value to certain unit in PostgreSQL-14?

for example:

    Byte     Convert   Converted
    Value    Unit      Value
    -----    ----      -----
    1        MB        0.000001
    3000000  GB        0.0029296875

DevDon (53 rep)

Mar 3, 2023, 08:31 AM • Last activity: Apr 17, 2025, 11:03 AM

1 votes

1 answers

64 views

Poor performance bucketing historical data with DISTINCT

postgresql query-performance distinct postgresql-14

I'm running into performance problems with a query that buckets a set of historical records of daily counts into a form that can be rendered easily as a daily graph. ## Context Essentially, the table in question is a record of daily audit reports where the number of problems of varying seriousness are reported for a set of location IDs. Each record in the table contains the counts of problems found during the audit at one location on one day, of several severities.

CREATE TABLE IF NOT EXISTS problem_reports (
    internal_id      bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    day              timestamp         NOT NULL,
    location_id      varchar(16)       NOT NULL,
    bad              integer DEFAULT 0 NOT NULL,
    awful            integer DEFAULT 0 NOT NULL,
    catastrophic     integer DEFAULT 0 NOT NULL
);

Since audits are not performed at every location on every day, I saved space by only recording rows for days that an audit has been performed. If the number of problems found was zero, I record a row with all zero counts for that location, to indicate that the number of problems of each severity is now zero as of that date. The output I need from the query is a running daily total of all problems found at all locations, by severity. So, if an audit wasn't performed for a location on a given day, the query needs to "look back" to find the most recent audit for that location, and include that in the aggregation. I achieved this using a DISTINCT clause:

WITH days AS (
    SELECT GENERATE_SERIES(
        DATE_TRUNC('day', '2025-03-01'::date),
        DATE_TRUNC('day', '2025-03-07'::date),
        '1 day'::interval
    ) AS day
),
counts AS (
    SELECT DISTINCT ON (days.day, pr.location_id)
        days.day, pr.location_id, pr.bad, pr.awful, pr.catastrophic
    FROM problem_reports pr
        RIGHT JOIN days ON pr.day   Unique  (cost=46930.59..48855.59 rows=40000 width=86) (actual time=0.109..0.120 rows=18 loops=1)                                                                                                                                                                                       |
|        Output: (generate_series(date_trunc('day'::text, ('2025-03-01'::date)::timestamp with time zone), date_trunc('day'::text, ('2025-03-07'::date)::timestamp with time zone), '1 day'::interval)), pr.location_id, pr.bad, pr.awful, pr.catastrophic, pr.day, pr.internal_id            |
|        Buffers: shared hit=1                                                                                                                                                                                                                                                                |
|        ->  Sort  (cost=46930.59..47572.26 rows=256667 width=86) (actual time=0.108..0.111 rows=32 loops=1)                                                                                                                                                                                  |
|              Output: (generate_series(date_trunc('day'::text, ('2025-03-01'::date)::timestamp with time zone), date_trunc('day'::text, ('2025-03-07'::date)::timestamp with time zone), '1 day'::interval)), pr.location_id, pr.bad, pr.awful, pr.catastrophic, pr.day, pr.internal_id      |
|              Sort Key: (generate_series(date_trunc('day'::text, ('2025-03-01'::date)::timestamp with time zone), date_trunc('day'::text, ('2025-03-07'::date)::timestamp with time zone), '1 day'::interval)), pr.location_id, pr.day DESC, pr.internal_id DESC                             |
|              Sort Method: quicksort  Memory: 27kB                                                                                                                                                                                                                                           |
|              Buffers: shared hit=1                                                                                                                                                                                                                                                          |
|              ->  Nested Loop Left Join  (cost=0.00..11584.65 rows=256667 width=86) (actual time=0.048..0.077 rows=32 loops=1)                                                                                                                                                               |
|                    Output: (generate_series(date_trunc('day'::text, ('2025-03-01'::date)::timestamp with time zone), date_trunc('day'::text, ('2025-03-07'::date)::timestamp with time zone), '1 day'::interval)), pr.location_id, pr.bad, pr.awful, pr.catastrophic, pr.day, pr.internal_id|
|                    Join Filter: (pr.day   ProjectSet  (cost=0.00..5.03 rows=1000 width=8) (actual time=0.024..0.027 rows=7 loops=1)                                                                                                                                                                            |
|                          Output: generate_series(date_trunc('day'::text, ('2025-03-01'::date)::timestamp with time zone), date_trunc('day'::text, ('2025-03-07'::date)::timestamp with time zone), '1 day'::interval)                                                                       |
|                          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)                                                                                                                                                                             |
|                    ->  Materialize  (cost=0.00..21.55 rows=770 width=78) (actual time=0.003..0.004 rows=8 loops=7)                                                                                                                                                                          |
|                          Output: pr.location_id, pr.bad, pr.awful, pr.catastrophic, pr.day, pr.internal_id                                                                                                                                                                                  |
|                          Buffers: shared hit=1                                                                                                                                                                                                                                              |
|                          ->  Seq Scan on sto.problem_reports pr  (cost=0.00..17.70 rows=770 width=78) (actual time=0.016..0.019 rows=8 loops=1)                                                                                                                                             |
|                                Output: pr.location_id, pr.bad, pr.awful, pr.catastrophic, pr.day, pr.internal_id                                                                                                                                                                            |
|                                Buffers: shared hit=1                                                                                                                                                                                                                                        |
|Planning Time: 0.186 ms                                                                                                                                                                                                                                                                      |
|Execution Time: 0.188 ms                                                                                                                                                                                                                                                                     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

## The Problem This works fine for moderate numbers of days and locations, but as the number of those grow, the DISTINCT clause in the CTE starts to perform very poorly. I considered recording a row for every location on every day, to eliminate the need for the DISTINCT "look-back", but that will quickly result in more rows than I'd like to deal with, as the number of locations is expected to grow exponentially. In some cases, I need to query for only a subset of locations, and render a graph of that, so recording a preprocessed aggregate of just the daily counts to the DB wouldn'twork. Is there a more efficient way of storing and querying this type of historical data to produce the graph data? Any suggestions appreciated!

hampercm (111 rep)

Apr 1, 2025, 06:49 PM • Last activity: Apr 4, 2025, 09:45 PM

1 votes

1 answers

821 views

Create user mapping without password how to configure authentication

postgresql database-design psql postgresql-14 postgresql-fdw

I am trying to create a user mapping in PostgreSQL without a password, but I am encountering an error that says. local_db=> select * from employee; ERROR: could not connect to server "testmachine02" DETAIL: connection to server at "192.168.56.10", port 5432 failed: fe_sendauth: no password supplied...

                                  I am trying to create a user mapping in PostgreSQL without a password, but I am encountering an error that says.

    local_db=> select * from employee;
    ERROR:  could not connect to server "testmachine02"
    DETAIL:  connection to server at "192.168.56.10", port 5432 failed: fe_sendauth: no password supplied

Here is the command that I used to create the user mapping:

    CREATE USER MAPPING for app_user SERVER testmachine02 OPTIONS (password_required 'false');

I also created a pgpass file under /root/.pgpass with the following entries:

    localhost:5432:local_db:app_user:app_user123
    192.168.56.10:5432:admin:admin:admin123
    192.168.56.10:5432:remote_db:test:test123

Despite these steps, I am still unable to access the table without a password. How can I create a user mapping without a password and access the table?

Aymen Rahal (11 rep)

Feb 18, 2023, 08:26 PM • Last activity: Mar 11, 2025, 01:02 PM

-1 votes

1 answers

520 views

Index row size 2976 exceeds in postgresql

postgresql-14

We have seen some of the indexes are getting failing with below error. ERROR: index row size 2976 exceeds byte version 4 maximum 2704 for index "idx1" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing. In...

                                  We have seen some of the indexes are getting failing with below error.

ERROR:  index row size 2976 exceeds byte version 4 maximum 2704 for index "idx1"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
       Consider a function index of an MD5 hash of the value, or use full text indexing.

Index : 
Create index idx1 on table1 using btree(upper(col1));

Table column length:
4000
pg_column_size for that column:
2931 

Can you please suggest how to fix this issue
                                

Ram (147 rep)

Nov 22, 2023, 08:04 AM • Last activity: Mar 9, 2025, 04:06 PM

1 votes

2 answers

1083 views

What is acceptable level of BTree index fragmentation in Postgres?

postgresql postgresql-14

I wonder at which point it makes sense to rebuild BTree index on relatively large tables (couple hundred GB). To me anything above 30% or 40% appears to be a good candidate, but I saw some say even 70% is not too bad for Postgres. I'm getting fragmentation from `pgstattuple.pgstatindex` function if...

                                  I wonder at which point it makes sense to rebuild BTree index on relatively large tables (couple hundred GB).

To me anything above 30% or 40% appears to be a good candidate, but I saw some say even 70% is not too bad for Postgres.  I'm getting fragmentation from pgstattuple.pgstatindex function if it matters.

a1ex07 (9060 rep)

Sep 8, 2023, 10:09 PM • Last activity: Feb 16, 2025, 09:51 PM

2 votes

0 answers

308 views

like_regex vs jsonb_build_object performance

postgresql-performance postgresql-14 jsonb json-path

I'm working with Postges 14.8 and using JSONB to store data. The queries that I'm working with perform filtering and JOIN'ing based on stuff in this JSONB column named `data`. We need to perform JOIN's to get the data we need. The setup is simple. We have one table that is partitioned based on one o...

I'm working with Postges 14.8 and using JSONB to store data. The queries that I'm working with perform filtering and JOIN'ing based on stuff in this JSONB column named data. We need to perform JOIN's to get the data we need. The setup is simple. We have one table that is partitioned based on one of our columns and then we JOIN rows in different partitions. Each partition has the same amount of rows in our tests - around 8 mln. We'd like to have one place to INSERT rows and partitioning improved our performance. I'm using a gin index like this: CREATE INDEX ix_tab_data ON public.tab USING gin (data jsonb_path_ops); And a btree index for pagination: CREATE INDEX ix_tab_created_timestamp ON public.tab USING btree (created_timestamp DESC); They're then inherited by partitions. Most queries look something like this:

SELECT User.data->>'Name' AS User_Name
, User.data->>'Surname' AS User_Surname
, User.data->>'Gender' AS User_Gender
, Address.data->>'StreetName' AS Address_StreetName
 FROM public.tab_user AS User
 JOIN public.tab_address AS Address ON Address.data @> jsonb_build_object('Id', User.data->'AddressId')
 WHERE
  Address.data @> jsonb_build_object('StreetName', 'Test')
 ORDER BY User.created_timestamp DESC
 LIMIT 500
 OFFSET 1000;

Query plan:

Limit  (cost=10000001275.74..10000001829.33 rows=500 width=136) (actual time=193.677..293.630 rows=500 loops=1)
  Buffers: shared hit=106258
  ->  Nested Loop  (cost=10000000168.54..149189116161.91 rows=125713639119 width=136) (actual time=0.333..293.527 rows=1500 loops=1)
        Buffers: shared hit=106258
        ->  Index Scan using tab_user_created_timestamp_idx on tab_user user  (cost=0.43..9720087.72 rows=8556337 width=205) (actual time=0.012..7.544 rows=8728 loops=1)
              Buffers: shared hit=8754
        ->  Bitmap Heap Scan on tab_address address  (cost=168.11..15972.39 rows=14692 width=373) (actual time=0.032..0.032 rows=0 loops=8728)
              Recheck Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @> jsonb_build_object('StreetName', 'Test')))
              Rows Removed by Index Recheck: 0
              Heap Blocks: exact=1502
              Buffers: shared hit=97504
              ->  Bitmap Index Scan on tab_address_data_idx  (cost=0.00..164.44 rows=14692 width=0) (actual time=0.030..0.030 rows=0 loops=8728)
                    Index Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @> jsonb_build_object('StreetName', 'Test')))
                    Buffers: shared hit=96002
Planning:
  Buffers: shared hit=2
Planning Time: 0.175 ms
Execution Time: 293.705 ms

GIN indexes are used and query plans look fine. From what I saw online this @> jsonb_build_object method is recommended by most. However when we've done tests by sending many queries we found they don't perform as fast as we'd like to, so we've started working on improvements. Since jsonb_build_object creates a new JSONB and then we check for containment we've tried to use a json_path. So the query looked like this:

SELECT User.data->>'Name' AS User_Name
, User.data->>'Surname' AS User_Surname
, User.data->>'Gender' AS User_Gender
, Address.data->>'StreetName' AS Address_StreetName
 FROM public.tab_user AS User
 JOIN public.tab_address AS Address ON Address.data@>jsonb_build_object('Id', User.data->'AddressId')
 WHERE
  Address.data @? '$.StreetName ? (@ == "Test")'
 ORDER BY User.created_timestamp DESC
 LIMIT 500
 OFFSET 1000;

Query plan:

Limit  (cost=10000001273.23..10000001825.58 rows=500 width=136) (actual time=189.903..287.950 rows=500 loops=1)
  Buffers: shared hit=106258
  ->  Nested Loop  (cost=10000000168.54..148874820513.06 rows=125713639119 width=136) (actual time=0.207..287.849 rows=1500 loops=1)
        Buffers: shared hit=106258
        ->  Index Scan using tab_user_created_timestamp_idx on tab_user user  (cost=0.43..9720087.72 rows=8556337 width=205) (actual time=0.007..7.470 rows=8728 loops=1)
              Buffers: shared hit=8754
        ->  Bitmap Heap Scan on tab_address address  (cost=168.11..15935.66 rows=14692 width=373) (actual time=0.031..0.031 rows=0 loops=8728)
              Recheck Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @? '$."StreetName"?(@ == "Test")'::jsonpath))
              Rows Removed by Index Recheck: 0
              Heap Blocks: exact=1502
              Buffers: shared hit=97504
              ->  Bitmap Index Scan on tab_address_data_idx  (cost=0.00..164.43 rows=14692 width=0) (actual time=0.030..0.030 rows=0 loops=8728)
                    Index Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @? '$."StreetName"?(@ == "Test")'::jsonpath))
                    Buffers: shared hit=96002
Planning:
  Buffers: shared hit=2
Planning Time: 0.141 ms
Execution Time: 288.016 ms

Performance was the same as with jsonb_build_object. We've tried using like_regex instead of == in the json_path and surprisingly it was fastest. Current query looks like this:

SELECT User.data->>'Name' AS User_Name
, User.data->>'Surname' AS User_Surname
, User.data->>'Gender' AS User_Gender
, Address.data->>'StreetName' AS Address_StreetName
 FROM public.tab_user AS User
 JOIN public.tab_address AS Address ON Address.data@>jsonb_build_object('Id', User.data->'AddressId')
 WHERE
  Address.data @? '$.StreetName ? (@ like_regex "Test")'
 ORDER BY User.created_timestamp DESC
 LIMIT 500
 OFFSET 1000;

Query plan:

Limit  (cost=10050199374.23..10051799360.17 rows=500 width=136) (actual time=40.426..61.151 rows=500 loops=1)
  Buffers: shared hit=61138
  ->  Nested Loop  (cost=10046999402.34..402290157972457.19 rows=125713639119 width=136) (actual time=0.054..61.049 rows=1500 loops=1)
        Buffers: shared hit=61138
        ->  Index Scan using tab_user_created_timestamp_idx on tab_user user  (cost=0.43..9720087.72 rows=8556337 width=205) (actual time=0.008..6.836 rows=8728 loops=1)
              Buffers: shared hit=8754
        ->  Bitmap Heap Scan on tab_address address  (cost=46999401.91..47015169.46 rows=14692 width=373) (actual time=0.006..0.006 rows=0 loops=8728)
              Recheck Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @? '$."StreetName"?(@ like_regex "Test")'::jsonpath))
              Rows Removed by Index Recheck: 1
              Heap Blocks: exact=8744
              Buffers: shared hit=52384
              ->  Bitmap Index Scan on tab_address_data_idx  (cost=0.00..46999398.24 rows=14692 width=0) (actual time=0.003..0.003 rows=1 loops=8728)
                    Index Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @? '$."StreetName"?(@ like_regex "Test")'::jsonpath))
                    Buffers: shared hit=43640
Planning:
  Buffers: shared hit=2
Planning Time: 0.171 ms
Execution Time: 61.214 ms

All queries return the same rows but like_regex is fastest - which is counterintuitive. Around 16% of rows in public.tab_address have the StreetName we're looking for in our queries. The more JSONB filters in WHERE the bigger the performance difference. We are currently looking for an explanation of this behavior. From what I saw online, regex searches tend to be more resource-intensive. Maybe when combined with the jsonb_path_ops index, Postgres can optimize the path traversal and quickly identify relevant records before applying the regex filter? It seems like query #2 should be faster than a regex search. I guess the performance depends on how the jsonb_path_ops index handles equality checks versus pattern matching. Does someone have an idea why like_regex is faster than jsonb_build_object and == in json_path?

MRzeczkowski (21 rep)

Dec 12, 2023, 12:40 PM • Last activity: Feb 9, 2025, 06:02 PM

1 votes

1 answers

1387 views

Moving records from one table to another - a few records at a time

postgresql-14

I have been using this query (successfully) to move records from one table to another in a Postgres database. WITH moved_rows AS ( DELETE FROM tableB RETURNING * ) INSERT INTO tableA SELECT * FROM moved_rows; For relatively small row-counts this works fine, but as the number of rows in tableB grows...

                                  I have been using this query (successfully) to move records from one table to another in a Postgres database.

    WITH moved_rows AS 
    ( 
        DELETE FROM tableB
        RETURNING *
    ) 
    INSERT INTO tableA 
    SELECT * FROM moved_rows; 

For relatively small row-counts this works fine, but as the number of rows in tableB grows larger I need to move rows at a rate of a few million each time. I tried this....

    WITH moved_rows AS 
    ( 
        DELETE FROM tableB
        RETURNING *
    ) 
    INSERT INTO tableA 
    SELECT * FROM moved_rows LIMIT 100000; 

This inserts 100000 records into tableA, but deletes ALL records from tableB.

How can I transfer a fixed number of records from one table to another? Obviously I need to ensure that only those rows successfully moved from A to B actually get deleted from A. I have no particular selection criteria for exactly which records get moved, I just want to limit the time and resources for each transaction.

UPDATE - I've tried both of these variants, and neither works....

    WITH moved_rows AS 
    ( 
        DELETE FROM tableB LIMIT 100000
        RETURNING *
    ) 
    INSERT INTO tableA 
    SELECT * FROM moved_rows ;

Or

    WITH moved_rows AS 
    ( 
        DELETE FROM tableB 
        RETURNING * LIMIT 100000
    ) 
    INSERT INTO tableA 
    SELECT * FROM moved_rows ;

Both return syntax error at or near "LIMIT" 


                                

ConanTheGerbil (1303 rep)

Feb 28, 2023, 02:45 PM • Last activity: Feb 6, 2025, 10:00 AM

1 votes

1 answers

69 views

Adding a col to PostgreSQL database caused indefinite hang

postgresql postgresql-14

I recently deployed a migration to production that added a new column to a table. The migration was very simple and should have only locked the table for milliseconds: `ALTER TABLE "table_1" ADD COLUMN "last_status_update" TIMESTAMP(3);` However, it seemed to hang indefinitely when deployed, causing...

I recently deployed a migration to production that added a new column to a table. The migration was very simple and should have only locked the table for milliseconds: ALTER TABLE "table_1" ADD COLUMN "last_status_update" TIMESTAMP(3); However, it seemed to hang indefinitely when deployed, causing application downtime. During the downtime, I ran the following command and exported the results:

SELECT pid, age(clock_timestamp(), query_start), state, wait_event, query 
FROM pg_stat_activity 
WHERE state != 'idle' 
ORDER BY query_start;

Unfortunately, I didn't think to do something similar with pg_locks. When I manually killed the transaction and ran the same command by manually connecting to the database, it resolved almost immediately without issue. I can't figure out why the ALTER TABLE seemed to hang. I thought it might be a developer holding an open transaction as described in [this answer](https://dba.stackexchange.com/a/133047) , but I think we would have seen that idle in transaction query in pg_stat_activity if that was the case. The table in question table_1 is small, ~11k rows. There are no constraints besides a PRIMARY KEY constraint. No indexes besides primary key as well. No virtual columns or triggers. In case it's relevant, we're using Prisma as an ORM. I considered that maybe Prisma failed to release the [advisory lock that it holds during migrations](https://www.prisma.io/docs/orm/prisma-migrate/workflows/development-and-production#advisory-locking) , but I believe advisory would have shown up as the wait_event instead of relation. Anyone know what might be going on? Thanks in advance :)

mdole (11 rep)

Jan 27, 2025, 10:26 AM • Last activity: Jan 28, 2025, 02:00 PM

0 votes

1 answers

217 views

PostgreSQL throwing "53100: could not extend file "base/xxxxx/xxxxx" with FileFallocate()" despite ample space in the volume

postgresql linux postgresql-14 postgresql-16

I recently upgraded PostgreSQL 14 to PostgreSQL 16 using --clone method. Both my data directories (old & new cluster) were on same volume. After few days, I started receiving below error. > 53100: could not extend file "base/160560/t64_168297303" with FileFallocate(): No space left on device Time wh...

                                  I recently upgraded PostgreSQL 14 to PostgreSQL 16 using --clone method. Both my data directories (old & new cluster) were on same volume. After few days, I started receiving below error.

> 53100: could not extend file "base/160560/t64_168297303" with FileFallocate(): No space left on device

Time when errors occur, I checked the volume free size using (df -hT) and found that more than 30% of space is available in the volume. In all cases, the query which gives this error is CREATE TEMP TABLE query. I also checked free iondes during same time (df -i) and found that there were enough available.

I also removed old data directory but that did not resolve the error.

I have one primary server and 1 replica server in async replication using patroni.

PostgreSQL Version : PostgreSQL 16.6 (Ubuntu 16.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

I was able to manually create a large file of 40 GB using fallocate successfully which pushed the used space on volume up to 96%. It is an EBS volume attached to EC2 which storing nothing else but PG 16 data directory which has symbolic link of pg_wal pointing to different volume.

I kept volume size at 96% and waited for a period where the error usually occurs but it didn't. One thing to mention: after removing the old PG 14 data directory, the error has occurred only once. Earlier, there were many occurrences but not continuous.

UdIt Solanki (135 rep)

Jan 9, 2025, 06:23 AM • Last activity: Jan 10, 2025, 09:34 AM

Showing page 1 of 20 total questions