Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

137 views

Postgis distance filter does not work as expected

I have a issue with simple PostGIS query. Here is a query: SELECT id, location, ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) AS distance FROM pins WHERE ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031 ORDER BY distance ASC, i...

                                  I have a issue with simple PostGIS query.

Here is a query:

    SELECT 
      id,
      location,
      ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) AS distance
    FROM pins
    WHERE
      ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031
    ORDER BY distance ASC, id ASC 
    LIMIT 11;

And here is a result:

 

    id  |                      location                      |     distance     
    -----+----------------------------------------------------+------------------
      58 | 0101000020E6100000E253B4CB61C1E13FB7ED0CC45974EF3F |  6008000.1836031
     162 | 0101000020E6100000DE133C027D95E03F3C15FD84A6BAEF3F | 6008157.46639754
     186 | 0101000020E6100000D339D99403BFE33F541C1E242AD5EE3F |  6008257.4084038
      55 | 0101000020E6100000B8042A79A500E63F775E82BA1908EE3F | 6008878.66960698
      98 | 0101000020E6100000CD48F884CD1BED3FAD51BA15F017EC3F | 6008923.81556693
      56 | 0101000020E6100000EECE76EEFE68E63F1681C811377DED3F | 6010325.52640522
     195 | 0101000020E6100000EB87B04B4DC0E33F0AEAE22BECADED3F | 6012118.48976013
     100 | 0101000020E6100000603039EEF27FD83FE70E98797AAEEF3F | 6012318.05377066
     108 | 0101000020E610000045BA52BB3EA0E23F4B96A28081D5ED3F | 6012631.63019378
     117 | 0101000020E61000003ABE6BC7FAD9D43F4D1CF4D130EEEF3F |  6013175.7742802
      91 | 0101000020E610000093B62A082965ED3F911CF0DA8FBFEA3F | 6013178.70288832

For some reason row with id = 58 is returned and its distance is 6008000.1836031 despite I've used where filter ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031

I guess Postgis calculates distance with greater precision?
How I can avoid this situation?
                                

user606521 (1415 rep)

Aug 25, 2016, 08:49 AM • Last activity: Aug 4, 2025, 11:05 AM

5 votes

2 answers

2354 views

List PostgreSQL extensions per database

postgresql postgresql-9.5 postgresql-extensions

I have a Postgres 9.5 server with 30 databases. I would like to find out which database uses which extension(s) in one go, preferably in a query that is usable in pgAdmin. I know that ```SELECT * FROM pg_extension``` gives me the extensions for the database I am connected to. I also know that ```SEL...

I have a Postgres 9.5 server with 30 databases. I would like to find out which database uses which extension(s) in one go, preferably in a query that is usable in pgAdmin. I know that

* FROM pg_extension

gives me the extensions for the database I am connected to. I also know that

* FROM pg_available_extensions

lists, well, all available extensions. I also know that

* FROM pg_database

lists all existing databases. **Question** How can I create a list that includes the database name and the extensions used in each database?

copossum (151 rep)

May 17, 2021, 07:36 AM • Last activity: Aug 1, 2025, 04:08 PM

0 votes

1 answers

180 views

postgresql has more DB connection in staging servers than production server

postgresql-9.5

We have postgresql 9.5 where we use it for an application confluence. in staging and in production both max connection is same. But we see staging uses more connection than production e.g staging has 170 connection to DB and production has 105.where as staging is very rarely used. and among 170 conn...

                                  We have postgresql 9.5 where we use it for an application confluence.
in staging and in production both max connection is same.
But we see staging uses more connection than production e.g staging has 170 connection to DB and production has 105.where as staging is very rarely used.
and among 170 connection 169 is idle.

can any body explain this behavior where highly used Db has less used connection and rarely used application has more used connection.
                                

Debasish Sahoo (1 rep)

Dec 26, 2018, 03:25 PM • Last activity: Jun 30, 2025, 01:03 PM

2 votes

1 answers

1186 views

Does VACUUM FULL ANALYZE a partition locks the main table in postgresql?

postgresql postgresql-9.5 vacuum

I'm using postgresql 9.5 and partman for partitioning tables. there is this table that has 30 partitions and volume of one of them are too higher than others. so I need to full vacuum this partition but I don't want to lock the main table. is it possible?

                                  I'm using postgresql 9.5 and partman for partitioning tables.
there is this table that has 30 partitions and volume of one of them are too higher than others. so I need to full vacuum this partition but I don't  want to lock the main table.
is it possible?
                                

Alireza (123 rep)

Dec 22, 2018, 12:22 PM • Last activity: Jun 19, 2025, 12:04 PM

0 votes

1 answers

4979 views

Drop tables but space not claimed in postgres 12

upgrade postgresql-9.5 disk-space postgresql-12

I have upgraded Postgresql 9.5 to Postgresql 12.4 a few days back using pg_upgrade utility with link (-k) option. So basically I am having two data directories i.e. One is old data directory (v9.5) and the current one in running state (v12.4). Yesterday I have dropped two tables of size 700GB and 30...

                                  I have upgraded Postgresql 9.5 to Postgresql 12.4 a few days back using pg_upgrade utility with link  (-k) option. 

So basically I am having two data directories i.e. One is old data directory (v9.5) and the current one in running state (v12.4).

Yesterday I have dropped two tables of size 700GB and 300GB. 

After connecting to postgres using psql utility I can see database size whose tables was dropped got decreased (with \l+ ) but what is making me worry is that only a few ~50GBs have been freed from storage partition. 

I have run vacuumdb only on that database but no luck. I have checked if any deleted open file is there on OS level using lsof but there is none.

Note : Still I have not deleted old data directory (v9.5). I am not sure if deleting will impact my running postgres 12 as I have used 'link' option in pg_upgrade.

Looking for the solution.

erTugRul (153 rep)

Nov 25, 2020, 01:31 AM • Last activity: Jun 9, 2025, 06:09 AM

5 votes

2 answers

623 views

PostgreSQL - How does multicolumn B-Tree index work with order by on 1st column and IN lookup for 2nd?

postgresql index performance postgresql-9.5 postgresql-performance

I created such table (similar to example from http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability ) CREATE TABLE scale_data ( section NUMERIC NOT NULL, id1 NUMERIC NOT NULL, -- unique values simulating ID or Timestamp id2 NUMERIC NOT NULL -- a kind of Type );...

                                  I created such table (similar to example from http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability  )

    CREATE TABLE scale_data (
       section NUMERIC NOT NULL,
       id1     NUMERIC NOT NULL, -- unique values simulating ID or Timestamp
       id2     NUMERIC NOT NULL -- a kind of Type
    );

Populate it with: 

    INSERT INTO scale_data
    SELECT sections.sections, sections.sections*10000 + gen.gen
         , CEIL(RANDOM()*100) 
      FROM GENERATE_SERIES(1, 300)     sections,
           GENERATE_SERIES(1, 90000) gen
     WHERE gen   Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..311588.74 rows=132353 width=11) (actual time=0.045..5.060 rows=500 loops=1)"
    "        Index Cond: (id2 = '50'::numeric)"
    "        Heap Fetches: 0"
    "Planning time: 0.103 ms"
    "Execution time: 5.177 ms"


Select#2 --more values in IN - plan has changed

    select id2 from scale_data 
    where id2 in (50, 52)
    order by id1 desc
    limit 500

Explain analyze#2: 

    "Limit  (cost=0.56..857.20 rows=500 width=11) (actual time=0.061..8.703 rows=500 loops=1)"
    "  ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..445780.74 rows=260190 width=11) (actual time=0.059..8.648 rows=500 loops=1)"
    "        Filter: (id2 = ANY ('{50,52}'::numeric[]))"
    "        Rows Removed by Filter: 25030"
    "        Heap Fetches: 0"
    "Planning time: 0.153 ms"
    "Execution time: 8.771 ms"

Why plan differs? 
Why in #1 it does show like **Index condition**, but in #2 **Filter** and number of index scanned cells. 
Doesn't sql#1 traverse index in the same way like explain for sql#2 shows?

On real/production DB #2 works much slower, even if search by 2 keys separately is fast

PG 9.5

                                

ALZ (171 rep)

May 25, 2017, 03:13 PM • Last activity: Jun 1, 2025, 07:04 AM

9 votes

1 answers

2004 views

AWS RDS postrgres massive disk use, small tables

postgresql amazon-rds postgresql-9.5 database-size

I can't figure out why our AWS postgres server has consumed all of it's space. We just had to up the storage space allocated to it, but can't find any hint from postgres that it's using that much space. Amazon says that we've eaten up about 60GB in the last 2 weeks. Postgres says our whole DB is bar...

                                  I can't figure out why our AWS postgres server has consumed all of it's space. We just had to up the storage space allocated to it, but can't find any hint from postgres that it's using that much space.

Amazon says that we've eaten up about 60GB in the last 2 weeks. Postgres says our whole DB is barely over 5GB, the DB operations are predominantly INSERT and SELECT.

How can I track down and reclaim our storage?

Here's the output of some size commands in PSQL

    select pg_size_pretty(pg_database_size('my_db'));
    -- 3730 MB

    SELECT     pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
      FROM pg_class C
      LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
      WHERE nspname NOT IN ('pg_catalog', 'information_schema')
        AND C.relkind  'i'
        AND nspname !~ '^pg_toast'
      ORDER BY pg_total_relation_size(C.oid) DESC
      LIMIT 20;

Gives

    1540 MB
    1286 MB
    235 MB
    191 MB

Curiously the autovacuumer has never triggered on any of the tables (I'm assuming this is because we rarely delete a row?)
The number of dead rows is quite low relative to table size (these tables have row counts in the  hundreds of thousands or millions)

    SELECT
      vacuum_count, autovacuum_count, n_dead_tup
    FROM pg_stat_user_tables ORDER BY n_dead_tup DESC limit 5;

    0	0	161
    0	0	119
    1	0	76
    0	0	25
    0	0	11


Predominantly the operations on the database are INSERT and SELECT, with some things getting UPDATES, and very rarely a DELETE.
We use a lot of JSONB.

**UPDATE**

\l+ shows same as the other queries: 3730 MB
                                

ChrisJ (621 rep)

Aug 17, 2018, 01:39 AM • Last activity: May 31, 2025, 06:07 AM

32 votes

3 answers

27192 views

How to get the ID of the conflicting row in upsert?

postgresql upsert postgresql-9.5

I have a table `tag` with 2 columns: `id` (uuid) and `name` (text). I now want to insert a new tag into the table, but if the tag already exists, I want to simply get the `id` of the existing record. I assumed I could just use `ON CONFLICT DO NOTHING` in combination with `RETURNING "id"`: INSERT INT...

                                  I have a table tag with 2 columns: id (uuid) and name (text). I now want to insert a new tag into the table, but if the tag already exists, I want to simply get the id of the existing record.

I assumed I could just use ON CONFLICT DO NOTHING in combination with RETURNING "id":

    INSERT INTO
    	"tag" ("name")
    VALUES( 'foo' )
    ON CONFLICT DO NOTHING
    RETURNING "id";

But this returns an empty result set, if the tag with the name "foo" already exists.

I then changed the query to use a noop DO UPDATE clause:

    INSERT INTO
    	"tag" ("name")
    VALUES( 'foo' )
    ON CONFLICT ("name") DO UPDATE SET "name" = 'foo'
    RETURNING "id";

This works as intended, but it is somewhat confusing, because I'm just setting the name to the already existing value.

Is this the way to go about this problem or is there a simpler approach I'm missing?
                                

Oliver Salzburg (422 rep)

Feb 16, 2016, 03:59 PM • Last activity: May 27, 2025, 03:34 PM

0 votes

1 answers

370 views

pglogical library

postgresql replication transactional-replication postgresql-9.5 postgresql-9.6

I would like to try `pglogical` to set up an `archive server` from a live database, filtering with `INSERT`, `UPDATE` statements (`DELETE` statements will not be replicated in the `archive server`). I follow the [official documentation][1] but when i set `shared_preload_libraries` = `'pglogical'` in...

                                  I would like to try pglogical to set up an archive server from a live database, filtering with INSERT, UPDATE statements (DELETE statements will not be replicated in the archive server).  
I follow the official documentation  but when i set shared_preload_libraries = 'pglogical' in my configuration file (postgresql.conf) the server could not start.    

There is no pglogical file in ./usr/pgsql-9.6/lib/ directory.    

How setting up all prerequisites to start properly (with v9.5 and 9.6)?  
I am using PostgreSQL v9.5 and v9.6 for POC on CentOS 7.  

Many Thanks.
                                

Mika (361 rep)

Mar 4, 2018, 04:09 PM • Last activity: May 26, 2025, 03:03 PM

0 votes

1 answers

1255 views

Default Postgres privileges not showing on \ddp

postgresql-9.5

As far as I understand there are set of default Postgres public privileges. I have created user and granted him privilege using: ``` CREATE USER foo WITH PASSWORD 'foo'; \connect bar; GRANT SELECT ON ALL TABLES IN SCHEMA public to foo; ``` and I can connect to DB with no issues. Then I taught I must...

As far as I understand there are set of default Postgres public privileges. I have created user and granted him privilege using:

CREATE USER foo WITH PASSWORD 'foo';
\connect bar;
GRANT SELECT ON ALL TABLES IN SCHEMA public to foo;

and I can connect to DB with no issues. Then I taught I must have connection privilege in the default privilege set, since I didn't grant connect on database explicitly. When I do \ddp I get 0 rows, and as far as I understood it is a way of checking default privileges per [this page](https://www.postgresql.org/docs/9.0/sql-alterdefaultprivileges.html) . Can someone help me understand how connecting to the DB is permitted when I haven't explicitly allowed it?

Dusan Gligoric (51 rep)

Apr 16, 2020, 01:43 PM • Last activity: May 3, 2025, 06:09 PM

0 votes

1 answers

5986 views

Postgresql: Permission denied to schema to user created based on master with rights

permissions postgresql-9.5 users

I'm trying to write a function that eventually will rotate users. Currently I have the code below which works and creates a new user using the prior user which ultimately has the same rights as the master user, ie, can do everything. select mysch.dblink('dbname=mydb user=themasteruser password=abc12...

                                  I'm trying to write a function that eventually will rotate users. Currently I have the code below which works and creates a new user using the prior user which ultimately has the same rights as the master user, ie, can do everything.

    select mysch.dblink('dbname=mydb user=themasteruser password=abc123 connect_timeout=200000 host=localhost', 'CREATE USER newuname WITH PASSWORD ''pass1'' CREATEDB CREATEROLE;GRANT rds_superuser TO newuname;');

    DROP SERVER IF EXISTS fs_link_b CASCADE;
	CREATE SERVER fs_link_b FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5432', dbname 'mydb ', connect_timeout '200000');
	CREATE USER MAPPING FOR newuname SERVER fs_link_b OPTIONS ("user" 'newuname',password 'pass1');
	ALTER SERVER fs_link_b OWNER TO newuname;

However, when I make a connection with the above created user and I try to run the query below:

    select * from mysch.dblink('link_b', 'select usename from PG_USER limit 1;') as t(uu text);

I get this error:

> SQL Error : ERROR: permission denied for schema mysch

If I created the user based on a master user, then should it not have all rights as the master user as created above? Or Do I need to do a bunch of individual GRANTS still?

user1013388 (111 rep)

Apr 1, 2020, 03:55 AM • Last activity: Apr 16, 2025, 01:06 AM

0 votes

3 answers

346 views

Postgres Specific table access

postgresql postgresql-9.6 postgresql-9.5

We have a requirement in our Postgres Database, We want to give specific table permissions to a particular set of user, we also have airflow in our environment which syncs tables, but sometimes any new columns are added to a table so we have to drop the table, due to which the specific table access...

                                  We have a requirement in our Postgres Database, We want to give specific table permissions to a particular set of user, we also have airflow in our environment which syncs tables, but sometimes any new columns are added to a table so we have to drop the table, due to which the specific table access for the user is gone. Access to a specific table is given through GRANT. Can you guys suggest us a way in which specific table access can be given, and will remain if the table is dropped and recreated?
                                

Jatin Bisht (3 rep)

Dec 3, 2020, 06:42 AM • Last activity: Jan 25, 2025, 12:04 PM

0 votes

1 answers

705 views

How to calculate quantity of WAL in db without archiving?

postgresql-9.5 log-shipping

I would like to find out a quantity of WAL files which are being used in PostgreSQL database in last 24h. I was assigned to do Log-shipping but before I do it I want to know how many WAL files are being used in last 24hours in order to predict how much space I need on external storage. Unfortunately...

                                  I would like to find out a quantity of WAL files which are being used in PostgreSQL database in last 24h. I was assigned to do Log-shipping but before I do it I want to know how many WAL files are being used in last 24hours in order to predict how much space I need on external storage. 
Unfortunately pgmetrics doesn't have such information. Do you know how to do it?

 

    WAL Files:
    WAL Archiving?       no
    WAL Files:           5
    +--------------------+--------------+
    |            Setting |        Value |
    +--------------------+--------------+
    |          wal_level |      minimal |
    |    archive_timeout |            0 |
    |    wal_compression |          off |
    |       max_wal_size | 64 (1.0 GiB) |
    |       min_wal_size |   5 (80 MiB) |
    | checkpoint_timeout |          300 |
    |   full_page_writes |           on |
    |  wal_keep_segments |            0 |
    +--------------------+--------------+

                                

KUE (25 rep)

May 29, 2019, 12:22 PM • Last activity: Dec 30, 2024, 12:00 PM

2 votes

1 answers

498 views

How do I insert a record and put the ID into an existing record in another table?

postgresql insert postgresql-9.5

I want to insert a record in one table and save the resuting ID in a record I already have in another table. Which sounds easy. Simplistic example: create table aaa (id serial primary key, foo int); create table foos (id serial primary key, foo int, aaa_id int); Imagine foos has rows which I want to...

                                  I want to insert a record in one table and save the resuting ID in a record I already have in another table. Which sounds easy.

Simplistic example: 

    create table aaa (id serial primary key, foo int);
    create table foos (id serial primary key, foo int, aaa_id int);

Imagine foos has rows which I want to insert in aaa.  But I want a record of aaa.id stored in foos.aaa_id. 

Heres some code that definitely does not work:
    
    with rows as (
        insert into aaa (foo) 
            select foo from foos 
            returning foos.id as foos_id, aaa.id as aaa_id 
    )
    update foos
        set aaa_id = (select aaa_id from rows where foos_id = foos.id);

I can't find a way of getting the "returning" clause to return anything other than records from the table it inserted.

Surely there must be a simple way to do this?  
                                

Andy Jones (123 rep)

Jul 8, 2016, 11:18 AM • Last activity: Dec 21, 2024, 07:37 AM

2 votes

2 answers

487 views

How to select blocks with sequential data and aggregate the ids

postgresql postgresql-9.5 window-functions gaps-and-islands

I have the following table: id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int) -------------------------------------------------------------------------------------------------------------- 2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6...

                                  I have the following table:

     id(int) startDate(timestamp)         endDate(timestamp)            plan_id(int)  planned(bool)  machine(int)
    --------------------------------------------------------------------------------------------------------------
      2005  '2019-01-16 08:29:24.872736'  '2019-01-16 08:30:23.529706'    34          true              6
      2004  '2019-01-16 08:19:28.011148'  '2019-01-16 08:29:22.680828'    34          true              6
      2003  '2019-01-16 08:18:27.074312'  '2019-01-16 08:19:25.753475'    34          true              6
      2002  '2019-01-16 08:08:30.206288'  '2019-01-16 08:18:24.856308'    34          true              6
      2001  '2019-01-16 08:07:29.163124'  '2019-01-16 08:08:27.949013'    34          true              6
      2000  '2019-01-16 07:59:03.221309'  '2019-01-16 08:00:14.654391'    null        false             7
      1999  '2019-01-16 08:00:00.986367'  '2019-01-16 08:00:03.221309'    null        false             6
      1998  '2019-01-16 07:57:30.711044'  '2019-01-16 07:59:58.778444'    null        false             6
      1997  '2019-01-16 07:56:32.466508'  '2019-01-16 07:57:28.489287'    null        false             6
      1996  '2019-01-16 07:50:06.887349'  '2019-01-16 07:56:30.237725'    null        false             6
      1995  '2019-01-16 07:46:34.327582'  '2019-01-16 07:50:04.619592'    33          true              6
      1994  '2019-01-16 07:45:33.813483'  '2019-01-16 07:46:32.014849'    33          true              6
      1993  '2019-01-16 07:24:39.267365'  '2019-01-16 07:39:23.786911'    null        false             6
      1992  '2019-01-16 07:23:39.646218'  '2019-01-16 07:24:37.093414'    null        false             6
      1991  '2019-01-16 07:13:41.166337'  '2019-01-16 07:23:37.403375'    null        false             6
      1990  '2019-01-16 07:12:39.961234'  '2019-01-16 07:13:38.907838'    null        false             6
      1989  '2019-01-16 07:10:46.984236'  '2019-01-16 07:12:37.647108'    null        false             6
      1988  '2019-01-15 17:05:59.832834'  '2019-01-15 17:08:21.603931'    31          true              6
      1987  '2019-01-15 17:04:59.567046'  '2019-01-15 17:05:57.565188'    31          true              6
      1986  '2019-01-15 17:00:01.411266'  '2019-01-15 17:10:57.255158'    31          true              7

I have to select the IDs of the blocks of unplanned records for a specific machine. I have been trying using window function, unfortunately, I couldn't work out the logic of it!

The problem here is that since we have different machines, we cannot rely on sequential ids, just that the endDate of a sequence is very close to next startDate (it is ok to set a tolerance constant e.g. 3 seconds).

I would like to have a query where the result would be: the min startDate, the max endDate and the IDs of the block. For this sample with machine = 6, it would be:

    blockStartDate                blockEndDate                  ids
    -------------------------------------------------------------------------------
    "2019-01-16 07:50:06.887349" "2019-01-16 08:00:03.221309" [1999,1998,1997,1996]
    "2019-01-16 07:10:46.984236" "2019-01-16 07:39:23.786911" [1989,1990,1991,1992,1993]

Note that the answer, in this case, has sequential IDs but this is not always the case. I am working on providing real data where 2 machines are producing data at the same time and the ids become useless.
                                

Attyla Fellipe (23 rep)

Jan 17, 2019, 02:00 PM • Last activity: Nov 14, 2024, 01:44 AM

3 votes

2 answers

2618 views

How do I set the isolation level for SELECT statement?

postgresql select isolation-level snapshot-isolation postgresql-9.5

Using PostgreSQL 9.5 I would like to make sure that when I run a `SELECT` statement I get consistent results, i.e. if the table is modified while the `SELECT` is running, I want to get the state that was valid when the `SELECT` started. Basically, it's like: Take a snapshot of a table, and don't all...

                                  Using PostgreSQL 9.5 I would like to make sure that when I run a SELECT statement I get consistent results, i.e. if the table is modified while the SELECT is running, I want to get the state that was valid when the SELECT started.

Basically, it's like: Take a snapshot of a table, and don't allow updates to the table while the snapshot is being taken, and then deliver the snapshot, while updates are allowed again.

If I understood isolation levels correctly, REPEATABLE READ is what I'm looking for, isn't it?

Now my question is: How can I run a SELECT with REPEATABLE READ isolation level? Or am I missing something, and my approach is wrong? How would I solve this?

Golo Roden (422 rep)

Feb 4, 2016, 11:44 AM • Last activity: Aug 18, 2024, 02:04 PM

1 votes

1 answers

1509 views

how to call lo_unlink(lo) in batches

postgresql-performance postgresql-9.5 hibernate batch-processing

In **Postgres 9.5** I am tying to reproduce the postgres tool vacuumlo to get rid of unreferenced large object to free disk space. References are stored in **text** type columns by hibernate so using the tool as is is not an option. Looking at vacuumlo sources ( [vacuumlo.c][1]) I have created a *va...

                                  In **Postgres 9.5** I am tying to reproduce the postgres tool vacuumlo to get rid of unreferenced large object to free disk space.
References are stored in **text** type columns by hibernate so using the tool as is is not an option. 
Looking at vacuumlo sources ( vacuumlo.c ) I have created a *vacuum_l* table like so:

    CREATE TABLE vacuum_l AS SELECT oid AS lo, false as deleted FROM pg_largeobject_metadata;

and proceded to delete all rows from *vacuum_l* table wich oids are referenced by user tables in my database.
Now is safe to call *unlink(lo)* for all items remaining in the *vacuum_l* table.

Unfortunately the table is very big and, as reported in the vacuumlo.c source code:

>We don't want to run each delete as an individual transaction, because the commit overhead would be high.  However, since 9.0 the backend will acquire a lock per deleted LO, so deleting too many LOs per transaction risks running out of room in the shared-memory lock table. Accordingly, we delete up to transaction_limit LOs per transaction.

So far I tried to unlink all oids with this sql function:

```
CREATE OR REPLACE FUNCTION unlinkOrphanedLobs() returns void AS
$BODY$
DECLARE
count_variable int;
begin
    loop
        SELECT COUNT(*) INTO count_variable FROM vacuum_l WHERE deleted=false;
        EXIT WHEN count_variable  ERROR: out of shared memory
> 
> HINT: You might need to increase max_locks_per_transaction.

Why, is clearly explained by the code comments reported above.
This function wont close the transaction at the end of the loop.

How would it be the correct way to do it?

Thanks for any help.
                                

Luky (121 rep)

Mar 22, 2021, 04:43 PM • Last activity: Jun 2, 2024, 09:07 PM

5 votes

2 answers

30010 views

How do I select arrays that are not empty?

postgresql postgresql-9.5 null array condition

Why is this so tricky, what is token set to that it isn't equal to null nor an empty string? SELECT lexemes FROM ts_debug('This is a title') WHERE alias = 'asciiword'; lexemes --------- {} {} {} {titl} (4 rows) Ok.. So I want to get rid of `{}`, SELECT lexemes FROM ts_debug('This is a title') WHERE...

                                  Why is this so tricky, what is token set to that it isn't equal to null nor an empty string?

    SELECT lexemes
    FROM ts_debug('This is a title')
    WHERE alias = 'asciiword';
    
     lexemes 
    ---------
     {}
     {}
     {}
     {titl}
    (4 rows)

Ok.. So I want to get rid of {},

    SELECT lexemes
    FROM ts_debug('This is a title')
    WHERE alias = 'asciiword'
      AND lexemes  '{}'
      AND lexemes  ARRAY[]::text[]
      AND lexemes IS NOT NULL
      AND lexemes  ARRAY[' ']
      AND lexemes  ARRAY[null]::text[];

**I knew most of these wouldn't work.**, but I'm totally confused why  '{}' wouldn't work not  ARRAY[]::text;. How do I filter this out?
                                

Evan Carroll (65502 rep)

Sep 23, 2017, 04:11 AM • Last activity: May 29, 2024, 01:32 PM

15 votes

1 answers

30515 views

Changing foreign key to ON DELETE CASCADE with least amount of impact

postgresql foreign-key postgresql-9.5 online-operations

I have an existing foreign key that has `ON DELETE NO ACTION` defined. I need to change this foreign key to `ON DELETE CASCADE`. I can do this within a transaction: begin; alter table posts drop constraint posts_blog_id_fkey; alter table posts add constraint posts_blog_id_fkey foreign key (blog_id)...

                                  I have an existing foreign key that has ON DELETE NO ACTION defined. I need to change this foreign key to ON DELETE CASCADE. I can do this within a transaction:

    begin;
    alter table posts drop constraint posts_blog_id_fkey;
    alter table posts add constraint posts_blog_id_fkey foreign key (blog_id) references blogs (id) on update no action on delete cascade;
    commit;

The problem is that the posts table is large (4 million rows) which means that validating the foreign key can take a non-trivial amount of time (I've tested this with a copy of the database). Dropping/adding the foreign key [acquires an ACCESS EXCLUSIVE lock](https://www.postgresql.org/docs/9.5/static/sql-altertable.html#AEN72289)  on posts. So, adding the foreign key blocks *all* access to the posts table for a decent amount of time because the lock is held while constraint validation occurs. I need to perform an online migration (I don't have a dedicated downtime window).

I know that I can perform **2 transactions** to help with the check taking a long time:

    begin;
    alter table posts drop constraint posts_blog_id_fkey;
    alter table posts add constraint posts_blog_id_fkey foreign key (blog_id) references blogs (id) on update no action on delete cascade not valid;
    commit;
    
    begin;
    alter table posts validate constraint posts;
    commit;

The advantage of this approach is that the ACCESS EXCLUSIVE lock is held for a very short time for dropping/adding the constraint and then for validating the constraint only [a SHARE UPDATE EXCLUSIVE on posts and ROW SHARE lock on blogs](https://www.postgresql.org/docs/9.5/static/sql-altertable.html)  since I'm on Postgres 9.5.

Are there downsides to this? I know that adding NOT VALID to the constraints means that *existing* data is not validated, but any rows inserted/updated before the VALIDATE CONSTRAINT *will* be checked. Because the foreign key is dropped/added in the same transaction, is there any possibility of creating inconsistent data?

TheCloudlessSky (337 rep)

Oct 27, 2017, 02:02 PM • Last activity: May 16, 2024, 08:16 PM

10 votes

4 answers

27667 views

Postgresql extremely slow count (with index, simple query)

postgresql performance count postgresql-9.5 query-performance

I need to run these simple queries on a table with millions of rows: ``` SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123; ``` ``` SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL; ``` The count result...

I need to run these simple queries on a table with millions of rows:

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123;

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;

The count result for both queries, for project 123, is about 5M. I have an index in place on project_id, and also another index on (project_id, trashed_at):

"index_subscriptions_on_project_id_and_created_at" btree (project_id, created_at DESC)
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id, trashed_at DESC)

The problem is that both queries are extremely slow and take about 17s each. These are the results of EXPLAIN ANALIZE:

QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual time=17342.420..17342.420 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
         Recheck Cond: (project_id = 123)
         Rows Removed by Index Recheck: 23746378
         Heap Blocks: exact=131205 lossy=1480411
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74 rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
               Index Cond: (project_id = 123)
 Planning time: 0.090 ms
 Execution time: 17344.182 ms
(9 rows)

QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual time=17557.218..17557.218 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
         Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
         Rows Removed by Index Recheck: 23746273
         Heap Blocks: exact=131144 lossy=1480409
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55 rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
               Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Planning time: 0.084 ms
 Execution time: 17558.522 ms
(9 rows)

What is the problem? What can I do to improve the performance (i.e. count in a few seconds)?

collimarco (653 rep)

Aug 22, 2019, 11:54 AM • Last activity: Apr 10, 2024, 01:12 PM

Showing page 1 of 20 total questions