Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
1 answers
137 views
Postgis distance filter does not work as expected
I have a issue with simple PostGIS query. Here is a query: SELECT id, location, ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) AS distance FROM pins WHERE ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031 ORDER BY distance ASC, i...
I have a issue with simple PostGIS query. Here is a query: SELECT id, location, ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) AS distance FROM pins WHERE ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031 ORDER BY distance ASC, id ASC LIMIT 11; And here is a result: id | location | distance -----+----------------------------------------------------+------------------ 58 | 0101000020E6100000E253B4CB61C1E13FB7ED0CC45974EF3F | 6008000.1836031 162 | 0101000020E6100000DE133C027D95E03F3C15FD84A6BAEF3F | 6008157.46639754 186 | 0101000020E6100000D339D99403BFE33F541C1E242AD5EE3F | 6008257.4084038 55 | 0101000020E6100000B8042A79A500E63F775E82BA1908EE3F | 6008878.66960698 98 | 0101000020E6100000CD48F884CD1BED3FAD51BA15F017EC3F | 6008923.81556693 56 | 0101000020E6100000EECE76EEFE68E63F1681C811377DED3F | 6010325.52640522 195 | 0101000020E6100000EB87B04B4DC0E33F0AEAE22BECADED3F | 6012118.48976013 100 | 0101000020E6100000603039EEF27FD83FE70E98797AAEEF3F | 6012318.05377066 108 | 0101000020E610000045BA52BB3EA0E23F4B96A28081D5ED3F | 6012631.63019378 117 | 0101000020E61000003ABE6BC7FAD9D43F4D1CF4D130EEEF3F | 6013175.7742802 91 | 0101000020E610000093B62A082965ED3F911CF0DA8FBFEA3F | 6013178.70288832 For some reason row with id = 58 is returned and its distance is 6008000.1836031 despite I've used where filter ST_Distance_Sphere(location, ST_MakePoint(21.012228699999998, 52.2296756)) > 6008000.1836031 I guess Postgis calculates distance with greater precision? How I can avoid this situation?
user606521 (1415 rep)
Aug 25, 2016, 08:49 AM • Last activity: Aug 4, 2025, 11:05 AM
5 votes
2 answers
2354 views
List PostgreSQL extensions per database
I have a Postgres 9.5 server with 30 databases. I would like to find out which database uses which extension(s) in one go, preferably in a query that is usable in pgAdmin. I know that ```SELECT * FROM pg_extension``` gives me the extensions for the database I am connected to. I also know that ```SEL...
I have a Postgres 9.5 server with 30 databases. I would like to find out which database uses which extension(s) in one go, preferably in a query that is usable in pgAdmin. I know that
* FROM pg_extension
gives me the extensions for the database I am connected to. I also know that
* FROM pg_available_extensions
lists, well, all available extensions. I also know that
* FROM pg_database
lists all existing databases. **Question** How can I create a list that includes the database name and the extensions used in each database?
copossum (151 rep)
May 17, 2021, 07:36 AM • Last activity: Aug 1, 2025, 04:08 PM
0 votes
1 answers
180 views
postgresql has more DB connection in staging servers than production server
We have postgresql 9.5 where we use it for an application confluence. in staging and in production both max connection is same. But we see staging uses more connection than production e.g staging has 170 connection to DB and production has 105.where as staging is very rarely used. and among 170 conn...
We have postgresql 9.5 where we use it for an application confluence. in staging and in production both max connection is same. But we see staging uses more connection than production e.g staging has 170 connection to DB and production has 105.where as staging is very rarely used. and among 170 connection 169 is idle. can any body explain this behavior where highly used Db has less used connection and rarely used application has more used connection.
Debasish Sahoo (1 rep)
Dec 26, 2018, 03:25 PM • Last activity: Jun 30, 2025, 01:03 PM
2 votes
1 answers
1186 views
Does VACUUM FULL ANALYZE a partition locks the main table in postgresql?
I'm using postgresql 9.5 and partman for partitioning tables. there is this table that has 30 partitions and volume of one of them are too higher than others. so I need to full vacuum this partition but I don't want to lock the main table. is it possible?
I'm using postgresql 9.5 and partman for partitioning tables. there is this table that has 30 partitions and volume of one of them are too higher than others. so I need to full vacuum this partition but I don't want to lock the main table. is it possible?
Alireza (123 rep)
Dec 22, 2018, 12:22 PM • Last activity: Jun 19, 2025, 12:04 PM
0 votes
1 answers
4979 views
Drop tables but space not claimed in postgres 12
I have upgraded Postgresql 9.5 to Postgresql 12.4 a few days back using pg_upgrade utility with link (-k) option. So basically I am having two data directories i.e. One is old data directory (v9.5) and the current one in running state (v12.4). Yesterday I have dropped two tables of size 700GB and 30...
I have upgraded Postgresql 9.5 to Postgresql 12.4 a few days back using pg_upgrade utility with link (-k) option. So basically I am having two data directories i.e. One is old data directory (v9.5) and the current one in running state (v12.4). Yesterday I have dropped two tables of size 700GB and 300GB. After connecting to postgres using psql utility I can see database size whose tables was dropped got decreased (with \l+ ) but what is making me worry is that only a few ~50GBs have been freed from storage partition. I have run vacuumdb only on that database but no luck. I have checked if any deleted open file is there on OS level using lsof but there is none. Note : Still I have not deleted old data directory (v9.5). I am not sure if deleting will impact my running postgres 12 as I have used 'link' option in pg_upgrade. Looking for the solution.
erTugRul (153 rep)
Nov 25, 2020, 01:31 AM • Last activity: Jun 9, 2025, 06:09 AM
5 votes
2 answers
623 views
PostgreSQL - How does multicolumn B-Tree index work with order by on 1st column and IN lookup for 2nd?
I created such table (similar to example from http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability ) CREATE TABLE scale_data ( section NUMERIC NOT NULL, id1 NUMERIC NOT NULL, -- unique values simulating ID or Timestamp id2 NUMERIC NOT NULL -- a kind of Type );...
I created such table (similar to example from http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability ) CREATE TABLE scale_data ( section NUMERIC NOT NULL, id1 NUMERIC NOT NULL, -- unique values simulating ID or Timestamp id2 NUMERIC NOT NULL -- a kind of Type ); Populate it with: INSERT INTO scale_data SELECT sections.sections, sections.sections*10000 + gen.gen , CEIL(RANDOM()*100) FROM GENERATE_SERIES(1, 300) sections, GENERATE_SERIES(1, 90000) gen WHERE gen Index Only Scan Backward using id1_id2_idx on scale_data (cost=0.56..311588.74 rows=132353 width=11) (actual time=0.045..5.060 rows=500 loops=1)" " Index Cond: (id2 = '50'::numeric)" " Heap Fetches: 0" "Planning time: 0.103 ms" "Execution time: 5.177 ms" Select#2 --more values in IN - plan has changed select id2 from scale_data where id2 in (50, 52) order by id1 desc limit 500 Explain analyze#2: "Limit (cost=0.56..857.20 rows=500 width=11) (actual time=0.061..8.703 rows=500 loops=1)" " -> Index Only Scan Backward using id1_id2_idx on scale_data (cost=0.56..445780.74 rows=260190 width=11) (actual time=0.059..8.648 rows=500 loops=1)" " Filter: (id2 = ANY ('{50,52}'::numeric[]))" " Rows Removed by Filter: 25030" " Heap Fetches: 0" "Planning time: 0.153 ms" "Execution time: 8.771 ms" Why plan differs? Why in #1 it does show like **Index condition**, but in #2 **Filter** and number of index scanned cells. Doesn't sql#1 traverse index in the same way like explain for sql#2 shows? On real/production DB #2 works much slower, even if search by 2 keys separately is fast PG 9.5
ALZ (171 rep)
May 25, 2017, 03:13 PM • Last activity: Jun 1, 2025, 07:04 AM
9 votes
1 answers
2004 views
AWS RDS postrgres massive disk use, small tables
I can't figure out why our AWS postgres server has consumed all of it's space. We just had to up the storage space allocated to it, but can't find any hint from postgres that it's using that much space. Amazon says that we've eaten up about 60GB in the last 2 weeks. Postgres says our whole DB is bar...
I can't figure out why our AWS postgres server has consumed all of it's space. We just had to up the storage space allocated to it, but can't find any hint from postgres that it's using that much space. Amazon says that we've eaten up about 60GB in the last 2 weeks. Postgres says our whole DB is barely over 5GB, the DB operations are predominantly INSERT and SELECT. How can I track down and reclaim our storage? Here's the output of some size commands in PSQL select pg_size_pretty(pg_database_size('my_db')); -- 3730 MB SELECT pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20; Gives 1540 MB 1286 MB 235 MB 191 MB Curiously the autovacuumer has never triggered on any of the tables (I'm assuming this is because we rarely delete a row?) The number of dead rows is quite low relative to table size (these tables have row counts in the hundreds of thousands or millions) SELECT vacuum_count, autovacuum_count, n_dead_tup FROM pg_stat_user_tables ORDER BY n_dead_tup DESC limit 5; 0 0 161 0 0 119 1 0 76 0 0 25 0 0 11 Predominantly the operations on the database are INSERT and SELECT, with some things getting UPDATES, and very rarely a DELETE. We use a lot of JSONB. **UPDATE** \l+ shows same as the other queries: 3730 MB
ChrisJ (621 rep)
Aug 17, 2018, 01:39 AM • Last activity: May 31, 2025, 06:07 AM
32 votes
3 answers
27192 views
How to get the ID of the conflicting row in upsert?
I have a table `tag` with 2 columns: `id` (uuid) and `name` (text). I now want to insert a new tag into the table, but if the tag already exists, I want to simply get the `id` of the existing record. I assumed I could just use `ON CONFLICT DO NOTHING` in combination with `RETURNING "id"`: INSERT INT...
I have a table tag with 2 columns: id (uuid) and name (text). I now want to insert a new tag into the table, but if the tag already exists, I want to simply get the id of the existing record. I assumed I could just use ON CONFLICT DO NOTHING in combination with RETURNING "id": INSERT INTO "tag" ("name") VALUES( 'foo' ) ON CONFLICT DO NOTHING RETURNING "id"; But this returns an empty result set, if the tag with the name "foo" already exists. I then changed the query to use a noop DO UPDATE clause: INSERT INTO "tag" ("name") VALUES( 'foo' ) ON CONFLICT ("name") DO UPDATE SET "name" = 'foo' RETURNING "id"; This works as intended, but it is somewhat confusing, because I'm just setting the name to the already existing value. Is this the way to go about this problem or is there a simpler approach I'm missing?
Oliver Salzburg (422 rep)
Feb 16, 2016, 03:59 PM • Last activity: May 27, 2025, 03:34 PM
0 votes
1 answers
370 views
pglogical library
I would like to try `pglogical` to set up an `archive server` from a live database, filtering with `INSERT`, `UPDATE` statements (`DELETE` statements will not be replicated in the `archive server`). I follow the [official documentation][1] but when i set `shared_preload_libraries` = `'pglogical'` in...
I would like to try pglogical to set up an archive server from a live database, filtering with INSERT, UPDATE statements (DELETE statements will not be replicated in the archive server). I follow the official documentation but when i set shared_preload_libraries = 'pglogical' in my configuration file (postgresql.conf) the server could not start. There is no pglogical file in ./usr/pgsql-9.6/lib/ directory. How setting up all prerequisites to start properly (with v9.5 and 9.6)? I am using PostgreSQL v9.5 and v9.6 for POC on CentOS 7. Many Thanks.
Mika (361 rep)
Mar 4, 2018, 04:09 PM • Last activity: May 26, 2025, 03:03 PM
0 votes
1 answers
1255 views
Default Postgres privileges not showing on \ddp
As far as I understand there are set of default Postgres public privileges. I have created user and granted him privilege using: ``` CREATE USER foo WITH PASSWORD 'foo'; \connect bar; GRANT SELECT ON ALL TABLES IN SCHEMA public to foo; ``` and I can connect to DB with no issues. Then I taught I must...
As far as I understand there are set of default Postgres public privileges. I have created user and granted him privilege using:
CREATE USER foo WITH PASSWORD 'foo';
\connect bar;
GRANT SELECT ON ALL TABLES IN SCHEMA public to foo;
and I can connect to DB with no issues. Then I taught I must have connection privilege in the default privilege set, since I didn't grant connect on database explicitly. When I do \ddp I get 0 rows, and as far as I understood it is a way of checking default privileges per [this page](https://www.postgresql.org/docs/9.0/sql-alterdefaultprivileges.html) . Can someone help me understand how connecting to the DB is permitted when I haven't explicitly allowed it?
Dusan Gligoric (51 rep)
Apr 16, 2020, 01:43 PM • Last activity: May 3, 2025, 06:09 PM
0 votes
1 answers
5986 views
Postgresql: Permission denied to schema to user created based on master with rights
I'm trying to write a function that eventually will rotate users. Currently I have the code below which works and creates a new user using the prior user which ultimately has the same rights as the master user, ie, can do everything. select mysch.dblink('dbname=mydb user=themasteruser password=abc12...
I'm trying to write a function that eventually will rotate users. Currently I have the code below which works and creates a new user using the prior user which ultimately has the same rights as the master user, ie, can do everything. select mysch.dblink('dbname=mydb user=themasteruser password=abc123 connect_timeout=200000 host=localhost', 'CREATE USER newuname WITH PASSWORD ''pass1'' CREATEDB CREATEROLE;GRANT rds_superuser TO newuname;'); DROP SERVER IF EXISTS fs_link_b CASCADE; CREATE SERVER fs_link_b FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5432', dbname 'mydb ', connect_timeout '200000'); CREATE USER MAPPING FOR newuname SERVER fs_link_b OPTIONS ("user" 'newuname',password 'pass1'); ALTER SERVER fs_link_b OWNER TO newuname; However, when I make a connection with the above created user and I try to run the query below: select * from mysch.dblink('link_b', 'select usename from PG_USER limit 1;') as t(uu text); I get this error: > SQL Error : ERROR: permission denied for schema mysch If I created the user based on a master user, then should it not have all rights as the master user as created above? Or Do I need to do a bunch of individual GRANTS still?
user1013388 (111 rep)
Apr 1, 2020, 03:55 AM • Last activity: Apr 16, 2025, 01:06 AM
0 votes
3 answers
346 views
Postgres Specific table access
We have a requirement in our Postgres Database, We want to give specific table permissions to a particular set of user, we also have airflow in our environment which syncs tables, but sometimes any new columns are added to a table so we have to drop the table, due to which the specific table access...
We have a requirement in our Postgres Database, We want to give specific table permissions to a particular set of user, we also have airflow in our environment which syncs tables, but sometimes any new columns are added to a table so we have to drop the table, due to which the specific table access for the user is gone. Access to a specific table is given through GRANT. Can you guys suggest us a way in which specific table access can be given, and will remain if the table is dropped and recreated?
Jatin Bisht (3 rep)
Dec 3, 2020, 06:42 AM • Last activity: Jan 25, 2025, 12:04 PM
0 votes
1 answers
705 views
How to calculate quantity of WAL in db without archiving?
I would like to find out a quantity of WAL files which are being used in PostgreSQL database in last 24h. I was assigned to do Log-shipping but before I do it I want to know how many WAL files are being used in last 24hours in order to predict how much space I need on external storage. Unfortunately...
I would like to find out a quantity of WAL files which are being used in PostgreSQL database in last 24h. I was assigned to do Log-shipping but before I do it I want to know how many WAL files are being used in last 24hours in order to predict how much space I need on external storage. Unfortunately pgmetrics doesn't have such information. Do you know how to do it? WAL Files: WAL Archiving? no WAL Files: 5 +--------------------+--------------+ | Setting | Value | +--------------------+--------------+ | wal_level | minimal | | archive_timeout | 0 | | wal_compression | off | | max_wal_size | 64 (1.0 GiB) | | min_wal_size | 5 (80 MiB) | | checkpoint_timeout | 300 | | full_page_writes | on | | wal_keep_segments | 0 | +--------------------+--------------+
KUE (25 rep)
May 29, 2019, 12:22 PM • Last activity: Dec 30, 2024, 12:00 PM
2 votes
1 answers
498 views
How do I insert a record and put the ID into an existing record in another table?
I want to insert a record in one table and save the resuting ID in a record I already have in another table. Which sounds easy. Simplistic example: create table aaa (id serial primary key, foo int); create table foos (id serial primary key, foo int, aaa_id int); Imagine foos has rows which I want to...
I want to insert a record in one table and save the resuting ID in a record I already have in another table. Which sounds easy. Simplistic example: create table aaa (id serial primary key, foo int); create table foos (id serial primary key, foo int, aaa_id int); Imagine foos has rows which I want to insert in aaa. But I want a record of aaa.id stored in foos.aaa_id. Heres some code that definitely does not work: with rows as ( insert into aaa (foo) select foo from foos returning foos.id as foos_id, aaa.id as aaa_id ) update foos set aaa_id = (select aaa_id from rows where foos_id = foos.id); I can't find a way of getting the "returning" clause to return anything other than records from the table it inserted. Surely there must be a simple way to do this?
Andy Jones (123 rep)
Jul 8, 2016, 11:18 AM • Last activity: Dec 21, 2024, 07:37 AM
2 votes
2 answers
487 views
How to select blocks with sequential data and aggregate the ids
I have the following table: id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int) -------------------------------------------------------------------------------------------------------------- 2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6...
I have the following table: id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int) -------------------------------------------------------------------------------------------------------------- 2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6 2004 '2019-01-16 08:19:28.011148' '2019-01-16 08:29:22.680828' 34 true 6 2003 '2019-01-16 08:18:27.074312' '2019-01-16 08:19:25.753475' 34 true 6 2002 '2019-01-16 08:08:30.206288' '2019-01-16 08:18:24.856308' 34 true 6 2001 '2019-01-16 08:07:29.163124' '2019-01-16 08:08:27.949013' 34 true 6 2000 '2019-01-16 07:59:03.221309' '2019-01-16 08:00:14.654391' null false 7 1999 '2019-01-16 08:00:00.986367' '2019-01-16 08:00:03.221309' null false 6 1998 '2019-01-16 07:57:30.711044' '2019-01-16 07:59:58.778444' null false 6 1997 '2019-01-16 07:56:32.466508' '2019-01-16 07:57:28.489287' null false 6 1996 '2019-01-16 07:50:06.887349' '2019-01-16 07:56:30.237725' null false 6 1995 '2019-01-16 07:46:34.327582' '2019-01-16 07:50:04.619592' 33 true 6 1994 '2019-01-16 07:45:33.813483' '2019-01-16 07:46:32.014849' 33 true 6 1993 '2019-01-16 07:24:39.267365' '2019-01-16 07:39:23.786911' null false 6 1992 '2019-01-16 07:23:39.646218' '2019-01-16 07:24:37.093414' null false 6 1991 '2019-01-16 07:13:41.166337' '2019-01-16 07:23:37.403375' null false 6 1990 '2019-01-16 07:12:39.961234' '2019-01-16 07:13:38.907838' null false 6 1989 '2019-01-16 07:10:46.984236' '2019-01-16 07:12:37.647108' null false 6 1988 '2019-01-15 17:05:59.832834' '2019-01-15 17:08:21.603931' 31 true 6 1987 '2019-01-15 17:04:59.567046' '2019-01-15 17:05:57.565188' 31 true 6 1986 '2019-01-15 17:00:01.411266' '2019-01-15 17:10:57.255158' 31 true 7 I have to select the IDs of the blocks of unplanned records for a specific machine. I have been trying using window function, unfortunately, I couldn't work out the logic of it! The problem here is that since we have different machines, we cannot rely on sequential ids, just that the endDate of a sequence is very close to next startDate (it is ok to set a tolerance constant e.g. 3 seconds). I would like to have a query where the result would be: the min startDate, the max endDate and the IDs of the block. For this sample with machine = 6, it would be: blockStartDate blockEndDate ids ------------------------------------------------------------------------------- "2019-01-16 07:50:06.887349" "2019-01-16 08:00:03.221309" [1999,1998,1997,1996] "2019-01-16 07:10:46.984236" "2019-01-16 07:39:23.786911" [1989,1990,1991,1992,1993] Note that the answer, in this case, has sequential IDs but this is not always the case. I am working on providing real data where 2 machines are producing data at the same time and the ids become useless.
Attyla Fellipe (23 rep)
Jan 17, 2019, 02:00 PM • Last activity: Nov 14, 2024, 01:44 AM
3 votes
2 answers
2618 views
How do I set the isolation level for SELECT statement?
Using PostgreSQL 9.5 I would like to make sure that when I run a `SELECT` statement I get consistent results, i.e. if the table is modified while the `SELECT` is running, I want to get the state that was valid when the `SELECT` started. Basically, it's like: Take a snapshot of a table, and don't all...
Using PostgreSQL 9.5 I would like to make sure that when I run a SELECT statement I get consistent results, i.e. if the table is modified while the SELECT is running, I want to get the state that was valid when the SELECT started. Basically, it's like: Take a snapshot of a table, and don't allow updates to the table while the snapshot is being taken, and then deliver the snapshot, while updates are allowed again. If I understood isolation levels correctly, REPEATABLE READ is what I'm looking for, isn't it? Now my question is: How can I run a SELECT with REPEATABLE READ isolation level? Or am I missing something, and my approach is wrong? How would I solve this?
Golo Roden (422 rep)
Feb 4, 2016, 11:44 AM • Last activity: Aug 18, 2024, 02:04 PM
1 votes
1 answers
1509 views
how to call lo_unlink(lo) in batches
In **Postgres 9.5** I am tying to reproduce the postgres tool vacuumlo to get rid of unreferenced large object to free disk space. References are stored in **text** type columns by hibernate so using the tool as is is not an option. Looking at vacuumlo sources ( [vacuumlo.c][1]) I have created a *va...
In **Postgres 9.5** I am tying to reproduce the postgres tool vacuumlo to get rid of unreferenced large object to free disk space. References are stored in **text** type columns by hibernate so using the tool as is is not an option. Looking at vacuumlo sources ( vacuumlo.c ) I have created a *vacuum_l* table like so: CREATE TABLE vacuum_l AS SELECT oid AS lo, false as deleted FROM pg_largeobject_metadata; and proceded to delete all rows from *vacuum_l* table wich oids are referenced by user tables in my database. Now is safe to call *unlink(lo)* for all items remaining in the *vacuum_l* table. Unfortunately the table is very big and, as reported in the vacuumlo.c source code: >We don't want to run each delete as an individual transaction, because the commit overhead would be high. However, since 9.0 the backend will acquire a lock per deleted LO, so deleting too many LOs per transaction risks running out of room in the shared-memory lock table. Accordingly, we delete up to transaction_limit LOs per transaction. So far I tried to unlink all oids with this sql function: ``` CREATE OR REPLACE FUNCTION unlinkOrphanedLobs() returns void AS $BODY$ DECLARE count_variable int; begin loop SELECT COUNT(*) INTO count_variable FROM vacuum_l WHERE deleted=false; EXIT WHEN count_variable ERROR: out of shared memory > > HINT: You might need to increase max_locks_per_transaction. Why, is clearly explained by the code comments reported above. This function wont close the transaction at the end of the loop. How would it be the correct way to do it? Thanks for any help.
Luky (121 rep)
Mar 22, 2021, 04:43 PM • Last activity: Jun 2, 2024, 09:07 PM
5 votes
2 answers
30010 views
How do I select arrays that are not empty?
Why is this so tricky, what is token set to that it isn't equal to null nor an empty string? SELECT lexemes FROM ts_debug('This is a title') WHERE alias = 'asciiword'; lexemes --------- {} {} {} {titl} (4 rows) Ok.. So I want to get rid of `{}`, SELECT lexemes FROM ts_debug('This is a title') WHERE...
Why is this so tricky, what is token set to that it isn't equal to null nor an empty string? SELECT lexemes FROM ts_debug('This is a title') WHERE alias = 'asciiword'; lexemes --------- {} {} {} {titl} (4 rows) Ok.. So I want to get rid of {}, SELECT lexemes FROM ts_debug('This is a title') WHERE alias = 'asciiword' AND lexemes '{}' AND lexemes ARRAY[]::text[] AND lexemes IS NOT NULL AND lexemes ARRAY[' '] AND lexemes ARRAY[null]::text[]; **I knew most of these wouldn't work.**, but I'm totally confused why '{}' wouldn't work not ARRAY[]::text;. How do I filter this out?
Evan Carroll (65502 rep)
Sep 23, 2017, 04:11 AM • Last activity: May 29, 2024, 01:32 PM
15 votes
1 answers
30515 views
Changing foreign key to ON DELETE CASCADE with least amount of impact
I have an existing foreign key that has `ON DELETE NO ACTION` defined. I need to change this foreign key to `ON DELETE CASCADE`. I can do this within a transaction: begin; alter table posts drop constraint posts_blog_id_fkey; alter table posts add constraint posts_blog_id_fkey foreign key (blog_id)...
I have an existing foreign key that has ON DELETE NO ACTION defined. I need to change this foreign key to ON DELETE CASCADE. I can do this within a transaction: begin; alter table posts drop constraint posts_blog_id_fkey; alter table posts add constraint posts_blog_id_fkey foreign key (blog_id) references blogs (id) on update no action on delete cascade; commit; The problem is that the posts table is large (4 million rows) which means that validating the foreign key can take a non-trivial amount of time (I've tested this with a copy of the database). Dropping/adding the foreign key [acquires an ACCESS EXCLUSIVE lock](https://www.postgresql.org/docs/9.5/static/sql-altertable.html#AEN72289) on posts. So, adding the foreign key blocks *all* access to the posts table for a decent amount of time because the lock is held while constraint validation occurs. I need to perform an online migration (I don't have a dedicated downtime window). I know that I can perform **2 transactions** to help with the check taking a long time: begin; alter table posts drop constraint posts_blog_id_fkey; alter table posts add constraint posts_blog_id_fkey foreign key (blog_id) references blogs (id) on update no action on delete cascade not valid; commit; begin; alter table posts validate constraint posts; commit; The advantage of this approach is that the ACCESS EXCLUSIVE lock is held for a very short time for dropping/adding the constraint and then for validating the constraint only [a SHARE UPDATE EXCLUSIVE on posts and ROW SHARE lock on blogs](https://www.postgresql.org/docs/9.5/static/sql-altertable.html) since I'm on Postgres 9.5. Are there downsides to this? I know that adding NOT VALID to the constraints means that *existing* data is not validated, but any rows inserted/updated before the VALIDATE CONSTRAINT *will* be checked. Because the foreign key is dropped/added in the same transaction, is there any possibility of creating inconsistent data?
TheCloudlessSky (337 rep)
Oct 27, 2017, 02:02 PM • Last activity: May 16, 2024, 08:16 PM
10 votes
4 answers
27667 views
Postgresql extremely slow count (with index, simple query)
I need to run these simple queries on a table with millions of rows: ``` SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123; ``` ``` SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL; ``` The count result...
I need to run these simple queries on a table with millions of rows:
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123;
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;
The count result for both queries, for project 123, is about 5M. I have an index in place on project_id, and also another index on (project_id, trashed_at):
"index_subscriptions_on_project_id_and_created_at" btree (project_id, created_at DESC)
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id, trashed_at DESC)
The problem is that both queries are extremely slow and take about 17s each. These are the results of EXPLAIN ANALIZE:
QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual time=17342.420..17342.420 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
         Recheck Cond: (project_id = 123)
         Rows Removed by Index Recheck: 23746378
         Heap Blocks: exact=131205 lossy=1480411
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74 rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
               Index Cond: (project_id = 123)
 Planning time: 0.090 ms
 Execution time: 17344.182 ms
(9 rows)
QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual time=17557.218..17557.218 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
         Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
         Rows Removed by Index Recheck: 23746273
         Heap Blocks: exact=131144 lossy=1480409
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55 rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
               Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Planning time: 0.084 ms
 Execution time: 17558.522 ms
(9 rows)
What is the problem? What can I do to improve the performance (i.e. count in a few seconds)?
collimarco (653 rep)
Aug 22, 2019, 11:54 AM • Last activity: Apr 10, 2024, 01:12 PM
Showing page 1 of 20 total questions