Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
1 answers
2658 views
Why pg_wal dir does not clean up while logical replication
my main question - why pg_wal dir is extremely growing up? I have two postgresql instances. The first was set up as master: ``` wal_level: "logical" max_wal_senders: 5 max_replication_slots: 10 ``` Also, I was enable archiving for pg_wal: ``` archive_mode: on archive_command: 'gzip /.../data/pg_wal_...
my main question - why pg_wal dir is extremely growing up? I have two postgresql instances. The first was set up as master:
wal_level: "logical"
max_wal_senders: 5
max_replication_slots: 10
Also, I was enable archiving for pg_wal:
archive_mode: on
archive_command: 'gzip  /.../data/pg_wal_archive/%f'
After that, PUBLICATION was created for some of my tables (not ALL TABLES) For replication psql instance - I using default psql config, only created SUBSCRIPTION to master. And I have two questions: 1. I run some script on master, which pulling ~ 10 Gb data to one table, which was **not included** in the PUBLICATION. And, I see, that my pg_wal dir size is increasing anyway. Why? 2. Why pg_wal dir does not cleaning? Is it normal? Looks like I have some data for restore table which is not using in replication, how to delete this data?
vintik (1 rep)
Aug 12, 2019, 11:46 AM • Last activity: Aug 2, 2025, 11:04 AM
2 votes
2 answers
3240 views
Optimizing a recursive CTE or replacing it with a temporary table
I have a recursive query like this: ``` with recursive PCte(id) as ( select p1.id from Product p1 where p1.parent_id is null and p1.serial_number in ('123', '124','125', 'n') union all select p2.id from Product p2 inner join PCte cte on p2.parent_id = cte.id ) select * from Product p left join Produ...
I have a recursive query like this:
with recursive PCte(id) as 
(
    select p1.id from Product p1 where p1.parent_id is null and p1.serial_number in
    ('123', '124','125', 'n')
    union all
    select p2.id from Product p2 inner join PCte cte on p2.parent_id = cte.id
) 
select * 
from Product p 
left join Product psub on psub.parent_id = p.id 
where p.id in (select c.id from PCte c)
This query have a slow performance with a large number of children, are there any possible optimization ways? If it is possible to replace the recursive CTE with a temporary table, I would like to see an example, thanks a lot.
Itan Reimbergh (21 rep)
Sep 28, 2021, 01:10 PM • Last activity: Aug 1, 2025, 02:06 AM
1 votes
1 answers
8731 views
postgresql: Invalid data directory | Can't open PID file /var/run/postgresql/10-main.pid (yet?) after start: No such file or directory
This error is showing after running this command: `sudo systemctl status postgresql@10-main.service` postgresql@10-main.service - PostgreSQL Cluster 10-main Loaded: loaded (/lib/systemd/system/postgresql@.service; indirect; vendor preset: enabled) Active: failed (Result: protocol) since Tue 2020-12-...
This error is showing after running this command: sudo systemctl status postgresql@10-main.service postgresql@10-main.service - PostgreSQL Cluster 10-main Loaded: loaded (/lib/systemd/system/postgresql@.service; indirect; vendor preset: enabled) Active: failed (Result: protocol) since Tue 2020-12-01 14:21:40 UTC; 2s ago Process: 1603 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 10-main start (code=exited, sta Dec 01 14:21:40 ubuntu-s-1vcpu-2gb-sgp1-01 systemd: Starting PostgreSQL Cluster 10-main... Dec 01 14:21:40 ubuntu-s-1vcpu-2gb-sgp1-01 postgresql@10-main: Error: Invalid data directory Dec 01 14:21:40 ubuntu-s-1vcpu-2gb-sgp1-01 systemd: postgresql@10-main.service: Can't open PID file /va Dec 01 14:21:40 ubuntu-s-1vcpu-2gb-sgp1-01 systemd: postgresql@10-main.service: Failed with result 'pro Dec 01 14:21:40 ubuntu-s-1vcpu-2gb-sgp1-01 systemd: Failed to start PostgreSQL Cluster 10-main. I have tried all solutions out there. I can't even take backup by pg_dump. Getting the following error while running pg_dump database_name > database_backup.bak: pg_dump: [archiver (db)] connection to database "database_name" failed: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"? Here is the log of permissions of /var/lib/postgresql/10/main: -rwx------ 1 postgres postgres 3 Apr 28 2020 PG_VERSION drwx------ 7 postgres postgres 4096 Nov 9 23:36 base drwx------ 2 postgres postgres 4096 Nov 29 19:28 global drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_commit_ts drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_dynshmem drwx------ 4 postgres postgres 4096 Nov 29 20:18 pg_logical drwx------ 4 postgres postgres 4096 Apr 28 2020 pg_multixact drwx------ 2 postgres postgres 4096 Nov 29 19:27 pg_notify drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_replslot drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_serial drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_snapshots drwx------ 2 postgres postgres 4096 Nov 29 20:18 pg_stat drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_stat_tmp drwx------ 2 postgres postgres 4096 Sep 2 00:07 pg_subtrans drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_tblspc drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_twophase drwx------ 3 postgres postgres 4096 Nov 28 01:33 pg_wal drwx------ 2 postgres postgres 4096 Apr 28 2020 pg_xact -rwx------ 1 postgres postgres 88 Apr 28 2020 postgresql.auto.conf -rwx------ 1 postgres postgres 95 Nov 29 19:27 postmaster.opts Here is the log after running sudo -u postgres psql this command: psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? Please Assist me to fix this. Any kind of help will be appreciated. Thanks in advance.
Farid Chowdhury (119 rep)
Dec 1, 2020, 04:10 PM • Last activity: Jul 27, 2025, 02:04 AM
0 votes
1 answers
2529 views
Getting "unknown variable postgresqlini" when updating PostgreSQL
I am getting an error when attempting to upgrade a PostgreSQL installation. The full error is: >A data directory is registered in \*\*\*unknown variable postgresqlini\*\*\* but is neither empty or a recognized data directory I've already had a look at [this issue](https://dba.stackexchange.com/quest...
I am getting an error when attempting to upgrade a PostgreSQL installation. The full error is: >A data directory is registered in \*\*\*unknown variable postgresqlini\*\*\* but is neither empty or a recognized data directory I've already had a look at [this issue](https://dba.stackexchange.com/questions/224062/install-postgresql-in-unattended-error) , which seems to be the only other one I can find which is related, but it did not help me. As you can see, my paths are surrounded by quotation marks. Here is the full command that is run (separated to multiple lines to make it readable)
"C:\ProgramData\Package Cache\44103BBE02C9C99CB868F4767B0CC3B5D50B6C09\postgresql-10.11-1-windows-x64.exe"
--unattendedmodeui minimal
--mode unattended
--servicename RM_Postgresql
--servicepassword xyz
--superpassword xyz
--prefix "C:\Program Files\\company\postgres"
--datadir "C:\Program Files\\company\data"
--disable-components stackbuilder
This command is part of an installer which launches the PostgreSQL installer. I've been running the same command for nearly 2 years without any issues (using different versions of PostgreSQL on different computers). For some reason, it seems to now be failing on one of our machines, and I am unsure as to why. The currently installed version of PostgreSQL on the computer where it is failing is 10.6-1, and I am attempting to update to 10.11-1 if that makes any difference. This now seems to cause issue with the previous installation as well:
initializeDatabase fail on opening postgres msg=could not connect to server: Connection refused (0x0000274D/10061)
	Is the server running on host "localhost" (::1) and accepting
	TCP/IP connections on port 5432?
could not connect to server: Connection refused (0x0000274D/10061)
	Is the server running on host "localhost" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
QPSQL: Unable to connect
ChrisMM (119 rep)
Feb 18, 2020, 12:30 PM • Last activity: Jul 21, 2025, 05:06 PM
1 votes
1 answers
157 views
SELECT operation with four joins results in bad performance
I'm having problems with a slow query. The query purpose is to get doers ids for X job who didn't exceed specified limits on realizations and also are in range of possible job places. **Resources** - Query: SELECT DISTINCT doers.id FROM doers JOIN doer_locations dl ON dl.doer_id = doers.id JOIN job_...
I'm having problems with a slow query. The query purpose is to get doers ids for X job who didn't exceed specified limits on realizations and also are in range of possible job places. **Resources** - Query: SELECT DISTINCT doers.id FROM doers JOIN doer_locations dl ON dl.doer_id = doers.id JOIN job_places jp ON (jp.lat - 0.3147625620715557) dl.lat AND (jp.lng - 0.5001626620527362) dl.lng LEFT JOIN job_realizations jr ON jr.job_place_id = jp.id AND jr.status IN (1, 2, 3, 4) LEFT JOIN job_realizations jrpd ON jrpd.job_place_id = jp.id AND jrpd.doer_id = doers.id AND jrpd.status IN (1, 2, 3, 4) WHERE (jp.job_id = 1 AND doers.id IS NOT NULL) GROUP BY doers.id, jp.id HAVING COUNT(DISTINCT jr.id) < jp.realizations_per_place AND COUNT(DISTINCT jrpd.id) < jp.realizations_per_place_per_doer - [Depesz explain](https://explain.depesz.com/s/jMHP) - [Raw explain analyze](https://pastebin.com/aY4L33iY) - [Simplified Schema](https://www.db-fiddle.com/f/r2LFERkLEoo9fKYeTKQ8S6/1) **Consideration** I'm not sure if I read the explain correctly but it seems it loses on performance especially when it calculates stuff on the run also HAVING COUNT(DISTINCT) seems pretty expensive. **Additional information** The type of both the lat and long columns is float.
mist (29 rep)
Oct 1, 2018, 11:22 AM • Last activity: Jul 11, 2025, 05:06 AM
3 votes
1 answers
441 views
Installing Postgresql parray_gin extension on mac, include not found
I am trying to install `parray_gin` in PostgreSQL 10 on macOS, but I can't get it to build. I'm running `pgxn install parray_gin`. This first gave me a missing `stdio.h` error, so I installed the xcode command line tools, and now it's complaining about /Library/PostgreSQL/10/include/postgresql/serve...
I am trying to install parray_gin in PostgreSQL 10 on macOS, but I can't get it to build. I'm running pgxn install parray_gin. This first gave me a missing stdio.h error, so I installed the xcode command line tools, and now it's complaining about /Library/PostgreSQL/10/include/postgresql/server/utils/pg_locale.h:19:10: fatal error: 'unicode/ucol.h' file not found After some Googling it appeared I needed to install an extra library, which I did through brew install icu4c This printed: This formula is keg-only, which means it was not symlinked into /usr/local, because macOS provides libicucore.dylib (but nothing else). If you need to have this software first in your PATH run: echo 'export PATH="/usr/local/opt/icu4c/bin:$PATH"' >> ~/.bash_profile echo 'export PATH="/usr/local/opt/icu4c/sbin:$PATH"' >> ~/.bash_profile For compilers to find this software you may need to set: LDFLAGS: -L/usr/local/opt/icu4c/lib CPPFLAGS: -I/usr/local/opt/icu4c/include But it didn't change a thing and I have no idea how to pass /usr/local/opt/icu4c/include to pgxn
Joris (131 rep)
Feb 28, 2018, 08:16 AM • Last activity: Jun 5, 2025, 08:05 PM
0 votes
1 answers
237 views
Actual used and free space in WAL segments
I am new to the PostgreSQL world. I have an experience in administrating Microsoft SQL Server and Oracle databases. In SQL Server, we are able to find how much space is actually used in pre-allocated log file. I wonder if we can do the same thing in PostgreSQL? How can we identify which WAL segments...
I am new to the PostgreSQL world. I have an experience in administrating Microsoft SQL Server and Oracle databases. In SQL Server, we are able to find how much space is actually used in pre-allocated log file. I wonder if we can do the same thing in PostgreSQL? How can we identify which WAL segments are actually full and which WAL segments are empty? I am currently using PostgreSQL version 10.
Rauf Asadov (1313 rep)
Jun 11, 2021, 08:44 AM • Last activity: May 31, 2025, 10:03 PM
1 votes
1 answers
249 views
Postgresql streaming replication slow on macOS
I am using PostgreSQL 10.1 on MAC on which I am trying to set up streaming replication. I configured both master and slave to be on the same machine. I find the streaming replication lag to be slower than expected on mac. The same test runs on a Linux Ubuntu 16.04 machine without much lag. I have th...
I am using PostgreSQL 10.1 on MAC on which I am trying to set up streaming replication. I configured both master and slave to be on the same machine. I find the streaming replication lag to be slower than expected on mac. The same test runs on a Linux Ubuntu 16.04 machine without much lag. I have the following insert script. for i in $(seq 1 1 1000) do bin/psql postgres -p 8999 -c "Insert into $1 select tz, $i * 127361::bigint, $i::real, random()*12696::bigint from generate_series('01-01-2018'::timestamptz, '02-01-2018'::timestamptz, '30 sec'::interval)tz;" echo $i done The lag is measured using the following queries, SELECT pg_last_wal_receive_lsn() - pg_last_wal_replay_lsn(); SELECT (extract(epoch FROM now()) - extract(epoch FROM pg_last_xact_replay_timestamp()))::int; However, the observation is very unexpected. The lag is increasing from the moment the transactions are started on master. Slave localhost_9001: 12680304 1 Slave localhost_9001: 12354168 1 Slave localhost_9001: 16086800 1 . . . Slave localhost_9001: 3697460920 121 Slave localhost_9001: 3689335376 122 Slave localhost_9001: 3685571296 122 . . . . Slave localhost_9001: 312752632 190 Slave localhost_9001: 308177496 190 Slave localhost_9001: 303548984 190 . . Slave localhost_9001: 22810280 199 Slave localhost_9001: 8255144 199 Slave localhost_9001: 4214440 199 Slave localhost_9001: 0 0 It took around 4.5 minutes for a single client inserting on a single table to complete on master and another 4 minutes for the slave to catch up. Note that NO simultaneous selects are run other than the script to measure the lag. I understand that replay in PostgreSQL is pretty simple like, "move a particular block to a location", but I am not sure about this behavior. I have the following other configurations, checkpoint_timeout = 5min max_wal_size = 1GB min_wal_size = 80MB Now, I run the same tests with same configurations on a Linux Ubuntu 16.04 machine, I find the lag perfectly reasonable. Am I missing anything? UPDATE : MAC Hardware configurations : 16GB RAM, 512GB SSD (Nvme) Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Linux hardware configurations : 16GB RAM, 512GB SSD (Nvme) Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
krithikaGopalakrishnan (181 rep)
Dec 17, 2018, 02:30 PM • Last activity: May 24, 2025, 08:05 PM
0 votes
2 answers
324 views
Postgres file increment size
I'm COPY-ing 300 tables / 400G to PG10@win2016server using a few threads to speed things up, but because the table files are incremented in 64-kb units (or something there-about) they get fragmented on disk which hurt read performance. Ideally I would like to make sure that files are expanded in lar...
I'm COPY-ing 300 tables / 400G to PG10@win2016server using a few threads to speed things up, but because the table files are incremented in 64-kb units (or something there-about) they get fragmented on disk which hurt read performance. Ideally I would like to make sure that files are expanded in larger increments (eg. 8 mb), alternative would be to make initial size = max segment size (1gb). Is there any way to accomplish this? And I know, single threaded copy would solve the issue at hand, but I would still have the same problem after the copying, as tables grow continually and still only in small fragmented chunks. EDIT: As a "proof" of sorts, i created a simple table CREATE TABLE testtable ( id bigint primary key, somestring varchar(4000) ); BEGIN FOR i IN 1..2000 LOOP insert into testtable values(i, repeat('Jeg kan en sang der kan drive dig til vanvid.', 40)); END LOOP; END which produces a table of 4096kb, which contig.exe reports to be of 25 fragments (in this instance), or 160kb on average. For 4 mb this is not an issue, for 400gb it is.
mikkel (238 rep)
Oct 23, 2017, 01:13 PM • Last activity: May 11, 2025, 11:04 PM
2 votes
2 answers
1133 views
How to extract the grants on default roles of PostgreSQL 10: pg_read_all_settings, pg_read_all_stats, pg_monitor and pg_signal_backend?
I have a Postgres instance of version 10 running on AWS RDS. I have a user there with default roles granted to it. I want to create the same user on my sandbox where the instance is of version 9.4. Thus, I want to know the grants that are included in these default roles of version 10.
I have a Postgres instance of version 10 running on AWS RDS. I have a user there with default roles granted to it. I want to create the same user on my sandbox where the instance is of version 9.4. Thus, I want to know the grants that are included in these default roles of version 10.
pricha (143 rep)
Jul 18, 2019, 08:32 AM • Last activity: May 4, 2025, 04:11 AM
2 votes
1 answers
1115 views
Any workarounds for lack of primary key on partitioned tables?
I am currently looking at implementing declarative partitioning with PostgreSQL 10. However, one sticking point is that we need to have a foreign key between two such partitioned tables, which is not supported. Is there any workaround or suggestions for alternative ways of doing this anyone can sugg...
I am currently looking at implementing declarative partitioning with PostgreSQL 10. However, one sticking point is that we need to have a foreign key between two such partitioned tables, which is not supported. Is there any workaround or suggestions for alternative ways of doing this anyone can suggest? Or should we use table inheritance instead. Thanks.
J. Gallagher (29 rep)
Jul 12, 2018, 09:17 AM • Last activity: Apr 26, 2025, 03:09 PM
1 votes
1 answers
1047 views
Duplicate rows with primary key constraint in PostgreSQL 10.6
I have a table that's had `CONSTRAINT table_pkey PRIMARY KEY (id)` since setup. However, I recently found that I have around 100 duplicate rows (I ran `select count(*) from (select count(*) from table group by id having count(*) > 1) as t1` as per [this old stack post][1]. However, I'm running Postg...
I have a table that's had CONSTRAINT table_pkey PRIMARY KEY (id) since setup. However, I recently found that I have around 100 duplicate rows (I ran select count(*) from (select count(*) from table group by id having count(*) > 1) as t1 as per this old stack post . However, I'm running Postgres 10.6, which should be far after that bug. Any ideas?
mckennab (111 rep)
Jul 3, 2019, 11:33 PM • Last activity: Apr 18, 2025, 04:05 AM
0 votes
1 answers
485 views
Is renaming schemas more efficient / faster than drop/create in a single transaction
Given the constraint (from customer side) to having only one *database* - how can I most effectively deploy new versions of data into a production-facing application. The new data is packaged into a (`-Fc`) dump file. This data is supposed to *replace* the existing data, so restoring it must be done...
Given the constraint (from customer side) to having only one *database* - how can I most effectively deploy new versions of data into a production-facing application. The new data is packaged into a (-Fc) dump file. This data is supposed to *replace* the existing data, so restoring it must be done by first dropping the existing tables and then restoring the new ones. The naive approach of just importing the dump file with pg_restore with --single-transaction will lock the whole database which will cause the application dependent on that data to halt/suspend queries. My intended approach is to utilize schemas. Basically, the application will have set its connection string to use mydb.public. The import imports all new data into mydb.next. Then, rename mydb.public to mydb.old, rename mydb.next to mydb.public and drop mydb.old. Would this approach result in a shorter downtime than importing directly? I am not sure how renames of schemas work internally in postgres, but from my surface understanding, it should be a more efficient approach
F.P (207 rep)
Aug 26, 2021, 05:48 AM • Last activity: Mar 14, 2025, 09:07 AM
0 votes
1 answers
809 views
Fastest way to check if record is related to a particular user in postgres
While implementing a `sharelikes` table I added a procedure to add a new record for a like. I wanted to add a check before insert that the item being liked does not belong to the users who submitted the like request. Given function parameters `Pshareid` and `Puserid` I used the following conditional...
While implementing a sharelikes table I added a procedure to add a new record for a like. I wanted to add a check before insert that the item being liked does not belong to the users who submitted the like request. Given function parameters Pshareid and Puserid I used the following conditional in the procedure SELECT INTO checkuser true FROM ks.shares WHERE id = Pshareid AND userid = Puserid limit 1; IF checkuser THEN -- Cannot like your own share ELSE --INSERT the new record END IF; In the select described above the record will almost always not exist. Is that more efficient than a SELECT on the share id to return the record (if it exists which is very likely at the point of select) and then to check the userid of the returned record? SELECT userid INTO checkuser FROM ks.shares WHERE id = Pshareid limit 1; IF checkuser = Puserid THEN -- Cannot like your own share ELSE --INSERT the new record END IF;
Russell Ormes (193 rep)
May 29, 2018, 08:39 AM • Last activity: Mar 13, 2025, 11:03 AM
4 votes
1 answers
1418 views
Why is restart_lsn not set, even on checkpoint - postgresql
> A logical slot will emit each change just once in normal operation. > The current position of each slot is persisted only at checkpoint, so > in the case of a crash the slot may return to an earlier LSN, which > will then cause recent changes to be resent when the server restarts. > Logical decodi...
> A logical slot will emit each change just once in normal operation. > The current position of each slot is persisted only at checkpoint, so > in the case of a crash the slot may return to an earlier LSN, which > will then cause recent changes to be resent when the server restarts. > Logical decoding clients are responsible for avoiding ill effects from > handling the same message more than once. Clients may wish to record > the last LSN they saw when decoding and skip over any repeated data or > (when using the replication protocol) request that decoding start from > that LSN rather than letting the server determine the start point Quoted from [here](https://www.postgresql.org/docs/10/logicaldecoding-explanation.html) I'm implementing a logical (streaming) replication. Thus far things are looking good but I'm unable to understand why restart_lsn entry is not changed. i.e I do this.. SELECT * FROM pg_replication_slots; slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn -----------+----------+-----------+--------+----------+--------+------------+------+--------------+------------- pgx_test | wal2json | logical | 12641 | postgres | t | 16045 | | 8544916 | 2/B1BC5330 Insert 2/3 record SELECT * FROM pg_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state -------+----------+-----------------+------------------+-------------+-----------------+-------------+----------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------ 16315 | 1601109 | pgx_replication | | 127.0.0.1 | | 50112 | 2019-03-11 17:33:18.150179+05:30 | | streaming | 2/B1BCA0F8 | | | | 0 | async The sent_location gets updated but not the restart_lsn, even though I ran a checkpoint command had set the checkoint_timeout=1m but still the replay_lsn does not change. SELECT * FROM pg_replication_slots; slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn -----------+----------+-----------+--------+----------+--------+------------+------+--------------+------------- pgx_test | wal2json | logical | 12641 | postgres | t | 16315 | | 8544916 | 2/B1BC5330 (1 row) Any reason why this happens?
Noobie (165 rep)
Mar 11, 2019, 12:24 PM • Last activity: Mar 8, 2025, 05:03 AM
0 votes
1 answers
460 views
In AWS Aurora's performance insights. What qualifies as "Other" and how do I investigate it?
I have a disproportionate amount of queries that are marked as "Other" in performance insights with no additional details. AWS doesn't seem to provide any details about what this means, and asking on the forum doesn't seem to work. Is there a way to get details about what falls into Other? [![enter...
I have a disproportionate amount of queries that are marked as "Other" in performance insights with no additional details. AWS doesn't seem to provide any details about what this means, and asking on the forum doesn't seem to work. Is there a way to get details about what falls into Other? enter image description here
Kit Sunde (363 rep)
Apr 1, 2020, 05:22 AM • Last activity: Feb 25, 2025, 11:01 AM
0 votes
1 answers
1758 views
COPY command only works as command line argument, not as part of script
No rows are being copied when I try to copy a CSV from stdin. To be more specific, this *works when writing the command on the cli, but not when read as part of a file*. Why is reading SQL commands from a file different from executing them directly as an argument for `-c`? ``` cat extract.csv | psql...
No rows are being copied when I try to copy a CSV from stdin. To be more specific, this *works when writing the command on the cli, but not when read as part of a file*. Why is reading SQL commands from a file different from executing them directly as an argument for -c?
cat extract.csv  | psql postgres -f copy-user.sql
Timing is on.
Line style is unicode.
Border style is 2.
Null display is "[NULL]".
Expanded display is used automatically.
COPY 0
Time: 41.685 ms

$ cat extract.csv  | psql postgres -c "$(cat copy-user.sql)"
Timing is on.
Line style is unicode.
Border style is 2.
Null display is "[NULL]".
Expanded display is used automatically.
COPY 10
Time: 31.685 ms
**copy-user.sql**
copy to_update from stdin (delimiter ';', format csv);
**extract.csv**
bfb92e29-1d2c-45c4-b9ab-357a3ac7ad13;test@test90239023783457843.com;x
aeccc3ea-cc1f-43ef-99ff-e389d5d63b22;tester@testerkjnaefgjnwerg.no;x
9cec13ae-c880-4371-9b1c-dd201f5cf233;bloblo@gmail.com;x
aeada2bc-a362-4f3e-80f2-06a717206802;veaat@gmail.com;x
fb85ddd8-7d17-4d41-8bc3-213b1e469506;navnnavnesen@foo.com;x
528e1f2e-1baa-483b-bc8c-85f993014696;kklk@hotmail.com;x
dbc8a9c1-56cf-4589-8b2c-cf1a2e0832ed;ghiiii@hotmail.com;x
fbf23553-baa2-410a-8f96-32b5c4deb0c7;lala@lala.no;x
e22ec0de-06f9-428a-aa3e-171c38f9a1f7;x2aa@gmail.com;x
8e8d0f73-8eb7-43b4-8019-b79042731b97;mail@mail.com;x
**table definition for to_update**
create table to_update(id text, email text, text char);
oligofren (165 rep)
Oct 15, 2019, 08:44 PM • Last activity: Feb 5, 2025, 02:07 AM
0 votes
0 answers
38 views
Why does query planning time increase on a standby replica with increase in dead tuples but not on the primary?
We are using Postgres 10 with standby replication (read replica in recovery mode). For the same query, we observe significantly higher planning time on the replica compared to the primary. Execution time remains similar. The query is like so ``` SELECT * FROM emp, sessions WHERE emp.id = ' ' AND emp...
We are using Postgres 10 with standby replication (read replica in recovery mode). For the same query, we observe significantly higher planning time on the replica compared to the primary. Execution time remains similar. The query is like so
SELECT *  
FROM emp, sessions  
WHERE emp.id = ''  
  AND emp.active = 't'  
  AND sessions.emp_id = emp.id;
- sessions table: ~250M records - emp table: ~100M records - Both primary and replica are on similar VMs (same RAM, HDD, processor, freespace etc.). Observations: - High number of dead tuples for sessions table (~75k dead tuples added/day), that is constantly increasing. - As dead tuples increase, planning time on the replica grows, but it remains constant on the primary. - Running VACUUM (on primary) temporarily fixes the issue, normalizing planning time on the replica. [Cannot run VACUUM on replica as it's in readonly/recovery mode] - Running ANALYZE on the primary does not help. The statistics seem to be the same - Dead tuples are created frequently due to a maintenance job that deletes records from the sessions table every 10 seconds. We're working on fixing this. - Replication lag is minimal and doesn't appear to be a factor (the records are definitely present in replica) Action we've taken till now - running auto-vacuum more aggressively so that the dead tuple count does not get higher - DB maintenance job fix **Question still remains, why does the same number of dead tuples not affect the primary's planning time but impacts the replica's planning time?** **Primary**
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.26..17.30 rows=1 width=451) (actual time=0.054..0.055 rows=1 loops=1)
   Buffers: shared hit=11
   ->  Index Scan using emp_pkey on emp a  (cost=0.69..8.71 rows=1 width=138) (actual time=0.036..0.037 rows=1 loops=1)
         Index Cond: (id = '123456'::text)
         Filter: active
         Buffers: shared hit=6
   ->  Index Scan using sessions_pkey on sessions g  (cost=0.57..8.59 rows=1 width=313) (actual time=0.015..0.015 rows=1 loops=1)
         Index Cond: (id = a.id)
         Buffers: shared hit=5
 Planning time: 23.652 ms
 Execution time: 0.096 ms
**Replica**
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.26..17.30 rows=1 width=451) (actual time=0.062..0.064 rows=1 loops=1)
   Buffers: shared hit=11
   ->  Index Scan using emp_pkey on emp a  (cost=0.69..8.71 rows=1 width=138) (actual time=0.045..0.046 rows=1 loops=1)
         Index Cond: (id = '123456'::text)
         Filter: active
         Buffers: shared hit=6
   ->  Index Scan using sessions_pkey on sessions g  (cost=0.57..8.59 rows=1 width=313) (actual time=0.014..0.014 rows=1 loops=1)
         Index Cond: (id = a.id)
         Buffers: shared hit=5
 Planning time: 854.801 ms
 Execution time: 0.102 ms
Aftab (1 rep)
Jan 23, 2025, 03:21 PM
2 votes
1 answers
1109 views
Arbitrary queries on n:m relationship, including "all" and "any"
I'm using postgres >= 9.6. I have tasks, tags and task_tags tables, for a typical n:m relationship between tasks and tags. I'd like to be able to programmatically build queries against the tasks table that supports queries against the actual fields of tasks, but also on the tags (tag names) of a tas...
I'm using postgres >= 9.6. I have tasks, tags and task_tags tables, for a typical n:m relationship between tasks and tags. I'd like to be able to programmatically build queries against the tasks table that supports queries against the actual fields of tasks, but also on the tags (tag names) of a task. Queries on the task fields themselves are straight-forward. Queries on the tags for a "does it have tag A?" are also straight-forward. What I am struggling with is coming up with a select/query structure that would allow me to also match things like "does it have tag A and tag B"? The best I've come up with is a lateral join on a subquery with an array aggregation, and then using array matching functions, e.g.: SELECT DISTINCT ON (tasks.id) tasks.* FROM tasks, LATERAL (SELECT array_agg(tags.name) AS tags FROM task_tags INNER JOIN tags ON task_tags.tag_id = tags.id WHERE task_tags.task_id = tasks.id GROUP BY task_tags.task_id) tt WHERE tt.tags @> array['tag1'::varchar, 'tag3'::varchar]; That way, it should be possible to programmatically build a WHERE clause (using tasks.* and tt.tags) satisfying all of the conditions of the user-provided "query". However, I'm not sure if this is the best way of doing it - thoughts? Is this query any efficient? Is there any index I could create that would improve it? Similarly, is there any way at all of making it work with wildcards against the tag names? Normal array matching wouldn't allow that, and solutions I've seen suggest using unnest (or, well, not using arrays in the first place), but then I'd lose the ability of saying "it needs to have both tagA and tagB". Is there any other way of building a query on these relationships that would allow that kind of "both tagA and tagB" matching?
Alex Hornung (123 rep)
Dec 31, 2017, 09:34 AM • Last activity: Jul 22, 2024, 12:50 AM
14 votes
4 answers
6100 views
Postgres jsonb vs composite type performance differences
What considerations are involved in choosing between a jsonb column and a composite type column of the same structure? For example, consider a column like that used in the Postgres documentation: CREATE TYPE inventory_item AS ( name text, supplier_id integer, price numeric ); What are the tradeoffs...
What considerations are involved in choosing between a jsonb column and a composite type column of the same structure? For example, consider a column like that used in the Postgres documentation: CREATE TYPE inventory_item AS ( name text, supplier_id integer, price numeric ); What are the tradeoffs involved between this approach vs a jsonb column mirroring this structure? For example, I suspect that the composite type won't require storing the key names for every record, whereas the jsonb type would require this.
Metropolis (243 rep)
Jan 17, 2019, 04:07 PM • Last activity: Jun 17, 2024, 08:18 PM
Showing page 1 of 20 total questions