Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
209
views
PostgreSQL: permission denied to COPY to or from an external program, even after Grant pg_execute_server_program
Running `Copy test(id, name) From Program 'zcat /tmp/test.csv.gz' CSV Header;` fails with: ``` SQL Error [42501]: ERROR: permission denied to COPY to or from an external program Detail: Only roles with privileges of the "pg_execute_server_program" role may COPY to or from an external program. ``` ev...
Running
Copy test(id, name) From Program 'zcat /tmp/test.csv.gz' CSV Header;
fails with:
SQL Error : ERROR: permission denied to COPY to or from an external program
Detail: Only roles with privileges of the "pg_execute_server_program" role may COPY to or from an external program.
even though it's been granted to the current user:
SELECT current_user;
-- report_mgr
SELECT rolname FROM pg_roles WHERE
pg_has_role(current_user, oid, 'member');
|rolname |
|-------------------------|
|pg_execute_server_program|
|report_mgr |
Running on the official postgres:16 Docker image (PostgreSQL 16.2 (Debian 16.2-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
).
The query works fine with the postgres user, both in psql and SQLModel/SQLAlchemy, but haven't been able to get it to work for report_mgr. The file is readable, though that's irrelevant as the command is never executed. The user was created like this:
CREATE OR REPLACE FUNCTION grant_on_schemas(
role text,
schemas text[],
privilege text default 'select')
RETURNS SETOF text AS $$
DECLARE
query text;
schema text;
BEGIN
query := format('GRANT CONNECT ON DATABASE %I TO %s;', (SELECT current_database()), role);
-- Add query to result table as a new row, then execute
Return Next query;
Execute query;
FOREACH schema IN ARRAY schemas LOOP
If privilege = 'select' Then
-- Grant select
query := format('GRANT USAGE ON SCHEMA %I TO %s;', schema, role);
Return Next query;
Execute query;
query := format('GRANT SELECT ON ALL TABLES IN SCHEMA %I TO %s;', schema, role);
Return Next query;
Execute query;
query := format('ALTER DEFAULT PRIVILEGES IN SCHEMA %I GRANT SELECT ON TABLES TO %s;', schema, role);
Return Next query;
Execute query;
Elsif privilege = 'all' Then
-- Grant all
query := format('GRANT USAGE ON SCHEMA %I TO %s;', schema, role);
Return Next query;
Execute query;
query := format('GRANT ALL ON ALL TABLES IN SCHEMA %I TO %s;', schema, role);
Return Next query;
Execute query;
-- Apply them to new tables/views created by this admin account
query := format('ALTER DEFAULT PRIVILEGES IN SCHEMA %I GRANT ALL ON TABLES TO %s;', schema, role);
Return Next query;
Execute query;
-- Other permissions: functions, procedures, routines, sequences
Else
Return Next format('Error: privilege "%s" not found', privilege);
End If;
END LOOP;
Return;
END;
$$ LANGUAGE plpgsql;
Create Role report_mgr login password 'qwert';
Select grant_on_schemas('report_mgr', array['public'], 'all');
Grant pg_execute_server_program To report_mgr;
Any ideas will be appreciated.
Chema
(131 rep)
Jun 12, 2024, 11:03 AM
• Last activity: Jun 24, 2025, 01:06 AM
0
votes
1
answers
209
views
Copy Data from Two different databases with FK relation
So the situation is as follow: I have **production DB** called **DB1** which has huge data, around 250 GB. I have **Test DB** called **DB2** on my local machine which has the same schema as the production DB but only the first two tables are partially filled(**UUT_RESULT, STEP_RESULT**). What I want...
So the situation is as follow:
I have **production DB** called **DB1** which has huge data, around 250 GB.
I have **Test DB** called **DB2** on my local machine which has the same schema as the production DB but only the first two tables are partially filled(**UUT_RESULT, STEP_RESULT**).
What I want:
Copy the only related data from the Production DB tables (**PROP_RESULT**) to my local DB table.
Very important note: I can copy the entire table, However, it's huge. That's why I only want to copy the data from **PROP_RESULT** table in the production DB which has relation to the already existed data in my local DB.
Note: Only UUT_RESULT and STEP_RESULT are partially filled in the Local DB, other tables are empty and I need to populate them and that's why I am posting this.
The following image will make is easier to understand

Nano
(139 rep)
Jul 17, 2019, 01:51 PM
• Last activity: Jun 18, 2025, 03:02 PM
3
votes
1
answers
279
views
Is it possible to load data into Postgres using COPY, where the data includes a SELECT statement?
(I think the answer is probably "no", but I haven't managed to find anything where anyone tried to do this. Which maybe is a sign in itself). So, I have a parent and child table than I'm generating data for in a script. The PK on *parent table* is an autosequence, the FK on *child table* into the pa...
(I think the answer is probably "no", but I haven't managed to find anything where anyone tried to do this. Which maybe is a sign in itself).
So, I have a parent and child table than I'm generating data for in a script. The PK on *parent table* is an autosequence, the FK on *child table* into the parent table is the autosequence PK which isn't populated until the parent table's data is loaded.
When I load (much smaller datasets) as SQL in PGAdmin, I can match the records together by using another column as a lookup (eg, I have a column *Reference* on *parent table* which I fill with a unique pre-defined value for each record, effectively as a surr)
I load the parent table through COPY, and that works fine. However, in the child table, what I want to be able to do is inculde
(select parent_id from parent_table where reference = predefined value)
as the value for the Parent ID field.
The problem I'm trying to address is that, at the point the child table's values are created, I don't know what the FK values need to be, because the parent table's value haven't been loaded yet.
To be honest, this probably isn't even a good way to go about this - I have 2 million records in the parent table and 3-4 million in the child, so I don't really want to be making 4 million time 1 record calls into another million record table, but I can't see any obvious way around this without splitting the data generation so that the child table data is generated after the parent table has been loaded.
Better ideas gratefully received!
khafka
(79 rep)
Nov 1, 2019, 02:05 PM
• Last activity: May 16, 2025, 10:01 PM
-1
votes
1
answers
281
views
vertica/postgres copy text to binary column
Until now I used Oracle to store xml file in blob data type column. Is there any option in Vertica to copy xml file into binary column? I saw there is varbinary data type but, I haven't succeeded to load this file into one row in this column. If someone is familiar with this process in PostgreSQL ca...
Until now I used Oracle to store xml file in blob data type column.
Is there any option in Vertica to copy xml file into binary column?
I saw there is varbinary data type but, I haven't succeeded to load this file into one row in this column.
If someone is familiar with this process in PostgreSQL can maybe be helpful (share your knowledge)
NoamiA
(101 rep)
Mar 5, 2020, 10:49 AM
• Last activity: May 15, 2025, 08:05 AM
0
votes
2
answers
465
views
COPY Command spawning multiple processes
I have a lengthy data pipeline that imports a CSV, creates a sequence of dependent views and then exports the results using ```COPY``` as a CSV. This process had been working fine, completing in around 30-60 seconds. Suddenly, this process has started running much slower when it reaches the copy com...
I have a lengthy data pipeline that imports a CSV, creates a sequence of dependent views and then exports the results using
as a CSV. This process had been working fine, completing in around 30-60 seconds. Suddenly, this process has started running much slower when it reaches the copy command. Running usename, state, query FROM pg_stat_activity;
shows three identical, active
commands under my username. The process does eventually complete but now takes up to 20 minutes or more. Other than changing some initial sub-setting, the data has remained unchanged. No one else is using this database but there are other users on the cluster.
Has anyone encountered this behavior before? Does anyone know what might cause a sudden slow down in a
operation?
Postgres 11.6, RHEL 7
Matt
(291 rep)
Feb 19, 2020, 08:45 PM
• Last activity: May 4, 2025, 01:05 AM
0
votes
1
answers
951
views
Why doesn't the $1 syntax work for all queries?
I tried to do this: COPY postgres_log FROM $1 WITH csv; And with a separately sent file path, like I make all normal queries (SELECT/UPDATE/INSERT/DELETE). However, it gives: ERROR: syntax error at or near "$1" Um... So does that mean that I'm forced to send it a scary string like this? COPY postgre...
I tried to do this:
COPY postgres_log FROM $1 WITH csv;
And with a separately sent file path, like I make all normal queries (SELECT/UPDATE/INSERT/DELETE). However, it gives:
ERROR: syntax error at or near "$1"
Um... So does that mean that I'm forced to send it a scary string like this?
COPY postgres_log FROM '/full/path/to/logfileblablabla.csv' WITH csv;
If so, why? Why don't all query types support the parameterized queries interface? Isn't this like *asking* to enable SQL query injection vulnerabilities?
user215888
Sep 19, 2020, 10:58 PM
• Last activity: Apr 23, 2025, 12:02 PM
0
votes
2
answers
843
views
PostgreSQL exits the shell when importing large data
I'm having this totally weird scenario: We have an archiving server based on PostgreSQL inside a Scientific Linux virtual server. I want to migrate the data (20+ GB) to a physical Scientific Linux machine, so I used \copy (select * from table) to '/path/to/file This ran successfully, and the data is...
I'm having this totally weird scenario: We have an archiving server based on PostgreSQL inside a Scientific Linux virtual server. I want to migrate the data (20+ GB) to a physical Scientific Linux machine, so I used
\copy (select * from table) to '/path/to/file
This ran successfully, and the data is in the output file.
On the physical machine I try to import the data using
\copy table from '/path/to/file'
When I try this command on a terminal emulator, it exits after a few minutes. Also, tried it on the terminal itself after login but after a few minutes it exits, too. In both cases there is no data imported.
Could it be that the server is somehow overloaded? Can we overcome this and import the entire file?
Edit: I increased the RAM to 3 GB, the crash took a bit longer to come.
Edit 2: There is a log file under
/var/lib/pgsql/data/pg_log
, this log file has these lines as the last few lines of it:
Context: COPY sample, line 77324116
Statement: COPY sample from STDIN
ERROR: Unexpected EOF on client connection
Context: COPY sample, line 77324116
Statement: COPY sample from STDIN
Log: Could not send data to client: Broken pipe
Statement: COPY sample from STDIN
ERROR: Unexpected EOF on client connection
Any help is really appreciated!
3bdalla
(229 rep)
Sep 22, 2016, 07:01 AM
• Last activity: Apr 13, 2025, 08:04 PM
1
votes
3
answers
176
views
ERROR: invalid byte sequence for encoding "UTF8": 0xdc 0x36
When running a `\copy` (either pgadmin or `aws_s3.table_import_from_s3`) of a 1.6GB file into an AWS Aurora Postgres-compatible database, I'm getting the following error: ``` ERROR: invalid byte sequence for encoding "UTF8": 0xdc 0x36 CONTEXT: COPY staging, line 99779: "L24000403170365 ACTIVEZONE LL...
When running a
\copy
(either pgadmin or aws_s3.table_import_from_s3
) of a 1.6GB file into an AWS Aurora Postgres-compatible database, I'm getting the following error:
ERROR: invalid byte sequence for encoding "UTF8": 0xdc 0x36
CONTEXT: COPY staging, line 99779: "L24000403170365 ACTIVEZONE LLC ..."
EDIT: Here's what I could pull for table definition (but let me know if you want more):
| column_name | data_type | character_maximum_length | is_nullable | column_default |
| ----------- | --------- | ------------------------ | ----------- | -------------- |
| raw | text | [null] | YES | [null] |
EDIT: I also tried to change the column to bytea
with no effect.
The source is supposed to be ASCII, but I get the same error with explicit encodings like utf8
, latin1
, win1251
, and win1252
.
EDIT: As requested in a reply, here's more information about the import commands. In pgadmin4, I'm right-click importing into the table which shows the following under the covers:
--command " "\\copy public.staging (\"raw\") FROM 'C:/data.txt' DELIMITER '|' ENCODING 'UTF8';""
I also use pgadmin4 to trigger the s3 table import by calling the query:
SELECT aws_s3.table_import_from_s3(
'staging',
'',
'(DELIMITER ''|'')',
aws_commons.create_s3_uri('data', 'data.txt', 'us-east-1')
);
Under the covers, table_import_from_s3
calls the command:
copy staging from '/rdsdbdata/extensions/aws_s3/{{internal filename}}' with (DELIMITER '|')
The answer to similar questions is to clean up the source data so I pulled up python and tried to find the offending character. I couldn't find any evidence of an unusual character at or around the referenced line. For the sake of argument, I believe the following will scan the entire file (and you can see the results inline):
>>> def charinfile(filename, bytechar):
... with open(filename, 'rb') as file:
... byteline = file.readline()
... while byteline: # readline returns empty string at EOF
... if byteline.find(bytechar) != -1:
... print("found!")
... return byteline
... byteline = file.readline()
... else:
... print("not found")
...
>>> charinfile(filename, b'\xdc')
not found
>>> charinfile(filename, b'\xdc36')
not found
>>> charinfile(filename, b'6') # make sure the code is working
found!
I've also tried versions where I use strings instead of bytes with the same results. I can confirm that there are no blank lines before EOF (have used line counters to verify that I reach ~1m rows).
What am I missing?
claytond
(123 rep)
Mar 12, 2025, 06:42 PM
• Last activity: Mar 24, 2025, 05:13 PM
0
votes
2
answers
2251
views
How to quote arrays in postgres copy statement to insert into JSON field
I have some data that looks like this: id,bo,c,url 1,"some text here", 22, "[{'id': 'NfKA3', 'u': 'https://somewebsite.com'}]" 2, "more text", 83, "[{'id': 'W3nAl', 'u': 'https://anotherwebsite.com'}]" 3, "even more text", 14, "[{'id': 'CyrMj', 'u': 'https://yetanotherwebsite.com'}]" I'm trying to i...
I have some data that looks like this:
id,bo,c,url
1,"some text here", 22, "[{'id': 'NfKA3', 'u': 'https://somewebsite.com '}]"
2, "more text", 83, "[{'id': 'W3nAl', 'u': 'https://anotherwebsite.com '}]"
3, "even more text", 14, "[{'id': 'CyrMj', 'u': 'https://yetanotherwebsite.com '}]"
I'm trying to insert this data into a table that looks like this:
CREATE TABLE myTable(
id integer,
body varchar,
count, varchar,
url JSON
);
Using this command:
\copy mytable FROM 'myData.csv' WITH DELIMITER ',' csv header;
However, I'm having trouble with the single quotes in the
url
field of the csv. I'm getting errors like this:
ERROR: invalid input syntax for type json
DETAIL: Token "'" is invalid.
CONTEXT: JSON data, line 1: [{'...
Can someone show me the correct way to structure my \COPY
command so as to be able to load the url
field in the csv as a JSON?
If necessary I can do some pre-processing of the file at the command line.
Matt
(291 rep)
May 7, 2021, 07:11 PM
• Last activity: Mar 6, 2025, 05:04 PM
1
votes
2
answers
6620
views
unterminated CSV quoted field when to import csv data file into postgresql
Create a table : CREATE TABLE num ( id serial NOT NULL , adsh VARCHAR(20) NOT NULL, tag VARCHAR(256) NOT NULL, version VARCHAR(20) NOT NULL, coreg VARCHAR(256), ddate date NOT NULL, qtrs DECIMAL(8) NOT NULL, uom VARCHAR(20), value DECIMAL(28,4), footnote VARCHAR(512) ); I want to import the `sample....
Create a table :
CREATE TABLE num (
id serial NOT NULL ,
adsh VARCHAR(20) NOT NULL,
tag VARCHAR(256) NOT NULL,
version VARCHAR(20) NOT NULL,
coreg VARCHAR(256),
ddate date NOT NULL,
qtrs DECIMAL(8) NOT NULL,
uom VARCHAR(20),
value DECIMAL(28,4),
footnote VARCHAR(512)
);
I want to import the
sample.txt
data into num
table:
adsh tag version coreg ddate qtrs uom value footnote
0001213900-20-033598 DueToAsiyaCommunicationsSapiDeC.v.Current 0001213900-20-033598 20191231 0 USD
0001213900-20-033598 DueToDinarZuzLLC 0001213900-20-033598 20200630 0 USD 178000.0000 Due to the April 6, 2020 180 days Loan Agreement with the Company to borrow up to $250 at an annual interest rate of nine percent (9.0%) ("the second "Dinar Zuz Note").
0001213900-20-033598 DueToNextCalaCurrent 0001213900-20-033598 20181231 0 USD -14000.0000
0001213900-20-033598 DueToNextCalaCurrent 0001213900-20-033598 20191231 0 USD
The command to import the data in sample.txt
into the table:
COPY num(adsh,tag,version,coreg,ddate,qtrs,uom,value,footnote)
FROM 'sample.txt'
DELIMITER E'\t'
CSV HEADER;
It encounter an error:
ERROR: unterminated CSV quoted field
CONTEXT: COPY num, line 6: "0001213900-20-033598 DueToDinarZuzLLC 0001213900-20-033598 20200630 0 USD 178000.0000 Due to the Ap..."
Please fix it without editing the primitive data.
showkey
(386 rep)
Jan 27, 2022, 11:30 AM
• Last activity: Oct 21, 2024, 04:02 PM
1
votes
2
answers
4886
views
Copy MySQL User but change HostName
I am migrating to a docker MariaDB setup and all my users were created w/ "Localhost". From what I read and tested ironically `'%'` excludes `localhost`. I want to copy my existing users adding a second entry utilizing `'%'` as the hostname. Any suggestions?
I am migrating to a docker MariaDB setup and all my users were created w/ "Localhost". From what I read and tested ironically
'%'
excludes localhost
. I want to copy my existing users adding a second entry utilizing '%'
as the hostname. Any suggestions?
FreeSoftwareServers
(248 rep)
May 5, 2020, 04:15 AM
• Last activity: Aug 25, 2024, 05:57 PM
0
votes
0
answers
34
views
PostgreSQL COPY network-bound in benchmark scenarios - no difference between binary and string format?
I am benchmarking the write pefrormance of PostgreSQL (+ TimescaleDB) at very high loads for my particular use case. I won't go into details about the setup, requirements, table columns etc, but the scale for our particular table/use case is about 20 billion rows/month. For insertion, I'm using the...
I am benchmarking the write pefrormance of PostgreSQL (+ TimescaleDB) at very high loads for my particular use case. I won't go into details about the setup, requirements, table columns etc, but the scale for our particular table/use case is about 20 billion rows/month.
For insertion, I'm using the
command, as that seems to be the best way to achieve fast insertion into Postgres.
I have a pretty good server and manage to achieve about 400k rows/sec insertion rate into the database, which does come close to fulfilling what we need, but...
Upon further investigation, monitoring the resources of the server, nothing really seemed to be the bottleneck on the server itself, and that's when I found out the whole thing is network-bound. After investigation, some simple math brings the network overhead up to 3000 bytes per row (!).
This is highly inefficient, but we are copying strings after all, so it's understandable.
However, there is another way to do the
command: FROM STDIN WITH BINARY
. This should be more efficient, but I find myself having basically the same overhead per row. This is something I do not understand. Is this expected, or am I doing something wrong on my side? Here are some excerpts from the python code I am using to benchmark binary copy:
This is the part that does the insertion itself:
@retry
def flush(self) -> Tuple[int, timedelta]:
written_rows = 0
time_taken = timedelta()
if self.line_count > 0:
self.buffer.write(struct.pack("!h", -1))
self.buffer.seek(0)
cur = self.conn.cursor()
t_start = datetime.now()
copy_command = f"copy {self.table_name} from stdin with binary"
try:
cur.copy_expert(copy_command, self.buffer)
except Exception:
logging.exception("Error occurred while writing the buffer")
written_rows = cur.rowcount
self.conn.commit()
t_end = datetime.now()
time_taken = t_end - t_start
logging.info(f"Writing {written_rows} rows into {self.table_name} took {t_end - t_start}")
self.buffer = BytesIO()
self.buffer.write(struct.pack('!11sii', b'PGCOPY\n\377\r\n\0', 0, 0))
self.line_count = 0
return (written_rows, time_taken)
This is the part that does the formatting of what goes into the buffer:
class BinaryCopyTimescaleInsertionFormatter(BaseTimescaleInsertionFormatter):
# Zip: (column position, format, size)
metadata_row_format_part_1 = list(zip(('q', 'q', 'q', '?', '?'),
( 8, 8, 8, 1, 1 )))
metadata_row_format_part_2 = list(zip(('i',),
(4,)))
event_row_format = list(zip(('q', 'q', 'd', 'q', 'i', 'i', 'i'),
(8, 8, 8, 8, 4, 4, 4)))
def encode_string(self, string_to_encode):
return struct.pack('!I', len(string_to_encode)) + string_to_encode.encode("utf-8")
def sanitize_and_encode_string(self, string_to_encode):
return self.encode_string(sanitize_string(string_to_encode))
def encode_null(self):
return struct.pack('!i', -1)
def pack_col(self, col, row_format, val):
return struct.pack('!i' + row_format[col], row_format[col], val)
def format_metadata_line(self, metadata):
binary_data = bytearray(struct.pack('!h', 11)) #no_columns
binary_data.extend(self.pack_col(0, self.metadata_row_format_part_1, metadata.element_id))
binary_data.extend(self.pack_col(1, self.metadata_row_format_part_1, 1))
binary_data.extend(self.pack_col(2, self.metadata_row_format_part_1, metadata.type))
binary_data.extend(self.pack_col(3, self.metadata_row_format_part_1, True))
binary_data.extend(self.pack_col(4, self.metadata_row_format_part_1, False))
binary_data.extend(self.sanitize_and_encode_string(metadata.element_name))
binary_data.extend(self.sanitize_and_encode_string(metadata.element_name))
binary_data.extend(self.pack_col(0, self.metadata_row_format_part_2, metadata.dpt_id))
binary_data.extend(self.sanitize_and_encode_string(metadata.unit))
binary_data.extend(self.sanitize_and_encode_string(metadata.alias))
binary_data.extend(self.sanitize_and_encode_string(metadata.comment))
return (bytes(binary_data))
def _format_event_line(self, event):
if DpeType(event.type) == DpeType.STRING or DpeType(event.type) == DpeType.DYN_STRING:
el_num = None
el_str = event.value
else:
el_num = event.value
el_str = None
binary_data = bytearray(struct.pack('!h', 8)) #no_columns
binary_data.extend(self.pack_col(0, self.event_row_format, event.element_id))
binary_data.extend(self.pack_col(1, self.event_row_format, int(event.ts // 1000)))
binary_data.extend(self.pack_col(2, self.event_row_format, el_num) if el_num is not None else self.encode_null())
binary_data.extend(self.pack_col(3, self.event_row_format, event.status))
binary_data.extend(self.pack_col(4, self.event_row_format, event.manager))
binary_data.extend(self.pack_col(5, self.event_row_format, event.user))
binary_data.extend(self.encode_null())
binary_data.extend(self.sanitize_and_encode_string(el_str) if el_str is not None else self.encode_null())
return bytes(binary_data), None
def _format_dyn_event_line(self, event):
pass
user129186
(101 rep)
Aug 21, 2024, 08:07 AM
1
votes
3
answers
7069
views
How to import multiple CSV files to a postgres table using pgadmin or other method?
I'm looking to import a folder of over 100+ csv files into one table on PostgreSQL using pgadmin. I have the following script but is there a way to change the file path to all csvs? COPY schema.table FROM ‘C:\Documents\Data’ DELIMITER ‘,’ CSV HEADER;
I'm looking to import a folder of over 100+ csv files into one table on PostgreSQL using pgadmin. I have the following script but is there a way to change the file path to all csvs?
COPY schema.table FROM ‘C:\Documents\Data’ DELIMITER ‘,’ CSV HEADER;
Chris Jenner
(151 rep)
May 21, 2020, 11:03 AM
• Last activity: Jun 17, 2024, 07:22 PM
0
votes
0
answers
186
views
PostgreSQL - Bulk COPY slower with unlogged tables than logged tables
We have an ETL process that starts with loading data into PostgresSQL 16.2 databases (Azure Database for PostgreSQL flexible). We load 100,000 records at a time using COPY FROM STDIN. This is staging data that doesn't need to be recovered in case of a crash. Because of this, we decided to give unlog...
We have an ETL process that starts with loading data into PostgresSQL 16.2 databases (Azure Database for PostgreSQL flexible). We load 100,000 records at a time using COPY FROM STDIN. This is staging data that doesn't need to be recovered in case of a crash. Because of this, we decided to give unlogged tables a try to reduce costs and increase performance.
My issue is that with all other things being equal, simply adding 'UNLOGGED' to the create table statements increased load times by 30-35% in testing (~21 minutes up to ~28 minutes in one instance, ~31 minutes up to ~41 minutes in another). This doesn't make much sense given what I've read about unlogged tables. Indexes are only created after the data has been loaded into the tables. Overall IOPS to disk and bandwidth consumption are both down when using UNLOGGED. My write throughput (bytes/sec) is much lower with unlogged tables, but I'm not sure if that's a result of having fewer writes in the first place due to not hitting the WAL.
Everything I find online simply talks about "use unlogged tables for faster writes", but there is essentially zero information about what could cause using unlogged tables to result in slower writes. Are there any recommendations for optimizing the performance of bulk loading into unlogged tables? I realize I haven't provided a ton of specifics, but given the lack of information on what can cause this decrease in performance, I don't really know what specific information would be helpful at this point.
Happy to provide any addition information that I can. Any guidance is greatly appreciated!
Jensenator
(1 rep)
Jun 3, 2024, 01:10 PM
2
votes
1
answers
402
views
Should I delete index before bulk-loading data with COPY?
My pet project has one table which is totally rewritten once a day. This project uses a PostgreSQL database. I decided to use `COPY` to insert data from 4 CSV files (about 3 million rows). At the same time there are maybe some queries to that table from clients. This table has a composite index. I u...
My pet project has one table which is totally rewritten once a day. This project uses a PostgreSQL database. I decided to use
COPY
to insert data from 4 CSV files (about 3 million rows).
At the same time there are maybe some queries to that table from clients. This table has a composite index. I use COPY
in a transaction.
- Should the index be deleted before COPY and then recreated?
- Or index recreated automatically after new data inserted?
- Maybe there is better solution than I have chosen?
Queries need the index.
I don't understand whether the index will be rebuilt automatically or whether it will need to be rebuilt manually. If manually, then maybe delete it first and then create it after copy. But I see that in my case I can't delete the index because there are concurent queries.
Rash-77
(21 rep)
May 10, 2024, 10:35 AM
• Last activity: May 11, 2024, 09:14 AM
0
votes
1
answers
93
views
AWS RDS DB Migration
I have a PostgresFB on an AWS RDS instance that we want to migrate to an on-site server. The problem we having is the migrating of data and the time it is taking to migrate the data. I have tried AWS migration and that is taking an hour to do 100mb just imagine the time if it is as 80GB DB. If I do...
I have a PostgresFB on an AWS RDS instance that we want to migrate to an on-site server. The problem we having is the migrating of data and the time it is taking to migrate the data.
I have tried AWS migration and that is taking an hour to do 100mb just imagine the time if it is as 80GB DB.
If I do a backup and compress I can get the DB down to 47GB and then using AWS Data sync and that is only transfer it at 1mb a second so going to take quite a while to transfer.
What other suggestions is there for me to get this db either sync faster or get the data out of AWS faster?
Any suggestions from experience?
Elene Roos
(9 rep)
May 5, 2024, 06:36 PM
• Last activity: May 10, 2024, 09:42 AM
1
votes
2
answers
4377
views
postgresql copy comand with date in filename
I have a query and want the output in a .csv file with today's date and time in the name (I am using PostgreSQL v12). COPY (SELECT * FROM mytable) TO 'C:\Users\me\mylocation\filename_date_time.csv' CSV HEADER where date_time is `YYYY-MM-DD-h_m_s` (or similar) I have seen some solutions to this using...
I have a query and want the output in a .csv file with today's date and time in the name (I am using PostgreSQL v12).
COPY
(SELECT * FROM mytable)
TO 'C:\Users\me\mylocation\filename_date_time.csv' CSV HEADER
where date_time is
YYYY-MM-DD-h_m_s
(or similar)
I have seen some solutions to this using bash. For various reasons I am on a Windows machine with no bash.
I would like to do this using psql please.
we need a Mat. Stat.
(111 rep)
Jan 19, 2021, 11:54 PM
• Last activity: Feb 23, 2024, 07:04 AM
3
votes
1
answers
560
views
Creating a Trigger in MySQL that has a JOIN
**PROBLEM:** I'm trying to create a TRIGGER that will copy records from 2 different tables (**users**, **addresses**) in database A to a single table (**users**) in database B. Basically, I'm trying to consolidate data into a single table in a different DB. I have the following TRIGGER ``` CREATE TR...
**PROBLEM:**
I'm trying to create a TRIGGER that will copy records from 2 different tables (**users**, **addresses**) in database A to a single table (**users**) in database B.
Basically, I'm trying to consolidate data into a single table in a different DB.
I have the following TRIGGER
My other question is, given the 2 source tables get written to at slightly different times, I'm not sure where to fire this TRIGGER? If I fire it on AFTER INSERT on table (users) will table (addresses) have been written to yet? Not sure - not my code (it's a WordPress application I'm drilling into), so any advice as to where to keep this TRIGGER would be great.
Suggestions or assistance would be greatly appreciated.
CREATE TRIGGER Copy db1_user to db2_users
AFTER INSERT ON user
FOR EACH ROW
BEGIN
INSERT INTO database2.users(user_id, user_email, user_firstname, user_lastname)
VALUES (NEW.user_id, NEW.email, NEW.first_name, NEW.last_name);
END
The above works fine, where I bend my brain is getting the other chunk of data from the address table. I've created a diagram of what I'm trying to achieve, hopefully it makes sense. Oh, it also needs to make sure that the **user_id** doesn't already exist in the target table, which it shouldn't as the **user_id** field in the source table is AUTO INCREMENT.

Tony Stephens
(33 rep)
Jan 16, 2024, 02:59 AM
• Last activity: Jan 17, 2024, 11:20 AM
0
votes
1
answers
47
views
Proper way of copying larger data from another Azure database?
Being the 3rd party that is allowed to read from the Azure SQL (in cloud), we need to simulate access to the data as in the older version of the ERP that used plain old MS-SQL server. Only the subset of data is to be "mirrored" to our application database (on the same Azure SQL server). Some of the...
Being the 3rd party that is allowed to read from the Azure SQL (in cloud), we need to simulate access to the data as in the older version of the ERP that used plain old MS-SQL server. Only the subset of data is to be "mirrored" to our application database (on the same Azure SQL server). Some of the things were solved; however, the custom mirroring is affected by all the things. So, some introduction to situation...
**The source tables through source views:** The new version of the ERP adds the custom fields as contained in the extra tables. To simulate the older structure, we were allowed to create views that join the base and extra tables. The view was given the name that was used on the plain old SQL server by the older version of the ERP.
For example, the older table was named
[Sales Invoice Header]
, the new implementation uses the [Sales Invoice Header$base]
and the [Sales Invoice Header$extension]
. We have created the view named [Sales Invoice Header XXX]
(the XXX is the suffix to mark it as part of our solution; so, the developers of their ERP are not surprised where it came from).
**The external tables:** In Azure SQL, to access the data from another database, one have to create the *external table* (and some other things) to create a "key hole" to see the data in the other database. So, we have created the external tables with the same name as our views in the other database.
For example, we have created the external table (in our database) named the [Sales Invoice Header XXX]
.
**The helper views:** In our database, we further create the helper view that wrap the external tables. They are given the same name as were the tables in the older ERP, and they are renaming some of the fields to be closer to the older ERP. The reason is that we had a lot of SQL code for getting the data from the older ERP. Sometimes, the fields are renamed to be closer to the older solution.
For example, we now have the [Sales Invoice Header]
view.
**Trying to separate the solutions:** The access through the above "key hole" is slow. We also want to separate the ERP from our application to minimize the performance impact to the ERP. Because of that, we copy the needed data from the ERP to the tables in our database. So, we use the "data pumps" (user stored procedures) called by the separate application (called through the Windows scheduler).
**Copying the data to our tables:** There is basically no problem with small tables like code-lists. However, we need to copy also larger tables that can even be changed in time. Think about the [Sales Invoice Header]
and even bigger [Sales Invoice Lines]
. In the case, the dedicated data pumps are scheduled to be executed in the night. Only the recent history (like two years) is copied; anyway, it is still a lot of data, and we would like to avoid blocking the ERP -- even in the night hours.
**Splitting the number of rows to batches:** Learning on "how to do that properly", I have found various solutions that basically use the WHILE
loop to copy the rows in portions (unlocking the source of the date after the portion to give the chance to others). We did something similar even with the plain-old-SQL solution (it was much easier).
Currently, I have studied the [Copying Large Tables Efficiently In SQL Server, Last Updated on May 16, 2022 by Erik Darling](https://erikdarling.com/copying-rowstore-tables/) . However, the Azure SQL adds more questions.
**No access to the Azure SQL tools...** As our access rights to the Azure SQL database is limited, we have no access to the tools. For example, we cannot see the execution plans... Moreover, the Azure SQL requires other (new) skills that we do not have. Because of that, I am asking here...
**Any articles on proper ways of copying between Azure SQL databases? Can you point me to some? Or can you summarize some of your own experience?**
**Some of the observed problems:** When trying to use the classic approach to the situation, we can observe, for example:
- The original table has the clustered index. The extension table also has the clustered index -- the same PRIMARY KEY
. However, the index is "lost" when you JOIN
the tables in the view. I am aware that the classic views can have an index. Anyway, I have no personal experience with that, and I do not know how it would work through another external table (the "key hole") and another view (to rename the fields).
- The ORDER BY
seems to be very slow (no surprise) when applied in the context of the target database.
- Probably because of that, some of the "tricks" do not work well.
pepr
(143 rep)
Dec 20, 2023, 05:33 PM
• Last activity: Dec 21, 2023, 09:23 AM
5
votes
3
answers
17956
views
copy command in postgresql to append data
I am executing `copy` command from a function execute 'copy (select * from tableName) to ''/tmp/result.txt''' It works fine. But, ``tableName`` is dynamic and will be having more tables (iterating in a loop). ``result.txt`` is having only the last iteration (table) data. I could do, if I had to, via...
I am executing
copy
command from a function
execute 'copy (select * from tableName) to ''/tmp/result.txt'''
It works fine. But, `tableName
is dynamic and will be having more tables (iterating in a loop).
result.txt
` is having only the last iteration (table) data.
I could do, if I had to, via the command line with `\o
or
>>
` by appending to a file. But, I have to use the function.
I read http://shuber.io/reading-from-the-filesystem-with-postgres/ . It suggests a solution using temporary tables. Can I do something similar, but without temp tables? (I must avoid archive logs). Will creating a temp table create archive logs?
Is there any way to do appending with `copy
` command in a function?
RBB
(815 rep)
Sep 15, 2016, 05:18 AM
• Last activity: Oct 17, 2023, 11:18 AM
Showing page 1 of 20 total questions