Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

55 votes

2 answers

89230 views

Optimizing bulk update performance in PostgreSQL

Using PG 9.1 on Ubuntu 12.04. It currently takes up to 24h for us to run a large set of UPDATE statements on a database, which are of the form: UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE id = constid (We're just overwriting fields of objects identified by ID.) The values come...

                                  Using PG 9.1 on Ubuntu 12.04.

It currently takes up to 24h for us to run a large set of UPDATE
statements on a database, which are of the form:

    UPDATE table
    SET field1 = constant1, field2 = constant2, ...
    WHERE id = constid

(We're just overwriting fields of objects identified by ID.)  The values come from an external data source (not already in the DB in a table).

The tables have handfuls of indices each and no foreign key constraints.
No COMMIT is made till the end.

It takes 2h to import a pg_dump of the entire DB.  This seems like a
baseline we should reasonably target.

Short of producing a custom program that somehow reconstructs a data set
for PostgreSQL to re-import, is there anything we can do to bring the
bulk UPDATE performance closer to that of the import?  (This is an area
that we believe log-structured merge trees handle well, but we're
wondering if there's anything we can do within PostgreSQL.)

Some ideas:

- dropping all non-ID indices and rebuilding afterward?
- increasing checkpoint_segments, but does this actually help sustained
  long-term throughput?
- using the techniques mentioned here ? (Load new data as table, then
  "merge in" old data where ID is not found in new data)

Basically there's a bunch of things to try and we're not sure what the 
most effective are or if we're overlooking other things. We'll be
spending the next few days experimenting, but we thought we'd ask here 
as well.

I do have concurrent load on the table but it's read-only.
                                

xyzzyrz (671 rep)

Apr 27, 2013, 12:20 AM • Last activity: Sep 3, 2024, 07:51 PM

0 votes

2 answers

789 views

MySQL update a record on duplicate key update (merge)

mysql upsert bulk

I have a table `products` with the follow schema: ``` CREATE TABLE `products` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT, `user_id` bigint unsigned NOT NULL, `article_id` bigint unsigned NOT NULL, `price_cents` int unsigned NOT NULL, `quantity` smallint NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `...

I have a table products with the follow schema:

CREATE TABLE products (
  id bigint unsigned NOT NULL AUTO_INCREMENT,
  user_id bigint unsigned NOT NULL,
  article_id bigint unsigned NOT NULL,
  price_cents int unsigned NOT NULL,
  quantity smallint NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY idx_products_unique (user_id,article_id,price_cents),
  KEY fk_products_article (article_id),
  CONSTRAINT fk_products_article FOREIGN KEY (article_id) REFERENCES articles (id) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT fk_products_user FOREIGN KEY (user_id) REFERENCES users (id) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB

Now I can do INSERT with this query:

INSERT INTO products (user_id,article_id,price_cents,quantity) 
VALUES (1,1,200,1) 
ON DUPLICATE KEY UPDATE price_cents=VALUES(price_cents),quantity=quantity+VALUES(quantity)

So now I have 1 product (ID 1) with quantity 1 and price 200. Now I insert 2 more products with:

INSERT INTO products (user_id,article_id,price_cents,quantity) 
VALUES (1,1,200,1),(1,1,199,1) 
ON DUPLICATE KEY UPDATE price_cents=VALUES(price_cents),quantity=quantity+VALUES(quantity)

Now I have 2 products, one (ID 1) with quantity 2 and price 200 and the other (ID 2) with quantity 1 and price 199. Good. The problem comes now: I want to update the product with price 199 and set a new price to 200. What I do is:

INSERT INTO products (id,user_id,article_id,price_cents,quantity) 
VALUES (2,1,1,200) 
ON DUPLICATE KEY UPDATE price_cents=VALUES(price_cents),quantity=quantity+VALUES(quantity)

and what I would like is a single product with id 1, price 200, and quantity 3, but I get Number:0x426, Message: "Duplicate entry '1-1-200' for key 'products.idx_products_unique' because MySQL does not delete the product with ID 2. Is there a way to achieve this in MySQL (keep in mind that I want to perform these operations in bulk)?

Pioz (101 rep)

Mar 26, 2021, 10:42 AM • Last activity: Feb 21, 2024, 03:02 PM

8 votes

2 answers

8589 views

Should I disable autovacuum on a table while I do a bulk update?

postgresql bulk autovacuum

I need to perform a simple update on all rows in a table. The table has 40-50 million rows. Dropping indexes and constraints during the `UPDATE` results in a massive performance improvement. But what about autovacuum? Can autovacuum start a `VACUUM` or `ANALYZE` in the middle of an `UPDATE`? If so,...

                                  I need to perform a simple update on all rows in a table. The table has 40-50 million rows. Dropping indexes and constraints during the UPDATE results in a massive performance improvement.

But what about autovacuum? Can autovacuum start a VACUUM or ANALYZE in the middle of an UPDATE? If so, it would be useless work that would eat up machine resources. I could disable it on the table prior to the UPDATE and then re-enable it afterwards:

    ALTER TABLE my_table
    SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);
    -- Drop constraints, drop indexes, and disable unnecessary triggers

    UPDATE my_table SET ....;

    -- Restore constraints, indexes, and triggers
    ALTER TABLE my_table
    SET (autovacuum_enabled = true, toast.autovacuum_enabled = true);

Does this even work if I don't commit after the first ALTER?

Also, if I disable it during the UPDATE, will it trigger *after* the update, or will it ignore those updates because it was disabled during them? (My suspicion is that it will run, but I'd rather be sure.)

I'm using PG 9.3 right now, but should be upgrading soon. So any mention of changes in newer versions is appreciated.

jpmc26 (1652 rep)

Jan 18, 2017, 07:52 PM • Last activity: Jun 8, 2023, 07:32 PM

0 votes

1 answers

465 views

PostgreSQL: does the WAL group commit affect synchronous replication performance?

postgresql replication transaction bulk batch-processing

I'm curious if PostgreSQL's Group Commit mechanism (controlled by `commit_delay` and `commit_siblings` parameters) only affects disk fsync calls. Or does it also allow multiple on-the-fly transactions to be batched concurrently in WAL synchronous replication? From [here](https://wiki.postgresql.org/...

                                  I'm curious if PostgreSQL's Group Commit mechanism (controlled by commit_delay and commit_siblings parameters) only affects disk fsync calls.

Or does it also allow multiple on-the-fly transactions to be batched concurrently in WAL synchronous replication?

From [here](https://wiki.postgresql.org/wiki/Group_commit)  it should affect synchronous replication ("The proposed implementation this page describes is **heavily based on the existing synchronous replication implementation**. ")? 

But [here](https://www.postgresql.org/docs/current/wal-configuration.html)  says: "No sleep will occur **if fsync is not enabled**", so it seems that Group Commit mechanism only reduces local fsync, and does not support batch confirmation of WAL remote synchronous replication?

--------UPDATE:

My understanding: In synchronous (strongly consistent) replication, each commit on the local node waits until the remote standby node returns ack before continuing. 

So my question is: During this waiting time for ack signaling (at least one RTT - Round Trip Time):

**[A]** will the transactions on the concurrent connections of other clients continue to be replicated and committed to the standby node **without blocking**? 

**[B]** Or does each transaction have to **wait** for the end of the previous transaction to initiate a commit to the remote standby node (stop-and-wait protocol)?

As you can see, action [A] is much like group commit, and [B] is like no group commit optimization. Coupled with the descriptions on the pg's official wiki (It says "The group commit is heavily based on the existing synchronous replication implementation."), that's why I associate it with group commit optimization.

ASBai (103 rep)

Sep 27, 2022, 10:48 PM • Last activity: Sep 28, 2022, 09:47 PM

1 votes

1 answers

638 views

How do I bulk export data from an RDBMS without killing performance?

mysql postgresql export bulk scan

I'm building an app that lets the user export their data. When they click the Export button, we need to do a `SELECT * WHERE user_id =`, save it to a CSV file, and download it. There may be hundreds of thousands of entries. If the user edits their data while the export is in progress, the export can...

                                  I'm building an app that lets the user export their data. When they click the Export button, we need to do a SELECT * WHERE user_id =, save it to a CSV file, and download it. There may be hundreds of thousands of entries.

If the user edits their data while the export is in progress, the export can include the edits, or not. Either is fine, as long as it's a consistent snapshot taken at any point in time during the export.

The question is how do I do it without killing performance for all other queries? I don't mind if the export takes longer. Can I tell the database to somehow run this query at a low priority?

Assume a MySQL or Postgres database.

The only solution I know is to set up a read replica  just for exports. Exports can slow down the replica, but won't slow down the primary, thus ensuring fast performance for other queries. Is there a better solution to this problem?

Kartick Vaddadi (113 rep)

Aug 17, 2022, 06:04 AM • Last activity: Aug 19, 2022, 05:45 AM

5 votes

2 answers

2592 views

postgres_fdw slower then copy + scp + copy (~12 times)

postgresql bulk foreign-data

- foreign server 9.2 - local server 9.5 - table is 10GB - data transfer performed on same network interface as foreign server works - no indexes set on data - old way: 1. copy to - 2:36 2. scp - 08:17 3. copy from - 10:11 - postgres_fdw: 1. by the time old way finished it has done 800MB of `insert i...

                                  - foreign server 9.2
- local server 9.5
- table is 10GB
- data transfer performed on same network interface as foreign server works
- no indexes set on data
- old way:
  1. copy to - 2:36
  2. scp - 08:17
  3. copy from - 10:11

- postgres_fdw:
  1. by the time old way finished it has done 800MB of insert into .. select * from foreign_table

Did I miss something in config (meaning I can improve it), or postgres_fdw is just not meant for bulk load (meaning I can't improve it)?

(I use it for small data amount reconcile and it works fine. The idea of insert select from fdw instead of running bash commands looked so sweet.)*

I tried psql to remote server from local server and \copy table - six minutes - faster then over ssh.

The fetch_size option, not available prior to 9.6, can be mocked up with dblink_fetch(CURSOR, fetch_size) - see my answer below.
                                

Vao Tsun (1263 rep)

Feb 23, 2017, 04:11 PM • Last activity: May 30, 2022, 02:50 PM

45 votes

2 answers

118579 views

Most efficient way of bulk deleting rows from postgres

postgresql delete bulk

I'm wondering what the most efficient way would be to delete large numbers of rows from PostgreSQL, this process would be part of a recurring task every day to bulk import data (a delta of insertions + deletions) into a table. There could be thousands, potentially millions of rows to delete. I have...

                                  I'm wondering what the most efficient way would be to delete large numbers of rows from PostgreSQL, this process would be part of a recurring task every day to bulk import data (a delta of insertions + deletions) into a table. There could be thousands, potentially millions of rows to delete.

I have a file of primary keys, one per line. The two options I was thinking of were along the lines of the below, but I don't know/understand enough of the internals of PostgreSQL to make an informed decision which would be best.

- Execute a DELETE query for each row in the file, with a simple WHERE on primary key (or group the deletes in batches of n using an IN() clause)
- Import the primary keys into a temporary table using the COPY command and then deleting from the main table using a join

Any suggestions will be much appreciated!

tarnfeld (639 rep)

Feb 16, 2013, 05:43 PM • Last activity: May 26, 2022, 08:16 PM

7 votes

2 answers

4790 views

Bulk Data Loading and Transaction Log

sql-server sql-server-2008-r2 optimization bulk import

I'm currently working on a project which bulk import data from flat files (csv) about 18 different files each linking to a specific table through some stored procedure. I followed the steps as advised in [Data Loading Performance guide][1]. The database is in `BulkLogged` recovery mode to minimize t...

                                  I'm currently working on a project which bulk import data from flat files (csv) about 18 different files each linking to a specific table through some stored procedure.

I followed the steps as  advised in Data Loading Performance guide .

The database is in BulkLogged recovery mode to minimize the logging, when executing the stored procedure below on a file containing 600000 rows I get an error

> Msg 9002, Level 17, State 4, Procedure SP_Import__DeclarationClearanceHistory_FromCSV, Line 34     
> The transaction log for database  is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases

(for testing purposes I do a full backup before starting the import).

Looking at the log_reuse_wait_desc I see the following:

> log_reuse_wait_desc **CHECKPOINT**. All other import get imported successfully. 

Any input in solving this would be welcomed.

    PROCEDURE [dbo].[SP_Import_DeclarationClearanceHistory_FromCSV]
     	@FilePath [nvarchar](1000)
    AS
    BEGIN
        -- Creating a Temproary Table for importing the data from csv file.
        DBCC TRACEON(610)

        CREATE TABLE #DeclarationClearanceHistory
        (
              [ItemID] [int] IDENTITY(1, 1) NOT NULL ,
              [CMSDeclarationID] [bigint] NOT NULL ,
              [StatusCode] [nvarchar](10) NOT NULL ,
              [SubStatus] [nvarchar](10) NULL ,
              [DepartmentCode] [nvarchar](10) NULL ,
              [StartDate] [datetime] NULL ,
              [EndDate] [datetime] NULL ,
              PRIMARY KEY CLUSTERED ( [ItemID] ASC )
                WITH ( PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
                       IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
                       ALLOW_PAGE_LOCKS = ON ) ON [PRIMARY]
            )
        ON  [PRIMARY]
         
    	-- Inserting all the from csv to temproary table using BULK INSERT
        EXEC ('BULK INSERT #DeclarationClearanceHistory
        FROM ''' + @FilePath + '''
        WITH ( FIELDTERMINATOR = '''', ROWTERMINATOR =''\n'', FIRSTROW = 2, KEEPIDENTITY, CODEPAGE = ''ACP'', ORDER = ''ITEMID ASC'' );') ;
        
        -- By using MERGE statement, inserting the record if not present and updating if exist.
        MERGE dbo.DeclarationClearanceHistory AS TargetTable                            -- Inserting or Updating the table.
            USING #DeclarationClearanceHistory AS SourceTable                           -- Records from the temproary table (records from csv file).
            ON ( TargetTable.ItemID = SourceTable.ItemID )      -- Defining condition to decide which records are alredy present
            WHEN NOT MATCHED BY TARGET 
                THEN INSERT (
                              ItemID ,
                              CMSDeclarationID ,
                              StatusCode ,
                              SubStatus ,
                              DepartmentCode ,
                              StartDate ,
                              EndDate
                            )
                   VALUES   ( SourceTable.ItemID ,
                              SourceTable.CMSDeclarationID ,
                              SourceTable.StatusCode ,
                              SourceTable.SubStatus ,
                              SourceTable.DepartmentCode ,
                              SourceTable.StartDate ,
                              SourceTable.EndDate
                            )
            WHEN MATCHED -- If  matched then UPDATE
                THEN UPDATE
                   SET      TargetTable.ItemID = SourceTable.ItemID ,
                            TargetTable.CMSDeclarationID = SourceTable.CMSDeclarationID ,
                            TargetTable.StatusCode = SourceTable.StatusCode ,
                            TargetTable.SubStatus = SourceTable.SubStatus ,
                            TargetTable.DepartmentCode = SourceTable.DepartmentCode ,
                            TargetTable.StartDate = SourceTable.StartDate ,
                            TargetTable.EndDate = SourceTable.EndDate ;
    DBCC TRACEOFF(610)
    END
                                

Raymond (229 rep)

Dec 19, 2012, 01:58 PM • Last activity: Jan 20, 2022, 09:01 AM

7 votes

2 answers

6202 views

Index before or after bulk load using load infile?

mysql index myisam import bulk

I have a database with over 1B rows and two columns that are indexed (in addition to the PK). Is it better to have the index pre-defined in the table before the load infile or better to index after the data has been loaded? A couple of notes regarding data size and system: - System is Linux w/ 8 cor...

                                  I have a database with over 1B rows and two columns that are indexed (in addition to the PK).
Is it better to have the index pre-defined in the table before the load infile or better to index after the data has been loaded?

A couple of notes regarding data size and system:

 - System is Linux w/ 8 cores and 32GB memory (currently maxed out
   unless I move to new HW)
 - DB is 1B rows that in raw data size is 150GB data.
 - Database is MyISAM and is mainly read-only after it's loaded.
                                

qman777 (71 rep)

Mar 24, 2013, 10:44 PM • Last activity: Nov 25, 2020, 07:19 AM

16 votes

3 answers

34478 views

Create a table from a CSV file with headers

mysql csv import bulk

I'm seeking to find a way to generate a new MySQL table solely based on the contents of a specified CSV. The CSV files I'll be using have the following properties; - "|" delimited. - First row specifies the column names (headers), also "|" delimited. - Column names & order are not fixed. - The numbe...

                                  I'm seeking to find a way to generate a new MySQL table solely based on the contents of a specified CSV. The CSV files I'll be using have the following properties;

- "|" delimited. 
- First row specifies the column names (headers), also "|" delimited. 
- Column names & order are not fixed. 
- The number of columns is not fixed. 
- Files are of a large size (1 mil rows / 50 columns). 

In Excel this is all rather simple, however with MySQL it does not appear to be (no luck with Google). Any suggestions on what I should be looking at?
                                

user58602 (161 rep)

Feb 14, 2015, 08:37 PM • Last activity: Sep 30, 2020, 08:32 PM

0 votes

0 answers

236 views

Creating a Format File For Bulk import

sql-server bcp bulk

So when I try implementing this query: BCP -SMSSQLSERVER01.[Internal_Checks].[Jan_Flat] format out ^ -fC:\Desktop\exported data\Jan_FlatFormat.fmt -c -T -UUGmn91 -SMSSQLSERVER01 -PChan I am getting an error "A valid table name is required for in, out, or format options." I want to create a Format Fi...

                                  So when I try implementing this query:

    BCP -SMSSQLSERVER01.[Internal_Checks].[Jan_Flat] format out  ^
    -fC:\Desktop\exported data\Jan_FlatFormat.fmt -c -T  -UUGmn91 -SMSSQLSERVER01 -PChan

I am getting an error "A valid table name is required for in, out, or format options."

I want to create a Format File to bulk upload data. What am I doing wrong?

RAJAT (1 rep)

Jun 17, 2020, 10:24 AM • Last activity: Jun 17, 2020, 04:01 PM

2 votes

2 answers

1329 views

SSMS - Openrowset maximum file size

sql-server insert openrowset bulk

Apologies if this has been answered elsewhere but I couldn't find anything here or online. Does openrowset have a file size limit? I'm trying to write a stored procedure, part of which checks if certain characters exist within the file and if they do I'd have to skip to the next section of the store...

                                  Apologies if this has been answered elsewhere but I couldn't find anything here or online.

Does openrowset have a file size limit? I'm trying to write a stored procedure, part of which checks if certain characters exist within the file and if they do I'd have to skip to the next section of the stored procedure. So far I've simply bulk inserted the entire file into a one column table then did:

    IF(SELECT COUNT(*) FROM #fulltable WHERE fulltable LIKE '%}%')>0 GOTO NEXTSECTION
    IF(SELECT COUNT(*) FROM #fulltable WHERE fulltable LIKE '%~%')>0 GOTO NEXTSECTION
    IF(SELECT COUNT(*) FROM #fulltable WHERE fulltable LIKE '%#%')>0 GOTO NEXTSECTION

It's reliable but very slow when dealing with large files (sometimes over 10GB). I'm thinking the below query would be quicker for large files


    DECLARE @FILE NVARCHAR(MAX)
    select @FILE = BULKCOLUMN from (
    select * from openrowset(BULK N'filpath', single_clob) [a]
    )a

    IF(SELECT IIF(@FILE LIKE '%{%',1,0)) = 1 GOTO NEXTSECTION
    IF(SELECT IIF(@FILE LIKE '%}%',1,0)) = 1 GOTO NEXTSECTION
    IF(SELECT IIF(@FILE LIKE '%~%',1,0)) = 1 GOTO NEXTSECTION
    IF(SELECT IIF(@FILE LIKE '%@%',1,0)) = 1 GOTO NEXTSECTION
    IF(SELECT IIF(@FILE LIKE '%£%',1,0)) = 1 GOTO NEXTSECTION


But as I say Bulk insert is reliable and I'd hate to risk the file being truncated when using OPENROWSET if there is some sort of file size limit.

Any advice on the matter would be appreciated.
                                

Anonymous (155 rep)

Oct 15, 2019, 12:01 PM • Last activity: Feb 25, 2020, 09:11 PM

4 votes

1 answers

1511 views

Select and Update MAC Addresses to Same Format

mysql update query bulk

I have a mysql database with a `macaddress` column, this contains thousands of mac addresses in various formats; uppercase, lowercase and with colons. See below for an example; +-------------------+ | macaddress | +-------------------+ | 90CCAADD3341 | +-------------------+ | 90:3f:ff:11:22:33 | +--...

                                  I have a mysql database with a macaddress column, this contains thousands of mac addresses in various formats; uppercase, lowercase and with colons. See below for an example;

    +-------------------+
    | macaddress        |
    +-------------------+
    | 90CCAADD3341      |
    +-------------------+
    | 90:3f:ff:11:22:33 |
    +-------------------+
    | 33:44:aa:bb:34:6a |
    +-------------------+
    | 8801abcd3231      |
    +-------------------+

Is there any way I can bulk update all rows in this column so they are in the same format?

I'd like them all to be uppercase with no colons, see below;

    +--------------+
    | macaddress   |
    +--------------+
    | 90CCAADD3341 |
    +--------------+
    | 903FFF112233 |
    +--------------+
    | 3344AABB346A |
    +--------------+
    | 8801ABCD3231 |
    +--------------+

How can I achieve this?
                                

TheOrdinaryGeek (177 rep)

Oct 31, 2019, 10:11 AM • Last activity: Oct 31, 2019, 10:26 AM

0 votes

2 answers

322 views

Can't see process progress, machine getting slow trying to import 500 million records

postgresql import logs copy bulk

Working environment: PostgreSQL 12.0, Ubuntu 18.10, 4GB memory, i5-3230M CPU @ 2.60GHz × 4, Os 64 bits I'm trying to import a .csv file with more than 500 million records into a PostgreSQL db table, using my PC. (we're testing on my PC until it works, after that we will do it on a server) My company is migrating from Firebird to Postgres, so this should be one time task. After that, we expect to insert ~200k records each day. Create table with (SQL generated from original firebird database)

CREATE TABLE REGDATA
	 (
	   	CODIGO integer NOT NULL,
	   	DTAREG timestamp NOT NULL,
	   	PERIOD integer NOT NULL,
	   	FLDCODIGO integer,
	   	REGVALUE double precision,
	   	CLICODIGO integer,
	   	SITCODIGO integer,
	   	CONSTRAINT PK_REGDATA PRIMARY KEY (CODIGO)
	 );

.csv is generated using FBExport 1.9 ./fbexport -Sc -D /opt/firebird/bin/measures.fdb -H localhost -U user -P password -F /home/dani/Documents/raw_regdata.out -Q "SELECT * FROM REGDATA" I convert raw_regdata.out to utf8 to work with postgres encoding. Getting rid of some characters (ã,º,etc) iconv -c -t utf8 /home/dani/Documents/raw_regdata.out > /home/dani/Documents/utf8_regdata.out I've followed Postgres guidelines on how to import large data, like: - Dropping all Indexes; - Increasing maintenance_work_mem (maintenance_work_mem = 512MB) - Increasing max_wal_size (max_wal_size = 4GB) - Setting wal_level = minimal - Setting max_wal_senders = 0 After connecting to Postgres cluster, I use COPY to import the data COPY REGDATA(CODIGO,DTAREG,PERIOD,FLDCODIGO,REGVALUE,CLICODIGO,SITCODIGO) from '/home/dani/Documents/utf8_regdata.out' DELIMITER ',' CSV HEADER; And here comes the problem, I cant see if the command is **progressing** or not, my PC gets very **slow** after some time and it just **freezes**. 1st Attempt: let it run for 2-3 hours, then killed the process (Ctrl+c on terminal), from logs I see that it was progressing (line 131158327):

2019-10-16 10:28:05.657 -03  postgres@measures ERROR:  canceling statement due to user request
2019-10-16 10:28:05.657 -03  postgres@measures CONTEXT:  COPY regdata, line 131158327: ""178865944","13.03.2015 12:10:00","600","22439","358.60000000000002","9","37""
2019-10-16 10:28:05.657 -03  postgres@measures STATEMENT:  COPY REGDATA(CODIGO,DTAREG,PERIOD,FLDCODIGO,REGVALUE,CLICODIGO,SITCODIGO) from '/home/dani/Documents/utf8_regdata.out' DELIMITER ',' CSV HEADER;

But I try to select something and get nothing, I think since Copy only commits after the whole process finish, it just undo everything and I end up with a empty table 2nd Attempt: let it run 24 hours (even if my PC is totally locked), then kill the process again, hoping to see more progress on logs, but for my surprise I dont see any similar logs, only the same 3 warnings over and over again:

2019-10-16 17:42:31.061 -03  LOG:  using stale statistics instead of current ones because stats collector is not responding
.
.
.
2019-10-17 06:10:31.423 -03  WARNING:  worker took too long to start; canceled
2019-10-17 06:57:19.150 -03  WARNING:  autovacuum worker started without a worker entry
2019-10-17 08:04:47.445 -03  LOG:  starting PostgreSQL 12.0 (Ubuntu 12.0-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
2019-10-17 08:04:47.622 -03  LOG:  listening on IPv4 address "127.0.0.1", port 5412
2019-10-17 08:04:48.048 -03  LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5412"
2019-10-17 08:04:53.879 -03  LOG:  database system was interrupted; last known up at 2019-10-16 15:47:25 -03
2019-10-17 08:05:10.887 -03  LOG:  database system was not properly shut down; automatic recovery in progress
2019-10-17 08:05:11.534 -03  LOG:  redo starts at 14/B8254908
2019-10-17 08:05:11.847 -03  LOG:  invalid record length at 14/B8260848: wanted 24, got 0
2019-10-17 08:05:11.847 -03  LOG:  redo done at 14/B82607D0
2019-10-17 08:05:16.417 -03  LOG:  database system is ready to accept connections

There's a way of monitoring the progress (which line it is currently working on), and it is normal to get a dead machine while the process is active? ------------------------------------------ edit: Looking on original firebird database, I can see primary key is not sorted:

dwenzel (63 rep)

Oct 17, 2019, 02:38 PM • Last activity: Oct 17, 2019, 04:20 PM

3 votes

1 answers

2006 views

Mongodb - Will Bulk.find.remove() lock the collection

mongodb delete bulk

I'm planning to run `Bulk.find.remove()` on a very big collection. (1.2TB). Will it block inserting or updating other records (which are not a part of the above `find`)?

                                  I'm planning to run Bulk.find.remove() on a very big collection. (1.2TB). 

Will it block inserting or updating other records (which are not a part of the above find)?

TheDataGuy (1986 rep)

Oct 9, 2019, 12:33 AM • Last activity: Oct 9, 2019, 08:19 AM

0 votes

0 answers

635 views

Import folder with Multiple CSV's into SSMS w/ out SSIS

sql-server ssms import csv bulk

I am wondering what's the best and most effective way to import csv's stored in a local folder into DB w/ in SSMS. Folder path- C:\Users\ngutierrez\Desktop\Open_Air_Tables Folder contains about 30 csv files. Each time these csv's are updated I need the DB to reflect the changes.

                                  I am wondering what's the best and most effective way to import csv's stored in a local folder into DB w/ in SSMS.

Folder path- C:\Users\ngutierrez\Desktop\Open_Air_Tables

Folder contains about 30 csv files. Each time these csv's are updated I need the DB to reflect the changes.

NicolasGutierrezToD (13 rep)

Aug 12, 2019, 08:26 PM • Last activity: Aug 13, 2019, 06:49 AM

4 votes

2 answers

4006 views

How do you copy a table from MySqlServer_A to MySqlServer_B?

mysql stored-procedures mysql-5 bulk

I have a daily task that collects data from 3 MySql Servers and then pushes that aggregate data back out to each of the servers deleting the previous days data. All servers are MySql. I would like to put this into a stored proc. I have been searching for a way to connect from one MySql Server to ano...

                                  I have a daily task that collects data from 3 MySql Servers and then pushes that aggregate data back out to each of the servers deleting the previous days data. All servers are MySql. I would like to put this into a stored proc.  

I have been searching for a way to connect from one MySql Server to another, run a query, next server - rinse and repeat. Besides not finding any examples of a stored procedure that connects to another MySql server, my immediate impression is that this is not really acceptable, nor intended.  

I can select into an outfile which creates a CSV. Is there any way to force an Xml document instead? I don't want to have to read into a datatable just to get Xml. Of course, Xml won't work with the Bulkload method, will it? 

What are my realistic options? One concern about selecting into an outfile is the use of weird characters: ' ,  \ \\ & @  / // etc...how will those characters affect the Bulkload process?

IAbstract (231 rep)

May 31, 2011, 03:43 PM • Last activity: Mar 8, 2018, 07:02 PM

4 votes

2 answers

7343 views

Mysql load from infile stuck waiting on hard drive

mysql innodb mysql-5.5 insert bulk

I've got a windows 7 64 bit machine that I'm using for some load testing of a mysql db. My program uses sqlalchemy to connect and run several `load from infile` statements on said database. These bulk loads all happen within a single transaction, all keys are disabled beforehand, and each csv file i...

                                  I've got a windows 7 64 bit machine that I'm using for some load testing of a mysql db. My program uses sqlalchemy to connect and run several load from infile statements on said database. These bulk loads all happen within a single transaction, all keys are disabled beforehand, and each csv file is only a few megabytes large.

The problem I've run into is that the test machine gets IO bound. It has enough ram available (12G) to hold the entire transaction in memory and do a single flush out the other end. As far as I understand the manual, the innodb tables shouldn't touch the hard drive until it flushes the dirty pages at transaction completion.

The total data to be loaded is about 1G, spread across the different tables. It ends up taking 37 minutes to load it all. Here's my current test settings  for perusal. I'd also be happy to report the results from show engine innodb status or similar queries if necessary.

To recap, I need to know if 37 minutes is a fast insert speed for this data size, and what I can do to increase the insert speed. 

**Edit:**

Whoops! I forgot some important info.

    Mysql version 5.5
    Server has 12G total ram
    Total rows inserted ~2,597,240

Spencer Rathbun (161 rep)

Jul 13, 2012, 07:42 PM • Last activity: Sep 4, 2017, 03:57 AM

24 votes

3 answers

40373 views

Why is 'LOAD DATA INFILE' faster than normal INSERT statements?

mysql import bulk

I've read an article that mentioned we can achieve **60,000 inserts per second** by using the `LOAD DATA IN FILE` statement, that reads from csv files and inserts the data into a database. Why should it differ from normal inserts? **EDIT:** I reduced the round-trip by calling just one `INSERT` state...

                                  I've read an article that mentioned we can achieve **60,000 inserts per second** by using the LOAD DATA IN FILE statement, that reads from csv files and inserts the data into a database.

Why should it differ from normal inserts?

**EDIT:**  
I reduced the round-trip by calling just one INSERT statement:  

    INSERT INTO tblname
    VALUES (NULL,2,'some text here0'),(NULL,2,'some text here1')
        ,(NULL,2,'some text here2'),(NULL,2,'some text here3')
        .....,(NULL,2,'some text here3000');

What about this?
                                

Alireza (3676 rep)

Apr 20, 2012, 10:35 AM • Last activity: Aug 30, 2017, 06:49 AM

1 votes

0 answers

1121 views

BULK INSERT ERRORFILE Not Created

sql-server dynamic-sql bulk

We have stored procedure on SQL Server v12.0.5203.0. Within that procedure we have a dynamic SQL statement that performs `BULK INSERT`. We are trying to capture misformatted rows and continue processing using `ERRORFILE`. The problem is that the `ERRORFILE` does not get created or updated. The SQL S...

                                  We have stored procedure on SQL Server v12.0.5203.0. Within that procedure we have a dynamic SQL statement that performs BULK INSERT. We are trying to capture misformatted rows and continue processing using ERRORFILE. The problem is that the ERRORFILE does not get created or updated.
The SQL Server command is running as account NT Service\MSSQL$SQLEXPRESSSERVER. The directory where we are trying to write the ERRORFILE is on the C: drive on the same physical box and has account MSSQL$SQLEXPRESSSERVER with full control. Is there some additional flag or permissions that need to be changed that will allow the creation of the ERRORFILE? 

The error is :

>Msg 245, Level 16, State 1, Line 6
Conversion failed when converting the nvarchar value '""' to data type int.

Shouldn't it log this row into the ERRORFILE and keep processing? 

this is a snippet from the stored proc:

	SET @paramDefinition = N'@inputFileName as nvarchar(500),
							@formatFileName as nvarchar(500),
							@conEdBcpErrorLogs as nvarchar(500),
							@firstRow as Int';

	SET @sql2 = N'BULK INSERT [emioutagesbcp]
				FROM "' +  @inputFileName + '"
				WITH  (
						FIRSTROW = ' + Convert(VARCHAR(10),@firstRow) + ',
						FORMATFILE = ''' + @formatFileName + ''',
						ERRORFILE = ''' + @conEdBcpErrorLogs + ''',
						FIELDTERMINATOR = ''|'',  
						ROWTERMINATOR = ''\r\n'',
						MAXERRORS = 100)';
	--PRINT @sql2

	EXECUTE sp_executesql @sql2, @paramDefinition, @inputFileName = @inputFileName, @formatFileName = @formatFileName, @conEdBcpErrorLogs = @conEdBcpErrorLogs, @firstRow = @firstRow;



                                

G Temme (11 rep)

Jul 18, 2017, 06:32 PM • Last activity: Jul 18, 2017, 07:52 PM

Showing page 1 of 20 total questions