Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

2 votes

1 answers

478 views

Bulk load data and provide row-by-row feedback

We want to allow our users to import CSV files containing many thousands of records to "pre-load" a particular table in our system. It's not strictly a one-time operation, it may be performed multiple times as they get feedback, tweak the CSV, and re-import. I understand that this is not a lot of data, relatively speaking. But, we want the operation to be atomic (this is a multi-user system other users may also be adding or modifying data) and we want to provide per-row feedback (like "Row 123 is a duplicate of item 'abc'"). I know that statements like COPY or tools like pg_bulkload can do the work very quickly, but this is user-interactive via our web app and we _require_ row-by-row feedback. Our naïve approach was to run sequential INSERT statements and to catch exceptions. Even though we're re-using a single connection, this makes separate calls and uses a separate transaction. This is taking _way_ too long.

for(var i = 0; i < rows.Count; i++)
{
    try
    {
        connection.Execute("INSERT INTO table1 VALUES (...)");
        // format a "successful" response
    }
    catch (PostgresException e)
    {
        // format a "failed" response based on exception details... 
    }
}

We tried creating a single INSERT statement with multiple VALUES, but this does not provide row-by-row feedback and it stops on the 1st error (I found no useful way to use ON CONFLICT here). We also tried sharing a transaction and running sequential statements but the transaction is "aborted" on the 1st error.

using(var tx = connection.BeginTransaction())
{
    for(var i = 0; i < rows.Count; i++)
    {
        try
        {
            connection.Execute("INSERT INTO table1 VALUES (...)", tx);
            // format a "successful" response
        }
        catch (PostgresException e)
        {
            // format a response based on the type of failure... 
        }
    }
}

I'm beginning to think that we need to run this entire operation _within_ the database in SQL. We have only one constraint right now, a NOT NULL UNIQUE "name" column. So the statuses would be something like... * "OK": the row inserted fine (return the new PRIMARY KEY "id" column) * "Conflict": the name in this row is a duplicate of an existing row either a row seen earlier in the CSV or already in the table (return the duplicate name) * "Bad Request": the name in this row is missing, empty, or all whitespace We also need to return the index (in insert order, I guess) in order to approximate the line number in the original CSV file. I'm thinking we need to INSERT into a temporary table and use something like row_number(). Given a CSV like this (name + some example columns, just to make the point)

name,region,code,set
A1,A,100,1
A2,A,101,1
A1,A,102,1
,B,200,2
C1,C,300,3

I would need responses like this: index | id | name | status ------|--------|--------|------- 1 | 100 | A1 | OK 2 | 101 | A2 | OK 3 | [null] | A1 | Conflict 4 | [null] | [null] | Bad Request 5 | 102 | C1 | OK Am I on the right path? Is this a workable solution or is there some other technique that would work better? Does it make sense to work the queries out by status? Like, rather than try to process one record at a time, can I try to find all of the bad requests 1st, then all of the duplicates, then all of the new rows (and insert them)? I'm fine with this response, we can always reorganize them in the application code. index | id | name | status ------|--------|--------|------- 4 | [null] | [null] | Bad Request 3 | [null] | A1 | Conflict 1 | 100 | A1 | OK 2 | 101 | A2 | OK 5 | 102 | C1 | OK Believe me, I'm asking around about the requirement, but it seems stuck for now &emdash; they really want to provide the user with specific row-by-row feedback. If I insert into a temporary staging table, can I run 3 set-based operations? One for bad requests, one for duplicates, and one for new records?

Anthony Mastrean (121 rep)

Jul 9, 2018, 05:32 PM • Last activity: Aug 1, 2025, 05:02 AM

0 votes

1 answers

1114 views

BULK INSERT does not fail when file contains commas instead of semicolons when FIRSTROW > 1

t-sql csv bulk-insert

I encountered an issue where I received a CSV file that supposed to be delimited by semicolons (;) but was delimited by commas (,). Bulk insert called by sqlcmd did not fail but did not insert either. I know that calling xp_cmdshell is not best practice but please don't comment on this. After invest...

CREATE TABLE [dbo].[test_table](
	[id] [int] NULL,
	[title] [varchar](10) NULL,
	[val] [int] NULL
)

Format file:

12.0
3
1       SQLCHAR              0       4       ";"    1     "id"             ""
2       SQLCHAR              0       10      ";"    2     "title"          SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR              0       4       "\r\n"     3     "val"            ""

Data file:

1,first,0
2,second,2

Bulk insert:

DECLARE @cmd VARCHAR(4000);
SET @cmd = 'sqlcmd -b -S  -Q "set nocount on; set dateformat dmy; bulk insert [test_db].[dbo].[test_insert] from ''C:\temp\test_table.csv'' with ( DATAFILETYPE = ''char'', TABLOCK, MAXERRORS = 1000, FIELDTERMINATOR = '';'', ROWTERMINATOR = ''\r\n'', BATCHSIZE = 100000, FORMATFILE = ''C:\temp\test_table.txt'', FIRSTROW = 2 );"';
EXEC xp_cmdshell @cmd;

owl (310 rep)

Dec 4, 2020, 03:03 PM • Last activity: Jul 13, 2025, 09:05 AM

1 votes

1 answers

3433 views

Bulk Inserts in Postgres

postgresql plpgsql bulk-insert

This is with respect to Data Migration activity where the historical data from the client database is migrated to vendor Postgres Database. There will be millions of transactions that need to be migrated as the Big Bang approach. In oracle database, I used to use the below template of code for migra...

                                  This is with respect to Data Migration activity where the historical data from the client database is migrated to vendor Postgres Database. There will be millions of transactions that need to be migrated as the Big Bang approach.

In oracle database, I used to use the below template of code for migration -

    create or replace PROCEDURE PRC_TEST
    AS
    DECLARE
    	CURSOR CUR
    	IS
        SELECT ID,NAME  FROM test; 
    TYPE test_typ IS TABLE OF CUR%ROWTYPE INDEX BY PLS_INTEGER;
    test_tbl test_typ;	
    
    BEGIN
    OPEN CUR;
    LOOP
    FETCH cur BULK COLLECT INTO test_tbl LIMIT 1000;
    DBMS_OUTPUT.PUT_LINE(test_tbl.COUNT);
         FORALL I IN 1..test_tbl.COUNT --SAVE EXCEPTIONS
    				INSERT
    				INTO test1(ID,NAME)
    				VALUES
    				(
    				test_tbl(I).id,
                    test_tbl(I).name
    				);
                    
           FORALL I IN 1..test_tbl.COUNT
                      UPDATE test1 
                      SET name     = name||test_tbl(I).NAME
                      WHERE id =test_tbl(I).id;
                      
                    DBMS_OUTPUT.PUT_LINE('AFTER'|| test_tbl.COUNT);  
            EXIT WHEN CUR%NOTFOUND ;       
         END LOOP;
    	 commit;
     close cur;    
     EXCEPTION
          WHEN OTHERS THEN
            FOR I IN 1 .. SQL%BULK_EXCEPTIONS.COUNT
            LOOP
            dbms_output.put_line('error'||sqlerrm);
            END LOOP;
    END;
    End PRC_TEST;

Is there a plpgsql equivalent code available for the same? 
What approach to be used in Postgres while handling such migration activity ? please provide some best practices to be followed for better performance and handling/storing the error records in Postgres.

Thanks..! 
                                

cpb (11 rep)

Jul 28, 2020, 09:07 AM • Last activity: May 5, 2025, 07:00 AM

0 votes

1 answers

671 views

Using OUTPUT INSERTED.id to UPDATE (i.e. not insert new record) existing rows

sql-server update bulk-insert cursors output-clause

1.) @MULTIPLE_RESULTS_TABLEVAR has fields (x, y, OrdersTableID) and values: [a,b,Null], [c,d,Null], [e,f,Null] 2.) Goal is to bulk insert @MULTIPLE_RESULTS_TABLEVAR data into an OrdersTable having fields (id, x, y) with each ORDERS_TABLE.id (aka identity) returned to ***update*** @MULTIPLE_RESULTS_T...

                                  1.) @MULTIPLE_RESULTS_TABLEVAR has fields (x, y, OrdersTableID) and values:

    [a,b,Null], [c,d,Null], [e,f,Null]

2.) Goal is to bulk insert @MULTIPLE_RESULTS_TABLEVAR data into an OrdersTable having fields (id, x, y) with each ORDERS_TABLE.id (aka identity) returned to ***update***  @MULTIPLE_RESULTS_TABLEVAR to make the values:

    [a,b,1], [c,d,2], [e,f,3]

3.) But using OUTPUT INSERTED.id INTO @MULTIPLE_RESULTS_TABLEVAR ***adds*** new rows to @MULTIPLE_RESULTS_TABLEVAR yielding values:

    [a,b,Null], [a,b,Null], [a,b,Null], [NULL,NULL,1], [NULL,NULL,2], [NULL,NULL,3]

4.) I can't find a documentation  option or non-kludgy strategy to ***UPDATE*** the existing rows. Specifically I don't want to trust a ( LAST_INSERT_@@SCOPE_IDENTITY - count(MULTIPLE_RESULTS_TABLEVAR.id) ) while echoing to a new #temptable  or a CURSOR/LOOP to INSERT then UPDATE with @@SCOPE_IDENTITY seems to defeat the whole purpose of **OUTPUT INSERTED**.

M S (21 rep)

Sep 12, 2023, 12:14 AM • Last activity: Apr 16, 2025, 10:03 AM

0 votes

2 answers

1695 views

Hints on bulk loading 3.6 billion rows to InnoDB on Aurora MySQL

mysql mysql-5.6 bulk-insert aws-aurora amazon-rds-aurora

everyone! I'm struggling for a week to bulk load 3.6 billion rows to an InnoDB table on Aurora MySQL 5.6.10a. This table has one FK to a "Main" table and has 12 columns. The first 1.4 billion were loaded overnight, but right now my insert rate is dropping quiklly. I **disabled** ```unique_check``` a...

and

but let

**on**. I splited the file into 506 files with around 3.84GB each (7,000,000 rows each) and I'm using

DATA FROM S3

to load them to the table. Any hints to improve this task? Thank you very much! **Additional details** All other tables in my SCHEMA use InnoDB as Engine and it works fine since they are much smaller than this one. Is it a good idea to change only this table to MyISAM? What would be the implications of doing so? My Files are ordered by PK and the PK is an

BIGINT

. CREATE TABLE Movement ( idMovement bigint(20) NOT NULL AUTO_INCREMENT, idLawSuit bigint(20) NOT NULL, content mediumtext NOT NULL, movementDate datetime NOT NULL, captureDate datetime NOT NULL, isReportContent tinyint(4) DEFAULT NULL, isDocument tinyint(4) DEFAULT NULL, contentInS3 tinyint(4) DEFAULT NULL, contentS3Url text, uniqueConcatId varchar(255) NOT NULL, captureOrder bigint(20) DEFAULT NULL, movementExtraInfo text, PRIMARY KEY (idMovement), KEY idLawSuit10 (idLawSuit), CONSTRAINT idLawSuit10 FOREIGN KEY (idLawSuit) REFERENCES LawSuit (idLawSuit) ON DELETE CASCADE ON UPDATE NO ACTION ) ENGINE=InnoDB AUTO_INCREMENT=1470000001 DEFAULT CHARSET=utf8 These are my InnoDB parameters: innodb_adaptive_flushing ON innodb_adaptive_flushing_lwm 10 innodb_adaptive_hash_index OFF innodb_adaptive_max_sleep_delay 150000 innodb_additional_mem_pool_size 8388608 innodb_api_bk_commit_interval 5 innodb_api_disable_rowlock OFF innodb_api_enable_binlog OFF innodb_api_enable_mdl OFF innodb_api_trx_level 0 innodb_aurora_enable_auto_akp OFF innodb_autoextend_increment 64 innodb_autoinc_lock_mode 2 innodb_buffer_pool_dump_at_shutdown OFF innodb_buffer_pool_dump_now OFF innodb_buffer_pool_dump_pct 100 innodb_buffer_pool_filename ib_buffer_pool innodb_buffer_pool_instances 8 innodb_buffer_pool_load_abort OFF innodb_buffer_pool_load_at_startup OFF innodb_buffer_pool_load_now OFF innodb_buffer_pool_size 96223625216 innodb_change_buffer_max_size 25 innodb_change_buffering none innodb_checksum_algorithm none innodb_checksums OFF innodb_cmp_per_index_enabled OFF innodb_commit_concurrency 0 innodb_compression_failure_threshold_pct 5 innodb_compression_level 6 innodb_compression_pad_pct_max 50 innodb_concurrency_tickets 5000 innodb_data_file_path ibdata1:12M:autoextend innodb_data_home_dir innodb_disable_sort_file_cache OFF innodb_doublewrite OFF innodb_fast_shutdown 1 innodb_file_format Antelope innodb_file_format_check ON innodb_file_format_max Antelope innodb_file_per_table ON innodb_flush_log_at_timeout 1 innodb_flush_log_at_trx_commit 1 innodb_flush_method O_DIRECT innodb_flush_neighbors 1 innodb_flushing_avg_loops 30 innodb_force_load_corrupted OFF innodb_force_recovery 0 innodb_ft_aux_table innodb_ft_cache_size 8000000 innodb_ft_enable_diag_print OFF innodb_ft_enable_stopword ON innodb_ft_max_token_size 84 innodb_ft_min_token_size 3 innodb_ft_num_word_optimize 2000 innodb_ft_result_cache_limit 2000000000 innodb_ft_server_stopword_table innodb_ft_sort_pll_degree 2 innodb_ft_total_cache_size 640000000 innodb_ft_user_stopword_table innodb_io_capacity 200 innodb_io_capacity_max 2000 innodb_large_prefix OFF innodb_lock_wait_timeout 50 innodb_locks_unsafe_for_binlog OFF innodb_log_buffer_size 8388608 innodb_log_file_size 50331648 innodb_log_files_in_group 2 innodb_log_group_home_dir ./ innodb_lru_scan_depth 1024 innodb_max_dirty_pages_pct 75 innodb_max_dirty_pages_pct_lwm 0 innodb_max_purge_lag 0 innodb_max_purge_lag_delay 0 innodb_mirrored_log_groups 1 innodb_monitor_disable innodb_monitor_enable innodb_monitor_reset innodb_monitor_reset_all innodb_old_blocks_pct 37 innodb_old_blocks_time 1000 innodb_online_alter_log_max_size 134217728 innodb_open_files 6000 innodb_optimize_fulltext_only OFF innodb_page_size 16384 innodb_print_all_deadlocks OFF innodb_purge_batch_size 900 innodb_purge_threads 3 innodb_random_read_ahead OFF innodb_read_ahead_threshold 56 innodb_read_io_threads 32 innodb_read_only OFF innodb_replication_delay 0 innodb_rollback_on_timeout OFF innodb_rollback_segments 128 innodb_shared_buffer_pool_uses_huge_pages ON innodb_sort_buffer_size 1048576 innodb_spin_wait_delay 6 innodb_stats_auto_recalc ON innodb_stats_method nulls_equal innodb_stats_on_metadata OFF innodb_stats_persistent ON innodb_stats_persistent_sample_pages 20 innodb_stats_sample_pages 8 innodb_stats_transient_sample_pages 8 innodb_strict_mode OFF innodb_support_xa ON innodb_sync_array_size 1 innodb_sync_spin_loops 30 innodb_table_locks ON innodb_thread_concurrency 0 innodb_thread_sleep_delay 10000 innodb_undo_directory . innodb_undo_logs 128 innodb_undo_tablespaces 0 innodb_use_native_aio OFF innodb_use_sys_malloc ON innodb_version 1.2.10 innodb_write_io_threads 4

Yago Carvalho (3 rep)

Aug 13, 2019, 02:59 PM • Last activity: Apr 15, 2025, 07:07 AM

1 votes

0 answers

44 views

How to copy a large oracle table into another without tmp issues

oracle partitioning bulk-insert

I have a table "FOO". I have created a table "BAR" such as CREATE TABLE BAR PARTITION BY RANGE (BAR_ID) INTERVAL (1) ( PARTITION p0 VALUES LESS THAN (1) ) AS SELECT * FROM (SELECT * FROM FOO) WHERE 0=1; ALTER TABLE BAR ADD CONSTRAINT BAR_PK PRIMARY KEY (BAR_ID); I wanted to do INSERT INTO BAR SELECT...

                                  I have a table "FOO". I have created a table "BAR" such as 

    CREATE TABLE BAR
        PARTITION BY RANGE (BAR_ID)
        INTERVAL (1)
    (
        PARTITION p0 VALUES LESS THAN (1)
    )
    AS
    SELECT *
    FROM (SELECT * FROM FOO)
    WHERE 0=1;
    
    ALTER TABLE BAR
    ADD CONSTRAINT BAR_PK PRIMARY KEY (BAR_ID);

I wanted to do 

    INSERT INTO BAR
    SELECT /*+ APPEND */ * FROM FOO;

And later drop FOO and rename BAR into FOO.
But during the INSERT INTO, ORA-01652: unable to extend temp segment by 128 in tablespace TMP

How can I copy FOO into BAR without using all memory ? What is the best practice ?
                                

Calimero (111 rep)

Apr 10, 2025, 11:28 AM

1 votes

0 answers

94 views

SQL Server: "cannot bulk load" error during delete

sql-server sql-server-2017 delete bulk-insert sql-server-2022

I am having a very *interesting* error using SQL Server 2022 and 2017. I am currently implementing my jurisdiction's data retention policies, which means doing a lot of deletes. I have run into a strange situation where a single, specific delete statement is causing an error that is leaving me stuck...

                                  I am having a very *interesting* error using SQL Server 2022 and 2017.  I am currently implementing my jurisdiction's data retention policies, which means doing a lot of deletes.  I have run into a strange situation where a single, specific delete statement is causing an error that is leaving me stuck.

Because of the complexity of the code, and the fact that I tracked the error down to a single delete statement, and the fact that the error still happens when I do the simplest possible (bulk) delete, I won't be posting much code.  I do hope that the reader will suggest strategies for approaching my problem.

Unfortunately, I don't have very much to work with.  The error message doesn't tell me the relevant entities, so I am stuck guessing.  And because there is an error, I don't get to see the execution plan for the delete that fails.

-------------------

In short, I have a table called dbo.Person.  It has SYSTEM_VERSIONING set to ON, and the corresponding history table is called dbo.PersonHistory.  dbo.PersonHistory incidentally, has an index on it.  dbo.Person also has a couple of indexes on it.

In the course of investigating my real issue, I simplified my buggy delete statement down to:

    delete
      from dbo.Person
         ;

and I get the error:

   > Cannot bulk load. The bulk data stream was incorrectly specified as sorted or the data violates a uniqueness constraint imposed by the target table. Sort order incorrect for the following two rows: primary key of first row: (2025-03-18 23:20:18.5852411, 2021-07-02 18:25:01.9834390, redacted ID, 0), primary key of second row: (2025-03-18 23:20:18.5852411, 2021-07-02 03:31:22.7446737, redacted ID, 1)

It is not clear to me what the "redacted ID" is supposed to correspond to, since the error does not tell me what the target table is.  But I examined all the indexes and similar on both the main and history tables and *guess* that these values "correspond" to values stored in the index on the history table, which would actually be a PersonId, the PK on the dbo.Person table.  *If* this is true, my guess is that the delete is having to make changes to the history table's index and that change is breaking a constraint.

But if I search for a dbo.Person row with that redacted ID, I find a single row.  There are no rows in the history table for that person.

Does anybody have any suggestions to investigate?  The error is so uninformative, and the lack of an actual execution plan makes it hard to debug the issue.

Thank you!

---------------------
It appears there are two indexes on the history table:

    CREATE NONCLUSTERED INDEX [IX_PersonHistory_ID_PERIOD_COLUMNS_I]
        ON [dbo].[PersonHistory]([SysEndTime] ASC, [SysStartTime] ASC, 
    [PersonId] ASC) WITH (DATA_COMPRESSION = PAGE);

    GO
    CREATE CLUSTERED COLUMNSTORE INDEX [CSIX_PersonHistory]
        ON [dbo].[PersonHistory];

---------------------------------------

I dropped both indexes on the history table (in my dev database) and the error occurred again.

nomen (113 rep)

Mar 19, 2025, 04:43 PM • Last activity: Mar 19, 2025, 09:33 PM

0 votes

1 answers

601 views

SELECT Every Parent Table Record and INSET Multiple Record in Child Table Against the Parent_Id

postgresql insert bulk-insert update

Suppose I have two tables name company and screensavers as given below: **Company:** id (pk auto inc) name (varchar) **ScereenSavers:** id (pk auto inc) name (varchar) path (varchar) company_id (int) # This is pk of Company Now I want to Select each company and add 3 records which means every single...

                                  Suppose I have two tables name company and screensavers as given below:

    **Company:**
    id (pk auto inc)
    name (varchar)
    
    **ScereenSavers:**
    id (pk auto inc)
    name (varchar)
    path (varchar)
    company_id (int) # This is pk of Company

Now I want to Select each company and add 3 records which means every single company's pk will be saved as fk in screensaver table for the given number of users. How can I doe this by writing something like:

    SELECT id AS c_id From company
    INSERT INTO screen_savers VALUES (
        name='Screen Saver-I',
        path='screen_save_i_path',
        company_id=c_id
    ),(
        name='Screen Saver-II',
        path='screen_save_ii_path',
        company_id=c_id
    ),(
        name='Screen Saver-III',
        path='screen_save_iii_path',
        company_id=c_id
    )

I don't know above will work, but I want to give some idea what I wanted to do. Can someone please let me know the solution? 
I am using Postgresql.
                                

Khuram (129 rep)

May 27, 2019, 01:27 PM • Last activity: Mar 16, 2025, 08:04 AM

1 votes

2 answers

349 views

Is it safe to use lastval/currval for multiple inserts?

postgresql query insert bulk-insert

Lets say I have those tables: ````sql CREATE TABLE buildings ( id_building serial4 NOT NULL, street text NULL, CONSTRAINT buildings_pkey PRIMARY KEY (id_building) ); ```` ````sql CREATE TABLE families ( id_family serial4 NOT NULL, id_building int NOT NULL, name text NULL, CONSTRAINT families_pkey PR...

Lets say I have those tables:

`sql
CREATE TABLE buildings (
	id_building serial4 NOT NULL,
	street	text NULL,
	CONSTRAINT buildings_pkey PRIMARY KEY (id_building)
);

`sql
CREATE TABLE families (
	id_family serial4 NOT NULL,
	id_building int NOT NULL,
	name	text NULL,
	CONSTRAINT families_pkey PRIMARY KEY (id_building),
	CONSTRAINT families_id_building_fkey FOREIGN KEY (id_building) REFERENCES buildings (id_building)
);

`sql
CREATE TABLE people (
	id_person serial4 NOT NULL,
	id_family int NOT NULL,
	name	text NULL,
	CONSTRAINT people_pkey PRIMARY KEY (id_person),
	CONSTRAINT people_id_family_fkey FOREIGN KEY (id_family) REFERENCES buildings (id_family)
);

` So that in a building you can have 0 or many families, and in every family you can have 1 or more people. I need to make one query to insert in this connected tables without knowing the number of rows that are inserted. I know well this method https://stackoverflow.com/a/41595442 but I think the syntax would become quite complex rather fast. So I'm exploring this system https://stackoverflow.com/a/41596194 that would be simpler to read and write. But is it robust and safe? My database is busy with many simultaneous inserts, even on those tables, from different sources and codes. I don't want to have inconsistencies in the reference keys. The query would be something like this

`sql
BEGIN;
INSERT INTO buildings (street) VALUES ('St Michael''s Rd');
INSERT INTO families (id_building, name) VALUES (currval('buildings_id_seq'), 'White');
INSERT INTO people (id_family, name) VALUES (currval('families_id_seq'), 'John');
INSERT INTO people (id_family, name) VALUES (currval('families_id_seq'), 'Jenny');
INSERT INTO people (id_family, name) VALUES (currval('families_id_seq'), 'Pepy');
INSERT INTO families (id_building, name) VALUES (currval('buildings_id_seq'), 'Brown');
INSERT INTO people (id_family, name) VALUES (currval('families_id_seq'), 'Conny');

INSERT INTO buildings (street) VALUES ('Crossford St');

INSERT INTO buildings (street) VALUES ('Stockwell Park Cres');
INSERT INTO families (id_building, name) VALUES (currval('buildings_id_seq'), 'Smith');
INSERT INTO people (id_family, name) VALUES (currval('families_id_seq'), 'John');

END;

` And I expect an output like this:

`sql
SELECT *
FROM buildings
-- 1, St Michael''s Rd
-- 2, Crossford St
-- 3, Stockwell Park Cres

SELECT *
FROM families
-- 1, 1, White
-- 2, 1, Brown
-- 3, 3, Smith

SELECT *
FROM people
-- 1, 1, John
-- 2, 1, Jenny
-- 3, 1, Pepy
-- 4, 2, Conny
-- 5, 3, John

Sotis (328 rep)

Mar 21, 2023, 10:15 AM • Last activity: Mar 2, 2025, 10:02 AM

0 votes

0 answers

86 views

SSIS ODBC Destination to MariaDB is slow – How to enable bulk inserts?

mariadb ssis odbc bulk-insert

I'm trying to copy data from SQL Server to MariaDB using SSIS. - I’ve configured a DSN using the official MariaDB ODBC connector. - My SSIS data flow consists of a simple source query (retrieving ~5,000 records) and an ODBC Destination. - However, the write performance is extremely slow. After enabling MariaDB's general log, I noticed that SSIS is generating one INSERT statement per row, significantly slowing down the process. I tested an external third-party component, which was much faster. From the logs, it appears to use a bulk insert approach, like this:

INSERT INTO tablename (col_1, col_2, col_3, col_4) 
VALUES 
    (7529766, NULL, 5, '2024-01-01 03:17:17'),
    (7529767, NULL, 5, '2024-01-01 03:17:29'),
    (7529768, NULL, 5, '2024-01-01 03:17:32')...

I’d like to achieve similar performance without relying on paid third-party components. Is there a way to configure SSIS's ODBC Destination to perform batch inserts instead of row-by-row inserts? Are there any MariaDB ODBC driver settings that can improve performance?

Mattia Nocerino (512 rep)

Feb 26, 2025, 09:04 AM • Last activity: Feb 28, 2025, 01:21 PM

1 votes

1 answers

5981 views

How to insert multiple values and handle conflict at the same time

postgresql update insert bulk-insert

I have a table where the primary key is a combination of two columns. When inserting multiple values at a time (batching) conflicts might happen since in the list of values passed there might be duplicates. I am adding an `ON CONFLICT` statement to the insert query hoping that the last row being ins...

I have a table where the primary key is a combination of two columns. When inserting multiple values at a time (batching) conflicts might happen since in the list of values passed there might be duplicates. I am adding an ON CONFLICT statement to the insert query hoping that the last row being inserted is the final value. When this query runs, I get the error:

SQL Error : ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
  Hint: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

Below is a contrived example of the table and the query

CREATE TABLE users (
	user text NOT NULL,
	user_email text NOT NULL,
	is_active bool NOT NULL DEFAULT false,
	is_admin bool NOT NULL DEFAULT false,
	PRIMARY KEY (user, user_email)
);

An example query is as follows:

insert into users (user, user_email, is_active, is_admin)
values('user1','user1@example.com','true','false'),('user1','user1@example.com','false','false')
on conflict (user, user_email)
do update set is_active  = excluded.is_active, is_admin  = excluded.is_admin ;

The reason I suspect this error is being throw, correct if I am mistaken, is because the 2 values being inserted both conflict and basically the on conflict statement can't handle them at the same time. My question, is how to best handle this scenario? how to modify this in such a way to keep batching but on conflict apply that last statement in the list of values(no race condition) Thanks in advance

Fouad (113 rep)

Oct 29, 2022, 01:39 PM • Last activity: Feb 17, 2025, 11:01 AM

0 votes

1 answers

946 views

Parallel insert and delete into staging tables

sql-server etl bulk-insert parallelism

I work on a small system that receives quite a lot of data from multiple sources (hundreds of them). Depending on few factors I can get anywhere from few rows to a couple of thousands (in fact smallest message contains single row and largest contains at most five thousands, but if more data should b...

                                  I work on a small system that receives quite a lot of data from multiple sources (hundreds of them). Depending on few factors I can get anywhere from few rows to a couple of thousands (in fact smallest message contains single row and largest contains at most five thousands, but if more data should be loaded then I get this data in few batches). Currently data is processed by few identical services. Each service load data into one of five staging tables and executes procedure that does the right thing - each staging table has different procedure associated with it, but it always boils down to changing unique identifiers (e.g. UUIDs) to proper database identifiers, inserting data into destination table and removing data from staging table. All procedures have this form:

    INSERT INTO TARGET (A, B, C)
    SELECT T1.ID, B, C FROM STAGING_TABLE
    JOIN T1 ON A = T1.ID
    
    DELETE FROM STAGING_TABLE
    
It is possible for all services to work on single staging (and destination) table in parallel. Currently this is done using snapshot isolation, but it is painfully obvious that for one reason or another we are losing data. What I mean by it is that there are messages that are properly processed by the services but all information from them is lost - we don't see records in the database. I can't prove that snapshot isolation is in responsible, but such incidents started to happen after snapshot isolation was introduced - which in turn was introduced after services loading data in parallel were introduced. Databases are currently far from my main field of expertise and I don't know why it happens, but it seems that snapshot isolation is the main culprit.

My question is: what is the lowest isolation level that can support this scenario? Is there better way to do it? I'm not fully aware what transaction isolation level was used earlier (when data was loaded by single service), but we never observed data loss. I tried (blindly) using "serializable" and "repeatable read", but "serializable" results in dropped messages due to deadlocks and "repeatable read" while seems to do the right thing (no data loss) also degrades performance to the level of serial writing.

EDIT:
Is it viable to load data using snapshot isolation, insert into temporary table (or table variable), THEN switch to some very permissive isolation level, insert data from temporary table into target table, revert to snapshot and delete data from staging table? If I read this correctly:

https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2005/ms173763(v=sql.90) 

It should be possible, but I don't understand yet if any of this would have any effect in discussed case - target table is not read, only written in this scenario, and I *think* this means that write won't be any "more parallel" than under snapshot isolation. But maybe I'm wrong?

Note that we can't wait and for example load data from multiple sources into single staging table and then move it into target table. We're not aiming for real time, but we need data insert ASAP.

Jędrzej Dudkiewicz (173 rep)

Jul 8, 2019, 05:17 PM • Last activity: Feb 12, 2025, 10:38 AM

0 votes

1 answers

1285 views

Why does LOAD DATA INFILE not insert any data?

mysql mysql-5.7 bulk-insert

I have this project I am working on, below is the sample of the text file I need to load into my database. `routes.txt`: "route_id","origin","destination","distance" 1,"ABE","ATL",692.000 2,"ABE","DTW",425.000 3,"ABE","ORD",655.000 `flights.txt`: "carrier","flight_number","route_id","departure_time"...

                                  I have this project I am working on, below is the sample of the text file I need to load into my database.

routes.txt:

    "route_id","origin","destination","distance"
    1,"ABE","ATL",692.000
    2,"ABE","DTW",425.000
    3,"ABE","ORD",655.000

flights.txt:

    "carrier","flight_number","route_id","departure_time","arrival_time"
    "AA",43,1051,"1100","1438"
    "AA",43,1182,"1523","1730"
    "AA",44,3477,"0710","1527"
    "AA",45,1921,"1830","2152"

Here are the tables I created:

    create table flights( 
      route_id INT NOT NULL,  
      carrier  char(2)  NOT NULL, 
      flight_number int  NOT NULL,  
      departure_time date NOT NULL, 
      arrival_time date NOT NULL,  
      primary key (route_id),  
      index(route_id) )

    create table routes( 
      route_id INT NOT NULL, 
      origin   char(3)  NOT NULL, 
      destination  char(3)  NOT NULL, 
      distance  decimal(8,3) NOT NULL, 
      primary key (route_id), 
      index(route_id) )

Here are the commands I executed to load the data:

    LOAD DATA infile '/var/lib/mysql-files/routes.txt'  INTO TABLE routes     
      fields terminated by ','   lines terminated BY '\n\r' IGNORE 1 LINES 
      (route_id,origin,destination,distance);

    Query OK, 0 rows affected (0.00 sec)
    Records: 0  Deleted: 0  Skipped: 0  Warnings: 0


From the result above, you see the data are not loaded.

What could be the reason?
                                

mic255 (1 rep)

Mar 3, 2017, 04:02 PM • Last activity: Jan 10, 2025, 02:09 PM

1 votes

1 answers

570 views

Open Row insert- SQL

sql-server ssis bulk-insert openrowset oledb

When we run a SSIS package to import data from excel , we get an initial error: ``` The requested OLE DB provider Microsoft.Jet.OLEDB.4.0 is not registered. If the 64-bit driver is not installed, run the package in 32-bit mode. ``` We were able to fix this by executing the package in 32 bit mode usi...

When we run a SSIS package to import data from excel , we get an initial error:

The requested OLE DB provider Microsoft.Jet.OLEDB.4.0 is not registered.
If the 64-bit driver is not installed, run the package in 32-bit mode.

We were able to fix this by executing the package in 32 bit mode using the option in wizard

SELECT * 
INTO #temp13
FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',   
                    'Data Source=XXXXXXXX;Extended Properties=Excel 8.0')...[Sheet1$];

We get the below error

The 32-bit OLE DB provider "Microsoft.Jet.OLEDB.4.0" 
cannot be loaded in-process on a 64-bit SQL Server. [SQLSTATE 42000] (Error 7438).
The step failed.

Is there any option to run the Transaction SQL in 32 bit .

Lakshmi R (119 rep)

Feb 14, 2023, 02:12 PM • Last activity: Dec 26, 2024, 01:02 PM

0 votes

0 answers

49 views

Inserting range of IP's into a sql table

t-sql insert ddl bulk-insert dml

I have a sql table with two fields TermID and IP address. I want to insert(10.100.08.01-10.100.08.254) and each IP will have an assigned TermID for example (TERM1 - TERM254). I will greatly appreciate any guidance as this is not my expertise.

                                  I have a sql table with two fields TermID and IP address. I want to insert(10.100.08.01-10.100.08.254) and each IP will have an assigned TermID for example (TERM1 -  TERM254).

I will greatly appreciate any guidance as this is not my expertise.

SQL_NoExpert (1117 rep)

Dec 23, 2024, 01:46 PM

0 votes

0 answers

42 views

Does sorting reduce MySQL deadlock likelihood of INSERT/UPDATE batches?

mysql deadlock bulk-insert

We know that MySQL INSERT/ON DUPLICATE KEY UPDATE statements create gap locks which may lead to deadlocks between multiple threads. If the INSERT/UPDATE is done in batches (either in a separate transaction for each batch or all batches in the same transaction), do we reduce the likelihood of deadlocks by sorting the data first? Here's an example where that seems to be the case, but I don't understand why. - https://github.com/fleetdm/fleet/issues/1146#issuecomment-865134315 - CREATE TABLE for the above example:

CREATE TABLE label_membership (
  created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  label_id int(10) unsigned NOT NULL,
  host_id int(10) unsigned NOT NULL,
  PRIMARY KEY (host_id,label_id),
  KEY idx_lm_label_id (label_id),
  CONSTRAINT fk_lm_host_id FOREIGN KEY (host_id) REFERENCES hosts (id) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT fk_lm_label_id FOREIGN KEY (label_id) REFERENCES labels (id) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

- Full schema: https://github.com/fleetdm/fleet/blob/8b4c6a1dd7d2683b9e7e6ebe22f372f0a419e6d8/server/datastore/mysql/schema.sql#L266

Victor L (97 rep)

Dec 3, 2024, 09:51 PM • Last activity: Dec 6, 2024, 08:38 PM

-1 votes

1 answers

89 views

SQL Bulk Insert Saves extra character

sql-server bulk-insert

i use this code for Bulk Insert BULK INSERT [Table] FROM 'D:\phones.txt' WITH ( ROWTERMINATOR = '0x0a' ) My data has only one column of 10 characters For example | 1234567890 | | 1234567891 | | 1234567892 | After saving in the database, by running the following code, it shows the length of the data...

                                  i use this code for Bulk Insert

    BULK INSERT [Table]
    FROM 'D:\phones.txt'
    WITH
    (
       ROWTERMINATOR = '0x0a'
    )

My data has only one column of 10 characters


For example

    | 1234567890 |    
    | 1234567891 |    
    | 1234567892 |

After saving in the database, by running the following code, it shows the length of the data more

    SELECT TOP (1)
    [Phone]
    ,LEN([Phone]) AS Phonelength
    FROM [Table]

result

| Phone | Phonelength |
| ---------- | ----------- |
| 1234567891 | 11 |


The extra character added to the end of the field is the new line

Why is the new line added?


                                

ioxoi (9 rep)

Apr 12, 2024, 08:04 PM • Last activity: Apr 13, 2024, 08:44 AM

9 votes

5 answers

9069 views

Copying a table (and all of its data) from one server to another?

sql-server t-sql etl bulk-insert bulkcopy

I have a massive table, let's say 500,000 rows. I want to copy it (schema and data) from one server to another. This is not an upsert or any kind of update; It's a one-off straight copy and paste. What are the idiomatic approaches to this? I've tried: - Restoring a backup from one server on to anoth...

                                  I have a massive table, let's say 500,000 rows. I want to copy it (schema and data) from one server to another. This is not an upsert or any kind of update; It's a one-off straight copy and paste. What are the idiomatic approaches to this?

I've tried:

 - Restoring a backup from one server on to another. This is impractical, because SQL Server notoriously cannot restore tables from a backup; It can only restore databases. And my database is huge!
 - Using SSMS to script the table's data as a sequence of INSERT statements. This is impractical, because the inserts have to be done row by agonising row. I suspect that this also does awful things to the transaction log, but nobody has attacked me for this yet (I'm running such a script right now, it's going to take hours).

J. Mini (1225 rep)

Nov 23, 2023, 07:00 PM • Last activity: Apr 9, 2024, 07:49 PM

0 votes

0 answers

200 views

Catching errors in bulk insert

sql-server t-sql bulk-insert

I am currently adding a new feature to my ASP.NET website, the goal is to upload a CSV file and have it processed by a SQL Server stored procedure. To perform the bulk insert, I am using this command in my procedure: ``` SET @SQLBulk = ' BEGIN TRANSACTION BULK INSERT '+ @Prod_Table + ' FROM ''' + @C...

SET @SQLBulk = '
        BEGIN TRANSACTION
        BULK INSERT '+ @Prod_Table +
        ' FROM ''' + @CSVFilePath + '''
        WITH
        (
            ERRORFILE = '''+ @ErrorFile +''',
            CHECK_CONSTRAINTS,
            FIELDTERMINATOR = '';'',
            ROWTERMINATOR = ''\n'',
            FIRSTROW = 2,
            FORMAT = ''CSV'',
            MAXERRORS = 10
        )
        COMMIT TRANSACTION'
         
    -- Exécuter la requête dynamique
    BEGIN TRY
        PRINT CAST(@SQLBulk as ntext)
        EXEC sp_executesql @SQLBulk
    END TRY
    BEGIN CATCH
    ROLLBACK TRANSACTION
 
    IF ERROR_NUMBER() IN (100, 7330)-- Erreur de syntaxe
        BEGIN
            SET @ErrorMessage = 'ERREUR:: Le fichier est mal formaté, il ne contient pas le bon nombre de champs et/ou Mauvais type de données'
        END
    IF ERROR_NUMBER() IN (547) -- Violation de contrainte d'intégrité référentielle
        BEGIN
            SET @ErrorMessage = 'ERREUR:: ...'
        END
    IF ERROR_NUMBER() IN (515) -- Violation de contrainte de clé étrangère
        BEGIN
            SET @ErrorMessage = 'ERREUR:: ...'
        END
    IF ERROR_NUMBER() IN (2627, 2601) -- Violation de contrainte d'unicité
        BEGIN
            SET @ErrorMessage = 'ERREUR::...' 
        END
    RAISERROR (@ErrorMessage, 15, 1)
 
    END CATCH
 
END

Now I want to log errors related to constraints (unique, foreign key, etc.). I've seen that the ERRORFILE option allows me to store errors in log files, but unfortunately, this only concerns errors related to the file itself (wrong data type, etc.) and not data integrity. Specifically, my problem is that when catching an error with bulk insert, I have no way of knowing which line caused the error, and from the moment there is a single bad entry, everything is blocked thereafter. This means that if my file contains 1000 entries with 50 incorrect ones, I have to re-upload the file at least 50 times to be able to insert everything. What I want is to go through the entire file, insert everything if there are no errors, and if there are errors, log them (in a table or file, whichever is convenient). Thank you for your help!

Helmi KAMEL (1 rep)

Apr 8, 2024, 09:42 AM

1 votes

1 answers

109 views

mysql bulk insert with constraints

mysql insert bulk-insert

I created a table like this, CREATE TABLE Leaderboard( userId bigint not null, matchId bigint not null, score mediumint not null, country CHAR(10), tournamentId int not null ) I am creating matches of 5 people, all from 5 distinct countries ("UK", "USA", "SPAIN", "GERMANY", "FRANCE) I want to insert...

                                  I created a table like this,

    CREATE TABLE Leaderboard(
      userId bigint not null,              
      matchId bigint not null,
      score mediumint not null,
      country CHAR(10),
      tournamentId int not null
    )

I am creating matches of 5 people, all from 5 distinct countries ("UK", "USA", "SPAIN", "GERMANY", "FRANCE)

I want to insert some a million entries to this table with following requirements,

A match will be formed like this

    usergermany | userUSA | userSPAIN | userUK | userFrance 

so a match will have 5 users, all from distinct countries. A tournament will have many matches, and a user can only play in one match in a tournament. 

So an example table will look like this

    userid    matchid    score     country   tournamentid
    
    ...
    988654    3877543    random    USA       177
    388654    3877543    random    GERMANY   177
    433432    3877543    random    FRANCE    177
    776212    3877543    random    UK        177
    1632987   3877543    random    SPAIN     177
    2113242   3877544    random    SPAIN     177
    2918974   3877544    random    USA       177
    111738    3877544    random    UK        177
    1772342   3877544    random    FRANCE    177
    1343243   3877544    random    GERMANY   177
    123131    3877545    random    UK        178
    1231414   3877545    random    FRANCE    178
    2858348   3877545    random    GERMANY   178
    1122432   3877545    random    USA       178
    2923434   3877545    random    SPAIN     178
    ...

A userId cannot exist in a tournament twice, 

A country cannot exists in a match twice (match groups are formed from distinct countries)

Also since 5 players compete in a match, a matchid will only appear 5 times in the table.

 

I want to insert a million entries into this table, userid being randomly from 1 to 3 million with the above constraints. 10 thousand matches for each tournament, so there will be 100 tournaments starting from 1.

100 tournaments, 10000 matches each,  make up to 1 million rows.


**--UPDATE--**

    create table seq_data as
    with recursive tmp(x) as (
        select 1
        union all
        select x+1 from tmp
        limit 3000000
    )
    select * from tmp;
     
    CREATE TABLE if not exists Leaderboard(
      userId bigint not null,
      matchId bigint not null,
      score mediumint not null,
      country CHAR(10),
      tournamentId int not null
    );
     
    DELIMITER //
    CREATE PROCEDURE InsertLeaderboardData()
    BEGIN
      DECLARE tournamentloop INT DEFAULT 0;
      DECLARE matchloop INT DEFAULT 0;
      DECLARE groupLoop INT DEFAULT 0;
      DECLARE matchidKey INT DEFAULT 1;
      WHILE tournamentloop < 1 DO
        SET tournamentloop = tournamentloop + 1;
        SET matchloop = 0;
        WHILE matchloop < 10000 DO
          SET matchidKey = matchidKey + 1;
          SET matchloop = matchloop + 1;
          SET groupLoop = 0;
          WHILE groupLoop < 5 DO
           SET groupLoop = groupLoop + 1;
           INSERT INTO Leaderboard (userId, matchId, score, country, tournamentId)
           VALUES (1, matchidKey, FLOOR(RAND() * 1000) + 1, ELT(groupLoop, "SPAIN", "FRANCE", "UK", "USA", "GERMANY"), tournamentloop);
           SELECT matchidKey;
          END WHILE;
        END WHILE;
      END WHILE;
    END;
    //
     
    DELIMITER ;
    CALL InsertLeaderboardData();
    DROP PROCEDURE InsertLeaderboardData;


This runs but slowly, yet I want 1 more feature.

As to INSERT INTO's userId I set the value to 1 for now to edit it later. What I want to do is, I created sequenced numbers into the table seq_data starting from 1 up to 3 million. For each WHILE tournamentloop < 100 DO scope, I want to get 50 thousand DISTINCT numbers from the seq_data table. so I can give the users an id. 

The morale is, no userid can appear twice in a tournament. so for each tournament scope i need to retrieve random ids which are distinct and assign it to rows. I already created the table for sequenced numbers, I don't know what is the best approach from here.


-- Statement


    INSERT INTO Leaderboard (userId, matchId, score, country, tournamentId) VALUES (useridKey, matchidKey, FLOOR(RAND() * 1000) + 1, ELT(groupLoop, "SPAIN", "FRANCE", "UK", "USA", "GERMANY"), tournamentloop);

I turn this into


    DECLARE tournamentloop INT DEFAULT 0;
    DECLARE matchloop INT DEFAULT 0;
    DECLARE groupLoop INT DEFAULT 0;
    DECLARE matchidKey INT DEFAULT 1;
    DECLARE useridKey INT DEFAULT 1;
    PREPARE insertStmt FROM 'INSERT INTO Leaderboard (userId, matchId, score, country, tournamentId)VALUES (useridKey, matchidKey, FLOOR(RAND() * 1000) + 1, ELT(groupLoop, "SPAIN", "FRANCE", "UK", "USA", "GERMANY"), tournamentloop);';
     EXECUTE insertStmt;

I get error: Unknown column 'useridKey' in 'field list'

                                

umarkaa (47 rep)

Apr 4, 2024, 07:21 AM • Last activity: Apr 5, 2024, 10:56 AM

Showing page 1 of 20 total questions