Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

0 answers

4 views

Data duplication after Apache IoTDB pipe restart, how to maintain the existing sync progress?

When using Apache IoTDB's pipe feature to sync data to another Apache IoTDB instance, I encountered duplicate data transmission. The scenario is as follows: I configured a Pipe from root.source to a remote IoTDB instance (root.target): CREATE PIPESINK remote_sink AS IoTDB ('ip:6667', 'username', 'pa...

                                  When using Apache IoTDB's pipe feature to sync data to another Apache IoTDB instance, I encountered duplicate data transmission. The scenario is as follows:
I configured a Pipe from root.source to a remote IoTDB instance (root.target):

    CREATE PIPESINK remote_sink AS IoTDB ('ip:6667', 'username', 'password');
    CREATE PIPE source_to_target TO remote_sink FROM (SOURCE 'root.source.we4rwe.**') INTO (TARGET 'root.target.faknlv93.**');`
    START PIPE source_to_target;

The initial sync works fine, but when I manually restart the Pipe service (e.g., STOP PIPE source_to_target followed by START PIPE), some historical data (e.g., already synced root.source.d1.sensor data) is retransmitted, causing duplicates on the target.
The pipe status (SHOW PIPES) shows status=RUNNING with no error logs.
Is this the expected behavior of IoTDB Pipe? How to avoid duplicate data transmission during Pipe restart? Are additional configurations (e.g., sync_progress or other parameters) required?
                                

Hester Tso (101 rep)

Aug 1, 2025, 08:12 AM

0 votes

1 answers

169 views

How to remove these duplicate records in an Oracle Database 12c?

oracle duplication

I have these two tables: ``` messages(id primary key, message_date, created_at, ...) user_messages(user_id, message_id references messages(id)) ``` I have some duplicate rows in messages: ``` select user_id, message_date, count(*) from messages inner join user_messages on messages.id = user_messages...

I have these two tables:

messages(id primary key, message_date, created_at, ...)
user_messages(user_id, message_id references messages(id))

I have some duplicate rows in messages:

select user_id, message_date, count(*)
from messages inner join user_messages 
on messages.id = user_messages_message_id
group by user_id, message_date;

user_id | message_date | count(*)
1       | 2019-01-01   | 2
1       | 2019-02-01   | 3
1       | 2019-03-01   | 2

How can I remove such duplicates, only retaining one of them, for example the one that its created_at (not message_date) is the minimum?

orly (21 rep)

Nov 8, 2019, 03:24 AM • Last activity: Jul 4, 2025, 04:06 AM

0 votes

1 answers

212 views

merging nearly- duplicate rows from different databases in the same table

postgresql duplication merge

I am sure , that many people before me faced a need to merge nearly- duplicate rows from several databases. However, I have not been able to find a simple yet general solution. I provided a data sample in Excel here: https://disk.yandex.ru/d/UtuENQ4LXHmOmQ It shows in yellow the 4 rows, that I have...

                                  I am sure , that many people before me faced a need to merge nearly- duplicate rows from several databases. However, I have not been able to find a simple yet general solution. I provided a data sample in Excel here: https://disk.yandex.ru/d/UtuENQ4LXHmOmQ  It shows in yellow the 4 rows, that I have now, and one row in green, that I want to make.
The new row should have some data only from one of the 4 database in some columns, and merged from all (or some) databases in other columns. I will delete the original rows after the merger. I spent all morning trying to figure out what functions to use. I have INSERT INTO, VALUES, WHERE , but smth else is missing. Could someone please direct me to a possible solution?
                                

YW T (11 rep)

Jan 24, 2022, 03:36 PM • Last activity: Jun 12, 2025, 08:07 PM

2 votes

1 answers

192 views

Blocked listener service

oracle rman duplication oracle-19c listener

It seems that I cannot connect via tns alias on my auxiliary, I know I could bypass it by either connection locally with auxiliary e.g. `auxiliary /` or adding a `(UR=A)` parameter. The DB is in nomount. I want to know why, because I have explicitly created two static listener registrations, to use...

                                  It seems that I cannot connect via tns alias on my auxiliary, I know I could bypass it by either connection locally with auxiliary e.g. auxiliary / or adding a  (UR=A) parameter. The DB is in nomount.

I want to know why, because I have explicitly created two static listener registrations, to use it for the dataguard and for the duplicate, but I have following issue.

Here is firstly the configuration.

listner.ora

    [oracle@server-salzburg admin]$ cat /u01/app/19/grid/network/admin/listener.ora
    #Backup file is  /u01/app/oracle/crsdata/server-salzburg/output/listener.ora.bak.server-salzburg.oracle line added by Agent
    # listener.ora Network Configuration File: /u01/app/19/grid/network/admin/listener.ora
    # Generated by Oracle configuration tools.
    
    LISTENER =
      (DESCRIPTION_LIST =
        (DESCRIPTION =
          (ADDRESS = (PROTOCOL = TCP)(HOST = server-salzburg)(PORT = 1521))
          (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
        )
      )
    
    SID_LIST_LISTENER=
       (SID_LIST=
            (SID_DESC=
              (GLOBAL_DBNAME=graz_DGMGRL)
              (SID_NAME=graz)
              (ORACLE_HOME=/u01/app/oracle/product/19/dbhome_1)
            )
            (SID_DESC=
              (GLOBAL_DBNAME=salzburg_DGMGRL)
              (SID_NAME=salzburg)
              (ORACLE_HOME=/u01/app/oracle/product/19/dbhome_1)
            )
          )
    
    
    
    ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON              # line added by Agent
    VALID_NODE_CHECKING_REGISTRATION_LISTENER=ON            # line added by Agent

tnsnames.ora

    [oracle@server-salzburg admin]$ cat tnsnames.ora
    SALZBURG =
      (DESCRIPTION =
        (ADDRESS = (PROTOCOL = TCP)(HOST = server-salzburg)(PORT = 1521))
        (CONNECT_DATA =
          (SERVER = DEDICATED)
          (SERVICE_NAME = salzburg)
        )
      )
    
    
    graz_DGMGRL =
      (DESCRIPTION =
        (ADDRESS = (PROTOCOL = TCP)(HOST = server-graz)(PORT = 1521))
        (CONNECT_DATA =
          (SERVER = DEDICATED)
          (SERVICE_NAME = graz)
        )
      )
    
    
    salzburg_DGMGRL =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = server-salzburg)(PORT = 1521))
      (CONNECT_DATA =
        (SERVER = DEDICATED)
        (SERVICE_NAME = salzburg)
      )
    )

listener output

    [oracle@server-salzburg admin]$ lsnrctl status
    
    LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 09-JUN-2025 20:27:36
    
    Copyright (c) 1991, 2019, Oracle.  All rights reserved.
    
    Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
    STATUS of the LISTENER
    ------------------------
    Alias                     LISTENER
    Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
    Start Date                09-JUN-2025 20:24:03
    Uptime                    0 days 0 hr. 3 min. 33 sec
    Trace Level               off
    Security                  ON: Local OS Authentication
    SNMP                      OFF
    Listener Parameter File   /u01/app/19/grid/network/admin/listener.ora
    Listener Log File         /u01/app/oracle/diag/tnslsnr/server-salzburg/listener/alert/log.xml
    Listening Endpoints Summary...
      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=server-salzburg)(PORT=1521)))
      (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
    Services Summary...
    Service "+ASM" has 1 instance(s).
      Instance "+ASM", status READY, has 1 handler(s) for this service...
    Service "+ASM_DG_DATA" has 1 instance(s).
      Instance "+ASM", status READY, has 1 handler(s) for this service...
    Service "+ASM_DG_FRA" has 1 instance(s).
      Instance "+ASM", status READY, has 1 handler(s) for this service...
    Service "graz_DGMGRL" has 1 instance(s).
      Instance "graz", status UNKNOWN, has 1 handler(s) for this service...
    Service "salzburg" has 1 instance(s).
      Instance "salzburg", status BLOCKED, has 1 handler(s) for this service...
    Service "salzburg_DGMGRL" has 1 instance(s).
      Instance "salzburg", status UNKNOWN, has 1 handler(s) for this service...
    The command completed successfully

rman error:

    [oracle@server-salzburg admin]$ rman target sys/*******@graz_DGMGRL auxiliary sys/******@salzburg_DGMGRL
    
    Recovery Manager: Release 19.0.0.0.0 - Production on Mon Jun 9 20:28:00 2025
    Version 19.3.0.0.0
    
    Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.
    
    connected to target database: GRAZ (DBID=3282927897)
    RMAN-00571: ===========================================================
    RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
    RMAN-00571: ===========================================================
    RMAN-00554: initialization of internal recovery manager package failed
    RMAN-04006: error from auxiliary database: ORA-12528: TNS:listener: all appropriate instances are blocking new connections


                                

paganini (138 rep)

Jun 9, 2025, 06:28 PM • Last activity: Jun 9, 2025, 08:41 PM

1 votes

1 answers

2207 views

duplicate key error index MongoDB

index mongodb duplication

I have rocket.chat server connected to LDAP. When one of users try logging to their account he get logged to someone other account, it's probably because indexes. My logs look like this: > [34mI20161126-12:07:28.992(1) rocketchat_ldap rocketchat_ldap.js:341 [31mLDAP ➔ Search.error Search by id 31353...

                                  I have rocket.chat server connected to LDAP.
When one of users try logging to their account he get logged to someone other account, it's probably because indexes. My logs look like this:


> [34mI20161126-12:07:28.992(1) rocketchat_ldap rocketchat_ldap.js:341 [31mLDAP ➔ Search.error Search by id 31353036 returned 2 records 
[34mI20161126-12:07:28.999(1) rocketchat_ldap rocketchat_ldap.js:780 
> 
> [31mLDAPSync ➔ error { [MongoError: E11000 duplicate key error index:
> rocketchat.users.$emails.address_1 dup key: { : "some@one.com" }]  
> name: 'MongoError',   message: 'E11000 duplicate key error index:
> rocketchat.users.$emails.address_1 dup key: { : "some@one.com" }',  
> driver: true,   index: 0,   code: 11000,   errmsg: 'E11000 duplicate
> key error index: rocketchat.users.$emails.address_1 dup key: { :
> "some@one.com" }' }

Can I rebuild that indexes or something else to resolve this problem?
                                

Simonell (11 rep)

Nov 26, 2016, 11:18 AM • Last activity: Jun 2, 2025, 10:04 AM

2 votes

1 answers

44 views

"Duplicate entry" error when upgrading MariaDB 10.6 to 11.4

mariadb upgrade duplication mariadb-10.6

I was trying to upgrade my local development environment (Ubuntu 22.04 WSL) from MariaDB 10.6 to 11.4. After updating the APT repo file, I run and `apt update`, which updated the server version. However, upon running `mariadb-upgrade`, I see this error: ``` Phase 1/8: Checking and upgrading mysql da...

I was trying to upgrade my local development environment (Ubuntu 22.04 WSL) from MariaDB 10.6 to 11.4. After updating the APT repo file, I run and apt update, which updated the server version. However, upon running mariadb-upgrade, I see this error:

Phase 1/8: Checking and upgrading mysql database
Processing databases
mysql
mysql.column_stats                                 OK
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.event                                        OK
mysql.func                                         OK
mysql.global_priv                                  OK
mysql.gtid_slave_pos                               OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.index_stats                                  OK
mysql.innodb_index_stats                           OK
mysql.innodb_table_stats                           OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.roles_mapping                                OK
mysql.servers                                      OK
mysql.spider_link_failed_log                       OK
mysql.spider_link_mon_servers                      OK
mysql.spider_table_crd                             OK
mysql.spider_table_position_for_recovery           OK
mysql.spider_table_sts                             OK
mysql.spider_tables                                OK
mysql.spider_xa                                    OK
mysql.spider_xa_failed_log                         OK
mysql.spider_xa_member                             OK
mysql.table_stats                                  OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.transaction_registry                         OK
Phase 2/8: Installing used storage engines... Skipped
Phase 3/8: Running 'mysql_fix_privilege_tables'
ERROR 1062 (23000) at line 30: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
ERROR 1062 (23000) at line 256: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
ERROR 1062 (23000) at line 304: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
ERROR 1062 (23000) at line 331: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
ERROR 1062 (23000) at line 343: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
FATAL ERROR: Upgrade failed

I do not understand how it is possible that there are duplicate functions, nor how to solve the problem. Since this is just a development environment, I simply tried dropping the database, but the error persists. Maybe the new version performs some checks that the old one didn't; but, how can I check for similar inconsistencies before running mariadb-upgrade in order to get a smooth update?

Matteo Tassinari (187 rep)

May 6, 2025, 02:13 PM • Last activity: May 6, 2025, 02:55 PM

3 votes

2 answers

198 views

Same leading key column in unique constraint and nonclustered index - duplicate

sql-server index duplication

These are two indexes on the table with the same leading key column. First is a unique constraint with additional keys: ALTER TABLE dbo.MyTable ADD CONSTRAINT UQ_MyTable_FirstColumn_SecondColumn_ThirdColumn UNIQUE NONCLUSTERED ( FirstColumn ASC, SecondColumn ASC, ThirdColumn ASC ) And the second is...

                                  These are two indexes on the table with the same leading key column.

First is a unique constraint with additional keys:

    ALTER TABLE dbo.MyTable ADD CONSTRAINT UQ_MyTable_FirstColumn_SecondColumn_ThirdColumn
    UNIQUE NONCLUSTERED 
    (
    	FirstColumn ASC,
    	SecondColumn ASC,
    	ThirdColumn ASC
    )

 And the second is a single-key nonclustered rowstore index:

    CREATE NONCLUSTERED INDEX IX_MyTable_FirstColumn ON dbo.MyTable
    (
    	FirstColumn ASC
    )

Despite the benefit that the NC is smaller, I would consider it a **duplicate**. Am I missing something? No "de-dupe" script checker I am using agrees with me, so I am confused. 
                                

jericzech (977 rep)

May 5, 2025, 07:51 AM • Last activity: May 5, 2025, 06:52 PM

1 votes

1 answers

1047 views

Duplicate rows with primary key constraint in PostgreSQL 10.6

postgresql primary-key postgresql-10 duplication debugging

I have a table that's had `CONSTRAINT table_pkey PRIMARY KEY (id)` since setup. However, I recently found that I have around 100 duplicate rows (I ran `select count(*) from (select count(*) from table group by id having count(*) > 1) as t1` as per [this old stack post][1]. However, I'm running Postg...

                                  I have a table that's had CONSTRAINT table_pkey PRIMARY KEY (id) since setup.

However, I recently found that I have around 100 duplicate rows (I ran select count(*) from (select count(*) from table group by id having count(*) > 1) as t1 as per this old stack post . However, I'm running Postgres 10.6, which should be far after that bug.  
Any ideas?
                                

mckennab (111 rep)

Jul 3, 2019, 11:33 PM • Last activity: Apr 18, 2025, 04:05 AM

1 votes

2 answers

1641 views

Mysql: Create a view with multiple self joins without duplicates in result

mysql join view duplication eav

Just to clarify i can't change the tables structure, so please leave out the "you should change your tables to this and that" answers, thank you. So i have a table **entities_attributes_values** where an entity has a lot of attributes and the value of that attribute, basically imagine 3 fields: - en...

                                  Just to clarify i can't change the tables structure, so please leave out the "you should change your tables to this and that" answers, thank you.

So i have a table **entities_attributes_values** where an entity has a lot of attributes and the value of that attribute, basically imagine 3 fields:

 - entity_id  
 - entity_attributes_id
 - value

Because every entities attribute and its value is on row getting more values is not so easy i was thinking of multiple self joins, and because this query will be very common i created a view, which is built with this query:

    SELECT L1.entity_id,
           L1.value as 'company_id',
           L2.value as 'entity_name',
           P.value as 'person_name',
           L4.value as 'establishment_id',
           L5.value as 'department_id'
    FROM entities_attributes_values L1
    LEFT JOIN entities_attributes_values L2 ON L1.entity_id = L2.entity_id AND L2.entity_attributes_id = 1
    LEFT JOIN entities_attributes_values L3 ON L1.entity_id = L3.entity_id AND L3.entity_attributes_id = 3
    LEFT JOIN persons_attributes_values P ON L3.value = P.core_persons_id AND P.core_persons_attributes_id = 4
    LEFT JOIN entities_attributes_values L4 ON L1.entity_id = L4.entity_id AND L4.entity_attributes_id = 12
    LEFT JOIN entities_attributes_values L5 ON L1.entity_id = L5.entity_id AND L5.entity_attributes_id = 13
    WHERE L1.entity_attributes_id = 2

So this works but i have one problem i get "duplicate" values and its not really duplicate but **the point is that in my view i want every entity to be only one row with all its attributes values** but instead i get this:




So as you can see the first three result are not good for me, i only need the fourth one, where i have all my data about one entity.

Thank you in advance for any help!
                                

Mr. Sam (111 rep)

Jun 24, 2015, 10:08 AM • Last activity: Apr 17, 2025, 09:07 PM

7 votes

3 answers

5973 views

When should the IGNORE_DUP_KEY option be used on an index?

sql-server unique-constraint duplication sql-server-2019

Someone said it's preferable to craft your queries to avoid duplicate key exceptions, but I'm not convinced that's more performant that just setting `IGNORE_DUP_KEY = ON` for the index. My goal is to ensure a row or set of rows exists for one or more users, before attempting to update those rows. I...

Someone said it's preferable to craft your queries to avoid duplicate key exceptions, but I'm not convinced that's more performant that just setting IGNORE_DUP_KEY = ON for the index. My goal is to ensure a row or set of rows exists for one or more users, before attempting to update those rows. I do this, so that when I attempt to update the row with an update statement like the one below, and no rows are affected, it's because the [Count] portion of predicate wasn't satisfied, as opposed to the row not existing at all (i.e. the [ID] portion of the predicate not being satisfied):

UPDATE [Inventory]
SET [Count] = [Count] + 1
WHERE [ID] = 3 
AND ([Count] + 1) <= @MaxInventory

I could run EXISTS(SELECT 1 From [Inventory] WHERE [ID] = 3 to check for that single row, and only insert the row if it does not exist. That simply avoids unnecessary inserts. The insert, if necessary, would still have to contend with concurrent transactions, so duplicate key exceptions can still occur. I'm curious whether it's more performant to just turn on IGNORE_DUP_KEY in this scenario, rather than allowing the error to be thrown and caught. Specifically, I'm curious if it's as-fast or possibly even faster than running an exists check, to just attempt to insert the record and let it ignore duplicate keys. This becomes even more important, when I'm checking for and initializing multiple records at once. For example, if I need to ensure records for thousands of users exist in a single update statement, the logic would be much simpler if I just ran that insert statement up-front, letting it ignore duplicate keys. Avoiding duplicates would be more complex, because I'd have to first query the table for which records don't exist, then attempt to add just those records (again, ignoring duplicate keys). Just inserting may be faster, even if all the records exist. I could meet it halfway and check whether *any* of the records are missing, such as with a left join or a COUNT comparison, but why bother if the insert ignoring duplicate keys is just faster? **Is is a good idea to use IGNORE_DUP_KEY and just attempt inserts instead of bothering with checking for row existence ahead of time? If not, why?**

Triynko (615 rep)

Sep 5, 2019, 05:48 PM • Last activity: Mar 16, 2025, 07:11 PM

0 votes

1 answers

53 views

Removing chronological duplicates from history table in SQL

query table duplication

I have a history table, I want to create another view to pick relevant records from this history. Not all records are relevant. I'm after the record changes for certain columns, for this case I care only about Plate and Status, and their changes over time, otherwise they are considered duplicates and should be not be showing in the new query output. I want to check the chronological changes per each PLATE value (doing a ROW OVER PARTITION BY PLATE), the changes should be compared by column SysloadDaterie order and if current row vs previous row have the same relevant column values, then should be considered duplicate and removed from the output of the query. In the example, there are 2 plates that have records: columns with duplicated Status value that are not chronologically next to each other they should stay as they can go back and forth over time. But if statuses are next to each other chronologically should be considered duplicates then. example:

CREATE TABLE history (
    plate_number VARCHAR(20),
    status VARCHAR(20),
    comments VARCHAR(100),
    sysloadDatetime DATETIME
);

-- Insert sample data for two plate numbers with duplicate rows chronologically
INSERT INTO history (plate_number, status, comments, sysloadDatetime) VALUES
('ABC123', 'Active', 'Initial entry', '2024-02-20 08:00:00'),
('ABC123', 'Active', 'Checked at checkpoint A', '2024-02-20 08:15:00'),
('ABC123', 'Active', 'Checked at checkpoint B', '2024-02-20 08:30:00'),
('ABC123', 'Inactive', 'Temporarily deactivated for maintenance', '2024-02-20 09:00:00'),
('ABC123', 'Inactive', 'Awaiting reactivation', '2024-02-20 09:05:00'),
('ABC123', 'Active', 'Reactivated after maintenance', '2024-02-20 10:00:00'),
('ABC123', 'Active', 'Final check completed', '2024-02-20 10:30:00'),
('XYZ789', 'Active', 'Initial entry', '2024-02-21 08:00:00'),
('XYZ789', 'Active', 'Checked at checkpoint A', '2024-02-21 08:15:00'),
('XYZ789', 'Inactive', 'Temporarily deactivated for maintenance', '2024-02-21 08:30:00'),
('XYZ789', 'Inactive', 'Awaiting reactivation', '2024-02-21 09:00:00'),
('XYZ789', 'Active', 'Reactivated after maintenance', '2024-02-21 09:30:00'),
('XYZ789', 'Active', 'Final check completed', '2024-02-21 10:00:00');

-- Query: Select all records ordered chronologically
SELECT * FROM history
ORDER BY sysloadDatetime;

History table output: plate_number | status | comments | sysloadDatetime -------------|----------|------------------------------------------|--------------------- ABC123 | Active | Initial entry | 2024-02-20 08:00:00 ABC123 | Active | Checked at checkpoint A | 2024-02-20 08:15:00 ABC123 | Active | Checked at checkpoint B | 2024-02-20 08:30:00 ABC123 | Inactive | Temporarily deactivated for maintenance | 2024-02-20 09:00:00 ABC123 | Inactive | Awaiting reactivation | 2024-02-20 09:05:00 ABC123 | Active | Reactivated after maintenance | 2024-02-20 10:00:00 ABC123 | Active | Final check completed | 2024-02-20 10:30:00 XYZ789 | Active | Initial entry | 2024-02-21 08:00:00 XYZ789 | Active | Checked at checkpoint A | 2024-02-21 08:15:00 XYZ789 | Inactive | Temporarily deactivated for maintenance | 2024-02-21 08:30:00 XYZ789 | Inactive | Awaiting reactivation | 2024-02-21 09:00:00 XYZ789 | Active | Reactivated after maintenance | 2024-02-21 09:30:00 XYZ789 | Active | Final check completed | 2024-02-21 10:00:00 Desired output: plate_number | status | comments | sysloadDatetime -------------|----------|------------------------------------------|--------------------- ABC123 | Active | Initial entry | 2024-02-20 08:00:00 ABC123 | Inactive | Temporarily deactivated for maintenance | 2024-02-20 09:00:00 ABC123 | Active | Reactivated after maintenance | 2024-02-20 10:00:00 XYZ789 | Active | Initial entry | 2024-02-21 08:00:00 XYZ789 | Inactive | Temporarily deactivated for maintenance | 2024-02-21 08:30:00 XYZ789 | Active | Reactivated after maintenance | 2024-02-21 09:30:00

nrod88 (1 rep)

Feb 21, 2025, 05:53 AM • Last activity: Feb 23, 2025, 01:26 PM

0 votes

2 answers

907 views

How to join a parent table to two child tables without duplication?

postgresql join query table duplication

parent table: | id | city_name | zip_code | | -- | ------------- | -------- | | 1 | new york city | 10001 | | 2 | los angeles | 90001 | | 3 | chicago | 60007 | Customer, child table: | id | customer_name | city_id | address | | -- | ------------- | ------- | ------------------ | | 1 | alice | 1 | 428 Hartford Lane | | 2 | betty | 1 | 249 George Street | | 3 | cath | 3 | 12112 S Front Ave | | 4 | daph | 3 | 8619 S Wolcott Ave | Restaurant, child table: | id | address | city_id | |----|-----------------------| --------| | 1 | 256 Roehampton St. | 1 | | 2 | 920 Cherry Camp Road | 3 | | 3 | 7609 Mckinley Ave | 2 | | 4 | 5723 Morgan Ave | 2 | When using the following query:

select c.id, c.city_name, c.zip_code, r.id, r.address, r.city_id, cus.id, cus.customer_name, cus.city_id, cus.address 
from restaurant r 
left join city c 
    on r.city_id = c.id 
right join customer cus 
    on c.id = cus.city_id where c.id = 1;

There will be a duplicate restaurant address entry. How do I ensure that the restaurants and customers are distinct? | id | city_name | zip_code | id | address | city_id | id | customer_name | city_id | address | | -- | ------------- | -------- | -- | ------------- | -------- | -- | ------------- | ------- | ------------------ | | 1 | new york city | 10001 | 1 | 256 Roehampton St. | 1 | 1 | alice | 1 | 428 Hartford Lane | | 1 | new york city | 10001 | 1 | 256 Roehampton St. | 1 | 2 | betty | 1 | 249 George Street | ***Edit: To add some context, I would like to return a json object of a city which contains its fields as well as a list of customers and a list of restaurants belonging to that particular city. Is it more efficient to make three separate queries to populate that object, i.e.

select * from city where id = 1;
select * from customer where city_id = 1;
select * from restaurant where city_id = 1;

or perform one query with two joins, with the expected result as the following, where there are no duplicate restaurants or customers | id | city_name | zip_code | id | address | city_id | id | customer_name | city_id | address | | -- | ------------- | -------- | -- | ------------- | -------- | -- | ------------- | ------- | ------------------ | | 1 | new york city | 10001 | 1 | 256 Roehampton St. | 1 | 1 | alice | 1 | 428 Hartford Lane | | 1 | new york city | 10001 | | | | 2 | betty | 1 | 249 George Street | Or can I just use a distinct keyword to remove duplicates?

12johnny (3 rep)

Feb 9, 2021, 12:20 AM • Last activity: Jan 25, 2025, 01:03 PM

0 votes

2 answers

6216 views

How to delete records without a primary key in a stardard way (i.e. not using ctid)

postgresql delete duplication

# Case of use We have a log table without a PK with two columns (date, event); You inserted a row. Then you want to undo the inserted record. Is there a way to delete that row (without deleting other rows with the same data), that doesn't use postgres only capabilities? I want a solution that works...

                                  # Case of use

We have a log table without a PK with two columns (date, event); 

You inserted a row. Then you want to undo the inserted record.

Is there a way to delete that row (without deleting other rows with the same data), that doesn't use postgres only capabilities? 

I want a solution that works in another database (it doesn't need to be totally standard, it can be only in just one database: SqLite, Oracle, MySql or SQLServer). 

## Example:

    create table the_log(
      date date,
      event_id integer
    );

    insert into the_log(date, event_id) values ('2019-09-21',1),('2019-09-21',1);

    select * from the_log;

My atempts:

    delete from the_log where row_number() over ()=1;

    delete from the_log limit 1;

    with the_log_2 as (select *, row_number() over () as version from prueba_sin_clave)
    delete from the_log_2 where version=1;

I supouse that the answer is **No**. I want to know if I am wrong or in what documentation I read that I am right.

Emilio Platzer (465 rep)

Sep 21, 2018, 06:31 PM • Last activity: Nov 22, 2024, 11:04 PM

9 votes

2 answers

14880 views

How do I remove duplicate records in a join table in PostgreSQL?

postgresql duplication

I have a table that has a schema like this: create_table "questions_tags", :id => false, :force => true do |t| t.integer "question_id" t.integer "tag_id" end add_index "questions_tags", ["question_id"], :name => "index_questions_tags_on_question_id" add_index "questions_tags", ["tag_id"], :name => "...

                                  I have a table that has a schema like this:

    create_table "questions_tags", :id => false, :force => true do |t|
            t.integer "question_id"
            t.integer "tag_id"
          end
        
          add_index "questions_tags", ["question_id"], :name => "index_questions_tags_on_question_id"
          add_index "questions_tags", ["tag_id"], :name => "index_questions_tags_on_tag_id"

I would like to remove records that are duplicates, i.e. they have both the same tag_id and question_id as another record.

What does the SQL look like for that?
                                

marcamillion (205 rep)

Mar 13, 2013, 12:49 AM • Last activity: Oct 20, 2024, 11:42 PM

0 votes

1 answers

40 views

Breaking Down Transaction Table

database-design duplication

My table transation consists of tranxid, custid, itemid, quantity, unit_price, payment mode, driverid. When I perform add operations and notice that customer 1 can buy at least one or two items more and so on. There is some duplication for the custid, itemid, driverid and payment mode. How can I res...

                                  My table transation consists of tranxid, custid, itemid, quantity, unit_price, payment mode, driverid.

When I perform add operations and notice that customer 1 can buy at least one or two items more and so on. There is some duplication for the custid, itemid, driverid and payment mode. How can I resolve this duplications?
Can I use the one to many/many to one relationship to split the transaction table into two table?
                                

Wei Xiang Ng (1 rep)

Sep 6, 2024, 12:12 AM • Last activity: Sep 6, 2024, 07:50 AM

5 votes

3 answers

353 views

Prevent inserting duplicate entries under READ COMMITTED Isolation

sql-server insert duplication isolation-level

We have a SQL Server table called `dbo.Dim_Target`. Multiple threads try to `INSERT` new data into this table in parallel. Each parallel thread first checks for matching data and then only inserts new data (non-matching), using `LEFT OUTER JOIN`. Isolation Level is `READ COMMITTED`. ### Problem On r...

We have a SQL Server table called dbo.Dim_Target. Multiple threads try to INSERT new data into this table in parallel. Each parallel thread first checks for matching data and then only inserts new data (non-matching), using LEFT OUTER JOIN. Isolation Level is READ COMMITTED. ### Problem On rare occasions, duplicate data is inserted into this table by two different threads a couple of milliseconds apart. According to this post , this issue is expected RC isolation level. ### Things I tried - Different isolation levels (RC, Repeatable Read, etc) all could not prevent this problem. - Serializable isolation prevents duplicates, but also causes a lot of deadlock issues. - Unique index is not possible on this large size column ### Question Is there an approach to prevent duplicates from multiple threads? ### Sample code to recreate the scenario #### 1. Preparation Create target table & two STG tables and load dummy data

DROP TABLE IF EXISTS dbo.Dim_Target
GO
CREATE TABLE dbo.Dim_Target
(    ID           INT               NOT NULL   IDENTITY(1,1)
   , Code         VARCHAR(3000)     NOT NULL
   , Created_Date DATETIME          NOT NULL   DEFAULT(GETDATE()))
GO
INSERT INTO dbo.Dim_Target(Code) SELECT NEWID()    -- some dummy data
GO 10000

--
-- STG table 1
--
DROP TABLE IF EXISTS dbo.STG_Test1
GO
CREATE TABLE dbo.STG_Test1
(    ID     INT            NOT NULL IDENTITY(1,1)
   , Code   VARCHAR(3000)
)
GO
INSERT INTO dbo.STG_Test1 (Code) SELECT NEWID()    -- some dummy data
GO 10000
INSERT INTO dbo.STG_Test1 (Code) SELECT 'abc1'     -- this is the duplicate entry that matches STG2 table
GO

--
-- STG table 2
--
DROP TABLE IF EXISTS dbo.STG_Test2
GO
CREATE TABLE dbo.STG_Test2
(    ID     INT            NOT NULL IDENTITY(1,1)
   , Code   VARCHAR(3000)
)
GO
INSERT INTO dbo.STG_Test2 (Code) SELECT NEWID()    -- some dummy data
GO 10000
INSERT INTO dbo.STG_Test2 (Code) SELECT 'abc1'     -- this is the duplicate entry that matches STG1 table
GO

--
-- Sample query to check for duplicates
--
SELECT Code, COUNT(*) 
FROM dbo.Dim_Target
GROUP BY Code
HAVING COUNT(*) > 1

#### 2.Thread 1: Try to insert from first parallel thread Run this below script from one SSMS window. Make sure the WAITFOR time is set the same in both windows.

WAITFOR TIME '16:04:00'
BEGIN TRANSACTION
    INSERT INTO dbo.Dim_Target (Code) 
    SELECT T.Code
    FROM dbo.STG_Test1 AS T
    LEFT OUTER JOIN dbo.Dim_Target AS D
        ON D.Code = T.Code
    WHERE D.Code IS NULL
COMMIT

#### 3. Thread 2: Try to insert from second parallel thread Run this below script from second SSMS window. Make sure the WAITFOR time is set the same in both windows.

WAITFOR TIME '16:04:00'
BEGIN TRANSACTION
	INSERT INTO dbo.Dim_Target (Code) 
	SELECT T.Code
	FROM dbo.STG_Test2 AS T
	LEFT OUTER JOIN dbo.Dim_Target AS D
		ON D.Code = T.Code
	WHERE D.Code IS NULL
COMMIT

#### NOTES - When I run this script, the duplicates do **not** occur every time. They happen once every two or 3 attempts, but they DO happen. The Created_Date column shows that duplicate entries were created just a couple of milliseconds apart. - We have 10 threads running in parallel loading all the Dimension tables, so the duplicates are occurring more frequently. --- With a temporary solution working, I want to take time to re-think to make this scalable. All parts of the ETL are processed with multiple threads, except this one step -- which is single threaded (for now). So any ideas would be welcome.

ToC (727 rep)

Aug 14, 2024, 05:24 PM • Last activity: Aug 17, 2024, 02:09 PM

0 votes

1 answers

117 views

UNIQUE CONSTRAINT on two columns, but duplicates appeared: how to track down and fix?

postgresql postgresql-9.6 unique-constraint duplication

We have table for users: ```PostgreSQL CREATE UNIQUE INDEX uk_users_login_socnet ON public.users USING btree (login, socnet) ``` We had maintenance on our server, during which it was backed up and replicated to different server using `pg_basebackup` (comleted by our sysadmins, reportedly without err...

We have table for users:

CREATE UNIQUE INDEX uk_users_login_socnet ON public.users USING btree (login, socnet)

We had maintenance on our server, during which it was backed up and replicated to different server using pg_basebackup (comleted by our sysadmins, reportedly without errors and issues). Since then we have errors in our logs like following:

ERROR:  duplicate key value violates unique constraint "uk_users_login_socnet"
DETAIL:  Key (login, socnet)=(user@example.com, 1) already exists. (UPDATE users SET extra = COALESCE(extra, '') || hstore('udid', '') || hstore('udid_last_update', '1721646485') WHERE id = 1234567;)

In the meantime following query returns no results:

SELECT u1.login, u1.socnet
FROM users u1
INNER JOIN users u2 ON u1.login = u2.login AND u1.socnet = u2.socnet

while it's EXPLAIN ANALYZE is:

Hash Join (cost=2817129.95..6386837.44 rows=9847200 width=18) (actual time=13654.755..31681.796 rows=15584155 loops=1)
	
Hash Cond: (((u1.login)::text = (u2.login)::text) AND (u1.socnet = u2.socnet))
	
-> Seq Scan on users u1 (cost=0.00..2490077.18 rows=15678918 width=18) (actual time=0.026..8806.271 rows=15582973 loops=1)
	
-> Hash (cost=2490077.18..2490077.18 rows=15678918 width=18) (actual time=13653.093..13653.093 rows=15582973 loops=1)
	
Buckets: 524288 Batches: 32 Memory Usage: 28510kB
	
-> Seq Scan on users u2 (cost=0.00..2490077.18 rows=15678918 width=18) (actual time=0.186..10887.626 rows=15582973 loops=1)
	
Planning time: 1.749 ms
	
Execution time: 32166.739 ms

But this query:

SELECT t1.login, t1.socnet, t2.login, t2.socnet
FROM users AS t1
LEFT JOIN (SELECT id, login, socnet FROM users) AS t2
ON t2.login = t1.login AND t1.socnet = t2.socnet
WHERE t1.id != t2.id

shows a list of duplicated rows with same login and socnet, which to me makes no sense. Here's respective EXPLAIN ANALYZE:

Hash Join (cost=2817129.95..6411455.44 rows=9847199 width=36) (actual time=17015.349..33466.957 rows=1182 loops=1)
	
Hash Cond: (((t1.login)::text = (users.login)::text) AND (t1.socnet = users.socnet))
	
Join Filter: (t1.id  users.id)
	
Rows Removed by Join Filter: 15583110
	
-> Seq Scan on users t1 (cost=0.00..2490077.18 rows=15678918 width=22) (actual time=0.034..9902.685 rows=15583110 loops=1)
	
-> Hash (cost=2490077.18..2490077.18 rows=15678918 width=22) (actual time=14344.722..14344.722 rows=15583110 loops=1)
	
Buckets: 524288 Batches: 32 Memory Usage: 30951kB
	
-> Seq Scan on users (cost=0.00..2490077.18 rows=15678918 width=22) (actual time=0.024..11382.363 rows=15583110 loops=1)
	
Planning time: 1.764 ms
	
Execution time: 33467.260 ms

The PostgreSQL server version is 9.6 My questions are: - What is happening? Why do I have duplicate data within UNIQUE CONSTRAINT? - Why I have different results for INNER JOIN and LEFT JOIN? - What might have happened to DB structure and how to debug it? Where to look for hints on what's broken? - How to fix this to consistent state?

Eduard Sukharev (129 rep)

Jul 22, 2024, 11:49 AM • Last activity: Jul 22, 2024, 01:28 PM

-2 votes

1 answers

1543 views

How to remove violation of duplicate primary key in the table in sql server?

sql-server primary-key duplication

I am encountering a customer error as they are unable to print due to violation of primary key as being duplicate, below its where i went and check for the db column in the table and saw this as the possible reason. What is the solution to prevent this? [![enter image description here][1]][1] [1]: h...

                                  I am encountering a customer error as they are unable to print due to violation of primary key as being duplicate, below its where i went and check for the db column in the table and saw this as the possible reason. What is the solution to prevent this?

thrue2023 (3 rep)

Apr 9, 2024, 11:41 AM • Last activity: Apr 9, 2024, 02:10 PM

1 votes

1 answers

194 views

Query to find when a specific current value changed

sql-server query duplication

I have 3 tables of data (all contain more columns, but these are the ones I need from each): t1: table1 | workID1 | h1Amt | h1Dt | status | |:-------|:-----|:----|:-------| | 101 |73.00 |12/1/2023| A | | 124 |0.00 |10/30/2023| D | | 251 |0.00 |3/1/2023| A | | 255 |24.00 |11/30/2023| A | | 256 |0.00...

                                  I have 3 tables of data (all contain more columns, but these are the ones I need from each):

t1: table1

| workID1 | h1Amt | h1Dt | status |
|:-------|:-----|:----|:-------|
| 101    |73.00 |12/1/2023| A  |
| 124    |0.00  |10/30/2023| D |
| 251    |0.00  |3/1/2023| A   |
| 255    |24.00 |11/30/2023| A |
| 256    |0.00  |12/1/2023| A  |

t2: table2

| workID2 | h2Amt | h2Dt | type1 |
|:-------|:------|:----|:-----|
| 101    | 0073  |202312| 2 |
| 101    | 0000  |202308| 13 |
| 101    | 0000  |202307| 1 |
| 124    | 0000  |202310| 3 |
| 124    | 0000  |202306| 2 |
| 124    | 0000  |202304| 13|
| 124    | 0079  |202301| 3 |
| 251    | 0000  |202303| 2 |
| 251    | 1154  |202301| 3 |
| 251    | 0000  |202212| 3 |
| 255    | 0024  |202311| 2 |
| 255    | 0000  |202212| 3 |
| 255    | 0000  |202210| 13 |
| 255    | 0175  |202209| 3 |
| 256    | 0000  |202312| 1 |
| 256    | 0000  |202309| 13|
| 256    | 0000  |202307| 3 |
| 256    | 0583  |202305| 2 |

t3: table3

| workID3 | h3Date | type2 | h3Dt |
|:-------|:----|:-----|:-------|
| 101    |12/1/2023| 2 | 202312 |
| 101    |8/13/2023| 13 | 202308 |
| 101    |7/1/2023 | 1 | 202307 |
| 124    |10/1/2023| 3 | 202310 |
| 124    |6/1/2023| 2 | 202306 |
| 124    |4/11/2023| 13| 202304 |
| 124    |1/1/2023| 3 | 202301 |
| 251    |3/1/2023| 2 | 202303 |
| 251    |1/1/2023| 3 | 202301 |
| 251    |12/1/2022| 3 | 202212 |
| 255    |11/1/2023| 2 | 202311 |
| 255    |12/1/20022| 3 | 202212 |
| 255    |10/7/2022| 13 | 202210 |
| 255    |9/1/2022| 3 | 202209 |
| 256    |12/1/2023| 1 | 202312 |
| 256    |9/5/2023| 13| 202309 |
| 256    |7/1/2023| 3 | 202307 |
| 256    |5/1/2023| 2 | 202305 |


In t1, there is only one row for each workID, which is the current transaction for that workID. As data gets added, all 3 tables gets updated, with t1 only showing the current transaction.
In t2 and t3, there are multiple rows because it shows historical transactions for each workID.

The h1Amt and h2Amt columns show the same cost, but formatted differently. Additionally, the h1Dt, h2Dt, and h3Dt all reflect the same date, just formatted differently.

I need to find a specific current value and when that value last changed. In my case, I am searching for an amount of 0.00 or 0000 and when it was last changed to this value. Also, the status (t1) has to be "A" and type (t2 or t3) has to be 1, 2, or 3.

Here is the output I'm looking for:

| workID | amount  | date | type |
|:-------|:--------|:-----|:-----|
| 251 |0.00 or 0000|202303 or 3/1/2023| 2    |
| 256 |0.00 or 0000|202307 or 7/1/2023| 3    |

This is what I came up with so far, but it is returning amounts that are not currently 0000 (0.00) or not showing workID's that have a current 0000 (0.00).

    select c.workID3
    	,b.currAmt
    	,b.curStart
    	,c.type2
    from table3 c
    inner join table1 t on t.workID1 = c.workID3
    inner join 
    	(
    	select b.workID2
    		,max(
    			case
    				when h1Amt = 0.00 and b.h2Dt = d.h3dt then h1Amt
    			end) as curAmt
    		,min(
    			case
    				when h1Amt = 0.00 and b.h2Dt = d.h3Dt then d.h3Date
    			end) as curStart
    	from table2 b
    	inner join table3 d on d.workID3 = b.workID2
    	inner join table1 on workID1 = b.workID2
    	where not exists
    		(
    		select 1
    		from table2 a
    		where a.workID2 = b.workID2
    			and a.h2Amt  b.h2Amt
    			and a.h2Dt > b.h2Dt
    		)
    		and t.h1Amt = 0.00
    		and t.status = 'A'
    		and d.type2 in ('1','2','3')
    	group by b.workID2
    	) b
    	on b.workID2 = c.workID3
    		and b.curStart = c.h3Dt
    order by c.workID3

I haven't used table3, mostly because table2 contains h2Amt that is needed. I don't think it would be smart for me to use h1Amt from table1 since it only shows what is current. At the very least I need to use table1 because of the status column. If I did use table3, table1 would be a requirement since it's the only table with a status column, but I think I would run into the issue of finding the historical amount since it's not recorded in either table1 or table3. Also, I've run into issues of getting duplicate rows in other queries that I've tried, to include using all 3 tables.

The above is the closest query I've seen after doing a lot of searching for similar situations. 

Hopefully I've given enough detail to find a solution. Thanks in advance!

Edit: in table3, there are 2 date columns, h3Dt and h3Date. Table3 has been updated. I also updated the query that I currently have. It returns most correctly, but still has some that returns the wrong date. Also, as I was looking at the data, table2 type1 has some data that is incorrect vs table 3 type 2, so I decided to use table 3 as well in the above query as it is more accurate. The only issue left now is how I can get the correct data for all rows.

Here is some sample data:

    create table table1 (
    	workID char(3),
    	h1Amt char(10),
    	h1Dt varchar(255),
    	status char(1) );
    
    insert into table1 (
    	workID, h1Amt, h1Dt, status
    values ('101','73.00','12/1/2023','A'),
    	('124','0.00','10/30/2023','D'),
    	('251','0.00','3/1/2023','A'),
    	('255','24.00','11/30/2023','A'),
    	('256','0.00','12/1/2023','A');
    
    create table table2 (
    	workID2 char(3),
    	h2Amt char(4),
    	h2Dt char(4),
    	type1 char(2) );
    
    insert into table2 (
    	workID2, h2Amt, h2Dt, type1
    values ('101','0073','202312','2'),
    	('101','0000','202308','13'),
    	('101','0000','202307','1'),
    	('124','0000','202310','3'),
    	('124','0000','202306','2'),
    	('124','0000','202304','13'),
    	('124','0079','202301','3'),
    	('251','0000','202303','2'),
    	('251','1154','202301','2'),
    	('251','0000','202212','3'),
    	('255','0024','202311','2'),
    	('255','0000','202212','3'),
    	('255','0000','202210','13'),
    	('255','0175','202209','3'),
    	('256','0000','202312','1'),
    	('256','0000','202309','13'),
    	('256','0000','202307','3'),
    	('256','0583','202305','2');
    
    create table table3 (
    	workID3 char(3),
    	h3Date varchar(10),
    	type2 char(2),
    	h3Dt char(4) );
    
    insert into table3 (
    	workID3, h3Date, type2, h3Dt
    values ('101','12/1/2023','2','202312'),
    	('101','8/13/2023','13','202308'),
    	('101','7/1/2023','1','202307'),
    	('124','10/1/2023','3','202310'),
    	('124','6/1/2023','2','202306'),
    	('124','4/11/2023','13','202304'),
    	('124','1/1/2023','3','202301'),
    	('251','3/1/2023','2','202303'),
    	('251','1/1/2023','3','202301'),
    	('251','12/1/2022','3','202212'),
    	('255','11/1/2023','2','202311'),
    	('255','12/1/2022','3','202212'),
    	('255','10/7/2022','13','202210'),
    	('255','9/1/2022','3','202209'),
    	('256','12/1/2023','1','202312'),
    	('256','9/5/2023','13','202309'),
    	('256','7/1/2023','3','202307'),
    	('256','5/1/2023','2','202305');



                                

tkmagnet (75 rep)

Jan 25, 2024, 10:18 PM • Last activity: Mar 19, 2024, 06:57 PM

2 votes

1 answers

1053 views

DISTINCT gives duplicate values

duplication oracle-19c distinct

Can it be considered a bug? This query gives duplicate values despite DISTINCT: select distinct '1' from ( select * from dual connect by level <= 10 ) order by dbms_random.value fetch first 10 rows only I understand it creates additional column with random values and uses distinct with two columns,...

                                  Can it be considered a bug? This query gives duplicate values despite DISTINCT:

    select distinct '1'
    from 
    (
    select *
    from dual
    connect by level <= 10
    )
    order by dbms_random.value
    fetch first 10 rows only

I understand it creates additional column with random values and uses distinct with two columns, but still I expect to get unique values.
                                

Andy DB Analyst (110 rep)

Jun 2, 2023, 02:47 PM • Last activity: Mar 13, 2024, 03:01 PM

Showing page 1 of 20 total questions