Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
0
answers
4
views
Data duplication after Apache IoTDB pipe restart, how to maintain the existing sync progress?
When using Apache IoTDB's pipe feature to sync data to another Apache IoTDB instance, I encountered duplicate data transmission. The scenario is as follows: I configured a Pipe from root.source to a remote IoTDB instance (root.target): CREATE PIPESINK remote_sink AS IoTDB ('ip:6667', 'username', 'pa...
When using Apache IoTDB's pipe feature to sync data to another Apache IoTDB instance, I encountered duplicate data transmission. The scenario is as follows:
I configured a Pipe from root.source to a remote IoTDB instance (root.target):
CREATE PIPESINK remote_sink AS IoTDB ('ip:6667', 'username', 'password');
CREATE PIPE source_to_target TO remote_sink FROM (SOURCE 'root.source.we4rwe.**') INTO (TARGET 'root.target.faknlv93.**');`
START PIPE source_to_target;
The initial sync works fine, but when I manually restart the Pipe service (e.g.,
STOP PIPE source_to_target
followed by START PIPE)
, some historical data (e.g., already synced root.source.d1.sensor
data) is retransmitted, causing duplicates on the target.
The pipe status (SHOW PIPES
) shows status=RUNNING
with no error logs.
Is this the expected behavior of IoTDB Pipe? How to avoid duplicate data transmission during Pipe restart? Are additional configurations (e.g., sync_progress or other parameters) required?
Hester Tso
(101 rep)
Aug 1, 2025, 08:12 AM
0
votes
1
answers
169
views
How to remove these duplicate records in an Oracle Database 12c?
I have these two tables: ``` messages(id primary key, message_date, created_at, ...) user_messages(user_id, message_id references messages(id)) ``` I have some duplicate rows in messages: ``` select user_id, message_date, count(*) from messages inner join user_messages on messages.id = user_messages...
I have these two tables:
messages(id primary key, message_date, created_at, ...)
user_messages(user_id, message_id references messages(id))
I have some duplicate rows in messages:
select user_id, message_date, count(*)
from messages inner join user_messages
on messages.id = user_messages_message_id
group by user_id, message_date;
user_id | message_date | count(*)
1 | 2019-01-01 | 2
1 | 2019-02-01 | 3
1 | 2019-03-01 | 2
How can I remove such duplicates, only retaining one of them, for example the one that its created_at
(not message_date
) is the minimum?
orly
(21 rep)
Nov 8, 2019, 03:24 AM
• Last activity: Jul 4, 2025, 04:06 AM
0
votes
1
answers
212
views
merging nearly- duplicate rows from different databases in the same table
I am sure , that many people before me faced a need to merge nearly- duplicate rows from several databases. However, I have not been able to find a simple yet general solution. I provided a data sample in Excel here: https://disk.yandex.ru/d/UtuENQ4LXHmOmQ It shows in yellow the 4 rows, that I have...
I am sure , that many people before me faced a need to merge nearly- duplicate rows from several databases. However, I have not been able to find a simple yet general solution. I provided a data sample in Excel here: https://disk.yandex.ru/d/UtuENQ4LXHmOmQ It shows in yellow the 4 rows, that I have now, and one row in green, that I want to make.
The new row should have some data only from one of the 4 database in some columns, and merged from all (or some) databases in other columns. I will delete the original rows after the merger. I spent all morning trying to figure out what functions to use. I have INSERT INTO, VALUES, WHERE , but smth else is missing. Could someone please direct me to a possible solution?
YW T
(11 rep)
Jan 24, 2022, 03:36 PM
• Last activity: Jun 12, 2025, 08:07 PM
2
votes
1
answers
192
views
Blocked listener service
It seems that I cannot connect via tns alias on my auxiliary, I know I could bypass it by either connection locally with auxiliary e.g. `auxiliary /` or adding a `(UR=A)` parameter. The DB is in nomount. I want to know why, because I have explicitly created two static listener registrations, to use...
It seems that I cannot connect via tns alias on my auxiliary, I know I could bypass it by either connection locally with auxiliary e.g.
auxiliary /
or adding a (UR=A)
parameter. The DB is in nomount.
I want to know why, because I have explicitly created two static listener registrations, to use it for the dataguard and for the duplicate, but I have following issue.
Here is firstly the configuration.
listner.ora
[oracle@server-salzburg admin]$ cat /u01/app/19/grid/network/admin/listener.ora
#Backup file is /u01/app/oracle/crsdata/server-salzburg/output/listener.ora.bak.server-salzburg.oracle line added by Agent
# listener.ora Network Configuration File: /u01/app/19/grid/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = server-salzburg)(PORT = 1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=graz_DGMGRL)
(SID_NAME=graz)
(ORACLE_HOME=/u01/app/oracle/product/19/dbhome_1)
)
(SID_DESC=
(GLOBAL_DBNAME=salzburg_DGMGRL)
(SID_NAME=salzburg)
(ORACLE_HOME=/u01/app/oracle/product/19/dbhome_1)
)
)
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER=ON # line added by Agent
VALID_NODE_CHECKING_REGISTRATION_LISTENER=ON # line added by Agent
tnsnames.ora
[oracle@server-salzburg admin]$ cat tnsnames.ora
SALZBURG =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = server-salzburg)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = salzburg)
)
)
graz_DGMGRL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = server-graz)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = graz)
)
)
salzburg_DGMGRL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = server-salzburg)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = salzburg)
)
)
listener output
[oracle@server-salzburg admin]$ lsnrctl status
LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 09-JUN-2025 20:27:36
Copyright (c) 1991, 2019, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date 09-JUN-2025 20:24:03
Uptime 0 days 0 hr. 3 min. 33 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/19/grid/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/server-salzburg/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=server-salzburg)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DG_DATA" has 1 instance(s).
Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "+ASM_DG_FRA" has 1 instance(s).
Instance "+ASM", status READY, has 1 handler(s) for this service...
Service "graz_DGMGRL" has 1 instance(s).
Instance "graz", status UNKNOWN, has 1 handler(s) for this service...
Service "salzburg" has 1 instance(s).
Instance "salzburg", status BLOCKED, has 1 handler(s) for this service...
Service "salzburg_DGMGRL" has 1 instance(s).
Instance "salzburg", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
rman error:
[oracle@server-salzburg admin]$ rman target sys/*******@graz_DGMGRL auxiliary sys/******@salzburg_DGMGRL
Recovery Manager: Release 19.0.0.0.0 - Production on Mon Jun 9 20:28:00 2025
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
connected to target database: GRAZ (DBID=3282927897)
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04006: error from auxiliary database: ORA-12528: TNS:listener: all appropriate instances are blocking new connections
paganini
(138 rep)
Jun 9, 2025, 06:28 PM
• Last activity: Jun 9, 2025, 08:41 PM
1
votes
1
answers
2207
views
duplicate key error index MongoDB
I have rocket.chat server connected to LDAP. When one of users try logging to their account he get logged to someone other account, it's probably because indexes. My logs look like this: > [34mI20161126-12:07:28.992(1) rocketchat_ldap rocketchat_ldap.js:341 [31mLDAP ➔ Search.error Search by id 31353...
I have rocket.chat server connected to LDAP.
When one of users try logging to their account he get logged to someone other account, it's probably because indexes. My logs look like this:
> [34mI20161126-12:07:28.992(1) rocketchat_ldap rocketchat_ldap.js:341 [31mLDAP ➔ Search.error Search by id 31353036 returned 2 records
[34mI20161126-12:07:28.999(1) rocketchat_ldap rocketchat_ldap.js:780
>
> [31mLDAPSync ➔ error { [MongoError: E11000 duplicate key error index:
> rocketchat.users.$emails.address_1 dup key: { : "some@one.com" }]
> name: 'MongoError', message: 'E11000 duplicate key error index:
> rocketchat.users.$emails.address_1 dup key: { : "some@one.com" }',
> driver: true, index: 0, code: 11000, errmsg: 'E11000 duplicate
> key error index: rocketchat.users.$emails.address_1 dup key: { :
> "some@one.com" }' }
Can I rebuild that indexes or something else to resolve this problem?
Simonell
(11 rep)
Nov 26, 2016, 11:18 AM
• Last activity: Jun 2, 2025, 10:04 AM
2
votes
1
answers
44
views
"Duplicate entry" error when upgrading MariaDB 10.6 to 11.4
I was trying to upgrade my local development environment (Ubuntu 22.04 WSL) from MariaDB 10.6 to 11.4. After updating the APT repo file, I run and `apt update`, which updated the server version. However, upon running `mariadb-upgrade`, I see this error: ``` Phase 1/8: Checking and upgrading mysql da...
I was trying to upgrade my local development environment (Ubuntu 22.04 WSL) from MariaDB 10.6 to 11.4.
After updating the APT repo file, I run and
apt update
, which updated the server version.
However, upon running mariadb-upgrade
, I see this error:
Phase 1/8: Checking and upgrading mysql database
Processing databases
mysql
mysql.column_stats OK
mysql.columns_priv OK
mysql.db OK
mysql.event OK
mysql.func OK
mysql.global_priv OK
mysql.gtid_slave_pos OK
mysql.help_category OK
mysql.help_keyword OK
mysql.help_relation OK
mysql.help_topic OK
mysql.index_stats OK
mysql.innodb_index_stats OK
mysql.innodb_table_stats OK
mysql.plugin OK
mysql.proc OK
mysql.procs_priv OK
mysql.proxies_priv OK
mysql.roles_mapping OK
mysql.servers OK
mysql.spider_link_failed_log OK
mysql.spider_link_mon_servers OK
mysql.spider_table_crd OK
mysql.spider_table_position_for_recovery OK
mysql.spider_table_sts OK
mysql.spider_tables OK
mysql.spider_xa OK
mysql.spider_xa_failed_log OK
mysql.spider_xa_member OK
mysql.table_stats OK
mysql.tables_priv OK
mysql.time_zone OK
mysql.time_zone_leap_second OK
mysql.time_zone_name OK
mysql.time_zone_transition OK
mysql.time_zone_transition_type OK
mysql.transaction_registry OK
Phase 2/8: Installing used storage engines... Skipped
Phase 3/8: Running 'mysql_fix_privilege_tables'
ERROR 1062 (23000) at line 30: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
ERROR 1062 (23000) at line 256: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
ERROR 1062 (23000) at line 304: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
ERROR 1062 (23000) at line 331: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
ERROR 1062 (23000) at line 343: Duplicate entry 'db_626suite-get_98_53_metadata-PROCEDURE' for key 'PRIMARY'
FATAL ERROR: Upgrade failed
I do not understand how it is possible that there are duplicate functions, nor how to solve the problem.
Since this is just a development environment, I simply tried dropping the database, but the error persists.
Maybe the new version performs some checks that the old one didn't; but, how can I check for similar inconsistencies before running mariadb-upgrade
in order to get a smooth update?
Matteo Tassinari
(187 rep)
May 6, 2025, 02:13 PM
• Last activity: May 6, 2025, 02:55 PM
3
votes
2
answers
198
views
Same leading key column in unique constraint and nonclustered index - duplicate
These are two indexes on the table with the same leading key column. First is a unique constraint with additional keys: ALTER TABLE dbo.MyTable ADD CONSTRAINT UQ_MyTable_FirstColumn_SecondColumn_ThirdColumn UNIQUE NONCLUSTERED ( FirstColumn ASC, SecondColumn ASC, ThirdColumn ASC ) And the second is...
These are two indexes on the table with the same leading key column.
First is a unique constraint with additional keys:
ALTER TABLE dbo.MyTable ADD CONSTRAINT UQ_MyTable_FirstColumn_SecondColumn_ThirdColumn
UNIQUE NONCLUSTERED
(
FirstColumn ASC,
SecondColumn ASC,
ThirdColumn ASC
)
And the second is a single-key nonclustered rowstore index:
CREATE NONCLUSTERED INDEX IX_MyTable_FirstColumn ON dbo.MyTable
(
FirstColumn ASC
)
Despite the benefit that the NC is smaller, I would consider it a **duplicate**. Am I missing something? No "de-dupe" script checker I am using agrees with me, so I am confused.
jericzech
(977 rep)
May 5, 2025, 07:51 AM
• Last activity: May 5, 2025, 06:52 PM
1
votes
1
answers
1047
views
Duplicate rows with primary key constraint in PostgreSQL 10.6
I have a table that's had `CONSTRAINT table_pkey PRIMARY KEY (id)` since setup. However, I recently found that I have around 100 duplicate rows (I ran `select count(*) from (select count(*) from table group by id having count(*) > 1) as t1` as per [this old stack post][1]. However, I'm running Postg...
I have a table that's had
CONSTRAINT table_pkey PRIMARY KEY (id)
since setup.
However, I recently found that I have around 100 duplicate rows (I ran select count(*) from (select count(*) from table group by id having count(*) > 1) as t1
as per this old stack post . However, I'm running Postgres 10.6, which should be far after that bug.
Any ideas?
mckennab
(111 rep)
Jul 3, 2019, 11:33 PM
• Last activity: Apr 18, 2025, 04:05 AM
1
votes
2
answers
1641
views
Mysql: Create a view with multiple self joins without duplicates in result
Just to clarify i can't change the tables structure, so please leave out the "you should change your tables to this and that" answers, thank you. So i have a table **entities_attributes_values** where an entity has a lot of attributes and the value of that attribute, basically imagine 3 fields: - en...
Just to clarify i can't change the tables structure, so please leave out the "you should change your tables to this and that" answers, thank you.
So i have a table **entities_attributes_values** where an entity has a lot of attributes and the value of that attribute, basically imagine 3 fields:
- entity_id
- entity_attributes_id
- value
Because every entities attribute and its value is on row getting more values is not so easy i was thinking of multiple self joins, and because this query will be very common i created a view, which is built with this query:
SELECT
So as you can see the first three result are not good for me, i only need the fourth one, where i have all my data about one entity.
Thank you in advance for any help!
L1
.entity_id
,
L1
.value
as 'company_id',
L2
.value
as 'entity_name',
P
.value
as 'person_name',
L4
.value
as 'establishment_id',
L5
.value
as 'department_id'
FROM entities_attributes_values
L1
LEFT JOIN entities_attributes_values
L2
ON L1
.entity_id
= L2
.entity_id
AND L2
.entity_attributes_id
= 1
LEFT JOIN entities_attributes_values
L3
ON L1
.entity_id
= L3
.entity_id
AND L3
.entity_attributes_id
= 3
LEFT JOIN persons_attributes_values
P
ON L3
.value
= P
.core_persons_id
AND P
.core_persons_attributes_id
= 4
LEFT JOIN entities_attributes_values
L4
ON L1
.entity_id
= L4
.entity_id
AND L4
.entity_attributes_id
= 12
LEFT JOIN entities_attributes_values
L5
ON L1
.entity_id
= L5
.entity_id
AND L5
.entity_attributes_id
= 13
WHERE L1
.entity_attributes_id
= 2
So this works but i have one problem i get "duplicate" values and its not really duplicate but **the point is that in my view i want every entity to be only one row with all its attributes values** but instead i get this:

Mr. Sam
(111 rep)
Jun 24, 2015, 10:08 AM
• Last activity: Apr 17, 2025, 09:07 PM
7
votes
3
answers
5973
views
When should the IGNORE_DUP_KEY option be used on an index?
Someone said it's preferable to craft your queries to avoid duplicate key exceptions, but I'm not convinced that's more performant that just setting `IGNORE_DUP_KEY = ON` for the index. My goal is to ensure a row or set of rows exists for one or more users, before attempting to update those rows. I...
Someone said it's preferable to craft your queries to avoid duplicate key exceptions, but I'm not convinced that's more performant that just setting
IGNORE_DUP_KEY = ON
for the index.
My goal is to ensure a row or set of rows exists for one or more users, before attempting to update those rows. I do this, so that when I attempt to update the row with an update statement like the one below, and no rows are affected, it's because the [Count]
portion of predicate wasn't satisfied, as opposed to the row not existing at all (i.e. the [ID]
portion of the predicate not being satisfied):
UPDATE [Inventory]
SET [Count] = [Count] + 1
WHERE [ID] = 3
AND ([Count] + 1) <= @MaxInventory
I could run EXISTS(SELECT 1 From [Inventory] WHERE [ID] = 3
to check for that single row, and only insert the row if it does not exist. That simply avoids unnecessary inserts. The insert, if necessary, would still have to contend with concurrent transactions, so duplicate key exceptions can still occur.
I'm curious whether it's more performant to just turn on IGNORE_DUP_KEY
in this scenario, rather than allowing the error to be thrown and caught. Specifically, I'm curious if it's as-fast or possibly even faster than running an exists check, to just attempt to insert the record and let it ignore duplicate keys.
This becomes even more important, when I'm checking for and initializing multiple records at once. For example, if I need to ensure records for thousands of users exist in a single update statement, the logic would be much simpler if I just ran that insert statement up-front, letting it ignore duplicate keys. Avoiding duplicates would be more complex, because I'd have to first query the table for which records don't exist, then attempt to add just those records (again, ignoring duplicate keys). Just inserting may be faster, even if all the records exist.
I could meet it halfway and check whether *any* of the records are missing, such as with a left join or a COUNT
comparison, but why bother if the insert ignoring duplicate keys is just faster?
**Is is a good idea to use IGNORE_DUP_KEY
and just attempt inserts instead of bothering with checking for row existence ahead of time? If not, why?**
Triynko
(615 rep)
Sep 5, 2019, 05:48 PM
• Last activity: Mar 16, 2025, 07:11 PM
0
votes
1
answers
53
views
Removing chronological duplicates from history table in SQL
I have a history table, I want to create another view to pick relevant records from this history. Not all records are relevant. I'm after the record changes for certain columns, for this case I care only about Plate and Status, and their changes over time, otherwise they are considered duplicates an...
I have a history table, I want to create another view to pick relevant records from this history. Not all records are relevant. I'm after the record changes for certain columns, for this case I care only about Plate and Status, and their changes over time, otherwise they are considered duplicates and should be not be showing in the new query output.
I want to check the chronological changes per each PLATE value (doing a ROW OVER PARTITION BY PLATE), the changes should be compared by column SysloadDaterie order and if current row vs previous row have the same relevant column values, then should be considered duplicate and removed from the output of the query.
In the example, there are 2 plates that have records:
columns with duplicated Status value that are not chronologically next to each other they should stay as they can go back and forth over time. But if statuses are next to each other chronologically should be considered duplicates then.
example:
CREATE TABLE history (
plate_number VARCHAR(20),
status VARCHAR(20),
comments VARCHAR(100),
sysloadDatetime DATETIME
);
-- Insert sample data for two plate numbers with duplicate rows chronologically
INSERT INTO history (plate_number, status, comments, sysloadDatetime) VALUES
('ABC123', 'Active', 'Initial entry', '2024-02-20 08:00:00'),
('ABC123', 'Active', 'Checked at checkpoint A', '2024-02-20 08:15:00'),
('ABC123', 'Active', 'Checked at checkpoint B', '2024-02-20 08:30:00'),
('ABC123', 'Inactive', 'Temporarily deactivated for maintenance', '2024-02-20 09:00:00'),
('ABC123', 'Inactive', 'Awaiting reactivation', '2024-02-20 09:05:00'),
('ABC123', 'Active', 'Reactivated after maintenance', '2024-02-20 10:00:00'),
('ABC123', 'Active', 'Final check completed', '2024-02-20 10:30:00'),
('XYZ789', 'Active', 'Initial entry', '2024-02-21 08:00:00'),
('XYZ789', 'Active', 'Checked at checkpoint A', '2024-02-21 08:15:00'),
('XYZ789', 'Inactive', 'Temporarily deactivated for maintenance', '2024-02-21 08:30:00'),
('XYZ789', 'Inactive', 'Awaiting reactivation', '2024-02-21 09:00:00'),
('XYZ789', 'Active', 'Reactivated after maintenance', '2024-02-21 09:30:00'),
('XYZ789', 'Active', 'Final check completed', '2024-02-21 10:00:00');
-- Query: Select all records ordered chronologically
SELECT * FROM history
ORDER BY sysloadDatetime;
History table output:
plate_number | status | comments | sysloadDatetime
-------------|----------|------------------------------------------|---------------------
ABC123 | Active | Initial entry | 2024-02-20 08:00:00
ABC123 | Active | Checked at checkpoint A | 2024-02-20 08:15:00
ABC123 | Active | Checked at checkpoint B | 2024-02-20 08:30:00
ABC123 | Inactive | Temporarily deactivated for maintenance | 2024-02-20 09:00:00
ABC123 | Inactive | Awaiting reactivation | 2024-02-20 09:05:00
ABC123 | Active | Reactivated after maintenance | 2024-02-20 10:00:00
ABC123 | Active | Final check completed | 2024-02-20 10:30:00
XYZ789 | Active | Initial entry | 2024-02-21 08:00:00
XYZ789 | Active | Checked at checkpoint A | 2024-02-21 08:15:00
XYZ789 | Inactive | Temporarily deactivated for maintenance | 2024-02-21 08:30:00
XYZ789 | Inactive | Awaiting reactivation | 2024-02-21 09:00:00
XYZ789 | Active | Reactivated after maintenance | 2024-02-21 09:30:00
XYZ789 | Active | Final check completed | 2024-02-21 10:00:00
Desired output:
plate_number | status | comments | sysloadDatetime
-------------|----------|------------------------------------------|---------------------
ABC123 | Active | Initial entry | 2024-02-20 08:00:00
ABC123 | Inactive | Temporarily deactivated for maintenance | 2024-02-20 09:00:00
ABC123 | Active | Reactivated after maintenance | 2024-02-20 10:00:00
XYZ789 | Active | Initial entry | 2024-02-21 08:00:00
XYZ789 | Inactive | Temporarily deactivated for maintenance | 2024-02-21 08:30:00
XYZ789 | Active | Reactivated after maintenance | 2024-02-21 09:30:00
nrod88
(1 rep)
Feb 21, 2025, 05:53 AM
• Last activity: Feb 23, 2025, 01:26 PM
0
votes
2
answers
907
views
How to join a parent table to two child tables without duplication?
parent table: | id | city_name | zip_code | | -- | ------------- | -------- | | 1 | new york city | 10001 | | 2 | los angeles | 90001 | | 3 | chicago | 60007 | Customer, child table: | id | customer_name | city_id | address | | -- | ------------- | ------- | ------------------ | | 1 | alice | 1 | 42...
parent table:
| id | city_name | zip_code |
| -- | ------------- | -------- |
| 1 | new york city | 10001 |
| 2 | los angeles | 90001 |
| 3 | chicago | 60007 |
Customer, child table:
| id | customer_name | city_id | address |
| -- | ------------- | ------- | ------------------ |
| 1 | alice | 1 | 428 Hartford Lane |
| 2 | betty | 1 | 249 George Street |
| 3 | cath | 3 | 12112 S Front Ave |
| 4 | daph | 3 | 8619 S Wolcott Ave |
Restaurant, child table:
| id | address | city_id |
|----|-----------------------| --------|
| 1 | 256 Roehampton St. | 1 |
| 2 | 920 Cherry Camp Road | 3 |
| 3 | 7609 Mckinley Ave | 2 |
| 4 | 5723 Morgan Ave | 2 |
When using the following query:
select c.id, c.city_name, c.zip_code, r.id, r.address, r.city_id, cus.id, cus.customer_name, cus.city_id, cus.address
from restaurant r
left join city c
on r.city_id = c.id
right join customer cus
on c.id = cus.city_id where c.id = 1;
There will be a duplicate restaurant address entry. How do I ensure that the restaurants and customers are distinct?
| id | city_name | zip_code | id | address | city_id | id | customer_name | city_id | address |
| -- | ------------- | -------- | -- | ------------- | -------- | -- | ------------- | ------- | ------------------ |
| 1 | new york city | 10001 | 1 | 256 Roehampton St. | 1 | 1 | alice | 1 | 428 Hartford Lane |
| 1 | new york city | 10001 | 1 | 256 Roehampton St. | 1 | 2 | betty | 1 | 249 George Street |
***Edit: To add some context, I would like to return a json object of a city which contains its fields as well as a list of customers and a list of restaurants belonging to that particular city.
Is it more efficient to make three separate queries to populate that object, i.e.
select * from city where id = 1;
select * from customer where city_id = 1;
select * from restaurant where city_id = 1;
or perform one query with two joins, with the expected result as the following, where there are no duplicate restaurants or customers
| id | city_name | zip_code | id | address | city_id | id | customer_name | city_id | address |
| -- | ------------- | -------- | -- | ------------- | -------- | -- | ------------- | ------- | ------------------ |
| 1 | new york city | 10001 | 1 | 256 Roehampton St. | 1 | 1 | alice | 1 | 428 Hartford Lane |
| 1 | new york city | 10001 | | | | 2 | betty | 1 | 249 George Street |
Or can I just use a distinct keyword to remove duplicates?
12johnny
(3 rep)
Feb 9, 2021, 12:20 AM
• Last activity: Jan 25, 2025, 01:03 PM
0
votes
2
answers
6216
views
How to delete records without a primary key in a stardard way (i.e. not using ctid)
# Case of use We have a log table without a PK with two columns (date, event); You inserted a row. Then you want to undo the inserted record. Is there a way to delete that row (without deleting other rows with the same data), that doesn't use postgres only capabilities? I want a solution that works...
# Case of use
We have a log table without a PK with two columns (date, event);
You inserted a row. Then you want to undo the inserted record.
Is there a way to delete that row (without deleting other rows with the same data), that doesn't use postgres only capabilities?
I want a solution that works in another database (it doesn't need to be totally standard, it can be only in just one database: SqLite, Oracle, MySql or SQLServer).
## Example:
create table the_log(
date date,
event_id integer
);
insert into the_log(date, event_id) values ('2019-09-21',1),('2019-09-21',1);
select * from the_log;
My atempts:
delete from the_log where row_number() over ()=1;
delete from the_log limit 1;
with the_log_2 as (select *, row_number() over () as version from prueba_sin_clave)
delete from the_log_2 where version=1;
I supouse that the answer is **No**. I want to know if I am wrong or in what documentation I read that I am right.
Emilio Platzer
(465 rep)
Sep 21, 2018, 06:31 PM
• Last activity: Nov 22, 2024, 11:04 PM
9
votes
2
answers
14880
views
How do I remove duplicate records in a join table in PostgreSQL?
I have a table that has a schema like this: create_table "questions_tags", :id => false, :force => true do |t| t.integer "question_id" t.integer "tag_id" end add_index "questions_tags", ["question_id"], :name => "index_questions_tags_on_question_id" add_index "questions_tags", ["tag_id"], :name => "...
I have a table that has a schema like this:
create_table "questions_tags", :id => false, :force => true do |t|
t.integer "question_id"
t.integer "tag_id"
end
add_index "questions_tags", ["question_id"], :name => "index_questions_tags_on_question_id"
add_index "questions_tags", ["tag_id"], :name => "index_questions_tags_on_tag_id"
I would like to remove records that are duplicates, i.e. they have both the same
tag_id
and question_id
as another record.
What does the SQL look like for that?
marcamillion
(205 rep)
Mar 13, 2013, 12:49 AM
• Last activity: Oct 20, 2024, 11:42 PM
0
votes
1
answers
40
views
Breaking Down Transaction Table
My table transation consists of tranxid, custid, itemid, quantity, unit_price, payment mode, driverid. When I perform add operations and notice that customer 1 can buy at least one or two items more and so on. There is some duplication for the custid, itemid, driverid and payment mode. How can I res...
My table transation consists of tranxid, custid, itemid, quantity, unit_price, payment mode, driverid.
When I perform add operations and notice that customer 1 can buy at least one or two items more and so on. There is some duplication for the custid, itemid, driverid and payment mode. How can I resolve this duplications?
Can I use the one to many/many to one relationship to split the transaction table into two table?
Wei Xiang Ng
(1 rep)
Sep 6, 2024, 12:12 AM
• Last activity: Sep 6, 2024, 07:50 AM
5
votes
3
answers
353
views
Prevent inserting duplicate entries under READ COMMITTED Isolation
We have a SQL Server table called `dbo.Dim_Target`. Multiple threads try to `INSERT` new data into this table in parallel. Each parallel thread first checks for matching data and then only inserts new data (non-matching), using `LEFT OUTER JOIN`. Isolation Level is `READ COMMITTED`. ### Problem On r...
We have a SQL Server table called
dbo.Dim_Target
. Multiple threads try to INSERT
new data into this table in parallel. Each parallel thread first checks for matching data and then only inserts new data (non-matching), using LEFT OUTER JOIN
.
Isolation Level is READ COMMITTED
.
### Problem
On rare occasions, duplicate data is inserted into this table by two different threads a couple of milliseconds apart. According to this post , this issue is expected RC isolation level.
### Things I tried
- Different isolation levels (RC, Repeatable Read, etc) all could not prevent this problem.
- Serializable isolation prevents duplicates, but also causes a lot of deadlock issues.
- Unique index is not possible on this large size column
### Question
Is there an approach to prevent duplicates from multiple threads?
### Sample code to recreate the scenario
#### 1. Preparation
Create target table & two STG tables and load dummy data
DROP TABLE IF EXISTS dbo.Dim_Target
GO
CREATE TABLE dbo.Dim_Target
( ID INT NOT NULL IDENTITY(1,1)
, Code VARCHAR(3000) NOT NULL
, Created_Date DATETIME NOT NULL DEFAULT(GETDATE()))
GO
INSERT INTO dbo.Dim_Target(Code) SELECT NEWID() -- some dummy data
GO 10000
--
-- STG table 1
--
DROP TABLE IF EXISTS dbo.STG_Test1
GO
CREATE TABLE dbo.STG_Test1
( ID INT NOT NULL IDENTITY(1,1)
, Code VARCHAR(3000)
)
GO
INSERT INTO dbo.STG_Test1 (Code) SELECT NEWID() -- some dummy data
GO 10000
INSERT INTO dbo.STG_Test1 (Code) SELECT 'abc1' -- this is the duplicate entry that matches STG2 table
GO
--
-- STG table 2
--
DROP TABLE IF EXISTS dbo.STG_Test2
GO
CREATE TABLE dbo.STG_Test2
( ID INT NOT NULL IDENTITY(1,1)
, Code VARCHAR(3000)
)
GO
INSERT INTO dbo.STG_Test2 (Code) SELECT NEWID() -- some dummy data
GO 10000
INSERT INTO dbo.STG_Test2 (Code) SELECT 'abc1' -- this is the duplicate entry that matches STG1 table
GO
--
-- Sample query to check for duplicates
--
SELECT Code, COUNT(*)
FROM dbo.Dim_Target
GROUP BY Code
HAVING COUNT(*) > 1
#### 2.Thread 1: Try to insert from first parallel thread
Run this below script from one SSMS window. Make sure the WAITFOR
time is set the same in both windows.
WAITFOR TIME '16:04:00'
BEGIN TRANSACTION
INSERT INTO dbo.Dim_Target (Code)
SELECT T.Code
FROM dbo.STG_Test1 AS T
LEFT OUTER JOIN dbo.Dim_Target AS D
ON D.Code = T.Code
WHERE D.Code IS NULL
COMMIT
#### 3. Thread 2: Try to insert from second parallel thread
Run this below script from second SSMS window. Make sure the WAITFOR
time is set the same in both windows.
WAITFOR TIME '16:04:00'
BEGIN TRANSACTION
INSERT INTO dbo.Dim_Target (Code)
SELECT T.Code
FROM dbo.STG_Test2 AS T
LEFT OUTER JOIN dbo.Dim_Target AS D
ON D.Code = T.Code
WHERE D.Code IS NULL
COMMIT
#### NOTES
- When I run this script, the duplicates do **not** occur every time. They happen once every two or 3 attempts, but they DO happen. The Created_Date
column shows that duplicate entries were created just a couple of milliseconds apart.
- We have 10 threads running in parallel loading all the Dimension tables, so the duplicates are occurring more frequently.
---
With a temporary solution working, I want to take time to re-think to make this scalable. All parts of the ETL are processed with multiple threads, except this one step -- which is single threaded (for now). So any ideas would be welcome.
ToC
(727 rep)
Aug 14, 2024, 05:24 PM
• Last activity: Aug 17, 2024, 02:09 PM
0
votes
1
answers
117
views
UNIQUE CONSTRAINT on two columns, but duplicates appeared: how to track down and fix?
We have table for users: ```PostgreSQL CREATE UNIQUE INDEX uk_users_login_socnet ON public.users USING btree (login, socnet) ``` We had maintenance on our server, during which it was backed up and replicated to different server using `pg_basebackup` (comleted by our sysadmins, reportedly without err...
We have table for users:
CREATE UNIQUE INDEX uk_users_login_socnet ON public.users USING btree (login, socnet)
We had maintenance on our server, during which it was backed up and replicated to different server using pg_basebackup
(comleted by our sysadmins, reportedly without errors and issues).
Since then we have errors in our logs like following:
ERROR: duplicate key value violates unique constraint "uk_users_login_socnet"
DETAIL: Key (login, socnet)=(user@example.com, 1) already exists. (UPDATE users SET extra = COALESCE(extra, '') || hstore('udid', '') || hstore('udid_last_update', '1721646485') WHERE id = 1234567;)
In the meantime following query returns no results:
SELECT u1.login, u1.socnet
FROM users u1
INNER JOIN users u2 ON u1.login = u2.login AND u1.socnet = u2.socnet
while it's EXPLAIN ANALYZE
is:
Hash Join (cost=2817129.95..6386837.44 rows=9847200 width=18) (actual time=13654.755..31681.796 rows=15584155 loops=1)
Hash Cond: (((u1.login)::text = (u2.login)::text) AND (u1.socnet = u2.socnet))
-> Seq Scan on users u1 (cost=0.00..2490077.18 rows=15678918 width=18) (actual time=0.026..8806.271 rows=15582973 loops=1)
-> Hash (cost=2490077.18..2490077.18 rows=15678918 width=18) (actual time=13653.093..13653.093 rows=15582973 loops=1)
Buckets: 524288 Batches: 32 Memory Usage: 28510kB
-> Seq Scan on users u2 (cost=0.00..2490077.18 rows=15678918 width=18) (actual time=0.186..10887.626 rows=15582973 loops=1)
Planning time: 1.749 ms
Execution time: 32166.739 ms
But this query:
SELECT t1.login, t1.socnet, t2.login, t2.socnet
FROM users AS t1
LEFT JOIN (SELECT id, login, socnet FROM users) AS t2
ON t2.login = t1.login AND t1.socnet = t2.socnet
WHERE t1.id != t2.id
shows a list of duplicated rows with same login
and socnet
, which to me makes no sense. Here's respective EXPLAIN ANALYZE
:
Hash Join (cost=2817129.95..6411455.44 rows=9847199 width=36) (actual time=17015.349..33466.957 rows=1182 loops=1)
Hash Cond: (((t1.login)::text = (users.login)::text) AND (t1.socnet = users.socnet))
Join Filter: (t1.id users.id)
Rows Removed by Join Filter: 15583110
-> Seq Scan on users t1 (cost=0.00..2490077.18 rows=15678918 width=22) (actual time=0.034..9902.685 rows=15583110 loops=1)
-> Hash (cost=2490077.18..2490077.18 rows=15678918 width=22) (actual time=14344.722..14344.722 rows=15583110 loops=1)
Buckets: 524288 Batches: 32 Memory Usage: 30951kB
-> Seq Scan on users (cost=0.00..2490077.18 rows=15678918 width=22) (actual time=0.024..11382.363 rows=15583110 loops=1)
Planning time: 1.764 ms
Execution time: 33467.260 ms
The PostgreSQL server version is 9.6
My questions are:
- What is happening? Why do I have duplicate data within UNIQUE CONSTRAINT?
- Why I have different results for INNER JOIN
and LEFT JOIN
?
- What might have happened to DB structure and how to debug it? Where to look for hints on what's broken?
- How to fix this to consistent state?
Eduard Sukharev
(129 rep)
Jul 22, 2024, 11:49 AM
• Last activity: Jul 22, 2024, 01:28 PM
-2
votes
1
answers
1543
views
How to remove violation of duplicate primary key in the table in sql server?
I am encountering a customer error as they are unable to print due to violation of primary key as being duplicate, below its where i went and check for the db column in the table and saw this as the possible reason. What is the solution to prevent this? [![enter image description here][1]][1] [1]: h...
I am encountering a customer error as they are unable to print due to violation of primary key as being duplicate, below its where i went and check for the db column in the table and saw this as the possible reason. What is the solution to prevent this?

thrue2023
(3 rep)
Apr 9, 2024, 11:41 AM
• Last activity: Apr 9, 2024, 02:10 PM
1
votes
1
answers
194
views
Query to find when a specific current value changed
I have 3 tables of data (all contain more columns, but these are the ones I need from each): t1: table1 | workID1 | h1Amt | h1Dt | status | |:-------|:-----|:----|:-------| | 101 |73.00 |12/1/2023| A | | 124 |0.00 |10/30/2023| D | | 251 |0.00 |3/1/2023| A | | 255 |24.00 |11/30/2023| A | | 256 |0.00...
I have 3 tables of data (all contain more columns, but these are the ones I need from each):
t1: table1
| workID1 | h1Amt | h1Dt | status |
|:-------|:-----|:----|:-------|
| 101 |73.00 |12/1/2023| A |
| 124 |0.00 |10/30/2023| D |
| 251 |0.00 |3/1/2023| A |
| 255 |24.00 |11/30/2023| A |
| 256 |0.00 |12/1/2023| A |
t2: table2
| workID2 | h2Amt | h2Dt | type1 |
|:-------|:------|:----|:-----|
| 101 | 0073 |202312| 2 |
| 101 | 0000 |202308| 13 |
| 101 | 0000 |202307| 1 |
| 124 | 0000 |202310| 3 |
| 124 | 0000 |202306| 2 |
| 124 | 0000 |202304| 13|
| 124 | 0079 |202301| 3 |
| 251 | 0000 |202303| 2 |
| 251 | 1154 |202301| 3 |
| 251 | 0000 |202212| 3 |
| 255 | 0024 |202311| 2 |
| 255 | 0000 |202212| 3 |
| 255 | 0000 |202210| 13 |
| 255 | 0175 |202209| 3 |
| 256 | 0000 |202312| 1 |
| 256 | 0000 |202309| 13|
| 256 | 0000 |202307| 3 |
| 256 | 0583 |202305| 2 |
t3: table3
| workID3 | h3Date | type2 | h3Dt |
|:-------|:----|:-----|:-------|
| 101 |12/1/2023| 2 | 202312 |
| 101 |8/13/2023| 13 | 202308 |
| 101 |7/1/2023 | 1 | 202307 |
| 124 |10/1/2023| 3 | 202310 |
| 124 |6/1/2023| 2 | 202306 |
| 124 |4/11/2023| 13| 202304 |
| 124 |1/1/2023| 3 | 202301 |
| 251 |3/1/2023| 2 | 202303 |
| 251 |1/1/2023| 3 | 202301 |
| 251 |12/1/2022| 3 | 202212 |
| 255 |11/1/2023| 2 | 202311 |
| 255 |12/1/20022| 3 | 202212 |
| 255 |10/7/2022| 13 | 202210 |
| 255 |9/1/2022| 3 | 202209 |
| 256 |12/1/2023| 1 | 202312 |
| 256 |9/5/2023| 13| 202309 |
| 256 |7/1/2023| 3 | 202307 |
| 256 |5/1/2023| 2 | 202305 |
In t1, there is only one row for each workID, which is the current transaction for that workID. As data gets added, all 3 tables gets updated, with t1 only showing the current transaction.
In t2 and t3, there are multiple rows because it shows historical transactions for each workID.
The h1Amt and h2Amt columns show the same cost, but formatted differently. Additionally, the h1Dt, h2Dt, and h3Dt all reflect the same date, just formatted differently.
I need to find a specific current value and when that value last changed. In my case, I am searching for an amount of 0.00 or 0000 and when it was last changed to this value. Also, the status (t1) has to be "A" and type (t2 or t3) has to be 1, 2, or 3.
Here is the output I'm looking for:
| workID | amount | date | type |
|:-------|:--------|:-----|:-----|
| 251 |0.00 or 0000|202303 or 3/1/2023| 2 |
| 256 |0.00 or 0000|202307 or 7/1/2023| 3 |
This is what I came up with so far, but it is returning amounts that are not currently 0000 (0.00) or not showing workID's that have a current 0000 (0.00).
select c.workID3
,b.currAmt
,b.curStart
,c.type2
from table3 c
inner join table1 t on t.workID1 = c.workID3
inner join
(
select b.workID2
,max(
case
when h1Amt = 0.00 and b.h2Dt = d.h3dt then h1Amt
end) as curAmt
,min(
case
when h1Amt = 0.00 and b.h2Dt = d.h3Dt then d.h3Date
end) as curStart
from table2 b
inner join table3 d on d.workID3 = b.workID2
inner join table1 on workID1 = b.workID2
where not exists
(
select 1
from table2 a
where a.workID2 = b.workID2
and a.h2Amt b.h2Amt
and a.h2Dt > b.h2Dt
)
and t.h1Amt = 0.00
and t.status = 'A'
and d.type2 in ('1','2','3')
group by b.workID2
) b
on b.workID2 = c.workID3
and b.curStart = c.h3Dt
order by c.workID3
I haven't used table3, mostly because table2 contains h2Amt that is needed. I don't think it would be smart for me to use h1Amt from table1 since it only shows what is current. At the very least I need to use table1 because of the status column. If I did use table3, table1 would be a requirement since it's the only table with a status column, but I think I would run into the issue of finding the historical amount since it's not recorded in either table1 or table3. Also, I've run into issues of getting duplicate rows in other queries that I've tried, to include using all 3 tables.
The above is the closest query I've seen after doing a lot of searching for similar situations.
Hopefully I've given enough detail to find a solution. Thanks in advance!
Edit: in table3, there are 2 date columns, h3Dt and h3Date. Table3 has been updated. I also updated the query that I currently have. It returns most correctly, but still has some that returns the wrong date. Also, as I was looking at the data, table2 type1 has some data that is incorrect vs table 3 type 2, so I decided to use table 3 as well in the above query as it is more accurate. The only issue left now is how I can get the correct data for all rows.
Here is some sample data:
create table table1 (
workID char(3),
h1Amt char(10),
h1Dt varchar(255),
status char(1) );
insert into table1 (
workID, h1Amt, h1Dt, status
values ('101','73.00','12/1/2023','A'),
('124','0.00','10/30/2023','D'),
('251','0.00','3/1/2023','A'),
('255','24.00','11/30/2023','A'),
('256','0.00','12/1/2023','A');
create table table2 (
workID2 char(3),
h2Amt char(4),
h2Dt char(4),
type1 char(2) );
insert into table2 (
workID2, h2Amt, h2Dt, type1
values ('101','0073','202312','2'),
('101','0000','202308','13'),
('101','0000','202307','1'),
('124','0000','202310','3'),
('124','0000','202306','2'),
('124','0000','202304','13'),
('124','0079','202301','3'),
('251','0000','202303','2'),
('251','1154','202301','2'),
('251','0000','202212','3'),
('255','0024','202311','2'),
('255','0000','202212','3'),
('255','0000','202210','13'),
('255','0175','202209','3'),
('256','0000','202312','1'),
('256','0000','202309','13'),
('256','0000','202307','3'),
('256','0583','202305','2');
create table table3 (
workID3 char(3),
h3Date varchar(10),
type2 char(2),
h3Dt char(4) );
insert into table3 (
workID3, h3Date, type2, h3Dt
values ('101','12/1/2023','2','202312'),
('101','8/13/2023','13','202308'),
('101','7/1/2023','1','202307'),
('124','10/1/2023','3','202310'),
('124','6/1/2023','2','202306'),
('124','4/11/2023','13','202304'),
('124','1/1/2023','3','202301'),
('251','3/1/2023','2','202303'),
('251','1/1/2023','3','202301'),
('251','12/1/2022','3','202212'),
('255','11/1/2023','2','202311'),
('255','12/1/2022','3','202212'),
('255','10/7/2022','13','202210'),
('255','9/1/2022','3','202209'),
('256','12/1/2023','1','202312'),
('256','9/5/2023','13','202309'),
('256','7/1/2023','3','202307'),
('256','5/1/2023','2','202305');
tkmagnet
(75 rep)
Jan 25, 2024, 10:18 PM
• Last activity: Mar 19, 2024, 06:57 PM
2
votes
1
answers
1053
views
DISTINCT gives duplicate values
Can it be considered a bug? This query gives duplicate values despite DISTINCT: select distinct '1' from ( select * from dual connect by level <= 10 ) order by dbms_random.value fetch first 10 rows only I understand it creates additional column with random values and uses distinct with two columns,...
Can it be considered a bug? This query gives duplicate values despite DISTINCT:
select distinct '1'
from
(
select *
from dual
connect by level <= 10
)
order by dbms_random.value
fetch first 10 rows only
I understand it creates additional column with random values and uses distinct with two columns, but still I expect to get unique values.
Andy DB Analyst
(110 rep)
Jun 2, 2023, 02:47 PM
• Last activity: Mar 13, 2024, 03:01 PM
Showing page 1 of 20 total questions