Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes
2 answers
143 views
How to leave only the rows with an timestamp difference with an interval greater than a parameter
I have a Postgres table with timestamps as entries. [![enter image description here][1]][1] From this table, I would like to calculate a new one where there are no consecutive entries with a timestamp difference shorter than 400 milliseconds. So in the case of the image, from the first 10 rows I wou...
I have a Postgres table with timestamps as entries. enter image description here From this table, I would like to calculate a new one where there are no consecutive entries with a timestamp difference shorter than 400 milliseconds. So in the case of the image, from the first 10 rows I would only leave [1,5,9] I tried with joins, but I realised I would need the updated table before calculating the ON clause of posterior rows, because I would need to know which rows have already been deleted. Edit: I tried the following join to at least have an idea of the tokens I would like to delete:
select distinct on (s.token)s.token as token1, s.timestamp as tm1, s2.token as token2, s2.timestamp as tm2
	from temporal.samples s
	join temporal.samples s2
	on s2.timestamp>s.timestamp + interval '400000 microseconds')
Giving this result: enter image description here Here, I see that the next token after the first one which follows the condition, is the 5th. So I would like to delete 2,3,4. Then, the next which is 400 ms delayed from the 5th is the 9th, so I would like to delete 6,7,8. Thanks in advance
eddie_jung (11 rep)
Jun 10, 2023, 10:56 AM • Last activity: Jul 20, 2025, 08:04 PM
0 votes
1 answers
153 views
PostgreSQL: Best practises for Deleting huge volume of Data in Replicated Environment
We have a prod database (PostgreSQL13) with a `hot-standby`(`physical replication`) as well as `logical replication` to a different `subscriber` server. Now we are in requirement of purging few year's record from some tables each having nearly 3-4M records for a month. All of these tables have `time...
We have a prod database (PostgreSQL13) with a hot-standby(physical replication) as well as logical replication to a different subscriber server. Now we are in requirement of purging few year's record from some tables each having nearly 3-4M records for a month. All of these tables have timestamp field as well as PK and quite a few indexes. Considering the WAL rate and possible replication lag what are the best solution to perform deletes in such an environment with minimal impact to client queries?
goodfella (595 rep)
Jul 25, 2024, 09:41 AM • Last activity: Jul 15, 2025, 01:06 AM
0 votes
2 answers
187 views
Best way to delete large set of rows knowing field to NOT delete
Coming from a Rails background, I have a large production database that I have a copy of where I only need records from 4 of the 2000+ companies. I'm trying to delete all the rows except the ones belonging to the 4, and I know the way I have it is not optimal. DELETE FROM appointments WHERE (appoint...
Coming from a Rails background, I have a large production database that I have a copy of where I only need records from 4 of the 2000+ companies. I'm trying to delete all the rows except the ones belonging to the 4, and I know the way I have it is not optimal. DELETE FROM appointments WHERE (appointments.company_id NOT IN (6, 753, 785, 1611)) another example is when I have to delete records on a table where the company_id is on an associated table: DELETE FROM mappings WHERE mappings.id IN (SELECT mappings.id FROM code_mappings INNER JOIN codes ON codes.remote_id = mappings.code_remote_id WHERE (codes.company_id NOT IN (6, 753, 785, 1611)))
Tonys (131 rep)
Oct 1, 2016, 03:39 AM • Last activity: Jun 30, 2025, 02:05 PM
3 votes
3 answers
125 views
Is it bad practice to just attempt to delete rows that may be referenced by other tables, and skip them if they fail?
As a scheduled cleanup action, I'm looking to delete orphaned rows in a certain table that can be referenced to by various other tables. Now I could go with a difficult-to-implement reference counting system which makes tons of other cascaded deletes much harder, or I could check ALL the possible ta...
As a scheduled cleanup action, I'm looking to delete orphaned rows in a certain table that can be referenced to by various other tables. Now I could go with a difficult-to-implement reference counting system which makes tons of other cascaded deletes much harder, or I could check ALL the possible tables for references pointing at the data. But I can also just simply occasionally try to delete the rows and skip the ones where this fails because of a foreign key constraint violation. Is there anything wrong with this practice? I'm not too worried about performance, if that becomes an issue I can just add a date column for when it was last attempted. Ideally I'd avoid defining functions for it, but some pseudo-code (I know nothing of postgreSQL functions so bear with me for any syntax errors) I'd imagine could quite easily do it: ``` CREATE OR REPLACE FUNCTION try_delete_or_update(p_id INT) RETURNS VOID AS $$ BEGIN DELETE FROM public.file WHERE last_delete_attempt < 1 month ago EXCEPTION WHEN foreign_key_violation THEN UPDATE public.file SET last_delete_attempt = now() WHERE last_delete_attempt < 1 month ago; END; $$ LANGUAGE plpgsql;
Sebastiaan van den Broek (206 rep)
Jun 26, 2025, 05:40 AM • Last activity: Jun 30, 2025, 12:24 PM
3 votes
1 answers
626 views
Deadlock on DELETE - DELETE for parallel execution of same stored procedure
I have a stored procedure which ran parallely by two processes. Code for sp : DECLARE @inputXML XML DECLARE @tempTable TABLE (afid INT, ccid INT) INSERT INTO @tempTable SELECT ref.value('@afid', 'INT'),ref.value('@ccid', 'INT') FROM @inputXML.nodes('/Ilist/aidA/afid') R ( ref ) DELETE FROM StylinInf...
I have a stored procedure which ran parallely by two processes. Code for sp : DECLARE @inputXML XML DECLARE @tempTable TABLE (afid INT, ccid INT) INSERT INTO @tempTable SELECT ref.value('@afid', 'INT'),ref.value('@ccid', 'INT') FROM @inputXML.nodes('/Ilist/aidA/afid') R ( ref ) DELETE FROM StylinInfoTemp FROM @tempTable AS temp WHERE temp.afid= StylinInfoTemp.afid AND temp.ccid= StylinInfoTemp.ccid this is the schema for StylinInfoTemp Columns : cisid int afid int ccid int entity int dtime int guid uniqueidentifier addFlags int Index is present on cisid, afid, ccid here is the schema for tempTable (afid INT, ccid INT) This tempTable is populated on the input received by the stored proc. tempTable in itself can never have duplicate entries. However StylinInfoTemp table may have duplicate entries on afId and ccid. But is unique on cisid, afid, ccid eg Check duplicate values on afid below enter image description here However the stored procedure tempTable will always have unique afid. When this stored proc executes I am getting a deadlock on the same page, I am having trouble understanding how exactly is this leading to a deadlock and why do we have same page Id below enter image description here Can someone help me understanding why deadlock, why same page id, how can two process hold and request for the same page id at the same time(image above) and how can I resolve this DeadLock XML : https://codebeautify.org/xmlviewer/y228f34a3 Query Plan: DELETE FROM StylinInfoTemp FROM @tempTable AS temp WHERE temp.afid = StylinInfoTemp.afid AND temp.ccid = StylinInfoTemp.ccid |--Clustered Index Delete(OBJECT:([Magneto].[dbo].[StylinInfoTemp].[StylinInfoTemp_cis_afid_ccid_PK])) |--Nested Loops(Left Semi Join, WHERE:(@tempTable.[afid] as [temp].[afid]=[Magneto].[dbo].[StylinInfoTemp].[afId] AND @temp.[ccid] as [temp].[ccid]=[Magneto].[dbo].[StylinInfoTemp].[ccid])) |--Clustered Index Scan(OBJECT:([Magneto].[dbo].[StylinInfoTemp].[StylinInfoTemp_cis_afid_ccidd_PK])) |--Table Scan(OBJECT:(@tempTable AS [temp]))
Himanshuman (197 rep)
Jul 14, 2022, 07:20 AM • Last activity: May 4, 2025, 10:03 PM
2 votes
2 answers
65 views
MariaDB deadlock with transactional outbox pattern
I have a deadlock in 10.6.21-MariaDB-ubu2004. Here's my schema: ``` CREATE TABLE INT_CHANNEL_MESSAGE ( MESSAGE_ID CHAR(36) NOT NULL, GROUP_KEY CHAR(36) NOT NULL, CREATED_DATE BIGINT NOT NULL, MESSAGE_PRIORITY BIGINT, MESSAGE_SEQUENCE BIGINT NOT NULL AUTO_INCREMENT UNIQUE, MESSAGE_BYTES BLOB, REGION...
I have a deadlock in 10.6.21-MariaDB-ubu2004. Here's my schema:
CREATE TABLE INT_CHANNEL_MESSAGE
(
    MESSAGE_ID       CHAR(36)     NOT NULL,
    GROUP_KEY        CHAR(36)     NOT NULL,
    CREATED_DATE     BIGINT       NOT NULL,
    MESSAGE_PRIORITY BIGINT,
    MESSAGE_SEQUENCE BIGINT       NOT NULL AUTO_INCREMENT UNIQUE,
    MESSAGE_BYTES    BLOB,
    REGION           VARCHAR(100) NOT NULL,
    PRIMARY KEY (REGION, GROUP_KEY, CREATED_DATE, MESSAGE_SEQUENCE)
) ENGINE = InnoDB;

CREATE INDEX INT_CHANNEL_MSG_DELETE_IDX ON INT_CHANNEL_MESSAGE (REGION, GROUP_KEY, MESSAGE_ID);
Initially, I have a single row in the table. Let's assume the values for each row are unique, only REGION and GROUP_KEY are fixed. Transaction #1 inserts two rows using separate INSERT statements:
INSERT into INT_CHANNEL_MESSAGE(
	MESSAGE_ID,
	GROUP_KEY,
	REGION,
	CREATED_DATE,
	MESSAGE_PRIORITY,
	MESSAGE_BYTES)
values (?, ?, ?, ?, ?, ?)
First insert is executed, then transaction hangs. The isolation level for Transaction #1 is REPEATABLE_READ (although tried changing to READ_COMMITED). Transaction #2 starts immediately after first insert (of Transaction #1) is executed (it is triggered by the application). The isolation level is set to READ_COMMITED. The initial row is selected for update, then the transaction hangs on DELETE call:
SELECT INT_CHANNEL_MESSAGE.MESSAGE_ID, INT_CHANNEL_MESSAGE.MESSAGE_BYTES
	from INT_CHANNEL_MESSAGE
	where INT_CHANNEL_MESSAGE.GROUP_KEY = ? and INT_CHANNEL_MESSAGE.REGION = ?
order by CREATED_DATE, MESSAGE_SEQUENCE LIMIT 1 FOR UPDATE SKIP LOCKED

DELETE from INT_CHANNEL_MESSAGE where MESSAGE_ID=? and GROUP_KEY=? and REGION=?
SHOW ENGINE INNODB STATUS output:
| InnoDB |      | 
=====================================
2025-04-04 16:06:44 0x7fc6241b3700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 21 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 2376 srv_idle
srv_master_thread log flush and writes: 2376
----------
SEMAPHORES
----------
------------
TRANSACTIONS
------------
Trx id counter 1646
Purge done for trx's n:o < 1646 undo n:o < 0 state: running
History list length 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 1643, ACTIVE 31 sec
2 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 1
MariaDB thread id 76, OS thread handle 140489014748928, query id 8535 172.21.0.1 nbs 
---TRANSACTION 1640, ACTIVE 31 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1
MariaDB thread id 75, OS thread handle 140488995890944, query id 8537 172.21.0.1 nbs Updating
SET STATEMENT SQL_SELECT_LIMIT=1 FOR DELETE from INT_CHANNEL_MESSAGE where MESSAGE_ID='ce0ce618-2430-0b4c-727b-7250e5388f15' and GROUP_KEY='cb18446f-633c-3a46-b5ac-95ab539126d1' and REGION='DEFAULT'
------- TRX HAS BEEN WAITING 31293172 us FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 55 page no 3 n bits 320 index PRIMARY of table nbs_biometric.INT_CHANNEL_MESSAGE trx id 1640 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 9; compact format; info bits 0
 0: len 7; hex 44454641554c54; asc DEFAULT;;
 1: len 30; hex 63623138343436662d363333632d336134362d623561632d393561623533; asc cb18446f-633c-3a46-b5ac-95ab53; (total 36 bytes);
 2: len 8; hex 80000196018d82b2; asc         ;;
 3: len 8; hex 8000000000000012; asc         ;;
 4: len 6; hex 00000000066b; asc      k;;
 5: len 7; hex bf000001410110; asc     A  ;;
 6: len 30; hex 63623533326436662d343362352d393164352d636561612d623965616434; asc cb532d6f-43b5-91d5-ceaa-b9ead4; (total 36 bytes);
 7: SQL NULL;
 8: len 30; hex aced0005737200346f72672e737072696e676672616d65776f726b2e6d65; asc     sr 4org.springframework.me; (total 1252 bytes);

------------------
---TRANSACTION (0x7fc6388d7180), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync) log: 0; buffer pool: 0
166 OS file reads, 332 OS file writes, 594 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 891776
Log flushed up to   891776
Pages flushed up to 42676
Last checkpoint at  42664
0 pending log flushes, 0 pending chkp writes
334 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 167772160
Dictionary memory allocated 931712
Buffer pool size   8112
Free buffers       7349
Database pages     763
Old database pages 261
Modified db pages  621
Percent of dirty pages(LRU & free pages): 7.654
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 152, created 611, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 763, unzip_LRU len: 0
I/O sum:cur, unzip sum:cur
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 63, updated 0, deleted 18, read 1182
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
 |
The deadlock does not happen when Transaction #1 inserts single row. How can I prevent it? I'm attaching application logs for wider context:
2025-04-04 18:06:13,426 [tx1]: TX START
2025-04-04 18:06:13,426 [tx1]: Executing prepared SQL update
2025-04-04 18:06:13,426 [tx1]: Executing prepared SQL statement [INSERT into INT_CHANNEL_MESSAGE(
	MESSAGE_ID,
	GROUP_KEY,
	REGION,
	CREATED_DATE,
	MESSAGE_PRIORITY,
	MESSAGE_BYTES)
values (?, ?, ?, ?, ?, ?)
]
2025-04-04 18:06:13,427 [tx2]: Executing prepared SQL query
2025-04-04 18:06:13,427 [tx2]: Executing prepared SQL statement [	SELECT INT_CHANNEL_MESSAGE.MESSAGE_ID, INT_CHANNEL_MESSAGE.MESSAGE_BYTES
	from INT_CHANNEL_MESSAGE
	where INT_CHANNEL_MESSAGE.GROUP_KEY = ? and INT_CHANNEL_MESSAGE.REGION = ?
order by CREATED_DATE, MESSAGE_SEQUENCE LIMIT 1 FOR UPDATE SKIP LOCKED]
2025-04-04 18:06:13,429 [tx2]: Executing prepared SQL update
2025-04-04 18:06:13,429 [tx2]: Executing prepared SQL statement [DELETE from INT_CHANNEL_MESSAGE where MESSAGE_ID=? and GROUP_KEY=? and REGION=?]
2025-04-04 18:07:03,430  [tx2]: Error: 1205-HY000: Lock wait timeout exceeded; try restarting transaction
2025-04-04 18:07:03,432 [tx2]: Extracted SQL state class 'HY' from value 'HY000'
2025-04-04 18:07:03,434 [tx1]: Executing prepared SQL update
2025-04-04 18:07:03,434 [tx1]: Executing prepared SQL statement [INSERT into INT_CHANNEL_MESSAGE(
	MESSAGE_ID,
	GROUP_KEY,
	REGION,
	CREATED_DATE,
	MESSAGE_PRIORITY,
	MESSAGE_BYTES)
values (?, ?, ?, ?, ?, ?)
]
2025-04-04 18:07:03,435 [tx2]: Resetting isolation level of JDBC Connection [HikariProxyConnection@1449683964 wrapping org.mariadb.jdbc.Connection@5b1420f9] to 4
2025-04-04 18:07:03,436  [tx1]: TX END
Tomek Stankowski (41 rep)
Apr 4, 2025, 05:07 PM • Last activity: Apr 8, 2025, 02:57 PM
1 votes
0 answers
94 views
SQL Server: "cannot bulk load" error during delete
I am having a very *interesting* error using SQL Server 2022 and 2017. I am currently implementing my jurisdiction's data retention policies, which means doing a lot of deletes. I have run into a strange situation where a single, specific delete statement is causing an error that is leaving me stuck...
I am having a very *interesting* error using SQL Server 2022 and 2017. I am currently implementing my jurisdiction's data retention policies, which means doing a lot of deletes. I have run into a strange situation where a single, specific delete statement is causing an error that is leaving me stuck. Because of the complexity of the code, and the fact that I tracked the error down to a single delete statement, and the fact that the error still happens when I do the simplest possible (bulk) delete, I won't be posting much code. I do hope that the reader will suggest strategies for approaching my problem. Unfortunately, I don't have very much to work with. The error message doesn't tell me the relevant entities, so I am stuck guessing. And because there is an error, I don't get to see the execution plan for the delete that fails. ------------------- In short, I have a table called dbo.Person. It has SYSTEM_VERSIONING set to ON, and the corresponding history table is called dbo.PersonHistory. dbo.PersonHistory incidentally, has an index on it. dbo.Person also has a couple of indexes on it. In the course of investigating my real issue, I simplified my buggy delete statement down to: delete from dbo.Person ; and I get the error: > Cannot bulk load. The bulk data stream was incorrectly specified as sorted or the data violates a uniqueness constraint imposed by the target table. Sort order incorrect for the following two rows: primary key of first row: (2025-03-18 23:20:18.5852411, 2021-07-02 18:25:01.9834390, redacted ID, 0), primary key of second row: (2025-03-18 23:20:18.5852411, 2021-07-02 03:31:22.7446737, redacted ID, 1) It is not clear to me what the "redacted ID" is supposed to correspond to, since the error does not tell me what the target table is. But I examined all the indexes and similar on both the main and history tables and *guess* that these values "correspond" to values stored in the index on the history table, which would actually be a PersonId, the PK on the dbo.Person table. *If* this is true, my guess is that the delete is having to make changes to the history table's index and that change is breaking a constraint. But if I search for a dbo.Person row with that redacted ID, I find a single row. There are no rows in the history table for that person. Does anybody have any suggestions to investigate? The error is so uninformative, and the lack of an actual execution plan makes it hard to debug the issue. Thank you! --------------------- It appears there are two indexes on the history table: CREATE NONCLUSTERED INDEX [IX_PersonHistory_ID_PERIOD_COLUMNS_I] ON [dbo].[PersonHistory]([SysEndTime] ASC, [SysStartTime] ASC, [PersonId] ASC) WITH (DATA_COMPRESSION = PAGE); GO CREATE CLUSTERED COLUMNSTORE INDEX [CSIX_PersonHistory] ON [dbo].[PersonHistory]; --------------------------------------- I dropped both indexes on the history table (in my dev database) and the error occurred again.
nomen (113 rep)
Mar 19, 2025, 04:43 PM • Last activity: Mar 19, 2025, 09:33 PM
-1 votes
1 answers
74 views
SQL Server does not reduce the size of MDF and LDF files even though the file shrink process has been carried out
I am a SQL Server 2017 user, I deleted a table that had too many records, but after the delete process, the size of the MDF and LDF files did not decrease, but increased. I have shrunk the file but the file size remains the same. What should I do to change the file size? I want the mdf file size to...
I am a SQL Server 2017 user, I deleted a table that had too many records, but after the delete process, the size of the MDF and LDF files did not decrease, but increased. I have shrunk the file but the file size remains the same. What should I do to change the file size? I want the mdf file size to return to normal according to the existing record
Rafardhan Athalla (1 rep)
Mar 14, 2025, 09:55 AM • Last activity: Mar 14, 2025, 08:57 PM
0 votes
1 answers
78 views
SQL Server: How to implement 3-way relationship with cascades
This is the schema I'm working with: [![db design][1]][1] Here's the DBML to recreate in [https://dbdiagram.io/d][2]: Table University { ID integer [primary key] } Table Professor { ID integer [primary key] UniversityID integer [primary key, ref: > University.ID] } Table Class { ID integer [primary...
This is the schema I'm working with: db design Here's the DBML to recreate in https://dbdiagram.io/d : Table University { ID integer [primary key] } Table Professor { ID integer [primary key] UniversityID integer [primary key, ref: > University.ID] } Table Class { ID integer [primary key] UniversityID integer [primary key, ref: > University.ID] ProfessorID integer [null] } Ref: Class.(ProfessorID, UniversityID) - Professor.(ID, UniversityID) My goal: - A University can have multiple Classes and Professors. - A Class can have one or no Professors. - Deleting a University deletes all of its Professors and Classes - **Deleting a Professor sets Class.ProfessorID to NULL** The last goal is causing us problems. SQL Server prevents adding another cascading foreign key constraint due to multiple cascade paths, and a trigger can't work because: 1) the delete query is rejected due to the foreign key constraint on Professor, so a FOR DELETE trigger will never run 2) SQL cannot create an INSTEAD OF DELETE because of the cascading relationship to the University How can I achieve this?
Carl Schmidt (3 rep)
Jan 23, 2025, 12:48 AM • Last activity: Jan 23, 2025, 10:42 AM
10 votes
3 answers
25011 views
Delete rows older than x days without locking table
we have a quite big MS SQL database with millions of rows. I created a simple script to delete rows older than 1 month, but this seems to lock the table and creates trouble for the application. The table has an indexed "ID" PK, and also a "date" column which I will use for this task. What is the bes...
we have a quite big MS SQL database with millions of rows. I created a simple script to delete rows older than 1 month, but this seems to lock the table and creates trouble for the application. The table has an indexed "ID" PK, and also a "date" column which I will use for this task. What is the best way to do this without causing the lock? I was considering partitioning, but not sure if its the best way to go. Thanks in advance.
DataViking (101 rep)
Feb 4, 2021, 08:48 AM • Last activity: Jan 16, 2025, 08:42 PM
2 votes
2 answers
1518 views
How to delete foreign key with lots of references
Table B have several thousand records that references table A with a foreign key. The application sends a request to delete the row from table A. It's important that the synchronous operation would be immediate and won't be at risk for timeouts. If a cascade delete is used, and thus several thousand...
Table B have several thousand records that references table A with a foreign key. The application sends a request to delete the row from table A. It's important that the synchronous operation would be immediate and won't be at risk for timeouts. If a cascade delete is used, and thus several thousands of records will be deleted as well, could it cause the deletion to take a long time? (not immediate). Assuming that it could lead to a long deletion operation, what could be done as an alternative? I've though perhaps to delete the record as deleted and to perform the actual deletion in some background process (where the referencing rows would be deleted before the parent), but this approach feels rather error prone (since all of the existing queries will need to reference this new markedAsDeleted column. Any ideas?
CuriousGeorge (21 rep)
Jun 30, 2019, 03:06 PM • Last activity: Jan 5, 2025, 11:04 PM
3 votes
1 answers
489 views
How do nonclustered columnstore indexes in SQL Server handle linked updates and deletes with rowstore tables(either heap or clustered index tables)?
I have read several articles and blog posts describing that columnstore indexes store all fields redundantly, including the row IDs from the underlying rowstore table (either RID for heaps or clustered index keys for clustered tables). Some articles mention that within a columnstore, each data row g...
I have read several articles and blog posts describing that columnstore indexes store all fields redundantly, including the row IDs from the underlying rowstore table (either RID for heaps or clustered index keys for clustered tables). Some articles mention that within a columnstore, each data row gets its own unique identifier internal to the columnstore index, and a delete bitmap or delete buffer uses this identifier (rowgroup ID and row number within the rowgroup) to mark specific rows as deleted. However, these sources do not provide detailed explanations of how the columnstore index correlates an updated or deleted row in the rowstore table to the corresponding row in the columnstore index. If the process truly relies on the rowstore-based identifiers (like RID or clustered key), it seems impractical given that the columnstore index stores its own structure of data. References Reviewed: * https://aboutsqlserver.com/2019/10/10/sql-server-2016-2017-critical-columnstore-index-bug-has-been-fixed/ * https://aboutsqlserver.com/2017/11/28/splitting-and-merging-non-empty-partitions-in-columnstore-indexes/ * https://sqlserverfast.com/epr/table-update/ * https://aboutsqlserver.com/2013/10/15/sql-server-storage-engine-data-pages-and-data-rows/ * https://www.mssqltips.com/sqlservertip/7759/sql-server-columnstore-indexes-tuple-mover-background-process/ * https://www.sqlshack.com/sql-server-2017-columnstore-in-place-updates/ * http://www.nikoport.com/columnstore/ Could someone explain in detail the actual mechanism SQL Server uses to tie a row in the columnstore index to the corresponding row in the rowstore table for the purpose of correct and consistent updates and deletes, and how these operations are performed behind the scenes? I’ve tried using the DBCC PAGE command to inspect the data pages, but the only concrete information I’ve found is that the delete bitmap stores a unique 9-byte identifier for each row. I still don’t understand how these identifiers map back to the corresponding row in the original rowstore table or how the internal linking mechanism works for updates and deletes. Could someone provide a detailed explanation of this process?
Steven (31 rep)
Dec 10, 2024, 02:25 PM • Last activity: Dec 12, 2024, 11:44 AM
0 votes
1 answers
37 views
How to handle the deletion of records with "shared ownership"?
Consider the following scenario: In a trading card game, a user may offer to trade their cards. The offer can be public (any user can accept, fk_receiver_id null on insert) or private (only one user can accept, fk_receiver_id set on insert). So we have the following junction table [![TradeOffer][1]]...
Consider the following scenario: In a trading card game, a user may offer to trade their cards. The offer can be public (any user can accept, fk_receiver_id null on insert) or private (only one user can accept, fk_receiver_id set on insert). So we have the following junction table TradeOffer I want to allow users to delete their accounts. I would be happy to cascade delete most things, but the trades have two users associated so I cant just delete everything. I could make fk_offerer_id nullable and use ON DELETE SET NULL, but I dont want to allow null values on insert. Should I make fk_offerer_id nullable anyway? Is there a better solution? The behavior I'm looking for when deleting is: - If fk_receiver_id is set, set fk_offerer_id to NULL or other value that signals "deleted user" - If fk_receiver_id is NULL (trade is pending/cancelled), delete the record Is there a way to handle this in the db? (Ideally without triggers)
Fullaccess (1 rep)
Nov 27, 2024, 10:33 PM • Last activity: Nov 28, 2024, 01:39 PM
0 votes
2 answers
6216 views
How to delete records without a primary key in a stardard way (i.e. not using ctid)
# Case of use We have a log table without a PK with two columns (date, event); You inserted a row. Then you want to undo the inserted record. Is there a way to delete that row (without deleting other rows with the same data), that doesn't use postgres only capabilities? I want a solution that works...
# Case of use We have a log table without a PK with two columns (date, event); You inserted a row. Then you want to undo the inserted record. Is there a way to delete that row (without deleting other rows with the same data), that doesn't use postgres only capabilities? I want a solution that works in another database (it doesn't need to be totally standard, it can be only in just one database: SqLite, Oracle, MySql or SQLServer). ## Example: create table the_log( date date, event_id integer ); insert into the_log(date, event_id) values ('2019-09-21',1),('2019-09-21',1); select * from the_log; My atempts: delete from the_log where row_number() over ()=1; delete from the_log limit 1; with the_log_2 as (select *, row_number() over () as version from prueba_sin_clave) delete from the_log_2 where version=1; I supouse that the answer is **No**. I want to know if I am wrong or in what documentation I read that I am right.
Emilio Platzer (465 rep)
Sep 21, 2018, 06:31 PM • Last activity: Nov 22, 2024, 11:04 PM
5 votes
1 answers
892 views
Query to delete records with lower eff_date in a large table with 400 million records
I have a table with below structure: create table TEST_TAB ( activity_type CHAR(1), tracking_code NUMBER, eff_date DATE ) Sample data for this table: insert into TEST_TAB (activity_type, tracking_code, eff_date) values ('A', 1, to_date('01-11-2020', 'dd-mm-yyyy')); insert into TEST_TAB (activity_typ...
I have a table with below structure: create table TEST_TAB ( activity_type CHAR(1), tracking_code NUMBER, eff_date DATE ) Sample data for this table: insert into TEST_TAB (activity_type, tracking_code, eff_date) values ('A', 1, to_date('01-11-2020', 'dd-mm-yyyy')); insert into TEST_TAB (activity_type, tracking_code, eff_date) values ('A', 1, to_date('02-01-2024', 'dd-mm-yyyy')); insert into TEST_TAB (activity_type, tracking_code, eff_date) values ('B', 2, to_date('01-08-2023', 'dd-mm-yyyy')); insert into TEST_TAB (activity_type, tracking_code, eff_date) values ('B', 2, to_date('02-08-2023', 'dd-mm-yyyy')); insert into TEST_TAB (activity_type, tracking_code, eff_date) values ('B', 2, to_date('03-08-2023', 'dd-mm-yyyy')); This is just a sample data and the amount of real data in the original table is nearly 400 million records. What I need to do is that for each group of activity_type, tracking_code, I need to keep the record which has the highest "eff_date" and delete the rest. So for the activity_type=A and tracking_code = 1 I need to keep the one with eff_date = 1/2/2024 and delete the other one. What I have for now is below query: delete from test_tab where rowid in (select rid from (select rowid as rid, row_number() over(partition by activity_type, tracking_code order by eff_date desc) as row_num from test_tab ) where row_num > 1 ) However this seems very slow. Could you suggest any better solution? The original table is partitioned on eff_date and it has index on the rest two columns. Another point is that there might be more than one year gap between eff_dates of each record in a single group. Thanks in advance
Pantea (1510 rep)
Nov 11, 2024, 01:10 PM • Last activity: Nov 16, 2024, 09:37 AM
0 votes
0 answers
34 views
How to split out this multi table transaction CRUD operations?
I currently have a process that is doing the following: START TRANSACTION DELETE FROM Foo WHERE id=123; #id is PK DELETE FROM Bar WHERE foo_id=123; #multirow delete INSERT INTO FooBar the row we deleted application level logic END TRANSACTION I want to find a way to not dot it all at one blocking tr...
I currently have a process that is doing the following: START TRANSACTION DELETE FROM Foo WHERE id=123; #id is PK DELETE FROM Bar WHERE foo_id=123; #multirow delete INSERT INTO FooBar the row we deleted application level logic END TRANSACTION I want to find a way to not dot it all at one blocking transaction. I was thinking to add cols to indicate if a row has been deleted and run another process for cleanup but I was wondering if there is another way that perhaps MySQL supports
Jim (123 rep)
Oct 21, 2024, 06:46 PM
16 votes
2 answers
8426 views
Is the system column "ctid" legitimate for identifying rows to delete?
I have a table with hundreds of millions of rows that I need to delete data from. The existing indexes are the most efficient. I can however use the existing indexes to find the rows to delete by using the `ctid` values: DELETE FROM calendar_event WHERE ctid IN (SELECT ctid FROM calendar_event WHERE...
I have a table with hundreds of millions of rows that I need to delete data from. The existing indexes are the most efficient. I can however use the existing indexes to find the rows to delete by using the ctid values: DELETE FROM calendar_event WHERE ctid IN (SELECT ctid FROM calendar_event WHERE user_id = 5 LIMIT 100 FOR UPDATE) What are the risks of relying on the ctid in this case? My worst case scenario is deleting the wrong the row.
ruelloehr (261 rep)
Aug 8, 2018, 05:54 PM • Last activity: Oct 20, 2024, 11:31 PM
37 votes
2 answers
42945 views
DELETE rows which are not referenced in other table
I have two tables in a PostgreSQL 9.3 database: Table `link_reply` has a foreign key named `which_group` pointing to table `link_group`. I want to delete all rows from `link_group` where no related row in `link_reply` exists. Sounds basic enough but I've been struggling with it. Will it be something...
I have two tables in a PostgreSQL 9.3 database: Table link_reply has a foreign key named which_group pointing to table link_group. I want to delete all rows from link_group where no related row in link_reply exists. Sounds basic enough but I've been struggling with it. Will it be something simple like this (not working)? DELETE FROM link_group WHERE link_reply = NULL;
Hassan Baig (2079 rep)
Apr 3, 2016, 02:15 PM • Last activity: Sep 25, 2024, 01:48 AM
47 votes
5 answers
63552 views
I need to run VACUUM FULL with no available disk space
I have one table that is taking up close to 90% of hd space on our server. I have decided to drop a few columns to free up space. But I need to return the space to the OS. The problem, though, is that I'm not sure what will happen if I run VACUUM FULL and there is not enough free space to make a cop...
I have one table that is taking up close to 90% of hd space on our server. I have decided to drop a few columns to free up space. But I need to return the space to the OS. The problem, though, is that I'm not sure what will happen if I run VACUUM FULL and there is not enough free space to make a copy of the table. I understand that VACUUM FULL should not be used but I figured it was the best option in this scenario. Any ideas would be appreciated. I'm using PostgreSQL 9.0.6
Justin Rhyne (573 rep)
Apr 25, 2012, 08:31 PM • Last activity: Aug 9, 2024, 10:09 AM
0 votes
1 answers
106 views
PostgreSQL: Approach for aggregating records and upsert vs delete and insert
We have a file processing service that extracts some device configuration information from daily files and stores it into PostgreSQL tables. We have two layers of tables, - The loading tables that contain records for every single day - The compiled tables that store aggregated records based on a has...
We have a file processing service that extracts some device configuration information from daily files and stores it into PostgreSQL tables. We have two layers of tables, - The loading tables that contain records for every single day - The compiled tables that store aggregated records based on a hash and date_from to show a timeline of configuration changes. The hash includes all the fields except the primary key. Following is a sample of the tables: CREATE TABLE public.logger_main_config_loading ( id text NOT NULL, hash text NOT NULL, logger_oem_id text NOT NULL, logger_model_id text, logger_serial_number text NOT NULL, logger_id text, logger_name text, date_from timestamp without time zone NOT NULL, date_to timestamp without time zone, encryption_pin_or_key text, enclosure_lock_details text, data_transfer_details text, offset_from_utc_hrs numeric, sampling_rate_sec integer, averaging_period_minutes integer, clock_is_auto_synced boolean, notes text, update_at timestamp with time zone DEFAULT CURRENT_TIMESTAMP, updated_by uuid, measurement_location_uuid uuid NOT NULL, logger_firmware_version text ); This table has the following Primary Key: ALTER TABLE ONLY public.logger_main_config_loading ADD CONSTRAINT logger_main_config_loading_pkey PRIMARY KEY (measurement_location_uuid, id, hash, date_from); The table has the following indexes: CREATE INDEX idx_logger_main_config_loading ON public.logger_main_config_column_name_loading USING btree (measurement_location_uuid, logger_main_config_loading_id, logger_main_config_loading_hash, logger_main_config_loading_date_from); CREATE INDEX idx_main_config_loading_measurement_location_uuid ON public.logger_main_config_loading USING btree (measurement_location_uuid); The logger_main_config_compiled table has exactly the same structure, but just stores records aggregated by id, hash and date_from. Here is a sample query that we use to get aggregated records from the loading table and insert them in the compiled table: SELECT log_main_load_1.*, log_main_load_2.date_to AS compiled_date_to FROM logger_main_config_loading AS log_main_load_1 INNER JOIN ( SELECT id, hash, measurement_location_uuid, min(date_from) AS date_from, max(date_to) AS date_to FROM logger_main_config_loading WHERE measurement_location_uuid = %(measurement_location_uuid)s GROUP BY id, hash, measurement_location_uuid ) AS log_main_load_2 ON log_main_load_1.id = log_main_load_2.id AND log_main_load_1.hash = log_main_load_2.hash AND log_main_load_1.measurement_location_uuid = log_main_load_2.measurement_location_uuid AND log_main_load_1.date_from = log_main_load_2.date_from WHERE log_main_load_1.measurement_location_uuid = %(measurement_location_uuid)s; We are planning to do away with these queries and do the aggregation purely through Python code and insert the records in the compiled table as some scenarios do not get handled. For example, if there are records in the loading table already from a particular date_from, and a file before that date is processed (new date_from before the min(date_from) in the table at that point), the records in the compiled table result in a new record as the date_from is different (different primary key), however, we need the same record's date_from to be updated if the hash has not changed. Hence, have the following questions: - What is the cleanest approach to achieve this? Is it a recommended practice to delete the compiled table records each time and insert them after aggregating them through code for each measurement_location_uuid? I do understand that indexes get rebuilt each time records are deleted, however this table may not even a million of records. - Based on the table structure, are any other performance considerations or indexes that need to be added/removed? - We expect idempotency in this implementation, even if a file is added again, there should be no effect on the compiled table. In this case delete and rewrite may be unnecessary. I would really appreciate some suggestions so that this can be implemented in a less complex and efficient manner. Thank you in advance.
ShwetaJ (153 rep)
Jul 30, 2024, 02:07 PM • Last activity: Aug 6, 2024, 08:49 PM
Showing page 1 of 20 total questions