Query to delete records with lower eff_date in a large table with 400 million records
5
votes
1
answer
892
views
I have a table with below structure:
create table TEST_TAB
(
activity_type CHAR(1),
tracking_code NUMBER,
eff_date DATE
)
Sample data for this table:
insert into TEST_TAB (activity_type, tracking_code, eff_date)
values ('A', 1, to_date('01-11-2020', 'dd-mm-yyyy'));
insert into TEST_TAB (activity_type, tracking_code, eff_date)
values ('A', 1, to_date('02-01-2024', 'dd-mm-yyyy'));
insert into TEST_TAB (activity_type, tracking_code, eff_date)
values ('B', 2, to_date('01-08-2023', 'dd-mm-yyyy'));
insert into TEST_TAB (activity_type, tracking_code, eff_date)
values ('B', 2, to_date('02-08-2023', 'dd-mm-yyyy'));
insert into TEST_TAB (activity_type, tracking_code, eff_date)
values ('B', 2, to_date('03-08-2023', 'dd-mm-yyyy'));
This is just a sample data and the amount of real data in the original table is nearly 400 million records. What I need to do is that for each group of
activity_type, tracking_code
, I need to keep the record which has the highest "eff_date" and delete the rest. So for the activity_type=A and tracking_code = 1
I need to keep the one with eff_date = 1/2/2024
and delete the other one.
What I have for now is below query:
delete from test_tab
where rowid in (select rid
from (select rowid as rid,
row_number() over(partition by activity_type, tracking_code order by eff_date desc) as row_num
from test_tab
)
where row_num > 1
)
However this seems very slow. Could you suggest any better solution?
The original table is partitioned on eff_date
and it has index on the rest two columns.
Another point is that there might be more than one year gap between eff_dates of each record in a single group.
Thanks in advance
Asked by Pantea
(1510 rep)
Nov 11, 2024, 01:10 PM
Last activity: Nov 16, 2024, 09:37 AM
Last activity: Nov 16, 2024, 09:37 AM