I have an INNODB table that's > 93 million rows. A lot of the data is considered "temp" data and is governed by an "is_active" flag of 1/0. When a user updates a form, new data is written with an "is_active=1" and the previous active records are updated to "is_active=0".
We wanted to move the data to a new table to clean up and ran statement like...
INSERT INTO tblNew (a, b, c)
SELECT a, b, c FROM tblOld WHERE is_active=1
This ran overnight and when I looked in the morning I noticed there were a bunch of processes backed up in the SHOW PROCESS LIST so I did a KILL on the process on the ID, which started the ROLLBACK and brought the server down for another 10 hours... production box of course.
I've been reading a lot on how you can try to repair, etc. and have been doing that all day, but I'm wondering if there's any kind of option I could have added to avoid the need for rollback on failure? Or is there a strategy commit or flush every X number of rows, etc.
I was trying this...
INSERT INTO tblNew (a, b, c)
SELECT a, b, c FROM tblOld WHERE is_active=1 AND pkID > 0 AND pkID < 1000000
Where the pkID was the primary key. I would run it in groups of 550k - 1M and up the number range for PK each run. There's an index on the PK and on is_active, yet I noticed speeds increased each run from 30 seconds to over 5 minutes a run by time it was in the 20M range. Any idea why this would take longer each run when it's the same number of rows for the work?
So 2 in summary, questions...
1) Can I do something to keep a huge rollback from happening if I stop the process?
2) Why did inserting the same number of items based on PK and indexed column take progressively longer per run?
Asked by Don
(103 rep)
May 15, 2015, 10:18 PM
Last activity: Apr 27, 2024, 06:14 PM
Last activity: Apr 27, 2024, 06:14 PM