Sample Header Ad - 728x90

pt-archiver lost rows when cancelled in the middle and reinitiated

0 votes
0 answers
52 views
Lost some 430 rows from the original table. I was archiving 1 year old data (1st Apr 2022 to 31st Mar 2023) from one of my servers, (Server version: 5.7.44-log MySQL Community Server (GPL)) using pt-archiver (percona-toolkit-3.0.13-1.el7.x86_64). Because the server choked, because of less resources and the Linux machine was not responding and swapping a lot, I had to cancel pt-archiver in the middle after a lot of attempts to take a shell session. I waited for good 15 minutes or so and then had to kill the mysqld process, shutdown the EC2 machine, increase the hardware and then reinitiate the pt-archiver tool. The mysql server had innodb_flush_log_at_trx_commit=1 sync_binlog = 1 I was thinking that above config would save my data even if I cancel pt-archiver in the middle, but it doesn't seem so. SO before I initiated the archival, I had taken a count of rows in the said range and after archival, I had checked the count and it seems 430 rows are missing. Following are the command I used mysql --login-path=archiver --skip-column-names --silent SOURCE_DB -e "select count(*) from SOURCE_DB.SOURCE_TABLE where crt_tm between date_format('2022-04-01 00:00:00','%Y-%m-%d 00:00:00') and date_format('2023-03-31 23:59:59','%Y-%m-%d 00:00:00')" &>> pt-archiver-LOGFILE.log Here it showed 21713301 rows pt-archiver --source h=SOURCE_HOST,P=PORT,u=USER,p=PASS,D=SOURCE_DB,t=SOURCE_TABLE,i=idx_crt_tm --dest h=DEST_HOST,P=PORT,u=USER,p=PASS,D=ARCHIVE_DB,t=ARCHIVE_TABLE --where "crt_tm in (select crt_tm from (select crt_tm from SOURCE_TABLE where crt_tm between date_format('2022-04-01 00:00:00','%Y-%m-%d 00:00:00') and date_format('2023-03-31 23:59:59','%Y-%m-%d 00:00:00')) AS temp) " --charset=UTF8 --sleep 5 --progress 10000 --limit=5000 --txn-size 5000 --replace --bulk-insert --bulk-delete --statistics --purge &>> ${LOGFILE} **When I checked the SOURCE_TABLE, I have zero rows in that date range. and the final count of rows in the ARCHIVE_TABLE is 21712871.** Is it possible to lose data in such way? Is there any way to get the data back? **And yes once the row is inserted, in the database it is never updated or deleted, meaning that the data being archived does never get and updates or deletes on it. Did I miss any important option in the pt-archiver command?** Attached is an excerpt of the log file. 2024-10-22+20:22:04: Table SOURCE_TABLE ... Archival from 01-Apr-2022_00-00-00 till 31-Mar-2023_23-59-59 started. Archiving table from SOURCE_DB.SOURCE_TABLE ...... to 172.21.90.94 ARCHIVE_DB.ARCHIVE_TABLE. 2024-10-22 20:22:04 ------------------------------------------------------------------------------------------------ Follwoing are the total number of rows being archived. 21713301 ------------------------------------------------------------------------------------------------ First Run ------------------------------------------------------------------------------------------------ at /bin/pt-archiver line 4907. Exiting on SIGINT. # A software update is available: TIME ELAPSED COUNT 2024-10-22T20:22:14 0 0 2024-10-22T20:23:04 50 10000 2024-10-22T20:24:48 154 20000 ... ... ... ... 2024-10-23T00:21:10 14335 1360000 2024-10-23T00:21:10 14335 1360000 Started at 2024-10-22T20:22:14, ended at 2024-10-23T00:23:33 Source: A=UTF8,D=SOURCE_DB,P=3306,h=SOURCE_HOST,i=idx_crt_tm,p=...,t=SOURCE_TABLE,u=pt_archiver Dest: A=UTF8,D=SOURCE_DB_archive,P=3306,h=DEST_HOST,i=idx_crt_tm,p=...,t=ARCHIVE_TABLE,u=pt_archiver SELECT 1365000 INSERT 1360000 DELETE 1360000 Action Count Time Pct bulk_deleting 272 12986.3633 89.69 sleep 272 1360.7520 9.40 bulk_inserting 272 56.9958 0.39 select 273 36.9763 0.26 commit 546 5.5169 0.04 print_bulkfile 1360000 -3.0093 -0.02 other 0 35.8642 0.25 2024-10-23+00:23:34: Table SOURCE_DB.SOURCE_TABLE ... Archival from 01-Apr-2022_00-00-00 till 31-Mar-2023_23-59-59 End. ------------------------------------------------------------------------------------------------ Second Run ------------------------------------------------------------------------------------------------ 2024-10-23+00:43:28: Table SOURCE_TABLE ... Archival from 01-Apr-2022_00-00-00 till 31-Mar-2023_23-59-59 started. Archiving table from SOURCE_DB.SOURCE_TABLE ...... to 172.21.90.94 ARCHIVE_DB.ARCHIVE_TABLE. 2024-10-23 00:43:28 ------------------------------------------------------------------------------------------------ Follwoing are the total number of rows being archived. 20353301 ------------------------------------------------------------------------------------------------ TIME ELAPSED COUNT 2024-10-23T00:43:40 0 0 2024-10-23T00:45:00 80 10000 2024-10-23T00:46:50 190 20000 2024-10-23T00:48:22 282 30000 ... ... ... ... 2024-10-24T05:43:38 104397 20340000 2024-10-24T05:43:49 104408 20350000 2024-10-24T05:43:54 104414 20353301 Started at 2024-10-23T00:43:40, ended at 2024-10-24T05:43:59 Source: A=UTF8,D=SOURCE_DB,P=3306,h=SOURCE_HOST,i=idx_crt_tm,p=...,t=SOURCE_TABLE,u=pt_archiver Dest: A=UTF8,D=ARCHIVE_DB,P=3306,h=DEST_HOST,i=idx_crt_tm,p=...,t=ARCHIVE_TABLE,u=pt_archiver SELECT 20353301 INSERT 20353301 DELETE 20353301 Action Count Time Pct bulk_deleting 4071 83296.8183 79.77 sleep 4071 20355.7030 19.49 select 4072 157.4209 0.15 bulk_inserting 4071 143.8715 0.14 commit 8142 46.2508 0.04 print_bulkfile 20353301 -17.9404 -0.02 other 0 437.3239 0.42 2024-10-24+05:43:59: Table SOURCE_DB.SOURCE_TABLE ... Archival from 01-Apr-2022_00-00-00 till 31-Mar-2023_23-59-59 End. ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------
Asked by Avinash Pawar (216 rep)
Oct 30, 2024, 03:03 PM
Last activity: Oct 31, 2024, 01:35 AM