Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

139 views

SQL Server - SQL Agent Alerts - Alert 824 warning about old problems

I have alerts enabled on a SQL Server 2017 where corruption was encountered and repaired over a year ago. I have also confirmed that there is no new corruption on any database on this server. When restarting the SQL Server it sends out an error 824 alert message that states the following: > DATE/TIM...

                                  I have alerts enabled on a SQL Server 2017 where corruption was encountered and repaired over a year ago. I have also confirmed that there is no new corruption on any database on this server.

When restarting the SQL Server it sends out an error 824 alert message that states the following:

> DATE/TIME:  1/17/2019 7:54:22 PM
> 
> DESCRIPTION: SQL Server detected a logical consistency-based I/O
> error: incorrect checksum (expected: 0xe660dd36; actual: 0xcd481907).
> It occurred during a read of page (1:386939) in database ID 7 at
> offset 0x000000bcef6000 in file 'C:\DATA\database.mdf'.  Additional
> messages in the SQL Server error log or operating system error log may
> provide more detail. This is a severe error condition that threatens
> database integrity and must be corrected immediately. Complete a full
> database consistency check (DBCC CHECKDB). This error can be caused by
> many factors; for more information, see SQL Server Books Online.

The issue here is that the date/time is from 11 months ago. What I am looking for is details on what the alert for error 824 is checking or what needs to be purged from 11 months ago to prevent this false positive alert from being sent out.

* I have checked msdb.dbo.suspect_pages - there are no entries in that table.
* All SQL server logs have rotate out in that time period so it is not something in the log.
* I have ran checkDB a couple of times to confirm no new corruption.

Any suggestions?

                                

Steve Stedman (228 rep)

Jan 15, 2020, 09:14 PM • Last activity: Jul 21, 2025, 04:06 AM

0 votes

1 answers

1522 views

End of file reached. Error 38 in SQL. Passes verification checks

sql-server backup restore c# corruption

Having issues restoring a database in SSMS, error 38. The databases are transferred up to google via their Drive API. Download it on my side and has error. If I transfer it through teamviewer from client PC I still get the same error. Only happening on two clients the rest are fine. I have tried backing up with FORMAT options and WITH CHECKSUM. Both succeed and backup is taking when verifying. I have ran restore fileheaders only and get proper data except the physicalname path has a lowercase 'c' for the drive on the problem databases. I have read this could be a corrupt database but I don't understand how the checksum and verify sql functions pass if so. Any insight would really help. This is the backup command used in C# "BACKUP DATABASE " + database + " TO DISK = '" + database + "_" + dateTime.ToString("yyyy_MM_dd_HH") + "_CloudBackup.bak' WITH NAME = 'database', INIT, CHECKSUM" EDIT: Running

check

on the database produced no errors. I have tried updated the physical name of the .mdf and .ldf it does not work still. Taking a backup from SSMS produces a valid backup file. It has something to do with the services I wrote. EDIT 2: I am restoring through SSMS GUI. I have also tried

RESTORE DATABASE db_name FROM 'backup_path'

Commands

RESTORE VERIFYONLY FROM DISK = 'backup_path'

On both computers, mine and the server PC yields "The backup set on file 1 is valid."

RESTORE FILELISTONLY FROM DISK = 'backup_path'

Have only test on my PC. Returns paths to mdf and ldf, no errors. Both PCs have sql server 2012 (SP1)

Austen Swanson (1 rep)

Jul 23, 2019, 08:49 PM • Last activity: Jul 20, 2025, 08:02 AM

-2 votes

0 answers

63 views

Can Transact-SQL snapshot backups be used to recover from database corruption?

sql-server backup corruption sql-server-2022 snapshot

The most typical corruption recovery scenario involves using a [full database backup](https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/create-a-full-database-backup-sql-server?view=sql-server-ver17) from before corruption struck and every [transaction log backup](https://lea...

                                  The most typical corruption recovery scenario involves using a [full database backup](https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/create-a-full-database-backup-sql-server?view=sql-server-ver17)  from before corruption struck and every [transaction log backup](https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/back-up-a-transaction-log-sql-server?view=sql-server-ver17)  between then and now.

I have recently discovered [Transact-SQL snapshot backups](https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/create-a-transact-sql-snapshot-backup?view=sql-server-ver17) . Like full backups, you can replay transaction logs on top of them. As this Microsoft diagram shows 

Does this suggest that Transact-SQL snapshot backups can be used to recover from database corruption, just like full database backups can?

J. Mini (1225 rep)

Jul 11, 2025, 08:13 PM

0 votes

2 answers

224 views

Oracle: Corrupt datafile will not turn back online?

oracle restore rman corruption datafile

need immediate assistance on an issue... Oracle Support is taking way too long to respond back on. We have a corrupt datafile in one of our tablespaces, I attempted to shut the datafile offline, and then perform a restore and recover from the standby. However, the recover returned an error saying th...

RMAN> recover datafile 88;
...
...
...
archived log thread=1 sequence=94742
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 09/08/2021 22:05:54
RMAN-06055: could not find archived log with sequence 94742 for thread 1

RMAN> alter database datafile 88 online;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of sql statement command at 09/08/2021 22:07:49
ORA-01113: file 88 needs media recovery
ORA-01110: data file 88: '/ora03/oracle/oradata/REPPRD01/AFM_DATA05.DBF'

Any tips? I am desperately trying anything. **EDIT:** Yes, I tried

datafile 88 until sequence 94742

**EDIT2:** The data guard configuration:

DGMGRL> show configuration

Configuration - REPPRD01_DR

  Protection Mode: MaxPerformance
  Members:
  REPPRD01 - Primary database
    Error: ORA-16724: cannot resolve gap for one or more standby databases

    rep01    - Physical standby database
      Error: ORA-16766: Redo Apply is stopped

Fast-Start Failover: DISABLED

Configuration Status:
ERROR   (status updated 21 seconds ago)

The standby configuration:

RMAN> show all;

RMAN configuration parameters for database with db_unique_name REP01 are:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 7 DAYS;
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
CONFIGURE DEVICE TYPE DISK PARALLELISM 1 BACKUP TYPE TO BACKUPSET; # default
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE COMPRESSION ALGORITHM 'BASIC' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE ; # default
CONFIGURE RMAN OUTPUT TO KEEP FOR 7 DAYS; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/oracle/product/12.1.0.2/dbs/snapcf_REP01.f'; # default

Phillip (11 rep)

Sep 9, 2021, 12:59 PM • Last activity: Jun 6, 2025, 04:06 PM

2 votes

1 answers

766 views

DBCC Check DB fails due to operating system error 665

sql-server-2017 windows-server ola-hallengren-scripts corruption dbcc-checkdb

We have a SQL Server 2017 Standard version where recently the CHECKDB is failing with the operating system error 665. The fragmentation level of the disk drive is 0% as it is automatically defragmented weekly. The database is 24/7 OLTP database. Only way the checkdb completes is when we turn off the...

                                  We have a SQL Server 2017 Standard version where recently the CHECKDB is failing with the operating system error 665.

The fragmentation level of the disk drive is 0% as it is automatically defragmented weekly.
The database is 24/7 OLTP database. Only way the checkdb completes is when we turn off the application. Now online solution suggests 

1)	Increase the size of the disk drive,  I have 800gb free space on the drive which has a database with 550 gb used space
2)	Defragment the data drive- Currently automatically defragment weekly
3)	Break the database into smaller files- Currently not possible due to downtime
4)	Consider placing the database files on ReFS volume which does not have the same ATTRIBUTE_LIST_ENTRY limits that NTFS presents. You must reformat the current NTFS volume using ReFS. – Again not possible due to downtime

Currently the checkdb checks with the option Physical only and using Ola Hallengreen script

My short term proposal is to do the checkdb into another server. But I am running out of ideas if there are other ways to cure the problem without any downtime. 

                                

SQL_NoExpert (1117 rep)

Apr 22, 2021, 03:02 PM • Last activity: May 12, 2025, 02:02 AM

7 votes

2 answers

642 views

Microsoft SQL Server | Investigate Root Cause of DBCC CHECKDB error | (SQLSTATE 42000)(Error 8992 | Catalog Msg 3853)

sql-server corruption sql-server-2022 dbcc-checkdb

--- - Details: - I am running Microsoft SQL Server 2022 > Microsoft SQL Server 2022 (RTM-GDR) (KB5046861) - 16.0.1135.2 (X64) Oct 18 2024 15:31:58 Copyright (C) 2022 Microsoft Corporation Standard Edition (64-bit) on Windows Server 2022 Standard 10.0 (Build 20348: ) - I am running DBCC CHECKDB on a user database (of which raised this error), each saturday - This has happened on one on-prem Windows Server with installation of SQL server, with a predefined set of configs - Then after a month, I set up an additional on-prem Windows Server with same installation of SQL server, with the same predefined set of configs. The DBCC CHECKDB runs fine the first two weeks, then fails. - The error output after DBCC CHECKDB run on the user database:

[SQLSTATE 01000] (Message 50000)  Command: DBCC CHECKDB ([]) WITH ALL_ERRORMSGS   
    [SQLSTATE 01000] (Message 50000)  Check Catalog Msg 3853, State 1: Attribute (objid=317307915,indexid=1) of row (class=0,objid=317307915,indexid=1,rowsetnum=1) in sys.sysrowsetrefs does not have a matching row (object_id=317307915,index_id=1) in sys.indexes.  
    ...  
    
    CHECKDB found 0 allocation errors and 12 consistency errors not associated with any single object

And there is 11 more objects with the same error, I just didn't print them all here right now. - Both servers have several of these errors, with same message, but just different object_ids. - Both servers use the same type of hardware, server model, hardware provider, disk provider and disk model. - Both servers have restored to this userdatabase, from a database in an Azure SQL Managed Instance. - A small overview of errors returned by DBCC CHECKDB: | Date | Errors | Notes | | -----| -------| ------| | See *Notes | 12 errors | From 08/01/2025 to today(07/05/2025) | | 08/01/2025 | 12 errors | We tried REPAIR_ALLOW_DATA_LOSS here | | 04/01/2025 | 12 errors | | | 28/12/2024 | 12 errors | | | 21/12/2024 | 12 errors | | | 14/12/2024 | 12 errors | | | 07/12/2024 | 7 errors | | | 30/11/2024 | 7 errors | | | 23/11/2024 | 7 errors | | | 16/11/2024 | 7 errors | | | 09/11/2024 | 7 errors | | | 02/11/2024 | 7 errors | | | 26/10/2024 | 3 errors | First appearance | | 19/10/2024 | No errors | | | 17/10/2024 | No errors | | | 16/10/2024 | No errors | | | 12/10/2024 | No errors | | | 05/10/2024 | No errors | | - Troubleshooting steps already taken: - Firstly worth noting - No clean backups available. - Second thing worth noting - We have not experienced this causing or impacting the performance or functionality or integrity of the "user-created" objects (tables, views, constraints, procedures, indexes, ) - DBCC CHECKDB on userdatabase have been runned on the Azure SQL Managed Instance as well, no DBCC CHECKDB Errors there - Performed several selects against: - sys.sysrowsetrefs, - sys.sysrowsets, - sys.partitions, - sys.objects, - sys.sql_modules, - sys.stats, - sys.indexes, - sys.allocation_units - sys.system_internals_partition_columns, To try and retrieve info, by Object_id (and/or rowsetid based on object_id from sys.sysrowsetrefs) None of the querries I ran, returned rows based on objid/rowsetid, except for sys.sysrowsetrefs ; the table that DBCC found does not have a matching row in sys.indexes for instance. It's like that this is a ghost object, just dangling reference to an object that is deleted or none-existent. And I can't retrieve any info about what this object was, why and how it is now gone, nor where on disk/page/partition it was stored. - I therefore tried running sp_clean_db_free_space manually in hope that it would eliminate the ghost records the DBCC CHECKDB was reporting - But nothing happened to the records. - Tried DBCC CHECKDB ([], REPAIR) No errors repaired, still getting the same errors - Tried DBCC CHECKDB ([], REPAIR_ALLOW_DATA_LOSS) No errors repaired, still getting the same errors. (After running this, I restored to full backup taken just before I ran this command, in order to rule out that REPAIR_ALLOW_DATA_LOSS could have messed this up even more.) - Tried running hardware providers disk check, and hardware check, to rule out if this could be I/O issue. No events registred, nor no errors or fails reported. --- I really feel like I've tried everything here, but I'm eager to find the root cause to this, since It's happened on two servers now, not just one. Also a solution to how I should fix this going forward now would be highly appriciated!

Lucas Bjørndal (93 rep)

May 7, 2025, 09:09 AM • Last activity: May 9, 2025, 03:03 PM

0 votes

1 answers

470 views

mysql table tables_priv index is corrupted and the best possible way to repair it

mysql index mariadb errors corruption

I am using MySQL with XAMPP. For sometime now I keep getting a server unavailable error and upon digging deeper I noticed that my `mysql.tables_priv` table index is corrupted (ARIA Database Type). I have tried running `aria_chk` and `myisamchk` to repair, but to no avail. I have also tried `mysql_up...

                                  I am using MySQL with XAMPP. 

For sometime now I keep getting a server unavailable error and upon digging deeper I noticed that my mysql.tables_priv table index is corrupted (ARIA Database Type). I have tried running aria_chk and myisamchk to repair, but to no avail. I have also tried mysql_upgrade but again this didn't help. 

The only thing I can think of now is to manually remove the tables_priv.MAD file from xampp\mysql\data\mysql folder, create an empty one and REPAIR TABLE tables_priv.
 
Can you kindly advise if that is the right and safe approach to take? 

Also, if you have any other suggestion then please do mention.

Girish Agarwal (1 rep)

Aug 20, 2023, 11:38 PM • Last activity: May 6, 2025, 11:02 AM

1 votes

1 answers

961 views

Mysql service stops with [ERROR] InnoDB: CORRUPT LOG RECORD FOUND

mysql mariadb innodb corruption mariadb-10.2

In my Plesk Obsidian version 18.0.42 running on CentOS Linux release 7.9.2009 with MySQL version 10.2.43-MariaDB. All of a sudden database stopped working. Mysql service never comes up on start request. On checking the logs I found the below information saying that "CORRUPT LOG RECORD FOUND". I have...

                                  In my Plesk Obsidian version 18.0.42 running on CentOS Linux release 7.9.2009 with MySQL version 10.2.43-MariaDB. All of a sudden database stopped working. Mysql service never comes up on start request. On checking the logs I found the below information saying that "CORRUPT LOG RECORD FOUND". I have tried starting the MySQL using innodb_force_recovery from 1 to 6 meanwhile, all other than 6 has failed. Using 6 I can start the service in recovery mode. I could see from logs that lots of tables of the major 4 databases including Plesk database were corrupted. The major two databases were moodle DB and had critical data of 7GB and 2GB respectively. 

    2022-04-13  8:37:27 140465444636864 [Note] InnoDB: Completed initialization of buffer pool
    2022-04-13  8:37:27 140464863237888 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
    2022-04-13  8:37:27 140465444636864 [Note] InnoDB: Highest supported file format is Barracuda.
    2022-04-13  8:37:27 140465444636864 [Note] InnoDB: Starting crash recovery from checkpoint LSN=158872609427
    2022-04-13  8:37:27 140465444636864 [ERROR] InnoDB: ############### CORRUPT LOG RECORD FOUND ##################
    2022-04-13  8:37:27 140465444636864 [Note] InnoDB: Log record type 65, page 158872609792:140721128394336. Log parsing proceeded successfully up to 158872609427. Previous log record type 128, is multi 0 Recv offset 0, prev 0
    2022-04-13  8:37:27 140465444636864 [Note] InnoDB: Hex dump starting 0 bytes before and ending 100 bytes after the corrupted record:
     len 100; hex 4152330034003500360029a8eb3e00380001800880068007800080018001800180018008ffffffffffffffffffffffff80018000800080008000800080008000ffffffffffffffff8000800080008000ffff8008800880088008800080008008ffff7fff; asc AR3 4 5 6 )  > 8                                                                                    ;
    2022-04-13  8:37:27 140465444636864 [Note] InnoDB: Set innodb_force_recovery to ignore this error.
    2022-04-13  8:37:27 140465444636864 [Warning] InnoDB: Log scan aborted at LSN 158872673280
    2022-04-13  8:37:27 140465444636864 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
    2022-04-13  8:37:27 140465444636864 [Note] InnoDB: Starting shutdown...
    2022-04-13  8:37:27 140465444636864 [ERROR] Plugin 'InnoDB' init function returned error.
    2022-04-13  8:37:27 140465444636864 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.


I have followed the plesk KDB doc https://support.plesk.com/hc/en-us/...s-for-the-MySQL-databases-on-Plesk-for-Linux- 
to resolve the case. My point is i have to proceed with risky step of removing mysql data directory. Even the plesk db was in corrupted list. But I have managed to recover it finally using the daily backups. Had to restore including psa db from backups. It was too stressful 4 hours to put everything back normal. This is the second time i am facing the same issue with this plesk server. Can some one let me know why this is happening? How can we prevent this ? Is there some method to monitor the same?
                                

nisamudeen97 (210 rep)

Apr 13, 2022, 06:24 PM • Last activity: May 2, 2025, 05:06 PM

0 votes

2 answers

348 views

MS SQL corruption repair manually investigate pages

sql-server corruption

I have a corrupt MS SQL db. We do have a good backup prior to the issue we had. All is now fixed. However, we are left with a corrupt database, with 2 weeks of good data in it. The issue is we can't merge data from a corrupt database into a good backup. - I cannot delete or denale a corrupted table (to re-create it) Any pointers on how I can fix this further, any out-of-the-box ideas on getting good data from a corrupt database into a good old backup? much repairs done using >

select name, user_access_desc from sys.databases where name='mydatabase'
DBCC CHECKDB ('mydatabase', REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS, NO_INFOMSGS

>1st run the txt output of repairs is some 52meg\ 2nd run this text field output is down to 17meg\ 3rd run is down to 317kb\ 4th run does not improve\ So I've now got 421 consistency errors that I'm looking to patch up manually >

Msg 8937, Sev 16, State 1, Line 1 : Table error: Object ID 268580045, index ID 1, partition ID 72057594062307328, alloc unit ID 72057594095075328 (type In-row data). B-tree page (1:21289893) has two parent nodes (1:21112771), slot 32 and (1:21283144), slot 458. [SQLSTATE 42000]

and

Msg 8980, Sev 16, State 1, Line 1 : Table error: Object ID 268580045, index ID 1, partition ID 72057594062307328, alloc unit ID 72057594095075328 (type In-row data). Index node page (1:21112771), slot 46 refers to child page (1:21213185) and previous child (1:21213184), but they were not encountered. [SQLSTATE 42000]

both repeated hundreds of times... I can see the affected table by using

SELECT * FROM sys.objects WHERE object_id = 268580045;

But there I'm getting stuck .

user125942 (9 rep)

Mar 13, 2023, 12:08 PM • Last activity: Apr 27, 2025, 05:03 AM

3 votes

1 answers

136 views

500 million is becoming a horrible negative number on replicas

mysql replication amazon-rds mysql-8.0 corruption

I need some assistance understanding what looks like a corrupted value in replicas. Here's the scenario: 1 primary database, 8 read replicas. Database is MySQL, deployed with Amazon RDS. There is a single cell of data we are aware of that has the wrong value, only on read replicas. On the primary it...

amount bigint NOT NULL

Here's some additional information: - SELECT VERSION(); returns 8.0.40 on all of these. - SHOW VARIABLES LIKE 'binlog_format'; shows MIXED on the primary, and ROW on replicas. - show replica status doesn't seem to show any issues. - show create table ... shows the column has the same type on both the primary and replicas. I ran select hex(amount) ... to get these values, in case they're helpful: - 1DCD6500 (correct primary value) - FFFFFFFC9A3E4D68 (incorrect replica value)

KernelDeimos (89 rep)

Apr 23, 2025, 09:22 PM • Last activity: Apr 26, 2025, 03:06 AM

0 votes

1 answers

272 views

SQL Server fix deleted mdf

sql-server restore corruption

I have a database which my indexes are in separate filegroups and space (SSD). The SSD which contains indexes is break down, my actual data which are on tables are secured but because of SSD break down, my database is on Recovery Pending mode. I want to get rid of those filegroups which were on SSD...

                                  I have a database which my indexes are in separate filegroups and space (SSD). The SSD which contains indexes is break down, my actual data which are on tables are secured but because of SSD break down, my database is on Recovery Pending mode.

I want to get rid of those filegroups which were on SSD and create a new filegroup on another space, then create my indexes on it, but I can't do anything because of Recovery Pending mode.

I have done these solution:

 - emergency mode, make filegroup offline, make database online
 - partial restore from full backup (my backup is for a month ago) which not worked, database stays in Restoring mode
 - paste deleted mdf from backup into its location, but database still stays in Recovery Pending mode
 - detached database and attach its current filegroups plus deleted filegroups from backup, which not worked because deleted filegroups was from another database and was not accepted by SQL Server

I don't know what to do any more, I have two options:

 1. Make deleted filegroups offline, create new filegroups and create my indexes with new names on them. It is not a permanent way indeed, because there is some deleted filegroups on my database which is not clean, but it help me to back online.
 2. create a script to create new database, new filegroups, new tables and insert current data to them, create stored procedures and triggers and indexes and ... which is difficult to write.

If you have any suggestion I'll be appreciate.
                                

Amirali Samiei (103 rep)

Jul 23, 2022, 10:11 AM • Last activity: Mar 16, 2025, 04:52 AM

-3 votes

2 answers

318 views

cant attach mdf file because of suspect mode

sql-server sql-server-2008-r2 corruption

Please I need Help i cant attach Mdf Database , As last time it was suspect , i made detach and i cant now attache it again . and i recieve this message An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo) E:\خاص بالبرنامج وممنوع الحذف\databas...

                                  Please I need Help 

i cant attach Mdf Database , As last time it was suspect , i made detach and i cant now attache it again .  
and i recieve this message 

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

E:\خاص بالبرنامج وممنوع الحذف\database\Sadiek.mdf is not a primary database file. (Microsoft SQL Server, Error: 5171)

Mahmoud Mahmad (11 rep)

Aug 13, 2022, 06:34 PM • Last activity: Mar 15, 2025, 05:12 PM

3 votes

1 answers

207 views

How to recreate corrupt innodb tables with innodb_file_per_table?

mysql innodb restore mysql-8.0 corruption

MySQL refuses to start up and I was only able to start in read-only mode using `innodb_force_recovery = 6`. Every answer I've found on this topic suggests making a backup, deleting ibdata and ib_logfiles and then letting them get recreated: - https://serverfault.com/questions/592793/mysql-crashed-an...

                                  MySQL refuses to start up and I was only able to start in read-only mode using innodb_force_recovery = 6. Every answer I've found on this topic suggests making a backup, deleting ibdata and ib_logfiles and then letting them get recreated:

- https://serverfault.com/questions/592793/mysql-crashed-and-wont-start-up 
- https://dba.stackexchange.com/questions/23361/innodb-force-recovery-when-innodb-corruption 
- https://stackoverflow.com/questions/41997197/mysql-data-recovery-with-innodb 

In my case since innodb-file-per-table is enabled, I'm wondering if I also need to delete files like /var/lib/mysql/MY_DATABASE/table.ibd? Also, as redo logging is disabled, there is no logfile to remove.

The specific error I got is this:

> [InnoDB] Server was killed when InnoDB redo logging was disabled. Data files could be corrupt. You can try to restart the database with innodb_force_recovery=6

At this point I've been able to start in read-only mode and I've made a backup of all the tables I need. What specifically should I delete? Or is there a better approach to completely rebuilding these tables?

I'm considering simply uninstalling mysql completely and purging the whole directory but looking for feedback here if there is a simpler approach.

You Old Fool (790 rep)

Mar 14, 2025, 01:16 PM • Last activity: Mar 15, 2025, 01:09 AM

0 votes

0 answers

14 views

Commit Log Corruption & Cluster Resilience in K8ssandra on GKE

cassandra corruption kubernetes

I'm Eran Betzalel from ExposeBox. We're migrating our big data infrastructure from Hadoop/HDFS/HBase VMs to a modern stack using GKE, GCS, and K8ssandra. While the move is exciting, we're currently facing a critical issue that I hope to get some insights on. **Issue Overview:** 1. **Commit Log Corruption:** * Our 8-node Cassandra cluster is experiencing frequent commit log corruptions. The latest occurred on two Cassandra nodes. Despite only two nodes showing the error, the entire cluster is affected, leading to a complete halt. * The error message points to a bad header in one of the commit log files, suggesting a possible incomplete flush or other disk-related issues. 2. **Kubernetes Node Failure:** * We detected that a Kubernetes node went down around the time the issue occurred. I'm curious how this event might be contributing to the corruption and what steps can be taken to shield Cassandra from such disruptions. 3. **Reliability Concerns:** * It’s puzzling why corruption on just two nodes cascades to affect the whole cluster. * I’m looking for recommendations on enhancing cluster resilience. Are there specific configurations or best practices to ensure that issues on individual nodes don’t compromise the entire cluster? **Questions for the Community:** * **Root Cause:** * How can commit log corruption on only two nodes cause a complete cluster halt? * **Resilience Strategies:** * What are the best practices for configuring K8ssandra to handle node failures and unexpected Kubernetes disruptions? * Are there specific settings or architectural changes that can help prevent such commit log issues from propagating cluster-wide? * **Kubernetes Integration:** * Given that a K8s node failure was detected around the time of the error, how can we make Cassandra more resilient in a dynamic, containerized environment? Below is the stack trace for your reference:

org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 8105604 of commit log /opt/cassandra/data/commitlog/CommitLog-8-1740071674398.log, with bad position but valid CRC
CommitLogReplayer.java:536 - Ignoring commit log replay error likely due to incomplete flush to disk
    at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:865)
    at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:727)
    at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:345)
    at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:208)
    at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:229)
    at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:205)
    at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:208)
    at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:229)
    at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:205)
    at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:147)
    at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:233)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
    at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:97)
    at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:124)

Eran Betzalel (101 rep)

Feb 23, 2025, 02:41 PM

0 votes

1 answers

97 views

How can I know that it is safe to failover a Basic Availability Group?

sql-server availability-groups corruption sql-server-2022 basic-availability-groups

Today, I learned the hard way that [you cannot corruption check a Basic Availability Group's secondary](https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/basic-availability-groups-always-on-availability-groups?view=sql-server-ver16#limitations). Given this limitation,...

                                  Today, I learned the hard way that [you cannot corruption check a Basic Availability Group's secondary](https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/basic-availability-groups-always-on-availability-groups?view=sql-server-ver16#limitations) . Given this limitation, how can I know that it is safe to failover a BAG? For all I know, the secondary could be corrupt.
                                

J. Mini (1225 rep)

Feb 4, 2025, 07:41 PM • Last activity: Feb 12, 2025, 10:03 PM

8 votes

1 answers

1497 views

RDS MYSQL instance rebooted and won't load INNODB table

mysql innodb recovery amazon-rds corruption

So my production db in RDS apparently either crashed or was rebooted during a version update, and upon being restarted mysql couldn't load one of the innodb tables: as evidenced by the following error in the logs. 140324 16:22:23 [ERROR] Cannot find or open table dbname/table_name from the internal...

                                  So my production db in RDS apparently either crashed or was rebooted during a version update, and upon being restarted mysql couldn't load one of the innodb tables: as evidenced by the following error in the logs.

    140324 16:22:23 [ERROR] Cannot find or open table dbname/table_name from
    the internal data dictionary of InnoDB though the .frm file for the
    table exists. Maybe you have deleted and recreated InnoDB data
    files but have forgotten to delete the corresponding .frm files
    of InnoDB tables, or you have moved .frm files to another database?
    or, the table contains indexes that this version of the engine
    doesn't support.
    See http://dev.mysql.com/doc/refman/5.5/en/innodb-troubleshooting.html 
    how you can resolve the problem.

The problem is that no one noticed for three days that the table was missing and therefore my automated snapshots are useless (I only retain them that long) as all the snapshots have the same problem.

During my attempts to fix things I have had indications from the db that indicate that the .frm and .ibd files are all inplace. The show tables command lists the table (so frm is there), and I get the error below if I (on a copy) drop the table and try and recreate it.

    140324 16:46:05 InnoDB: Error creating file './dbname/table_name.ibd'.
    140324 16:46:05 InnoDB: Operating system error number 17 in a file operation.
    InnoDB: Error number 17 means 'File exists'.
    InnoDB: Some operating system error numbers are described at
    InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html 
    InnoDB: The file already exists though the corresponding table did not
    InnoDB: exist in the InnoDB data dictionary. Have you moved InnoDB
    InnoDB: .ibd files around without using the SQL commands
    InnoDB: DISCARD TABLESPACE and IMPORT TABLESPACE, or did
    InnoDB: mysqld crash in the middle of CREATE TABLE? You can
    InnoDB: resolve the problem by removing the file './dbname/table_name.ibd'
    InnoDB: under the 'datadir' of MySQL.

I suspect there was some issue with an index before the instance was restarted, or if it indeed crashed then perhaps that caused a corruption.

I managed to recreate the data (probably) from logs though it's only a guess, not a sure thing.

So I'm asking what I should do at this point. I'd love to find a solution that allows me to get that table back with the data it had, but if I have to delete it and recreate it that's also ok. Obviously I can't touch any mysql data files as it's an rds instance. Mysql version is 5.5.27.
                                

Camden Narzt (203 rep)

Mar 24, 2014, 05:44 PM • Last activity: Feb 8, 2025, 10:06 PM

3 votes

3 answers

387 views

Are Basic Availability Groups a reliable HA solution without integrity checks on the secondary replica?

sql-server availability-groups high-availability corruption basic-availability-groups

We have a few Basic Availability groups in production and I've been reading about the [limitations][1] of them. I'm concerned about the following limitations: > No backups on secondary replica. > > No integrity checks on secondary replicas. Suppose that we haven't failed over in months. In our case,...

                                  We have a few Basic Availability groups in production and I've been reading about the limitations  of them. I'm concerned about the following limitations:

> No backups on secondary replica.
>
> No integrity checks on secondary replicas.

Suppose that we haven't failed over in months. In our case, that seems to imply that we haven't run DBCC CHECKDB for a long time on any of the databases on the secondary. There could have been a storage corruption issue that occurred months ago that we still don't know about. If a disaster occurs on the primary replica and we fail over to the secondary, we might end up with the production application pointing at corrupt data.

Would it be considered a best practice to perform one of the following on a fixed schedule?

 1. Perform a planned manual failover to switch the primary and secondary and leave the former secondary in the primary role until the next planned failover.
 2. Take a database snapshot of the secondary replica and run DBCC CHECKDB against that.

Or am I overthinking the risks here?

Joe Obbish (32976 rep)

Oct 26, 2021, 04:21 PM • Last activity: Feb 6, 2025, 12:30 PM

3 votes

2 answers

573 views

Why is running database integrity checks in parallel beneficial if you are following best practice on Enterprise Edition?

sql-server sql-server-2019 best-practices corruption dbcc-checkdb

On Enterprise Edition and Development edition, database integrity checks (e.g. `DBCC CHECKDB`) can run in parallel. According to best practice, backups should be tested regularly. So if I'm following best practice, I would be regularly restoring my production Enterprise Edition box's backups on to a...

                                  On Enterprise Edition and Development edition, database integrity checks (e.g. DBCC CHECKDB) can run in parallel. According to best practice, backups should be tested regularly. So if I'm following best practice, I would be regularly restoring my production Enterprise Edition box's backups on to a non-production Development edition box and running CHECKDB there. This means that **I would never actually be running database integrity checks on my production Enterprise Edition box**. So, why is running database integrity checks in parallel beneficial if you are following best practice on Enterprise Edition?

The exception to this is system databases, but they're so tiny that running CHECKDB in parallel is unlikely to matter.

J. Mini (1225 rep)

Jan 15, 2025, 07:15 AM • Last activity: Feb 6, 2025, 11:00 AM

0 votes

2 answers

1106 views

Error 23 on transaction log backup, full backup works fine

sql-server sql-server-2012 backup transaction-log corruption

I have a database with the database files on the E: drive, and logs files on the D: drive. I'm getting the following error when trying to run a transaction log backup: > Read on "D:\LOG\AOA_Log.ldf" failed: 23(Data error (cyclic redundancy check).) I'm also getting a bad block message in the Windows...

                                  I have a database with the database files on the E: drive, and logs files on the D: drive.

I'm getting the following error when trying to run a transaction log backup:

> Read on "D:\LOG\AOA_Log.ldf" failed: 23(Data error (cyclic redundancy check).) 

I'm also getting a bad block message in the Windows event log.

A Full database backup works fine.

Can I change the database property to simple restore and shrink the log file?

Can I run chkdsk on the D: drive (need to dismount) without messing up the database?

The server is a VMWare guest with the data residing in a disk array.

Ken A (1 rep)

Sep 11, 2019, 02:05 PM • Last activity: Feb 5, 2025, 03:01 AM

0 votes

2 answers

70 views

corrupted system files, how to recover databases for a fresh install?

mysql ubuntu phpmyadmin corruption

newbie here. Our system files got corrupted and we no longer can access the control panel or phpmyadmin, but we do have access to SSH and FTP, and to my understanding, you can backup databases locally using the `mysqldump` command, however somehow MySQL's files were also corrupted and we no longer c...

                                  newbie here. Our system files got corrupted and we no longer can access the control panel or phpmyadmin, but we do have access to SSH and FTP, and to my understanding, you can backup databases locally using the mysqldump command, however somehow MySQL's files were also corrupted and we no longer can use mysqldump.

I contacted my provider and they said this "MySQL databases are typically stored in /var/lib/mysql/. You can copy the whole mysql directory and place it into its new location".

When I searched for this folder, I did find it under www/server/mysql/ but no databases inside, I did however find a folder under www/server/data/ containing what looks like my tables with .MYD .MYI and .sdi files. Is it weird that they are on a completely separate folder than MySQL or is that by design? Can I download and upload these into a completely fresh install on a new host and get my data back? 

Any tips or advice is appreciated!

We are on Ubuntu 22.04, aaPanel, in case that helps.

Bob (1 rep)

Oct 2, 2024, 08:35 AM • Last activity: Jan 27, 2025, 08:32 AM

Showing page 1 of 20 total questions