Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

2 votes

1 answers

5416 views

How to restore database from /var/lib/mysql

mysql linux recovery restore disaster-recovery

I have a database folder from a crashed server called gain_wordpress. I'm trying to put that database on a new server but I'm not able to do it successfully. I don't have a .sql dump, all I have is the database folder from a past crashed server with ibdata and the 2 log files. I copied the gain_word...

                                  I have a database folder from a crashed server called gain_wordpress. I'm trying to put that database on a new server but I'm not able to do it successfully. I don't have a .sql dump, all I have is the database folder from a past crashed server with ibdata and the 2 log files. 

I copied the gain_wordpress folder to /var/lib/mysql on the new server but when I do a > show tables on the database I get the following error: 

    ERROR 1018 (HY000): Can't read dir of './gain_wordpress/' (errno: 13)

I also changed the permission of the folder and did a chown mysql:mysql gain_wordpress -R and it still didn't take care of the problem. 

What is the best way to recover gain_wordpress. Do I need to replace any of the ibdata or the logfiles? What is the best way? I have the corresponding .frm, .MYD and .MYI files for each table. Please advise.

unixmiah (156 rep)

Mar 27, 2015, 02:58 PM • Last activity: Jul 29, 2025, 10:10 PM

0 votes

2 answers

2804 views

SQL1273N - Missing log file "S0000000.LOG" in database partition "0"

db2 db2-luw disaster-recovery

I'm getting this error when trying to connect to a DB2 Database. This error started after the OS restarted. ```none db2 => connect to cm SQL1273N An operation that reads the logs in database "CM" cannot proceed due to a missing log file "S0000000.LOG" in database partition "0" and log stream "0". ``...

I'm getting this error when trying to connect to a DB2 Database. This error started after the OS restarted.

db2 => connect to cm
SQL1273N An operation that reads the logs in database "CM" cannot proceed 
due to a missing log file "S0000000.LOG" in database partition "0" and log 
stream "0".

Any help are welcome.

PAULO CRISTIANO KLEIN (1 rep)

Jan 2, 2023, 05:26 PM • Last activity: Jul 23, 2025, 02:03 AM

0 votes

2 answers

618 views

SQL server alwayson 3 Node Primary Down

sql-server sql-server-2012 optimization availability-groups disaster-recovery

I am using three node alwayson Availability Group SQL 2012 Enterprise Edition. 2 Node in DC with File Share Witness quorum also in DC and third node to be setup in DR I need to perform a DR switchover with all DC down And DR database up for applications. Can this solution work or I need to use Log S...

                                  I am using three node alwayson Availability Group SQL 2012 Enterprise Edition.

2 Node in DC with File Share Witness quorum also in DC and third node to be setup in DR

I need to perform a DR switchover with all DC down And DR database up for applications.

Can this solution work or I need to use Log Shipping as third node instead of alwayson AG node.

Please help on this.

Thanks

mukki (11 rep)

Jun 25, 2020, 07:20 AM • Last activity: Jul 12, 2025, 11:03 PM

0 votes

1 answers

165 views

Sql server replication for HA and DR

sql-server replication high-availability disaster-recovery

1.)According to brent ozar's HA and DR [chart][1],for Failover Cluster Instance - SQL Server minimum version supported and SQL Server minimum edition required is given as 'any'. Does that mean it works with sql express? or am I misinterpreting the meaning of 'any'? According to [microsoft][2] it doe...

                                  1.)According to brent ozar's HA and DR chart ,for Failover Cluster Instance - SQL Server minimum version supported and SQL Server minimum edition required is given as 'any'.
Does that mean it works with sql express? or am I misinterpreting the meaning of 'any'?
According to microsoft  it doesn't work for sql express.

2.)Same chart also says that sql server Replication can automatically failover. Here is what I found . How does the automatic failover happen in case of replication if it is possible?

IT Researcher143 (105 rep)

Jan 15, 2022, 04:49 AM • Last activity: Jul 8, 2025, 01:02 AM

7 votes

1 answers

13258 views

Restore mongoDB by --repair and WiredTiger

mongodb recovery disaster-recovery wiredtiger mongo-repair

We accidentally deleted the directory `rm -rf /data/db` which was our **MongoDB path**, and thanks to **extundelete**, we recovered it and got the directory `/data/db`. Here are our files in the directory, and the files was generated under MongoDB **version 3.4**. [![enter image description here][1]...

We accidentally deleted the directory rm -rf /data/db which was our **MongoDB path**, and thanks to **extundelete**, we recovered it and got the directory /data/db. Here are our files in the directory, and the files was generated under MongoDB **version 3.4**.

Folder diagnostic.data:

Folder journal:

# Step 1: try to run mongod as normal **a)** We ran mongod --port 27017 --dbpath /data/db --bind_ip_all and mongo, and expected there should have a user-defined database wecaXX, **But, it did not show up.**

> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB

In Robo3T

**b)** Then I tried to run mongod --port 27017 --dbpath /data/db --bind_ip_all --repair. The result was:

2019-03-25T14:10:02.170+0800 I CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten] MongoDB starting : pid=23018 port=27017 dbpath=/data/db 64-bit host=iZj6c0pipuxk17pb7pbaw0Z
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten] db version v4.0.7
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten] git version: 1b82c812a9c0bbf6dc79d5400de9ea99e6ffa025
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten] allocator: tcmalloc
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten] modules: none
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten] build environment:
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten]     distmod: ubuntu1604
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten]     distarch: x86_64
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten]     target_arch: x86_64
2019-03-25T14:10:02.191+0800 I CONTROL  [initandlisten] options: { net: { bindIpAll: true, port: 27017 }, repair: true, storage: { dbPath: "/data/db" } }
2019-03-25T14:10:02.191+0800 W STORAGE  [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2019-03-25T14:10:02.194+0800 I STORAGE  [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-03-25T14:10:02.194+0800 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2019-03-25T14:10:02.194+0800 I STORAGE  [initandlisten]
2019-03-25T14:10:02.194+0800 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2019-03-25T14:10:02.194+0800 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem 
2019-03-25T14:10:02.194+0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=256M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2019-03-25T14:10:02.818+0800 E STORAGE  [initandlisten] WiredTiger error (17) [1553494202:818725][23018:0x7f6119074a40], connection: __posix_open_file, 715: /data/db/WiredTiger.wt: handle-open: open: File exists Raw: [1553494202:818725][23018:0x7f6119074a40], connection: __posix_open_file, 715: /data/db/WiredTiger.wt: handle-open: open: File exists
2019-03-25T14:10:02.818+0800 I STORAGE  [initandlisten] WiredTiger message unexpected file WiredTiger.wt found, renamed to WiredTiger.wt.1
2019-03-25T14:10:03.832+0800 I STORAGE  [initandlisten] WiredTiger message [1553494203:832267][23018:0x7f6119074a40], txn-recover: Main recovery loop: starting at 4/11366912 to 5/256
2019-03-25T14:10:03.832+0800 I STORAGE  [initandlisten] WiredTiger message [1553494203:832674][23018:0x7f6119074a40], txn-recover: Recovering log 4 through 5
2019-03-25T14:10:03.898+0800 I STORAGE  [initandlisten] WiredTiger message [1553494203:898252][23018:0x7f6119074a40], txn-recover: Recovering log 5 through 5
2019-03-25T14:10:03.964+0800 I STORAGE  [initandlisten] WiredTiger message [1553494203:964880][23018:0x7f6119074a40], txn-recover: Set global recovery timestamp: 0
2019-03-25T14:10:03.998+0800 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2019-03-25T14:10:03.999+0800 E STORAGE  [initandlisten] WiredTiger error (17) [1553494203:999855][23018:0x7f6119074a40], WT_SESSION.create: __posix_open_file, 715: /data/db/_mdb_catalog.wt: handle-open: open: File exists Raw: [1553494203:999855][23018:0x7f6119074a40], WT_SESSION.create: __posix_open_file, 715: /data/db/_mdb_catalog.wt: handle-open: open: File exists
2019-03-25T14:10:04.000+0800 I STORAGE  [initandlisten] WiredTiger message unexpected file _mdb_catalog.wt found, renamed to _mdb_catalog.wt.1
2019-03-25T14:10:04.015+0800 I CONTROL  [initandlisten]
2019-03-25T14:10:04.015+0800 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-03-25T14:10:04.015+0800 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2019-03-25T14:10:04.015+0800 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2019-03-25T14:10:04.015+0800 I CONTROL  [initandlisten]
2019-03-25T14:10:04.015+0800 I CONTROL  [initandlisten]
2019-03-25T14:10:04.015+0800 I CONTROL  [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 3824 processes, 65535 files. Number of processes should be at least 32767.5 : 0.5 times number of files.
2019-03-25T14:10:04.020+0800 I STORAGE  [initandlisten] createCollection: admin.system.version with provided UUID: 47d8713d-ac61-4081-83bf-60209ad60a7d
2019-03-25T14:10:04.034+0800 W ASIO     [initandlisten] No TransportLayer configured during NetworkInterface startup
2019-03-25T14:10:04.036+0800 I COMMAND  [initandlisten] setting featureCompatibilityVersion to 4.0
2019-03-25T14:10:04.036+0800 I STORAGE  [initandlisten] repairDatabase admin
2019-03-25T14:10:04.037+0800 I STORAGE  [initandlisten] Repairing collection admin.system.version
2019-03-25T14:10:04.040+0800 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-0-4352287918877967674. Not salvaging.
2019-03-25T14:10:04.048+0800 I INDEX    [initandlisten] build index on: admin.system.version properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "admin.system.version" }
2019-03-25T14:10:04.048+0800 I INDEX    [initandlisten]          building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-03-25T14:10:04.055+0800 I STORAGE  [initandlisten] finished checking dbs
2019-03-25T14:10:04.055+0800 I STORAGE  [initandlisten] WiredTigerKVEngine shutting down
2019-03-25T14:10:04.056+0800 I STORAGE  [initandlisten] Shutting down session sweeper thread
2019-03-25T14:10:04.057+0800 I STORAGE  [initandlisten] Finished shutting down session sweeper thread
2019-03-25T14:10:04.140+0800 I STORAGE  [initandlisten] shutdown: removing fs lock...
2019-03-25T14:10:04.140+0800 I CONTROL  [initandlisten] now exiting
2019-03-25T14:10:04.140+0800 I CONTROL  [initandlisten] shutting down with code:0

**c)** After repairing I re-ran mongod --port 27017 --dbpath /data/db --bind_ip_all, it still showed **nothing** (the same result as step a)). # Step 2: Use wiredTiger tool Since that didn't work as I expected, I started to find any other tools or approach that may help me. Here is the link Recovering a WiredTiger collection from a corrupt MongoDB installation which introduces how to use WiredTiger to recover collections. And I decided to have a try. I created a folder db_backup and put all my files into it. And I created a folder wiredtiger-3.0.0 under db_backup.I installed wiredtiger in folder wiredtiger-3.0.0:

**a) To salvage the collection**

root@iZj6c0pipuxk17pb7pbaw0Z:/data/db_backup/# cd ./wiredtiger-3.0.0
root@iZj6c0pipuxk17pb7pbaw0Z:/data/db_backup/wiredtiger-3.0.0# ./wt -v -h ../db_for_wt -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R salvage collection-23--360946650994865928.wt
        WT_SESSION.salvage 100

Based on the article above, it means that we have successfully salvaged this collection. # Error Occured: **b) To dump the collection**

root@iZj6c0pipuxk17pb7pbaw0Z:/data/db_backup/wiredtiger-3.0.0# ./wt -v -h ../../db_backup -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R dump -f ../collection.dump collection-23--360946650994865928
lt-wt: cursor open(table:collection-23--360946650994865928) failed: No such file or directory

Note that the command above was indeed ended without .wt. I have checked my directory argument and I found no fault. In the picture, the file collection-23--360946650994865928.wt **is right Here**.

The snapshot of collection-23--360946650994865928.wt just has been salvaged. We could see some English or Chinese characters in it. And that data truly is from one of collection of database wecaXX.

# Questions: **1)** Now, I'm stuck by dumping the collection. Does anyone know what's wrong with that? **2)** collection-23--360946650994865928.wt contains our most important data. If we could not restore the whole database, extracting data from that file will be still very useful. However, if we do copy paste manually, not all the characters are well written; they still need to be decrypted. Does anyone know how to do that? **3)** is it possible that we did not recover all the files (or the entire body of all the files) under /data/db? **Any comment or answer will be appreciated!!**

国贵棠 (71 rep)

Mar 25, 2019, 10:10 AM • Last activity: Jun 29, 2025, 11:03 PM

0 votes

1 answers

808 views

Safe method to rename a crashed myisam table

mysql myisam disaster-recovery crash

I have a large MyISAM table which has crashed. Repairing the table will take some time. The table only INSERTed to and SELECTed from, never updated. To allow the application to continue working, albeit with reduced capability, I thought of - renaming the crashed table - creating a new table with the...

                                  I have a large MyISAM table which has crashed. Repairing the table will take some time. The table only INSERTed to and SELECTed from, never updated. To allow the application to continue working, albeit with reduced capability, I thought of

 - renaming the crashed table
 - creating a new table with the original name
 - switching processing back on
 - repairing the backup table
 - switching off processing
 - merging the repaired and new data
 - switching on processing

The other steps in this process do not pose any risk due to the nature of the application.
Is it safe to rename a crashed MyISAM table? How?

I believe that I can't simply do ALTER TABLE...RENAME.... as this always does a row-by-row copy  into a new table.

Apparently Peter Zaitsev uses a "tiny script which moves out all MyISAM tables out of MySQL database directory " but doesn't seem to give details of what this script does (presumably stops database first?).

                                

symcbean (339 rep)

Sep 8, 2016, 09:51 AM • Last activity: Jun 10, 2025, 08:04 AM

0 votes

2 answers

463 views

Failing over to DR site Availability group

sql-server availability-groups windows disaster-recovery failover-cluster-instance

I have a 4 node cluster SQL 2012 availability group database with one witness server at the primary site. 3 nodes are in the primary site and one node in the DR site. Due to maintenance at the Primary Site I need to failover to the DR site and making it as the primary. Here is my question, if I Fail...

                                  I have a 4 node cluster SQL 2012 availability group database with one witness server at the primary site. 3 nodes are in the primary site and one node in the DR site. 

Due to maintenance at the Primary Site I need to failover to the DR site and making it as the primary.

Here is my question, if I Failover to the Dr Site and turn off all the nodes one by one gracefully including the witness server, will the Dynamin Quorum take over and keep the cluster and the availability group alive or the Windows Server Failover Cluster goes down and AG is inaccessible and as a result I will have to bring the cluster service up with force quorum

SQL_NoExpert (1117 rep)

Apr 4, 2022, 01:58 PM • Last activity: May 3, 2025, 06:06 AM

0 votes

1 answers

1589 views

Mysqld won't start although lower_case_table_names is 0 and match the data dictionnary

mysql restore recovery disaster-recovery

I copied a mysql 8.0.26 database from a Linux server to my Windows box but mysqld won't start. Either it's this (with lower_case_table_names=1): [ERROR] [MY-011087] [Server] Different lower_case_table_names settings for server ('1') and data dictionary ('0'). [ERROR] [MY-010020] [Server] Data Dictio...

                                  I copied a mysql 8.0.26 database from a Linux server to my Windows box but mysqld won't start. Either it's this (with lower_case_table_names=1):

    [ERROR] [MY-011087] [Server] Different lower_case_table_names settings for server ('1') and data dictionary ('0').
    [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
    [ERROR] [MY-010119] [Server] Aborting

Or this  (with lower_case_table_names=0): 

    [ERROR] [MY-010158] [Server] The server option 'lower_case_table_names' is configured to use case sensitive table names but the data directory is on a case-insensitive file system which is an unsupported combination. Please consider either using a case sensitive file system for your data directory or switching to a case-insensitive table name mode.
    [ERROR] [MY-010119] [Server] Aborting

Is this because the Windows fs is case insensitive? How can I run the database on Windows? Or is this another issue? The database crashed on Linux and I try to repair it on Windows.  
                                

Cybercartel (111 rep)

Jun 9, 2023, 03:46 PM • Last activity: Apr 19, 2025, 07:00 AM

1 votes

1 answers

2451 views

How to recover Postgresq data from $PGDATA/base files after system crash

postgresql recovery disaster-recovery postgresql-13 data

I was working on my Ubuntu machine and left it on to grab food. Upon returning, I found it had shut itself down. It was not power failture because I have a home backup system and lights never go off in my house. When I turned it on, it went straight to Busybox because apparently the `fsck` tool had...

                                  I was working on my Ubuntu machine and left it on to grab food. Upon returning, I found it had shut itself down. It was not power failture because I have a home backup system and lights never go off in my house.

When I turned it on, it went straight to Busybox because apparently the fsck tool had somehow moved the entire contents of the root partition and everything else to **lost+found** on my laptop's primary drive, which is an SSD.

So I backed up the lost+found directory to an external HDD and installed Ubuntu 22, from Ubuntu 20. I did not lose personal data because I had my /home directory on my secondary HDD. However, everything else was on the same partition on the SSD.

So, after perusing through the lost+found files, I was able to extract all the files from /lib/postgresql/. Now because Postgresql uses OIDS, unlike mysql which uses names, I had to figure out the databases based on the information on this website here https://www.postgresql.fastware.com/blog/how-postgresql-maps-your-tables-into-physical-files 

 For reference, I was able to recover MySQL tables since they simply use table names. With PostgreSQL however, I'm not sure if just copying the **$PGDATA/base/** files will work given the information mentioned on that website.

Is it possible to extract the data (through some tool) from the base files or to re-import them back into an active cluster/instance?

TheRealChx101 (121 rep)

Jul 31, 2022, 12:20 AM • Last activity: Mar 22, 2025, 01:00 PM

-1 votes

1 answers

3696 views

The header for file 'xxx.mdf' is not a valid database file header. The FILE SIZE property is incorrect

sql-server disaster-recovery

Updated Question : The issue occurred when the SQL Data and Log Disk were unknowingly detached from the VM server. After realizing this, the disks were reattached, but the detachment and reattachment at the VM level caused the database to enter a recovery pending state. Subsequently, the emergency m...

                                  Updated Question : 

The issue occurred when the SQL Data and Log Disk were unknowingly detached from the VM server.

After realizing this, the disks were reattached, but the detachment and reattachment at the VM level caused the database to enter a recovery pending state.

Subsequently, the emergency mode and recovery with data loss commands were executed. However, after this process, some records were found to be completely removed at the page level from the data file.

To recover the lost data, we attempted to restore from a backup taken before running the recovery with data loss command.

However, repairing and attaching the specific data file resulted in file header and other issues.

We are facing an issue after unknowingly detaching a disk from the server:

> Msg 5172, Level 16, State 15, Line 3  
> The header for file 'E:\DATABASE\xxxx.mdf' is not a valid database file header. The FILE SIZE property is incorrect.
>
> Msg 1802, Level 16, State 7, Line 3  
> CREATE DATABASE failed. Some file names listed could not be created. Check related errors.
> 
> FileMgr::StartLogFiles: Operating system error 2 (the system cannot find the file specified.) occurred while creating or opening file 'D:\DATABASE\xxxx.LDF'. Diagnose and correct the operating system error, and retry the operation.
> 
> Operating system error 38 (Reached the end of the file.) on file "D:\DATABASE\xxxx.LDF" during SQLServerLogMgr::ReadFileHdr.

We don’t have a recent backup—only a full backup from the previous day.

Selva (25 rep)

Mar 11, 2025, 04:58 PM • Last activity: Mar 12, 2025, 10:17 PM

1 votes

1 answers

117 views

Replay standby WAL to point in time without creating new timeline

postgresql replication recovery disaster-recovery postgresql-13

We have a PostgreSQL 13 replication cluster where one of the standbys is setup with a `recovery_min_apply_delay`="x hr", which could be useful in the case of data corruption. For a scenario where an accidental delete happened on the *master server*, but not yet applied to the *standby server*. In th...

                                  We have a PostgreSQL 13 replication cluster where one of the standbys is setup with a recovery_min_apply_delay="x hr", which could be useful in the case of data corruption. 

For a scenario where an accidental delete happened on the *master server*, but not yet applied to the *standby server*. In this case, I am trying to extract the data from standby server by removing the *WAL replay delay* and adding recovery_target_time to a point before delete happened.

To this point I am successful, but once I done with pulling data from standby server to master (I am thinking of postgres_fdw), I need to resume replication. But when I run pg_wal_replay_resume(), it is creating a new timeline and is no longer in the replication cluster.

Is there any way to replay WAL other than using recovery_target_time?

Giving an example of what I am trying to achieve,

*Let's say recovery_min_apply_delay=2hr , an accidental delete happened at 03:50 AM, at this time transactions committed at standby is transactions that happened on or before 01:50 AM. If I wait another 2hr and at 05:49 AM if I am able to pause standby WAL replay I will be able see a snapshot of data which is "right before" DELETE. Then I will use postgres_fdw to pull data from standby.* 

How do I achieve "right before" in a precise systematic way? Also I would be able to resume WAL replay.

goodfella (595 rep)

Feb 17, 2025, 04:47 AM • Last activity: Feb 18, 2025, 09:56 AM

1 votes

2 answers

42 views

Multi subnet failover for Azure PaaS failover groups

sql-server azure-sql-database high-availability disaster-recovery azure-sql-managed-instance

should the `multisubnetfailover=True` parameter be added to connection strings to Azure failover groups?

                                  should the multisubnetfailover=True parameter be added to connection strings to Azure failover groups?
                                

AlexP012 (53 rep)

Jan 20, 2025, 12:32 PM • Last activity: Jan 20, 2025, 11:51 PM

0 votes

2 answers

911 views

Connect two Percona XtraDB Clusters in two data centers

mysql clustering percona disaster-recovery

I have been trying to figure this out the past couple of weeks. We used to have a five-host Percona cluster at different providers. When it started acting up [I read that it is a bad idea][1] to have synchronous replication across data centers. Now we have a three-node cluster in one data center and...

                                  I have been trying to figure this out the past couple of weeks. We used to have a five-host Percona cluster at different providers. When it started acting up I read that it is a bad idea  to have synchronous replication across data centers. Now we have a three-node cluster in one data center and it is stable.

I am trying to figure out the best way to have data center redundancy.

I read that you can have two individual clusters, one in each data center and have an async link between them. Upon further reading though it seems that it is typically for a DR scenario so the slave cluster is not utilized and while the fail over is easy, the fall back is a bit involved.

I am very new to DB clustering, I am hoping for guidance. The database clients will be using HAProxy, and will only point to the hosts in one datacenter.

I am hoping that I can split the clients half to each DC and have the async link replicate the data across. How would you guys do this? What is everyone using these days?

Les (11 rep)

Jun 24, 2021, 06:39 AM • Last activity: Jan 12, 2025, 03:10 AM

1 votes

2 answers

4643 views

Always On Availability Group Failover Issue

availability-groups disaster-recovery

I'm testing a PowerShell Disaster Recovery script and trying to fail over to an off-site AG DR (asynch) replica with this command ALTER AVAILABILITY GROUP [MyAG] FORCE_FAILOVER_ALLOW_DATA_LOSS On several clusters this works fine but on a couple of other ones it usually fails (but not always) with th...

                                  I'm testing a PowerShell Disaster Recovery script and trying to fail over to an off-site AG DR (asynch) replica with this command

    ALTER AVAILABILITY GROUP [MyAG] FORCE_FAILOVER_ALLOW_DATA_LOSS

On several clusters this works fine but on a couple of other ones it usually fails (but not always) with the error:

> Failed to move a Windows Server Failover Clustering (WSFC) group to
> the local node (Error code 5023).  The WSFC service may not be running
> or may not be accessible in its current state, or the specified
> cluster group or node handle is invalid.  For information about this
> error code, see "System Error Codes" in the Windows Development
> documentation. 

> Failed to designate the local availability replica of
> availability group 'MyAG' as the primary replica.  The operation
> encountered SQL Server error 41018 and has been terminated.  Check the
> preceding error and the SQL Server error log for more details about
> the error and corrective actions.

We have to force the quorum simulating the situation where the async DR replica can't communicate with the other replicas: https://learn.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/force-a-wsfc-cluster-to-start-without-a-quorum?view=sql-server-2017 

A comparison of the clusters that work vs. those that don't with Get-Cluster and Get-ClusterGroup doesn't reveal any major differences.

This failover has worked successfully on some clusters but fails 99% of the time on two of the clusters. It has worked on these two clusters, but rarely. One of the common issues I found from a Google search is permissions for NT Authority\System (see https://dataginger.com/2014/10/28/sql-server-failed-to-bring-availability-group-availability-group-name-online/) , which I confirmed it has. 

I ran a trace and see the same errors and no other clues in the trace.

If nothing turns up I may have to open a Microsoft ticket to engage their support.

Any thoughts? 

Thanks for your help!
                                

Mike Petrak (436 rep)

Mar 30, 2019, 08:52 PM • Last activity: Jan 8, 2025, 12:08 AM

0 votes

0 answers

80 views

PostgreSQL: Is there any way to further optimise ASYNC replication towards less data loss during failover?

postgresql replication failover disaster-recovery

We are looking to see if there any further optimisations that can be done in PostgreSQL(13) `ASYNC replication` to make `data loss` near to zero as possible when a `failover` is triggered during disaster. We don't want change to `synchronous replication` as with the testing the `TPS` is really low c...

                                  We are looking to see if there any further optimisations that can be done in PostgreSQL(13) ASYNC replication to make data loss near to zero as possible when a failover is triggered during disaster. We don't want change to synchronous replication as with the testing the TPS is really low comparing with ASYNC method. 

We see an average write lag of 200-300KBs and at peak traffic up to 20MB. Is there any optimisations can do on WAL level? Any other area to tune?

**UPDATE**

Could wal_compression help optimise this?

goodfella (595 rep)

Oct 4, 2024, 09:16 AM • Last activity: Oct 8, 2024, 03:46 AM

2 votes

1 answers

88 views

SQL Server Database I/O Consistency Error Disaster Recovery

sql-server sql-server-2012 restore disaster-recovery

I have a corrupted SQL Database that is inacessible by any means. It is to the point that simple commands, like USE MyDatabase, won't execute. The error I get is: SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:371; actual 0:0). It occurred during a read of pa...

                                  I have a corrupted SQL Database that is inacessible by any means. It is to the point that simple commands, like USE MyDatabase, won't execute.

The error I get is: SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:371; actual 0:0). It occurred during a read of page (1:371) in database ID 5 at offset 0x000000002e6000 in file.

The database got to this situation because this is a corrupted .MDF file saved from a dead HD that is not accessible anymore. So, the .MDF contains the more recent data the user wants. Yes, the user was able to recover some full backups, but they are very old.

So what I have is: old good full backup in .BAK file, corrupted up to date .MDF file and it's .LDF file, that is in unknown state since I don't know how to check it.

This database could only be attached by doing the 'workaround' of restoring it from the old full backup .BAK file, setting it OFFLINE, replacing the sucessfully .MDF file with the corrupted one. However, the database always get stuck at EMERGENCY or SUSPECT mode, and cannot proceed further from that. I can't even use CHECKDB since any ALTER DATABASE command will result in the described error.

I already tried to export the DATA from it by using both bcp command and manually from SSMS Task, but it also failed. I also used some free trial versions of renowned .MDF repair softwares, and it's indeed was able to capture good data, however the user is still skeptical on using these.

I also have tried to follow the instructions described here , but with no success since I can't get the database to go single-user, only EMERGENCY MODE. I also tried to start the server via SQLcmd and proceed from that, but it also failed.

Is there anything left that can be done? Can I fix the current state of those failed pages via HEX, even if they be gone afterwards? Any other source of export I could try? Did I tried everything in order to restore succesfuly?

**UPDATE:**
Since I had no option left, I did a thing that **is widely not recommended**: I restored the good backup in order to create a good .MDF file. I opened the good .MDF file with HEX editor and copied the faulty page adresses, then writed it to the bad .MDF file. It was my day, since by **incredible luck**, I was able to put the database ONLINE and execute alter commands, that even with some errors, placed the database in a position where I could execute checkdb and after that, I could execute some queries in key tables, retrieving some important data!!! The data, offcourse, had many holes on it, and was really painful to select the good sectors. This bad experience turned into a good one. I will not post that as a solution, as this is a really messy one. Thanks everyone who helped me here and in stackoverflow.

joaocarlosib (23 rep)

Sep 4, 2024, 02:35 PM • Last activity: Sep 5, 2024, 01:15 PM

0 votes

1 answers

32 views

What is a viable low-cost DR option for a large cluster?

cassandra disaster-recovery database-size kubernetes

We have a Cassandra cluster running on GKE with a 32-CPU node pool and SSD disks. The current cluster size is nearly 1 PB, with each node utilizing an average of 5 TB on 10 TB allocated SSD disks. The cluster comprises 200 nodes, each with 10 TB disks, totaling 2 PB in size total allocated. Given th...

                                  We have a Cassandra cluster running on GKE with a 32-CPU node pool and SSD disks. The current cluster size is nearly 1 PB, with each node utilizing an average of 5 TB on 10 TB allocated SSD disks. The cluster comprises 200 nodes, each with 10 TB disks, totaling 2 PB in size total allocated.

Given this cluster size, the maintenance costs are substantial. How can we achieve low-cost disaster recovery for such a large cluster?

One option I am considering is creating a new data center in a different region with a replication factor of 1 (RF1). While this is not recommended, it would at least reduce the cluster size by a factor of three.

Any suggestions would be greatly appreciated.

Sai (39 rep)

Jul 12, 2024, 07:47 PM • Last activity: Jul 26, 2024, 09:59 AM

0 votes

0 answers

65 views

Always on availability group removing nodes

sql-server sql-server-2014 availability-groups data-synchronization disaster-recovery

We have a SQL 2014 enterprise with 4 node availability group database with 2 Synchronous commit , 1 read only Asynchronous for reporting and 1 DR asynchronous. Due to space shortage we need to remove some nodes. Since the reports points out to the Asynchronous reporting server can we remove 2 nodes...

                                  We have a SQL 2014 enterprise with 4 node availability group database with 2 Synchronous commit , 1 read only Asynchronous for reporting and 1 DR asynchronous.

Due to space shortage we need to remove some nodes. Since the reports points out to the Asynchronous reporting server can we remove 2 nodes ( 1 Synchronous replica and 1 DR) and make the Reporting server in Synchronous mode and keep it read only.

The cluster will then have 2 nodes both in synchronous mode with 1 disk witness for the quorum.

This setup will thus have no Disaster recovery server as it is temporary .

I will appreciate if someone can advice whether I can have this setup or if there is a better work around.

SQL_NoExpert (1117 rep)

Apr 23, 2024, 12:08 PM • Last activity: May 3, 2024, 11:23 AM

0 votes

0 answers

49 views

Data Loss and data redeundancy when forced failed over to secondary replica on DR Data Center in SQL Server 2016

sql-server sql-server-2016 high-availability disaster-recovery data-integrity

We are using SQL Server 2016 Enterprise edition and planning to have 2 datacenters primary and the remote Disaster recovery(DR). We have 2 replicas (primary replica and secondary replica) in the primary datacenter and it is running under synchronous-commit mode with automatic failover to provide hig...

                                  We are using SQL Server 2016 Enterprise edition and planning to have 2 datacenters primary and the remote Disaster recovery(DR). We have 2 replicas (primary replica and secondary replica) in the primary datacenter and it is running under synchronous-commit mode with automatic failover to provide high availability. The third replica (secondary replica) is in a disaster recovery datacenter and is running under asynchronous commit mode to provide datacenter level disaster recovery. I have below questions with respect to data integrity in case of forced manual failover when whole primary datacenter is completely down.

 Scenario 1:
  Say we have table named Table1 and it has 4 records; Record 1,2,3 and 4 whereas record4 was committed on Table1 on primary replica and before committing to secondary data center; whole server was down. How to Sync these data once the Primary datacenter is up again. What is the process we need to follow and is there any automated way to do this. 
 
 Scenario 2:
  Say I make DR as primary replica and as there are no Record4 in DR replica and it will start insert Record4 with different data. Even though Id is same for Record4 but the data in Primary data center and secondary data center for Record4 is different and how to sync it as lot of other tables depends on Record4. I am thinking to reseed DR to some number say 100 and then changed DR to primary one. Is there any automated or alternate better way to do this. 
  
  What is your experience on above scenarios. Can you please provide some suggestions?
                                

Manoj Jeganathan (1 rep)

Apr 1, 2024, 09:09 AM • Last activity: May 3, 2024, 11:21 AM

0 votes

0 answers

63 views

Migrate cold database (freebsd) to mariaDb on ubuntu

mysql disaster-recovery

I have a little bit of a pickle: I have a freebsd based jail that ran nextcloud with mysqldb The machine running the jail is broken. I still have the jail with the data. Now I was wondering if I can get the previous database over to an ubuntu VM with the current version of mariadb (I don't know what...

                                  I have a little bit of a pickle:
I have a freebsd based jail that ran nextcloud with mysqldb
The machine running the jail is broken. I still have the jail with the data.

Now I was wondering if I can get the previous database over to an ubuntu VM with the current version of mariadb (I don't know what version was running on the freebsd jail)

Can anyone tell me how to do this?

Tried copying the DB files over but either mysql wouldnt start at all or no tables are available in the DB

Any help appreciated. I'd assume at least some other files are required to get that back up and running

best regards

chas0r (1 rep)

Feb 29, 2024, 09:27 PM

Showing page 1 of 20 total questions