Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

3 votes

1 answers

425 views

SQL Server 2014 Availability Group failed

sql-server sql-server-2014 availability-groups failover

I have 3 servers on VMware workstation: 1. Win 2012 R2 as ADDC -Cluster service. 2. Win 2012 R2 as SQL server primary replica . 3. Win 2012 R2 as SQL server secondary replica (auto fail-over). My Availability group fail in this scenario: When I disconnect primary replica from the network the cluster...

                                  I have 3 servers on VMware workstation:

 1. Win 2012 R2 as ADDC -Cluster service. 
 2. Win 2012 R2 as SQL server primary replica .
 3. Win 2012 R2 as SQL server secondary replica (auto fail-over).

 My Availability group fail in this scenario:

 When I disconnect primary replica from the network the cluster fail to announce a fail-over to secondary replica when I try to manual fail-over from secondary replica instance I get this error:

however this not the case when I stop primary replica from cluster manager because it will do auto fail-over.

danarj (169 rep)

Aug 16, 2015, 06:58 AM • Last activity: Jul 31, 2025, 12:03 PM

0 votes

1 answers

160 views

MySQL Master Slave Asynchronous GTID Replication Failover/Switchover over ProxySQL with little downtime

mysql mysql-5.7 failover master-slave-replication proxysql

**This is a production setup and I have to perform a failover from Master A to new potential master B.** **MySQL Version is 5.7 and CentOS 7 is the OS.** I have a simple master slave setup with A as Master and B,C,D,E,F as async slaves. ProxySQL Cluster is also configured. Master A (writer in ProxyS...

                                  **This is a production setup and I have to perform a failover from Master A to new potential master B.**

**MySQL Version is 5.7 and CentOS 7 is the OS.**

I have a simple master slave setup with A as Master and B,C,D,E,F as async slaves. ProxySQL Cluster is also configured.
Master A (writer in ProxySQL) and Slaves B and C (readers in ProxySQL) are added to ProxySQL.

On all the slaves **log_slave_updates** is **ON** and **log_bin** is also **ON**.

On **Master A** **read_only=0** and **Slaves B,C,D,E,F** have **read_only=1**.

This is a **GTID** based replication with **master_auto_position = 0**.

**A --> B,C,D,E,F this is the current replication topology**

Now, I know for failover I have to choose either slave B or slave C because they are on proxy. I choose B as my new potential master. So, I have to move all the slaves from current Master A to new Master C. So the replication topology will be like this

**B --> A,C,D,E,F -- proposed new topology**

**My question is how do I move the slaves from Master A to new potential Master B?**
**How should I use the coordinates from new Master B? What procedure should I follow?**

Once I move the slaves I guess failover will be easy, I just tweak the **read_only** flag on old master and new master and proxy will take care of the connections.

I would really appreciate if a detailed procedure with explanation is provided.

I have gone through following links but couldn't understand the process
https://stackoverflow.com/questions/43960577/mysql-failover-how-to-choose-slave-as-new-master 

https://serverfault.com/questions/303669/how-to-actually-use-mysql-slave-as-soon-the-master-is-failover-or-got-burnt 

https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-switch.html

Avinash Pawar (216 rep)

Aug 3, 2023, 07:15 PM • Last activity: Jul 24, 2025, 01:00 PM

0 votes

1 answers

52 views

ADO.NET client occasionally attempts connection to SQL Server mirror partner even when principal is online

sql-server sql-server-2016 failover mirroring

We're running SQL Server 2016 Standard Edition with database mirroring in synchronous mode and automatic failover enabled. Our .NET applications use connection strings with Failover Partner. Occasionally (1–4× per day), some applications try to connect to the mirror server, even though the prin...

                                  We're running SQL Server 2016 Standard Edition with database mirroring in synchronous mode and automatic failover enabled. Our .NET applications use connection strings with Failover Partner.

Occasionally (1–4× per day), some applications try to connect to the mirror server, even though the principal is online and fully healthy. We see login failures on the mirror (Error 4060 – "Cannot open database ...") but no issues or alerts on the principal side (Redgate, Zabbix, SQL logs, etc.). This affects different databases (we have dozens mirrored) and different apps or users – it’s not tied to one specific service. 

Extended Events show only mirror-side errors, with various client hosts and application names. 

PerfMon once showed a spike in TCP hard connects around the same time, but nothing consistent. Based on online research such as [Database mirroring login failure attempts on mirror server](https://serverfault.com/questions/236163) , we suspect it may be caused by things like expired connections in the pool, network hiccups, SQL Browser issues, or connection resets (e.g. firewall/NAT dropping idle sockets). When this happens, the .NET SqlClient might believe the primary is unreachable and trigger an immediate failover attempt. 

So far this seems to be expected behavior, but it causes false-positive alerts in our monitoring tools. 

We're trying to confirm: 
1) Is this behavior expected from ADO.NET when Failover Partner is used? 
2) Is there any way to reduce/suppress the probing of the mirror? 
3) Are there best practices for detecting/logging/tuning this (e.g., connection settings, retry logic)? 
4) Should we just tune our monitoring and ignore these errors as “normal noise”?

Jan (1 rep)

Jul 17, 2025, 02:57 PM • Last activity: Jul 18, 2025, 01:27 AM

0 votes

3 answers

161 views

Need to configure Availability Groups between three different SQL failover instance

sql-server sql-server-2012 availability-groups clustering failover

We have total 6 Nodes and 3 failover cluster (Each with two node). - `Cluster1.TestAD.com`(Node1,Node2) - SQL Failover Instance (SQL1) - 1 TB Shared storage - `Cluster2.TestAD.com`(Node3,Node4) - SQL Failover Instance (SQL2)- 1 TB Shared storage - `Cluster3.TestAD.com`(Node5,Node6) - SQL Failover In...

                                  We have total 6 Nodes and 3 failover cluster (Each with two node).

 - Cluster1.TestAD.com(Node1,Node2) - SQL Failover Instance (SQL1) - 1
   TB Shared storage
 - Cluster2.TestAD.com(Node3,Node4) - SQL Failover
   Instance (SQL2)- 1 TB Shared storage 
 - Cluster3.TestAD.com(Node5,Node6) - SQL Failover Instance (SQL3)- 1 TB Shared storage

Now we want to create availability group setup between above three SQL instance.

We plan to create new cluster (cluster4.TestAD.com ) with all six node and create Availability groups with all three instance. 

SQL Edition : SQL Server 2012 R2 Enterprise

Is it possible? Any other recommendations ?

adkalavadia (26 rep)

Aug 24, 2016, 09:45 AM • Last activity: Jul 16, 2025, 09:04 PM

0 votes

1 answers

179 views

MariaDB Galera and Geo redundancy

replication mariadb clustering failover galera

I am trying to simulate 2 clusters in different datacenters/different locations, using VMWare VSXi and 6 VMs: 3 are located on DC1 and other 3 on DC2. All nodes are RHEL 9. Both DCs communicate through the same VLAN and this works fine. I have installed MariaDB 10.5.22 + Galera. On DC1 , the 3 nodes are all data nodes, where node #1 was bootstrapped using galera_new_cluster script On DC2, 2 nodes are Data nodes and I added 1 Arbitrator When starting nodes, all 6 are connected on the same cluster, sharing data, and this seems to be working fine. The problem that I find is when I simulate a network outage: I modify VLAN ID on DC2 and both DCs loose connectivity. Looking at each node's logs, they effectively loose connectivity, but : - all nodes (3 from DC1 and 2 from DC2) are unavailable for queries - Arbitrator keeps retrying to connect to nodes, ex.: Oct 7 20:08:16 ire-lab-se3 garb-systemd: 2024-10-07 20:08:16.562 INFO: (dd79df53-972e, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.20.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 358971429 cwnd: 1 last_queued_since: 359271583299206 last_delivered_since: 359271583299206 send_queue_length: 0 send_queue_bytes: 0 I was expecting that: - DC1 would remain available and synced, after all, the bootstrap was on Node#1 and there are 3 nodes available on this network - Garbd arbitrator would handle the remaining 2 nodes in DC2, allowing applications on DC2 to use the current data I have tried multiple test scenarios, and this is the only one that is failing (network outage) and I can't seem to be able to solve the issue. The configuration files on each Galera Data Node is similar to (only node name and ID differ):

#
# These groups are read by MariaDB server.
# Use it for options that only the server (but not clients) should see
#
# See the examples of server my.cnf files in /usr/share/mysql/
#

# this is read by the standalone daemon and embedded servers
[server]
log_error=/var/log/mariadb/mariadb.err
log_warnings=9
#default_time_zone='UTC'
table_definition_cache=4096
table_open_cache=4096
#innodb_read_only_compressed=OFF # only for 10.6 and later

# this is only for the mysqld standalone daemon
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mysqld/mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd 
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
pid-file=/run/mariadb/mariadb.pid

# disabling symlinks is recommend to prevent assorted security risks
symbolic_links=0

#enable binary logging
log_bin=/var/log/mariadb/mariadb-bin
log_bin_index=/var/log/mariadb/mariadb-bin.index

#enable relay log files
relay_log=/var/log/mariadb/relay-bin
relay_log_index=/var/log/mariadb/relay-bin.index

log_slave_updates=1
performance_schema=ON
interactive_timeout=180
wait_timeout=180

max_connections=500

#
# * Galera-related settings
#
[galera]
# Mandatory settings
#wsrep_on=ON
#wsrep_provider=
#wsrep_cluster_address=
#binlog_format=row
#default_storage_engine=InnoDB
#innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0

# this is only for embedded server
[embedded]

# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]

# This group is only read by MariaDB-10.5 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.5]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
wsrep_on=ON
query_cache_size=0
query_cache_type=0
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
# below parameter should not include Arbitrators, only MariaDB nodes
wsrep_cluster_address="gcomm://10.20.0.1,10.20.0.2,10.20.0.3,10.20.0.4,10.20.0.5"
wsrep_cluster_name='galera_cluster_LAB'
wsrep_node_address='10.20.0.1'
wsrep_node_name='galera_LAB_1'
wsrep_sst_method='rsync'
#'mariabackup'
#wsrep_sst_auth=backupuser;backupuser
server_id=2
wsrep_provider_options='gmcast.segment=1;gcache.size=2G'
#wsrep_sst_donor="galera_1"

and the Gardb Arbitrator is set as:

# Copyright (C) 2012 Codership Oy
# This config file is to be sourced by garb service script.

# A comma-separated list of node addresses (address[:port]) in the cluster
 GALERA_NODES="10.20.0.4:4567 10.20.0.5:4567"

# Galera cluster name, should be the same as on the rest of the nodes.
 GALERA_GROUP="galera_cluster_LAB"

# Optional Galera internal options string (e.g. SSL settings)
# see https://galeracluster.com/library/documentation/galera-parameters.html 
 GALERA_OPTIONS=""

# Log file for garbd. Optional, by default logs to syslog
# FOR SOME REASON IT DOESN'T WORK; RETURNS AN ERROR PRBABLY RELATED TO PERMISSIONS
# LOG_FILE="/var/log/mariadb/garbd.log"

# Where to persist necessary data
# WORK_DIR=""

So I think this is pretty much standard.... I googled a lot, but could not find anything that makes this work as expected. The requirements are that, in production, both DCs should communicate freely and if a network outage happens, application servers on both DCs should continue working with the local mariadb galera nodes (querying and inserting new rows, if required) and once the network is up again, all nodes should sync data between each of them (new data from DC1 exported to DC2 and new data from DC2 exported to DC1, and so on....) Am I missing something here? could anyone help? Thanks Sergio

scharrua (11 rep)

Oct 7, 2024, 09:54 PM • Last activity: Jul 4, 2025, 08:08 AM

0 votes

1 answers

570 views

Promoting MySQL replica to primary during failover

mysql replication failover

The MySQL primary fails and I'm trying to promote the replica to primary. Following the [official documentation][1], I've already: 1. disabled the failed primary server on all load balancers 2. waited for the primary server to complete all its transactions: Source has sent all binlog to replica; wai...

The MySQL primary fails and I'm trying to promote the replica to primary. Following the official documentation , I've already: 1. disabled the failed primary server on all load balancers 2. waited for the primary server to complete all its transactions: Source has sent all binlog to replica; waiting for more updates 3. set the primary server as super R/O 4. dumped all InnoDB databases from the failed primary server as a precaution 5. stopped the IO_THREAD of the replica 6. waited for the replica to complete all its transactions: Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates 7. stopped the SQL_THREAD of the replica 8. reset the GTID execution history and the binary log of the replica However, there are discrepancies between the failed primary and the new primary servers: - **failed primary server status**:

+---------------+----------+--------------+------------------+---------------------------------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                           |
+---------------+----------+--------------+------------------+---------------------------------------------+
| binlog.000073 |      197 |              |                  | c219365c-bd04-11ed-b1e2-525400003001:1-4620 |
+---------------+----------+--------------+------------------+---------------------------------------------+

- **new primary server status**:

+---------------+----------+--------------+------------------+-------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000001 |      157 |              |                  |                   |
+---------------+----------+--------------+------------------+-------------------+

We can see that the positions are different and that the new primary has no 'Executed_Gtid_Set'. Either I missed something, or the doc is incomplete. Any suggestion?

jean-christophe manciot (121 rep)

Apr 26, 2023, 02:10 PM • Last activity: Jun 28, 2025, 11:05 PM

0 votes

1 answers

188 views

How to configure Oracle traffic director for full site failover to have high availability?

oracle replication failover proxy

I have two server systems, each server having both Apache Tomcat server (deployed web app) and Oracle 11g database server. The database on the two servers is synchronised by Oracle Data Guard for high availibility. Tomcat web server is running individually on each server system with no load balancin...

                                  I have two server systems, each server having both Apache Tomcat server (deployed web app) and Oracle 11g database server. The database on the two servers is synchronised by Oracle Data Guard for high availibility. Tomcat web server is running individually on each server system with no load balancing. One server acts as primary server and other server acts as redundant server.

Usually for any failure on primary server, I have to manually change the IP/Domain name to access the other server.

I have came across Oracle Traffic Director which may solve my isssue to switch any client request to redundant server in case of failure of primary server. Since I am new to Oracle Traffic Director, can anyone guide me to configure OTD for full site failover?

Rishi Gandhi (1 rep)

Dec 30, 2020, 08:47 AM • Last activity: Jun 22, 2025, 05:05 AM

0 votes

1 answers

221 views

It's possible switchover a innodb cluster instance as primary?

mysql innodb failover mysql-innodb-cluster failover-cluster-instance

If I am in topologyMode Single-Primary, it's possible to switchover the primary instance between machines in the cluster? I did not find anything in the documentation. Ie: in my lab I have: topology": { "mysql0:3306": { "address": "mysql0:3306", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas"...

                                  If I am in topologyMode Single-Primary, it's possible to switchover the primary instance between machines in the cluster? I did not find anything in the documentation.

Ie: in my lab I have:

    topology": {
                "mysql0:3306": {
                    "address": "mysql0:3306",
                    "memberRole": "PRIMARY",
                    "mode": "R/W",
                    "readReplicas": {},
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "5.7.42"
                },
                "mysql1:3306": {
                    "address": "mysql1:3306",
                    "memberRole": "SECONDARY",
                    "mode": "R/O",
                    "readReplicas": {},
                    "role": "HA",
                    "status": "ONLINE",
                    "version": "5.7.42"

How to switch mysql1 as primary?

**Update:**

Seems to have the command cluster.setPrimaryInstance() but it did not work in MySQL 5.7, only 8.0. So, there isn't a way to do it in MySQL 5.7?
                                

Astora (841 rep)

Jun 28, 2023, 12:52 PM • Last activity: Jun 10, 2025, 10:02 AM

0 votes

1 answers

33 views

Mysql ubuntu solution (failover, failback)

mysql replication failover

I am a newbie SQL DBA and I am working on a mysql server based on ubuntu operating system. I need to solve the following problems. Is there a way to solve all the following problems with a single program, tool, etc.? And how are organizations addressing the following issues? 1. **Automate** master-t...

                                  I am a newbie SQL DBA and I am working on a mysql server based on ubuntu operating system.
I need to solve the following problems. Is there a way to solve all the following problems with a single program, tool, etc.? And how are organizations addressing the following issues?
1. **Automate** master-to-slave replication
2. **Automate** master-to-slave failover (when the master goes down, the slave automatically becomes the new master)
3. **Automate** slave to master failback (when the master goes up, the old master is automatically replaced)
4. Master and slave **automate** data consistency (After all, master and slaves all data equally)

I am searching on the internet and asking chatgpt and they suggest the following programs, but they are not all in one solutions:
 1. Orchestrator
 2. InnoDB
 3. Group Replication
 4. Galera Cluster
 
Thanks for your answers.


                                

Mavis May (3 rep)

Jun 5, 2025, 08:03 AM • Last activity: Jun 5, 2025, 01:43 PM

0 votes

2 answers

337 views

AD DNS manager records duplication with SQL always on failover cluster

sql-server availability-groups failover active-directory dns

We are managing our SQL Always On listeners records in 2 active directory servers (they are syncing each other) and we came across an issue which we couldn't pinpoint the cause for.We are testing MSSQL failovers on our windows DB servers and after the failover we check the DNS manager to see that th...

                                  We are managing our SQL Always On listeners records in 2 active directory servers (they are syncing each other) and we came across an issue which we couldn't pinpoint the cause for.We are testing MSSQL failovers on our windows DB servers and after the failover we check the DNS manager to see that the IP for the relevant listener changed (our DB servers are on multiple subnets).

In some cases (once every few failovers) we see that the listener record duplicates ,meaning we have 2 records for the same listener pointing to different DB servers and not 1 pointing to the primary DB as it should be.

We have checked the availability group parameters which might affect this :

PublishPTRRecords is set to true for all availability groups and for the cluster.

HostRecordTTL is set to 15 seconds.

RegisterAllProvidersIP is set to 0 to assure that only a single IP will be available for each listener.

I would like to ask your opinion regarding possible causes for this issue or a way to monitor the DNS records change process which might allow me to debug .

Thanks

Yishai (1 rep)

Mar 26, 2024, 08:13 AM • Last activity: May 18, 2025, 12:03 PM

1 votes

2 answers

277 views

Automatic MYSQL failover

mysql replication clustering high-availability failover

We have a wordpress environment however this is not a wordpress question. Our scenario is that front Ends connect to a MYSQL backend (MYSQL VM1) which is replicated to a backup vm (MYSQL VM2). The replication is Master - Master. What we're looking for is a MYSQL HA cluster with automatic failover wh...

                                  We have a wordpress environment however this is not a wordpress question. Our scenario is that front Ends connect to a MYSQL backend (MYSQL VM1) which is replicated to a backup vm (MYSQL VM2). The replication is Master - Master.

What we're looking for is a MYSQL HA cluster with automatic failover where if VM1 dies / is unresponsive, connection fails over to VM2. Anything you'd recommend ?

joebegborg07 (179 rep)

Mar 8, 2017, 10:11 AM • Last activity: May 15, 2025, 05:26 AM

0 votes

1 answers

5495 views

SQL Sever Alwayson - Increase the WSFC failover threshold

sql-server availability-groups clustering failover

Im trying to simulate the Failover in my SQL server 2017 alwayson by stopping and starting the SQL server service. It worked 2 times, but after that, the failover is not happening, It showing an error like, >Clustered role 'sqlag' has exceeded its failover threshold. It has exhausted the configured...

                                  Im trying to simulate the Failover in my SQL server 2017 alwayson by stopping and starting the SQL server service. 

It worked 2 times, but after that, the failover is not happening, It showing an error like,

>Clustered role 'sqlag' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  
No additional attempts will be made to bring the role online or fail it over to another node in the cluster.  Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

I tried to change the Alwayson group resource restart timeout values like this. But still the failover is not happening, can someone help me where to increase this failover threshold?

And I tried even the max allowed values, but getting error like this.

>Cluster resource 'sqlag' of type 'SQL Server Availability Group' in clustered role 'sqlag' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

TheDataGuy (1986 rep)

Sep 20, 2021, 07:20 AM • Last activity: May 8, 2025, 12:03 AM

1 votes

1 answers

383 views

SQL availability group failover latency experienced is 35 seconds, which is 10 seconds higher than older SQL Mirror failover

sql-server availability-groups clustering failover

**Encountered problem:** Our use-case has 3 x Server 2022 in a topology of a domain controller with two node cluster. This is a SQL Always On AG architecture. The cluster hosts the SQL availability group roles. This is a replacement for the MS deprecated SQL Mirroring architecture. While the older M...

                                  **Encountered problem:**
Our use-case has 3 x Server 2022 in a topology of a domain controller with two node cluster.  This is a SQL Always On AG architecture.  The cluster hosts the SQL availability group roles.  This is a replacement for the MS deprecated SQL Mirroring architecture.

While the older Mirror setup had a hard (physical power or network cable out) failover time of 10 seconds or less failing over to the database, the newer Always On (AO-AG) system takes 21 seconds.



**solutions tried:**
Raw database only failover (with simple, threaded client app) takes 22 seconds with all configurations set to optimal as mentioned briefly below:

Same client application when run in the Mirror architecture setup on same hardware and databases show a less than 10 second failover to new principal DB

- All cluster Role properties and Role Resource properties are set to minimize latency. (image)
- Network is a lab based dedicated setup with no other resources utilizing it.
- RegisterAllProvidersIP and HostRecordTTL has been changed respectively from 1 and 1200 to 0 and 10.
- SQL listener and IP resource roles in use - which is connected to clients before and after failover.
- Manual failover occurs within 5 seconds.
- Auto failover (pulling net or power cable) takes around 22 seconds for HA AO-AG setup. SQL Mirror setup for same takes around 10 seconds.
- MultiSubnetFailover=True and False also has been tried.

Did Microsoft make SQL AO-AG slower than the legacy SQL Mirror?

**How do I speed up the failover time?**

thanks

(https://i.sstatic.net/9YO7D.png)  Role Resource properties - these have been changed in many permutations
                                

AllYourBaseAreBelongToUs (11 rep)

Mar 7, 2024, 05:35 PM • Last activity: Apr 26, 2025, 06:01 PM

4 votes

1 answers

15329 views

DATABASE lacks a quorum of nodes for high availability

sql-server availability-groups clustering sql-server-2017 failover

Just after setting up successfully `alwayson` - availability group, testing the failover, I get this error message: TITLE: Microsoft SQL Server Management Studio ------------------------------ Failed to resume data movement in database 'APCORE', which resides on the availability replica 'MY_SERVER'...

                                  Just after setting up successfully alwayson - availability group,
testing the failover,

I get this error message:

TITLE: Microsoft SQL Server Management Studio
------------------------------

Failed to resume data movement in database 'APCORE', which resides on the availability replica 'MY_SERVER' in the availability group 'AG_TS'. (Microsoft.SqlServer.Smo)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=14.0.17289.0+((SSMS_Rel_17_4).181117-0805)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476 

------------------------------
ADDITIONAL INFORMATION:

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

------------------------------

Unable to access database 'APCORE' because it lacks a quorum of nodes for high availability. Try the operation again later. (Microsoft SQL Server, Error: 988)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=14.00.1000&EvtSrc=MSSQLServer&EvtID=988&LinkId=20476 

------------------------------
BUTTONS:

OK
------------------------------

what I have is a Quorum for 3 nodes AlwaysOn Availability Group 

my windows server 2016 cluster has a quorum disk set up:

There is a related question here:

Failover Cluster Instance work without quorum 

What does that message mean?

Marcello Miorelli (17274 rep)

May 8, 2019, 08:10 AM • Last activity: Apr 18, 2025, 05:07 AM

5 votes

1 answers

6421 views

Postgres Streaming Replication: How to re-sync data in master (with standby) after failover

postgresql replication postgresql-9.1 failover

I am absolutely new to postgres 9.1 replication mechanisms. I have managed to set up streaming replication between a master and standby postgres (Both have x number of records). At one point when the master server fails, using the trigger file mechanism the standby takes over, and accumulates additi...

                                  I am absolutely new to postgres 9.1 replication mechanisms. 

I have managed to set up streaming replication between a master and standby postgres (Both have x number of records). At one point when the master server fails, using the trigger file mechanism the standby takes over, and accumulates additional data (say now has x+y number of records). 

Now when the Original Master Server comes up, it still has x number of records (which is now the new master). Is there a way to fetch only the delta, i.e. newly added 'y' number of records from the standby and restart as Master. 

Or do I have to always take entire base backup?

Nandini (51 rep)

May 5, 2015, 08:30 AM • Last activity: Apr 17, 2025, 03:03 AM

5 votes

1 answers

524 views

Mysqlfailover command - No slave is listed in health status

mysql replication failover

I have successfully created replication using `GTID_MODE`. It works perfectly. Now I need to setup automatic failover feature in it. I have run the following command. mysqlfailover --master=root:abc@10.24.184.12:3306 --discover-slaves-login=root:abc I have got the following results. No slave is list...

                                  I have successfully created replication using GTID_MODE. It works perfectly. Now I need to setup automatic failover feature in it. I have run the following command.

    mysqlfailover --master=root:abc@10.24.184.12:3306 --discover-slaves-login=root:abc

I have got the following results. No slave is listed.

    MySQL Replication Failover Utility
    Failover Mode = auto     Next Interval = Tue May
    
    Master Information
    ------------------
    Binary Log File   Position  Binlog_Do_DB  Binlog
    mysql-bin.000016  9568
    
    GTID Executed Set
    8fe8b710-cd34-11e4-824d-fa163e52e544:1-1143
    
    Replication Health Status
    0 Rows Found.
    Q-quit R-refresh H-health G-GTID Lists U-UUIDs U


But when I execute the mysqlrplcheck and mysqlrplshow commands, the slave is listed.

Is this normal?

                                

Aneesha (51 rep)

May 11, 2015, 11:48 AM • Last activity: Apr 14, 2025, 11:06 AM

1 votes

3 answers

1337 views

Manually specifying candidate masters in Maxscale

mysql mysql-5.6 failover maxscale

Looking over the documentation, I haven't quite been able to find if its possible to manually set the candidate master status of a server in Maxscale. The reason I am looking for this is because some of the slaves we have hooked up the the master/slave server architecture have inferior hardware to t...

                                  Looking over the documentation, I haven't quite been able to find if its possible to manually set the candidate master status of a server in Maxscale.  The reason I am looking for this is because some of the slaves we have hooked up the the master/slave server architecture have inferior hardware to the others, and we don't want them to become the master.  Looking at the documentation, it is possible to set the status of a sever as master or slave, but not specify that we don't want it to become a master.

Is there a setting to do this in Maxscale?  Possibly in maxadmin or a .cnf file or something?

Thanks.

Bif Stone (11 rep)

Mar 1, 2016, 02:58 PM • Last activity: Apr 13, 2025, 07:03 AM

1 votes

1 answers

335 views

Data loss prevention when switching to secondary during failover

azure-sql-database failover

During the failover test in Azure SQL Database from primary to secondary, I see that every time there is a loss of 3 seconds worth data during the switch. I want to minimise this data loss. The [documentation][1] states: >To protect critical transactions from data loss, an application developer can...

                                  During the failover test in Azure SQL Database from primary to secondary, I see that every time there is a loss of 3 seconds worth data during the switch. I want to minimise this data loss. 

The documentation  states:

>To protect critical transactions from data loss, an application developer can call the sp_wait_for_database_copy_sync stored procedure immediately after committing the transaction.

How can we achieve this point?

Does this statement mean we need to add a SQL script that is automatically triggered on each commit? Has anyone tried this option? 
Is there any other way to optimise the data loss?

Sajit Gangadharan (111 rep)

Aug 22, 2022, 11:49 AM • Last activity: Mar 23, 2025, 01:32 AM

2 votes

1 answers

818 views

DB_CHAINING lost on on planned Availability Group failover for SQL Server 2019 cumulative update

sql-server failover sql-server-2019 cumulative-update basic-availability-groups

I'm looking for ideas on how to determine a plausible root cause for DB_CHAINNG problems encountered when we recently performed a planned AG failover. We're using SQL Server 2019 Standard Edition with "Basic" AGs (meaning that every database is in it's own AG). In "Basic" AGs the Secondary Replicas...

                                  I'm looking for ideas on how to determine a plausible root cause for DB_CHAINNG problems encountered when we recently performed a planned AG failover.

We're using SQL Server 2019 Standard Edition with "Basic" AGs (meaning that every database is in it's own AG). In "Basic" AGs the Secondary Replicas are not readable, only one Primary and one Secondary is possible per-AG, and the commit-mode must be synchronous. Due to licencing (and cross-database queries), all AGs always have their Primary Replicas on the same Instance at any one time.

The Windows cluster is two Windows Server 2019 nodes, with a File Share Witness on a third server.

The purpose of our planned failover was to apply SQL Server 2019 CU10 (from CU8) to both servers, this went without issue. Patch SRV2, failover to SRV2, patch SRV1.

The planned failover of all AGs was fine, no problems at all.

However, the IIS application immediately started failing with permissions problems due to DB_CHAINING being lost. Altering the databases to enable DB_CHAINING addressed the issues. Patching the second server proceeded without incident.

'cross db ownership chaining' is (and has always been) disabled at the server-level on both servers. This is as per the MS recommendation:- 

I'm aware that restore or attach can affect this property, please see this article and the comments:- , and that DB_CHAINING can only be changed on the Primary Replica.

The configuration of the servers, databases and AGs, etc. predates my time. I know DB_CHAINING can compromise security, it wasn't my choice, it's an old application.

To pre-empt some potential suggestions, this was not an issue with Login passwords, SIDs, Server Role membership, database Users, Roles or permissions (anything User/Role/permission related is stored in the databases and would not have been affected by an AG failover). This was not an orphaned database User issue.

This was not a problem with client connections to a 'stale' Listener address/IP or similar. Both servers are on the same subnet and all clients reconnected via their Listener network names without any issues. The problem was when the clients successfully connected, they could not see tables, or execute procedures, etc.

As soon as DB_CHAINING was enabled, everything worked as expected - no other changes were needed.

I've confirmed on a test environment that failing over an AG has no effect on DB_CHAINING, it neither enables it nor disables it, on either Replica. This implies that the option was not enabled before I failed-over.

I would normally accept this and ensure that checking, and if necessary, enabling this option is added to the failover plan for the future, however about six weeks ago, we experienced an unplanned automatic failover caused by a switch reboot at the hosting company. On this occasion, all AGs failed-over and the application continued to function without any issues or intervention.

I've reviewed the ERRORLOG files and I have a complete history of when DB_CHAINING was enabled on which databases since the two servers were installed, I can see that no-one re-enabled DB_CHAINING after the unplanned failover. Furthermore, no-one has ever disabled DB_CHAINING on any database on either server.

Has anyone experienced a SQL Server update removing DB_CHAINING from User databases? I've reviewed the patching (setup) logs and can find no mention of anything like this being recorded. The closest I can find is someone reporting chaining issues sometime after applying SQL Server 2016 SP2:- . But the answer suggests changing the application behaviour to workaround it, which I'm not able to do.

is_db_chaining_on currently returns 1 for all (necessary) databases on both servers (which is good evidence that patching doesn't alter this option, otherwise databases on the last patched currently passive might report 0).

I cannot explain how an unplanned failover from SRV2 to SRV1 was OK, but a planned failback six-weeks later to SRV2 from SRV1 had DB_CHAINING issues (when there is no evidence in the logs that anyone/anything disabled the option). It seems like either the update silently removed the property or the property was not set correctly on SRV2 before the planned failover - in which case how was the application functioning (for months) before the unplanned failover six-weeks ago?

Any ideas where I can look, or what I can test for, to identify a plausible root cause would be appreciated.

**EDIT:** thanks for your response AMTwo, but no, there have been no backup/restore or detach/attach events. The databases were created from native backups on both SRV1 and SRV2 in early Dec-2020, since then, there have been no restores of the any databases in question. I'm the only DBA and there's no-one else who could/would have done this, msdb.dbo.restore_history is empty. We also have Redgate SQL Monitor coverage over this pair of servers which would have generated a lot of alerts if anything like a restore was attempted which broke the AG replication.

Rob Dalzell (856 rep)

May 17, 2021, 11:37 AM • Last activity: Mar 2, 2025, 08:01 AM

0 votes

2 answers

436 views

Automatic Failover for Basic Availability Group

high-availability failover sql-server-2019 hadr basic-availability-groups

This will be my first project with Basic availability group so I thought would check before implementing . As per Microsoft documentation below it says we will need to have Software assurance for failover servers for disaster recovery / high availability. https://learn.microsoft.com/en-us/sql/sql-se...

                                  This will be my first project with Basic availability group so I thought would check before implementing .

As per Microsoft documentation below it says we will need to have Software assurance for failover servers for disaster recovery / high availability.

https://learn.microsoft.com/en-us/sql/sql-server/editions-and-components-of-sql-server-version-15?view=sql-server-ver15 

Will greatly appreciate if someone can clarify on this as I will need to have the capabilities to do automatic / manual failover without the need to purchase software assurance

SQL_NoExpert (1117 rep)

Jul 30, 2021, 02:17 PM • Last activity: Feb 25, 2025, 11:01 PM

Showing page 1 of 20 total questions