Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
3
votes
1
answers
425
views
SQL Server 2014 Availability Group failed
I have 3 servers on VMware workstation: 1. Win 2012 R2 as ADDC -Cluster service. 2. Win 2012 R2 as SQL server primary replica . 3. Win 2012 R2 as SQL server secondary replica (auto fail-over). My Availability group fail in this scenario: When I disconnect primary replica from the network the cluster...
I have 3 servers on VMware workstation:
1. Win 2012 R2 as ADDC -Cluster service.
2. Win 2012 R2 as SQL server primary replica .
3. Win 2012 R2 as SQL server secondary replica (auto fail-over).
My Availability group fail in this scenario: When I disconnect primary replica from the network the cluster fail to announce a fail-over to secondary replica when I try to manual fail-over from secondary replica instance I get this error:
however this not the case when I stop primary replica from cluster manager because it will do auto fail-over.
My Availability group fail in this scenario: When I disconnect primary replica from the network the cluster fail to announce a fail-over to secondary replica when I try to manual fail-over from secondary replica instance I get this error:

danarj
(169 rep)
Aug 16, 2015, 06:58 AM
• Last activity: Jul 31, 2025, 12:03 PM
0
votes
1
answers
160
views
MySQL Master Slave Asynchronous GTID Replication Failover/Switchover over ProxySQL with little downtime
**This is a production setup and I have to perform a failover from Master A to new potential master B.** **MySQL Version is 5.7 and CentOS 7 is the OS.** I have a simple master slave setup with A as Master and B,C,D,E,F as async slaves. ProxySQL Cluster is also configured. Master A (writer in ProxyS...
**This is a production setup and I have to perform a failover from Master A to new potential master B.**
**MySQL Version is 5.7 and CentOS 7 is the OS.**
I have a simple master slave setup with A as Master and B,C,D,E,F as async slaves. ProxySQL Cluster is also configured.
Master A (writer in ProxySQL) and Slaves B and C (readers in ProxySQL) are added to ProxySQL.
On all the slaves **
log_slave_updates
** is **ON
** and **log_bin
** is also **ON
**.
On **Master A
** **read_only=0
** and **Slaves B,C,D,E,F
** have **read_only=1
**.
This is a **GTID
** based replication with **master_auto_position = 0
**.
**A --> B,C,D,E,F this is the current replication topology
**
Now, I know for failover I have to choose either slave B or slave C because they are on proxy. I choose B as my new potential master. So, I have to move all the slaves from current Master A to new Master C. So the replication topology will be like this
**B --> A,C,D,E,F -- proposed new topology
**
**My question is how do I move the slaves from Master A to new potential Master B?**
**How should I use the coordinates from new Master B? What procedure should I follow?**
Once I move the slaves I guess failover will be easy, I just tweak the **read_only** flag on old master and new master and proxy will take care of the connections.
I would really appreciate if a detailed procedure with explanation is provided.
I have gone through following links but couldn't understand the process
https://stackoverflow.com/questions/43960577/mysql-failover-how-to-choose-slave-as-new-master
https://serverfault.com/questions/303669/how-to-actually-use-mysql-slave-as-soon-the-master-is-failover-or-got-burnt
https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-switch.html
Avinash Pawar
(216 rep)
Aug 3, 2023, 07:15 PM
• Last activity: Jul 24, 2025, 01:00 PM
0
votes
1
answers
52
views
ADO.NET client occasionally attempts connection to SQL Server mirror partner even when principal is online
We're running SQL Server 2016 Standard Edition with database mirroring in synchronous mode and automatic failover enabled. Our .NET applications use connection strings with Failover Partner. Occasionally (1–4× per day), some applications try to connect to the mirror server, even though the prin...
We're running SQL Server 2016 Standard Edition with database mirroring in synchronous mode and automatic failover enabled. Our .NET applications use connection strings with Failover Partner.
Occasionally (1–4× per day), some applications try to connect to the mirror server, even though the principal is online and fully healthy. We see login failures on the mirror (Error 4060 – "Cannot open database ...") but no issues or alerts on the principal side (Redgate, Zabbix, SQL logs, etc.). This affects different databases (we have dozens mirrored) and different apps or users – it’s not tied to one specific service.
Extended Events show only mirror-side errors, with various client hosts and application names.
PerfMon once showed a spike in TCP hard connects around the same time, but nothing consistent. Based on online research such as [Database mirroring login failure attempts on mirror server](https://serverfault.com/questions/236163) , we suspect it may be caused by things like expired connections in the pool, network hiccups, SQL Browser issues, or connection resets (e.g. firewall/NAT dropping idle sockets). When this happens, the .NET SqlClient might believe the primary is unreachable and trigger an immediate failover attempt.
So far this seems to be expected behavior, but it causes false-positive alerts in our monitoring tools.
We're trying to confirm:
1) Is this behavior expected from ADO.NET when Failover Partner is used?
2) Is there any way to reduce/suppress the probing of the mirror?
3) Are there best practices for detecting/logging/tuning this (e.g., connection settings, retry logic)?
4) Should we just tune our monitoring and ignore these errors as “normal noise”?
Jan
(1 rep)
Jul 17, 2025, 02:57 PM
• Last activity: Jul 18, 2025, 01:27 AM
0
votes
3
answers
161
views
Need to configure Availability Groups between three different SQL failover instance
We have total 6 Nodes and 3 failover cluster (Each with two node). - `Cluster1.TestAD.com`(Node1,Node2) - SQL Failover Instance (SQL1) - 1 TB Shared storage - `Cluster2.TestAD.com`(Node3,Node4) - SQL Failover Instance (SQL2)- 1 TB Shared storage - `Cluster3.TestAD.com`(Node5,Node6) - SQL Failover In...
We have total 6 Nodes and 3 failover cluster (Each with two node).
-
Cluster1.TestAD.com
(Node1,Node2) - SQL Failover Instance (SQL1) - 1
TB Shared storage
- Cluster2.TestAD.com
(Node3,Node4) - SQL Failover
Instance (SQL2)- 1 TB Shared storage
- Cluster3.TestAD.com
(Node5,Node6) - SQL Failover Instance (SQL3)- 1 TB Shared storage
Now we want to create availability group setup between above three SQL
instance.
We plan to create new cluster (cluster4.TestAD.com
) with all six node and create Availability groups with all three instance.
SQL Edition : SQL Server 2012 R2 Enterprise
Is it possible? Any other recommendations ?
adkalavadia
(26 rep)
Aug 24, 2016, 09:45 AM
• Last activity: Jul 16, 2025, 09:04 PM
0
votes
1
answers
179
views
MariaDB Galera and Geo redundancy
I am trying to simulate 2 clusters in different datacenters/different locations, using VMWare VSXi and 6 VMs: 3 are located on DC1 and other 3 on DC2. All nodes are RHEL 9. Both DCs communicate through the same VLAN and this works fine. I have installed MariaDB 10.5.22 + Galera. On DC1 , the 3 nodes...
I am trying to simulate 2 clusters in different datacenters/different locations, using VMWare VSXi and 6 VMs: 3 are located on DC1 and other 3 on DC2. All nodes are RHEL 9.
Both DCs communicate through the same VLAN and this works fine.
I have installed MariaDB 10.5.22 + Galera.
On DC1 , the 3 nodes are all data nodes, where node #1 was bootstrapped using galera_new_cluster script
On DC2, 2 nodes are Data nodes and I added 1 Arbitrator
When starting nodes, all 6 are connected on the same cluster, sharing data, and this seems to be working fine.
The problem that I find is when I simulate a network outage: I modify VLAN ID on DC2 and both DCs loose connectivity.
Looking at each node's logs, they effectively loose connectivity, but :
- all nodes (3 from DC1 and 2 from DC2) are unavailable for queries
- Arbitrator keeps retrying to connect to nodes, ex.:
Oct 7 20:08:16 ire-lab-se3 garb-systemd: 2024-10-07 20:08:16.562 INFO: (dd79df53-972e, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://10.20.0.1:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 358971429 cwnd: 1 last_queued_since: 359271583299206 last_delivered_since: 359271583299206 send_queue_length: 0 send_queue_bytes: 0
I was expecting that:
- DC1 would remain available and synced, after all, the bootstrap was on Node#1 and there are 3 nodes available on this network
- Garbd arbitrator would handle the remaining 2 nodes in DC2, allowing applications on DC2 to use the current data
I have tried multiple test scenarios, and this is the only one that is failing (network outage) and I can't seem to be able to solve the issue.
The configuration files on each Galera Data Node is similar to (only node name and ID differ):
#
# These groups are read by MariaDB server.
# Use it for options that only the server (but not clients) should see
#
# See the examples of server my.cnf files in /usr/share/mysql/
#
# this is read by the standalone daemon and embedded servers
[server]
log_error=/var/log/mariadb/mariadb.err
log_warnings=9
#default_time_zone='UTC'
table_definition_cache=4096
table_open_cache=4096
#innodb_read_only_compressed=OFF # only for 10.6 and later
# this is only for the mysqld standalone daemon
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mysqld/mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
pid-file=/run/mariadb/mariadb.pid
# disabling symlinks is recommend to prevent assorted security risks
symbolic_links=0
#enable binary logging
log_bin=/var/log/mariadb/mariadb-bin
log_bin_index=/var/log/mariadb/mariadb-bin.index
#enable relay log files
relay_log=/var/log/mariadb/relay-bin
relay_log_index=/var/log/mariadb/relay-bin.index
log_slave_updates=1
performance_schema=ON
interactive_timeout=180
wait_timeout=180
max_connections=500
#
# * Galera-related settings
#
[galera]
# Mandatory settings
#wsrep_on=ON
#wsrep_provider=
#wsrep_cluster_address=
#binlog_format=row
#default_storage_engine=InnoDB
#innodb_autoinc_lock_mode=2
#
# Allow server to accept connections on all interfaces.
#
#bind-address=0.0.0.0
#
# Optional setting
#wsrep_slave_threads=1
#innodb_flush_log_at_trx_commit=0
# this is only for embedded server
[embedded]
# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
# This group is only read by MariaDB-10.5 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.5]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
wsrep_on=ON
query_cache_size=0
query_cache_type=0
innodb_log_file_size=100M
innodb_file_per_table
innodb_flush_log_at_trx_commit=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
# below parameter should not include Arbitrators, only MariaDB nodes
wsrep_cluster_address="gcomm://10.20.0.1,10.20.0.2,10.20.0.3,10.20.0.4,10.20.0.5"
wsrep_cluster_name='galera_cluster_LAB'
wsrep_node_address='10.20.0.1'
wsrep_node_name='galera_LAB_1'
wsrep_sst_method='rsync'
#'mariabackup'
#wsrep_sst_auth=backupuser;backupuser
server_id=2
wsrep_provider_options='gmcast.segment=1;gcache.size=2G'
#wsrep_sst_donor="galera_1"
and the Gardb Arbitrator is set as:
# Copyright (C) 2012 Codership Oy
# This config file is to be sourced by garb service script.
# A comma-separated list of node addresses (address[:port]) in the cluster
GALERA_NODES="10.20.0.4:4567 10.20.0.5:4567"
# Galera cluster name, should be the same as on the rest of the nodes.
GALERA_GROUP="galera_cluster_LAB"
# Optional Galera internal options string (e.g. SSL settings)
# see https://galeracluster.com/library/documentation/galera-parameters.html
GALERA_OPTIONS=""
# Log file for garbd. Optional, by default logs to syslog
# FOR SOME REASON IT DOESN'T WORK; RETURNS AN ERROR PRBABLY RELATED TO PERMISSIONS
# LOG_FILE="/var/log/mariadb/garbd.log"
# Where to persist necessary data
# WORK_DIR=""
So I think this is pretty much standard....
I googled a lot, but could not find anything that makes this work as expected. The requirements are that, in production, both DCs should communicate freely and if a network outage happens, application servers on both DCs should continue working with the local mariadb galera nodes (querying and inserting new rows, if required) and once the network is up again, all nodes should sync data between each of them (new data from DC1 exported to DC2 and new data from DC2 exported to DC1, and so on....)
Am I missing something here? could anyone help?
Thanks
Sergio
scharrua
(11 rep)
Oct 7, 2024, 09:54 PM
• Last activity: Jul 4, 2025, 08:08 AM
0
votes
1
answers
570
views
Promoting MySQL replica to primary during failover
The MySQL primary fails and I'm trying to promote the replica to primary. Following the [official documentation][1], I've already: 1. disabled the failed primary server on all load balancers 2. waited for the primary server to complete all its transactions: Source has sent all binlog to replica; wai...
The MySQL primary fails and I'm trying to promote the replica to primary.
Following the official documentation , I've already:
1. disabled the failed primary server on all load balancers
2. waited for the primary server to complete all its transactions:
Source has sent all binlog to replica; waiting for more updates
3. set the primary server as super R/O
4. dumped all InnoDB databases from the failed primary server as a precaution
5. stopped the IO_THREAD of the replica
6. waited for the replica to complete all its transactions:
Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
7. stopped the SQL_THREAD of the replica
8. reset the GTID execution history and the binary log of the replica
However, there are discrepancies between the failed primary and the new primary servers:
- **failed primary server status**:
+---------------+----------+--------------+------------------+---------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+---------------------------------------------+
| binlog.000073 | 197 | | | c219365c-bd04-11ed-b1e2-525400003001:1-4620 |
+---------------+----------+--------------+------------------+---------------------------------------------+
- **new primary server status**:
+---------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000001 | 157 | | | |
+---------------+----------+--------------+------------------+-------------------+
We can see that the positions are different and that the new primary has no 'Executed_Gtid_Set'.
Either I missed something, or the doc is incomplete.
Any suggestion?
jean-christophe manciot
(121 rep)
Apr 26, 2023, 02:10 PM
• Last activity: Jun 28, 2025, 11:05 PM
0
votes
1
answers
188
views
How to configure Oracle traffic director for full site failover to have high availability?
I have two server systems, each server having both Apache Tomcat server (deployed web app) and Oracle 11g database server. The database on the two servers is synchronised by Oracle Data Guard for high availibility. Tomcat web server is running individually on each server system with no load balancin...
I have two server systems, each server having both Apache Tomcat server (deployed web app) and Oracle 11g database server. The database on the two servers is synchronised by Oracle Data Guard for high availibility. Tomcat web server is running individually on each server system with no load balancing. One server acts as primary server and other server acts as redundant server.
Usually for any failure on primary server, I have to manually change the IP/Domain name to access the other server.
I have came across Oracle Traffic Director which may solve my isssue to switch any client request to redundant server in case of failure of primary server. Since I am new to Oracle Traffic Director, can anyone guide me to configure OTD for full site failover?
Rishi Gandhi
(1 rep)
Dec 30, 2020, 08:47 AM
• Last activity: Jun 22, 2025, 05:05 AM
0
votes
1
answers
221
views
It's possible switchover a innodb cluster instance as primary?
If I am in topologyMode Single-Primary, it's possible to switchover the primary instance between machines in the cluster? I did not find anything in the documentation. Ie: in my lab I have: topology": { "mysql0:3306": { "address": "mysql0:3306", "memberRole": "PRIMARY", "mode": "R/W", "readReplicas"...
If I am in topologyMode Single-Primary, it's possible to switchover the primary instance between machines in the cluster? I did not find anything in the documentation.
Ie: in my lab I have:
topology": {
"mysql0:3306": {
"address": "mysql0:3306",
"memberRole": "PRIMARY",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE",
"version": "5.7.42"
},
"mysql1:3306": {
"address": "mysql1:3306",
"memberRole": "SECONDARY",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE",
"version": "5.7.42"
How to switch
mysql1
as primary?
**Update:**
Seems to have the command cluster.setPrimaryInstance()
but it did not work in MySQL 5.7, only 8.0. So, there isn't a way to do it in MySQL 5.7?
Astora
(841 rep)
Jun 28, 2023, 12:52 PM
• Last activity: Jun 10, 2025, 10:02 AM
0
votes
1
answers
33
views
Mysql ubuntu solution (failover, failback)
I am a newbie SQL DBA and I am working on a mysql server based on ubuntu operating system. I need to solve the following problems. Is there a way to solve all the following problems with a single program, tool, etc.? And how are organizations addressing the following issues? 1. **Automate** master-t...
I am a newbie SQL DBA and I am working on a mysql server based on ubuntu operating system.
I need to solve the following problems. Is there a way to solve all the following problems with a single program, tool, etc.? And how are organizations addressing the following issues?
1. **Automate** master-to-slave replication
2. **Automate** master-to-slave failover (when the master goes down, the slave automatically becomes the new master)
3. **Automate** slave to master failback (when the master goes up, the old master is automatically replaced)
4. Master and slave **automate** data consistency (After all, master and slaves all data equally)
I am searching on the internet and asking chatgpt and they suggest the following programs, but they are not all in one solutions:
1. Orchestrator
2. InnoDB
3. Group Replication
4. Galera Cluster
Thanks for your answers.
Mavis May
(3 rep)
Jun 5, 2025, 08:03 AM
• Last activity: Jun 5, 2025, 01:43 PM
0
votes
2
answers
337
views
AD DNS manager records duplication with SQL always on failover cluster
We are managing our SQL Always On listeners records in 2 active directory servers (they are syncing each other) and we came across an issue which we couldn't pinpoint the cause for.We are testing MSSQL failovers on our windows DB servers and after the failover we check the DNS manager to see that th...
We are managing our SQL Always On listeners records in 2 active directory servers (they are syncing each other) and we came across an issue which we couldn't pinpoint the cause for.We are testing MSSQL failovers on our windows DB servers and after the failover we check the DNS manager to see that the IP for the relevant listener changed (our DB servers are on multiple subnets).
In some cases (once every few failovers) we see that the listener record duplicates ,meaning we have 2 records for the same listener pointing to different DB servers and not 1 pointing to the primary DB as it should be.
We have checked the availability group parameters which might affect this :
PublishPTRRecords
is set to true for all availability groups and for the cluster.
HostRecordTTL
is set to 15 seconds.
RegisterAllProvidersIP
is set to 0
to assure that only a single IP will be available for each listener.
I would like to ask your opinion regarding possible causes for this issue or a way to monitor the DNS records change process which might allow me to debug .
Thanks

Yishai
(1 rep)
Mar 26, 2024, 08:13 AM
• Last activity: May 18, 2025, 12:03 PM
1
votes
2
answers
277
views
Automatic MYSQL failover
We have a wordpress environment however this is not a wordpress question. Our scenario is that front Ends connect to a MYSQL backend (MYSQL VM1) which is replicated to a backup vm (MYSQL VM2). The replication is Master - Master. What we're looking for is a MYSQL HA cluster with automatic failover wh...
We have a wordpress environment however this is not a wordpress question. Our scenario is that front Ends connect to a MYSQL backend (MYSQL VM1) which is replicated to a backup vm (MYSQL VM2). The replication is Master - Master.
What we're looking for is a MYSQL HA cluster with automatic failover where if VM1 dies / is unresponsive, connection fails over to VM2. Anything you'd recommend ?
joebegborg07
(179 rep)
Mar 8, 2017, 10:11 AM
• Last activity: May 15, 2025, 05:26 AM
0
votes
1
answers
5495
views
SQL Sever Alwayson - Increase the WSFC failover threshold
Im trying to simulate the Failover in my SQL server 2017 alwayson by stopping and starting the SQL server service. It worked 2 times, but after that, the failover is not happening, It showing an error like, >Clustered role 'sqlag' has exceeded its failover threshold. It has exhausted the configured...
Im trying to simulate the Failover in my SQL server 2017 alwayson by stopping and starting the SQL server service.
It worked 2 times, but after that, the failover is not happening, It showing an error like,
>Clustered role 'sqlag' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.
No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.
I tried to change the Alwayson group resource restart timeout values like this. But still the failover is not happening, can someone help me where to increase this failover threshold?
And I tried even the max allowed values, but getting error like this.
>Cluster resource 'sqlag' of type 'SQL Server Availability Group' in clustered role 'sqlag' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

TheDataGuy
(1986 rep)
Sep 20, 2021, 07:20 AM
• Last activity: May 8, 2025, 12:03 AM
1
votes
1
answers
383
views
SQL availability group failover latency experienced is 35 seconds, which is 10 seconds higher than older SQL Mirror failover
**Encountered problem:** Our use-case has 3 x Server 2022 in a topology of a domain controller with two node cluster. This is a SQL Always On AG architecture. The cluster hosts the SQL availability group roles. This is a replacement for the MS deprecated SQL Mirroring architecture. While the older M...
**Encountered problem:**
Our use-case has 3 x Server 2022 in a topology of a domain controller with two node cluster. This is a SQL Always On AG architecture. The cluster hosts the SQL availability group roles. This is a replacement for the MS deprecated SQL Mirroring architecture.
While the older Mirror setup had a hard (physical power or network cable out) failover time of 10 seconds or less failing over to the database, the newer Always On (AO-AG) system takes 21 seconds.
**solutions tried:**
Raw database only failover (with simple, threaded client app) takes 22 seconds with all configurations set to optimal as mentioned briefly below:
Same client application when run in the Mirror architecture setup on same hardware and databases show a less than 10 second failover to new principal DB
- All cluster Role properties and Role Resource properties are set to minimize latency. (image)
- Network is a lab based dedicated setup with no other resources utilizing it.
- RegisterAllProvidersIP and HostRecordTTL has been changed respectively from 1 and 1200 to 0 and 10.
- SQL listener and IP resource roles in use - which is connected to clients before and after failover.
- Manual failover occurs within 5 seconds.
- Auto failover (pulling net or power cable) takes around 22 seconds for HA AO-AG setup. SQL Mirror setup for same takes around 10 seconds.
- MultiSubnetFailover=True and False also has been tried.
Did Microsoft make SQL AO-AG slower than the legacy SQL Mirror?
**How do I speed up the failover time?**
thanks
(https://i.sstatic.net/9YO7D.png) Role Resource properties - these have been changed in many permutations
AllYourBaseAreBelongToUs
(11 rep)
Mar 7, 2024, 05:35 PM
• Last activity: Apr 26, 2025, 06:01 PM
4
votes
1
answers
15329
views
DATABASE lacks a quorum of nodes for high availability
Just after setting up successfully `alwayson` - availability group, testing the failover, I get this error message: TITLE: Microsoft SQL Server Management Studio ------------------------------ Failed to resume data movement in database 'APCORE', which resides on the availability replica 'MY_SERVER'...
Just after setting up successfully
what I have is a Quorum for 3 nodes AlwaysOn Availability Group
my windows server 2016 cluster has a quorum disk set up:
There is a related question here:
Failover Cluster Instance work without quorum
What does that message mean?
alwayson
- availability group,
testing the failover,
I get this error message:
TITLE: Microsoft SQL Server Management Studio
------------------------------
Failed to resume data movement in database 'APCORE', which resides on the availability replica 'MY_SERVER' in the availability group 'AG_TS'. (Microsoft.SqlServer.Smo)
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=14.0.17289.0+((SSMS_Rel_17_4).181117-0805)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476
------------------------------
ADDITIONAL INFORMATION:
An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)
------------------------------
Unable to access database 'APCORE' because it lacks a quorum of nodes for high availability. Try the operation again later. (Microsoft SQL Server, Error: 988)
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=14.00.1000&EvtSrc=MSSQLServer&EvtID=988&LinkId=20476
------------------------------
BUTTONS:
OK
------------------------------


Marcello Miorelli
(17274 rep)
May 8, 2019, 08:10 AM
• Last activity: Apr 18, 2025, 05:07 AM
5
votes
1
answers
6421
views
Postgres Streaming Replication: How to re-sync data in master (with standby) after failover
I am absolutely new to postgres 9.1 replication mechanisms. I have managed to set up streaming replication between a master and standby postgres (Both have x number of records). At one point when the master server fails, using the trigger file mechanism the standby takes over, and accumulates additi...
I am absolutely new to postgres 9.1 replication mechanisms.
I have managed to set up streaming replication between a master and standby postgres (Both have x number of records). At one point when the master server fails, using the trigger file mechanism the standby takes over, and accumulates additional data (say now has x+y number of records).
Now when the Original Master Server comes up, it still has x number of records (which is now the new master). Is there a way to fetch only the delta, i.e. newly added 'y' number of records from the standby and restart as Master.
Or do I have to always take entire base backup?
Nandini
(51 rep)
May 5, 2015, 08:30 AM
• Last activity: Apr 17, 2025, 03:03 AM
5
votes
1
answers
524
views
Mysqlfailover command - No slave is listed in health status
I have successfully created replication using `GTID_MODE`. It works perfectly. Now I need to setup automatic failover feature in it. I have run the following command. mysqlfailover --master=root:abc@10.24.184.12:3306 --discover-slaves-login=root:abc I have got the following results. No slave is list...
I have successfully created replication using
GTID_MODE
. It works perfectly. Now I need to setup automatic failover feature in it. I have run the following command.
mysqlfailover --master=root:abc@10.24.184.12:3306 --discover-slaves-login=root:abc
I have got the following results. No slave is listed.
MySQL Replication Failover Utility
Failover Mode = auto Next Interval = Tue May
Master Information
------------------
Binary Log File Position Binlog_Do_DB Binlog
mysql-bin.000016 9568
GTID Executed Set
8fe8b710-cd34-11e4-824d-fa163e52e544:1-1143
Replication Health Status
0 Rows Found.
Q-quit R-refresh H-health G-GTID Lists U-UUIDs U
But when I execute the mysqlrplcheck
and mysqlrplshow
commands, the slave is listed.
Is this normal?
Aneesha
(51 rep)
May 11, 2015, 11:48 AM
• Last activity: Apr 14, 2025, 11:06 AM
1
votes
3
answers
1337
views
Manually specifying candidate masters in Maxscale
Looking over the documentation, I haven't quite been able to find if its possible to manually set the candidate master status of a server in Maxscale. The reason I am looking for this is because some of the slaves we have hooked up the the master/slave server architecture have inferior hardware to t...
Looking over the documentation, I haven't quite been able to find if its possible to manually set the candidate master status of a server in Maxscale. The reason I am looking for this is because some of the slaves we have hooked up the the master/slave server architecture have inferior hardware to the others, and we don't want them to become the master. Looking at the documentation, it is possible to set the status of a sever as master or slave, but not specify that we don't want it to become a master.
Is there a setting to do this in Maxscale? Possibly in maxadmin or a .cnf file or something?
Thanks.
Bif Stone
(11 rep)
Mar 1, 2016, 02:58 PM
• Last activity: Apr 13, 2025, 07:03 AM
1
votes
1
answers
335
views
Data loss prevention when switching to secondary during failover
During the failover test in Azure SQL Database from primary to secondary, I see that every time there is a loss of 3 seconds worth data during the switch. I want to minimise this data loss. The [documentation][1] states: >To protect critical transactions from data loss, an application developer can...
During the failover test in Azure SQL Database from primary to secondary, I see that every time there is a loss of 3 seconds worth data during the switch. I want to minimise this data loss.
The documentation states:
>To protect critical transactions from data loss, an application developer can call the
sp_wait_for_database_copy_sync
stored procedure immediately after committing the transaction.
How can we achieve this point?
Does this statement mean we need to add a SQL script that is automatically triggered on each commit? Has anyone tried this option?
Is there any other way to optimise the data loss?
Sajit Gangadharan
(111 rep)
Aug 22, 2022, 11:49 AM
• Last activity: Mar 23, 2025, 01:32 AM
2
votes
1
answers
818
views
DB_CHAINING lost on on planned Availability Group failover for SQL Server 2019 cumulative update
I'm looking for ideas on how to determine a plausible root cause for DB_CHAINNG problems encountered when we recently performed a planned AG failover. We're using SQL Server 2019 Standard Edition with "Basic" AGs (meaning that every database is in it's own AG). In "Basic" AGs the Secondary Replicas...
I'm looking for ideas on how to determine a plausible root cause for DB_CHAINNG problems encountered when we recently performed a planned AG failover.
We're using SQL Server 2019 Standard Edition with "Basic" AGs (meaning that every database is in it's own AG). In "Basic" AGs the Secondary Replicas are not readable, only one Primary and one Secondary is possible per-AG, and the commit-mode must be synchronous. Due to licencing (and cross-database queries), all AGs always have their Primary Replicas on the same Instance at any one time.
The Windows cluster is two Windows Server 2019 nodes, with a File Share Witness on a third server.
The purpose of our planned failover was to apply SQL Server 2019 CU10 (from CU8) to both servers, this went without issue. Patch SRV2, failover to SRV2, patch SRV1.
The planned failover of all AGs was fine, no problems at all.
However, the IIS application immediately started failing with permissions problems due to DB_CHAINING being lost. Altering the databases to enable DB_CHAINING addressed the issues. Patching the second server proceeded without incident.
'cross db ownership chaining' is (and has always been) disabled at the server-level on both servers. This is as per the MS recommendation:-
I'm aware that restore or attach can affect this property, please see this article and the comments:- , and that DB_CHAINING can only be changed on the Primary Replica.
The configuration of the servers, databases and AGs, etc. predates my time. I know DB_CHAINING can compromise security, it wasn't my choice, it's an old application.
To pre-empt some potential suggestions, this was not an issue with Login passwords, SIDs, Server Role membership, database Users, Roles or permissions (anything User/Role/permission related is stored in the databases and would not have been affected by an AG failover). This was not an orphaned database User issue.
This was not a problem with client connections to a 'stale' Listener address/IP or similar. Both servers are on the same subnet and all clients reconnected via their Listener network names without any issues. The problem was when the clients successfully connected, they could not see tables, or execute procedures, etc.
As soon as DB_CHAINING was enabled, everything worked as expected - no other changes were needed.
I've confirmed on a test environment that failing over an AG has no effect on DB_CHAINING, it neither enables it nor disables it, on either Replica. This implies that the option was not enabled before I failed-over.
I would normally accept this and ensure that checking, and if necessary, enabling this option is added to the failover plan for the future, however about six weeks ago, we experienced an unplanned automatic failover caused by a switch reboot at the hosting company. On this occasion, all AGs failed-over and the application continued to function without any issues or intervention.
I've reviewed the ERRORLOG files and I have a complete history of when DB_CHAINING was enabled on which databases since the two servers were installed, I can see that no-one re-enabled DB_CHAINING after the unplanned failover. Furthermore, no-one has ever disabled DB_CHAINING on any database on either server.
Has anyone experienced a SQL Server update removing DB_CHAINING from User databases? I've reviewed the patching (setup) logs and can find no mention of anything like this being recorded. The closest I can find is someone reporting chaining issues sometime after applying SQL Server 2016 SP2:- . But the answer suggests changing the application behaviour to workaround it, which I'm not able to do.
is_db_chaining_on currently returns 1 for all (necessary) databases on both servers (which is good evidence that patching doesn't alter this option, otherwise databases on the last patched currently passive might report 0).
I cannot explain how an unplanned failover from SRV2 to SRV1 was OK, but a planned failback six-weeks later to SRV2 from SRV1 had DB_CHAINING issues (when there is no evidence in the logs that anyone/anything disabled the option). It seems like either the update silently removed the property or the property was not set correctly on SRV2 before the planned failover - in which case how was the application functioning (for months) before the unplanned failover six-weeks ago?
Any ideas where I can look, or what I can test for, to identify a plausible root cause would be appreciated.
**EDIT:** thanks for your response AMTwo, but no, there have been no backup/restore or detach/attach events. The databases were created from native backups on both SRV1 and SRV2 in early Dec-2020, since then, there have been no restores of the any databases in question. I'm the only DBA and there's no-one else who could/would have done this, msdb.dbo.restore_history is empty. We also have Redgate SQL Monitor coverage over this pair of servers which would have generated a lot of alerts if anything like a restore was attempted which broke the AG replication.
Rob Dalzell
(856 rep)
May 17, 2021, 11:37 AM
• Last activity: Mar 2, 2025, 08:01 AM
0
votes
2
answers
436
views
Automatic Failover for Basic Availability Group
This will be my first project with Basic availability group so I thought would check before implementing . As per Microsoft documentation below it says we will need to have Software assurance for failover servers for disaster recovery / high availability. https://learn.microsoft.com/en-us/sql/sql-se...
This will be my first project with Basic availability group so I thought would check before implementing .
As per Microsoft documentation below it says we will need to have Software assurance for failover servers for disaster recovery / high availability.
https://learn.microsoft.com/en-us/sql/sql-server/editions-and-components-of-sql-server-version-15?view=sql-server-ver15
Will greatly appreciate if someone can clarify on this as I will need to have the capabilities to do automatic / manual failover without the need to purchase software assurance
SQL_NoExpert
(1117 rep)
Jul 30, 2021, 02:17 PM
• Last activity: Feb 25, 2025, 11:01 PM
Showing page 1 of 20 total questions