Sample Header Ad - 728x90

Intermittently failover of my SQL Server resources on Windows Server 2016

0 votes
1 answer
1784 views
I have 2 Windows 2016 VM's running on Vmware ESXi VMware ESXi, 6.7.0, 17700523 with VMDK's as the SQL disks. I have a SQL 2017 AlwaysOn Cluster running on Server 2016. Basically everything is pointing to an issue with the network configuration but for the time being we're stuck without a solution. Has anyone come across a similar issue which tends to failover the resources randomly? SQL Server **First machine : SQLDB01 , 10.20.20.30 Second machine : SQLDB02 , 10.20.20.31 AG Name : SQLDBAG File share witness host : 10.20.20.40** we use VMXNET3 nic's in the Failover Cluster Management – Cluster Event [FTI][Follower] Ignoring duplicate connection: route to remote node found [CHANNEL 10.20.20.30:~62034~] graceful close, status (of previous failure, may not indicate problem) (0) [NETFTAPI] Signaled NetftRemoteUnreachable event, local address 10.20.20.31:3343 remote address 10.20.20.30:3343 [DCM] Force disconnect failed on DisconnectSmbInstance::CSV, status (c000000d) [PULLER SQLDB01] ReadObject failed with GracefulClose(1226)' because of 'channel to remote endpoint fe80::a1b3:e30a:c6a:a379%9:~54878~ is closed' [QUORUM] Node 2: One off quorum (2) [DCM] UpdateClusDiskMembership: ctl 300224 nodeSet (2), status 87 [RCM] Moving orphaned group Cluster Group from downed node SQLDB01 to node SQLDB02. [RES] SQL Server Availability Group : [hadrag] Lease Thread terminated Operational Log: Microsoft Failover Cluster Virtual Adapter (NetFT) has missed more than 40 percent of consecutive heartbeats. -------------------------------------------------------------------------- UPDATED-10/30/2021 1) does the backup network support heartbeat too ? No Also , there is no relationship between backup NIC and failover clustering configuration. Already checked "Do not allow cluster network communication on this network " for BACKUP NIC. 2)How many (virtual) NICs? We have 2 NICs (LAN and BACKUP) 3)Are the VMs on same host? No, different ESX host No any intensive security scans and vMotions. Only I am backing up boot disk.(C Volume image backup) -------------------------------------------------------------------------- timeframes that these events occur: 10/27/2021, 1:00:44 AM Task: Create virtual machine snapshot 10/27/2021, 1:14:21 AM Backup successful 10/27/2021, 1:14:21 AM Task: Remove snapshot 10/27/2021, 1:15:38 AM Virtual machine SQLDB01 disks consolidated successfully -- 10/28/2021 1:14:22 AM --->> Microsoft Failover Cluster Virtual Adapter (NetFT) has missed more than 40 percent of consecutive heartbeats. 10/28/2021 1:14:28 AM ---->> Cluster has lost the UDP connection from local endpoint 10.20.20.30:~3343~ connected to remote endpoint 10.20.20.31:~3343~. 10/28/2021 1:15:35 AM [CHANNEL 10.20.20.31:~3343~]/recv: Failed to retrieve the results of overlapped I/O: 10054 -------------------------------------------------------------------- SQLDB02 events : I am assuming , there is conflict between Veeam replication job and netbackup daily incremental backup job. then I am getting disk consolidation message. but it doesn't happen all the time. 10/28/2021, 1:00:32 AMTask: Create virtual machine snapshot (NETBACKUP) 10/28/2021, 1:00:49 AM User logged event: Source: Veeam Backup Action: Job "SQLDB02_Replication" Operation: Started Status 10/28/2021, 1:00:58 AMTask: Create virtual machine snapshot (VEEAM) 10/28/2021, 1:14:17 AM NetBackup: Backup successful for SQLDB02 10/28/2021, 1:14:18 AMTask: Remove snapshot WARNING : 10/28/2021, 1:15:35 AM Virtual machine SQLDB02 disks consolidation is needed on ESX_IP (NETBACKUP) 10/28/2021, 1:15:35 AM Virtual machine SQLDB02 disks consolidation failed on ESX_IP (NETBACKUP 10/28/2021, 1:16:53 AM NetBackup: Consolidate disk failed for SQLDB02.
Asked by Cell-o (1106 rep)
Oct 29, 2021, 11:12 AM
Last activity: Oct 30, 2021, 06:42 AM