Sample Header Ad - 728x90

PCS Stonith (fencing) will kill two node cluster if first is down

2 votes
1 answer
3242 views
I have configured a two node physical server cluster (HP ProLiant DL560 Gen8) using pcs (corosync/pacemaker/pcsd). I have also configured fencing on them using fence_ilo4. The weird thing will happen if one node goes down (under DOWN i mean power OFF), the second node will die as well. Fencing will kill itself causing both servers to be offline. How do i correct this behavior? The thing i tried is to add "wait_for_all: 0" and "expected_votes: 1" in /etc/corosync/corosync.conf under quorum section. But it will still kill it. At some point, some maintenance is to be performed on one of those servers, and it will have to be shutdown. I don't want for the other node to go down if this happens. Here are some outputs [root@kvm_aquila-02 ~]# pcs quorum status Quorum information ------------------ Date: Fri Jun 28 09:07:18 2019 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 2 Ring ID: 1/284 Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 1 Flags: 2Node Quorate Membership information ---------------------- Nodeid Votes Qdevice Name 1 1 NR kvm_aquila-01 2 1 NR kvm_aquila-02 (local) [root@kvm_aquila-02 ~]# pcs config show Cluster Name: kvm_aquila Corosync Nodes: kvm_aquila-01 kvm_aquila-02 Pacemaker Nodes: kvm_aquila-01 kvm_aquila-02 Resources: Clone: dlm-clone Meta Attrs: interleave=true ordered=true Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s) start interval=0s timeout=90 (dlm-start-interval-0s) stop interval=0s timeout=100 (dlm-stop-interval-0s) Clone: clvmd-clone Meta Attrs: interleave=true ordered=true Resource: clvmd (class=ocf provider=heartbeat type=clvm) Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s) start interval=0s timeout=90s (clvmd-start-interval-0s) stop interval=0s timeout=90s (clvmd-stop-interval-0s) Group: test_VPS Resource: test (class=ocf provider=heartbeat type=VirtualDomain) Attributes: config=/shared/xml/test.xml hypervisor=qemu:///system migration_transport=ssh Meta Attrs: allow-migrate=true is-managed=true priority=100 target-role=Started Utilization: cpu=4 hv_memory=4096 Operations: migrate_from interval=0 timeout=120s (test-migrate_from-interval-0) migrate_to interval=0 timeout=120 (test-migrate_to-interval-0) monitor interval=10 timeout=30 (test-monitor-interval-10) start interval=0s timeout=300s (test-start-interval-0s) stop interval=0s timeout=300s (test-stop-interval-0s) Stonith Devices: Resource: kvm_aquila-01 (class=stonith type=fence_ilo4) Attributes: ipaddr=10.0.4.39 login=fencing passwd=0ToleranciJa pcmk_host_list="kvm_aquila-01 kvm_aquila-02" Operations: monitor interval=60s (kvm_aquila-01-monitor-interval-60s) Resource: kvm_aquila-02 (class=stonith type=fence_ilo4) Attributes: ipaddr=10.0.4.49 login=fencing passwd=0ToleranciJa pcmk_host_list="kvm_aquila-01 kvm_aquila-02" Operations: monitor interval=60s (kvm_aquila-02-monitor-interval-60s) Fencing Levels: Location Constraints: Ordering Constraints: start dlm-clone then start clvmd-clone (kind:Mandatory) Colocation Constraints: clvmd-clone with dlm-clone (score:INFINITY) Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: kvm_aquila dc-version: 1.1.19-8.el7_6.4-c3c624ea3d have-watchdog: false last-lrm-refresh: 1561619537 no-quorum-policy: ignore stonith-enabled: true Quorum: Options: wait_for_all: 0 [root@kvm_aquila-02 ~]# pcs cluster status Cluster Status: Stack: corosync Current DC: kvm_aquila-02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum Last updated: Fri Jun 28 09:14:11 2019 Last change: Thu Jun 27 16:23:44 2019 by root via cibadmin on kvm_aquila-01 2 nodes configured 7 resources configured PCSD Status: kvm_aquila-02: Online kvm_aquila-01: Online [root@kvm_aquila-02 ~]# pcs status Cluster name: kvm_aquila Stack: corosync Current DC: kvm_aquila-02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum Last updated: Fri Jun 28 09:14:31 2019 Last change: Thu Jun 27 16:23:44 2019 by root via cibadmin on kvm_aquila-01 2 nodes configured 7 resources configured Online: [ kvm_aquila-01 kvm_aquila-02 ] Full list of resources: kvm_aquila-01 (stonith:fence_ilo4): Started kvm_aquila-01 kvm_aquila-02 (stonith:fence_ilo4): Started kvm_aquila-02 Clone Set: dlm-clone [dlm] Started: [ kvm_aquila-01 kvm_aquila-02 ] Clone Set: clvmd-clone [clvmd] Started: [ kvm_aquila-01 kvm_aquila-02 ] Resource Group: test_VPS test (ocf::heartbeat:VirtualDomain): Started kvm_aquila-01 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Asked by Marko Todoric (437 rep)
Jun 28, 2019, 07:14 AM
Last activity: Jun 28, 2019, 02:38 PM