Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

1 answers

2844 views

Apache resource failed to start in Pacemaker

centos apache-httpd pacemaker high-availability

I am using Pacemaker with Corosync to set up a basic Apache HA cluster with 3 nodes running CentOS7. For some reasons, I cannot get the apache resource started in pcs. Cluster IP: 192.168.200.40 # pcs resource show ClusterIP Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes:...

                                  I am using Pacemaker with Corosync to set up a basic Apache HA cluster with 3 nodes running CentOS7. For some reasons, I cannot get the apache resource started in pcs. 

Cluster IP: 192.168.200.40

    # pcs resource show ClusterIP
         Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
          Attributes: cidr_netmask=24 ip=192.168.200.40
          Operations: monitor interval=20s (ClusterIP-monitor-interval-20s)
                      start interval=0s timeout=20s (ClusterIP-start-interval-0s)
                      stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)



    # pcs resource show WebServer
     Resource: WebServer (class=ocf provider=heartbeat type=apache)
      Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
      Operations: monitor interval=1min (WebServer-monitor-interval-1min)
                  start interval=0s timeout=40s (WebServer-start-interval-0s)
                  stop interval=0s timeout=60s (WebServer-stop-interval-0s)



    # pcs status
    Cluster name: 
    WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
    Stack: corosync
    Current DC: server3.example.com (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
    Last updated: Thu Jun  7 21:59:09 2018
    Last change: Thu Jun  7 21:45:23 2018 by root via cibadmin on server1.example.com
    
    3 nodes configured
    2 resources configured
    
    Online: [ server1.example.com server2.example.com server3.example.com ]
    
    Full list of resources:
    
     ClusterIP	(ocf::heartbeat:IPaddr2):	Started server2.example.com
     WebServer	(ocf::heartbeat:apache):	Stopped
    
    Failed Actions:
    * WebServer_start_0 on server3.example.com 'unknown error' (1): call=49, status=Timed Out, exitreason='',
        last-rc-change='Thu Jun  7 21:46:03 2018', queued=0ms, exec=40002ms
    * WebServer_start_0 on server1.example.com 'unknown error' (1): call=53, status=Timed Out, exitreason='',
        last-rc-change='Thu Jun  7 21:45:23 2018', queued=0ms, exec=40003ms
    * WebServer_start_0 on server2.example.com 'unknown error' (1): call=47, status=Timed Out, exitreason='',
        last-rc-change='Thu Jun  7 21:46:43 2018', queued=1ms, exec=40002ms
    
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled

The httpd instance is **enabled** and **running** on all three nodes.  The cluster IP and individual node IPs are able to access the web page. The ClusterIP resource also works well for failover. What may go wrong for the apache resource in this case? 

Thank you very much!


Update:

Here is more information from the debug output. It seems the Apache is unable to bind to the port, but there is no error from the apache log, and systemctl status httpd gave all green on all nodes. I can open web pages via the cluster IP and each every node IP. The ClusterIP resource failover works fine, too. Any idea on why Apache resource doesn't work with pacemaker?

    # pcs resource debug-start WebServer --full
    Operation start for WebServer (ocf:heartbeat:apache) failed: 'Timed Out' (2)
     >  stderr: ERROR: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80 (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80 no listening sockets available, shutting down AH00015: Unable to open logs
     >  stderr: INFO: apache not running
     >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
     >  stderr: INFO: apache not running
     >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
     >  stderr: INFO: apache not running
     >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
     >  stderr: INFO: apache not running


                                

cody (67 rep)

Jun 8, 2018, 04:16 PM • Last activity: Jul 15, 2025, 02:03 AM

0 votes

1 answers

2303 views

Secondary DRBD node does not auto-start in Pacemaker+Corosync setup

samba nfs pacemaker drbd corosync

I am trying to set up a 2-PC cluster with shared resources: `ClusterIP`, `ClusterSamba`, `ClusterNFS`, `DRBD` (cloned resource), and a `DRBDFS`. The beginning of the project followed the [Clusters from Scratch](https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Clusters_from_Scratch/inde...

I am trying to set up a 2-PC cluster with shared resources: ClusterIP, ClusterSamba, ClusterNFS, DRBD (cloned resource), and a DRBDFS. The beginning of the project followed the [Clusters from Scratch](https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Clusters_from_Scratch/index.html) guide. When everything in this guide is done, it works without problems. So, I wanted to use parts of that guide and build my own setup: I created one shared IP (ClusterIP) that is automatically assigned to one node, and (here is where it gets tricky) on that node, I mount my /dev/drbd1 device to /exports and then share this mount through **SAMBA** and **NFS**. When I start the cluster, all resources come up as they should, _but DRBD does not go up on the secondary node_ (Primary/Unknown). If I bring it up manually, it syncs and works. Also, when I stop the cluster (or forcibly reboot the first node), all resources transfer to the other node and everything works, _except DRBD on the other node goes into an Unknown state_. ### So now, here is the problem: **Why does DRBD go down on the secondary node when I stop the cluster? Or why doesn't it start in the Secondary role on the secondary node?** Sorry if my description is bad. --- ## Here are the commands I used

# apt install -y pacemaker pcs psmisc policycoreutils-python-utils drbd-utils samba nfs-kernel-server 
# systemctl start pcsd.service
# systemctl enable pcsd.service
# passwd hacluster
# pcs host auth alice bob
# pcs cluster setup myCluster alice bob --force
# pcs cluster start --all
# pcs property set stonith-enabled=false
# pcs property set no-quorum-policy=ignore
# modprobe drbd
# echo drbd >/etc/modules-load.d/drbd.conf
# drbdadm create-md r0
# drbdadm up r0
# drbdadm primary r0 --force
# mkfs.ext4 /dev/drbd1
# systemctl disable smbd
# systemctl disable nfs-kernel-server.service 
# mkdir /exports
# vi /etc/samba/smb.conf 
# vi /etc/exports 
# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.1.1.30 cidr_netmask=24 op monitor interval=30s
# pcs resource defaults resource-stickiness=100
# pcs resource op defaults timeout=240s
# pcs resource create ClusterSamba lsb:smbd op monitor interval=60s
# pcs resource create ClusterNFS ocf:heartbeat:nfsserver op monitor interval=60s
# pcs resource create DRBD ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s
# pcs resource promotable DRBD promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 notify=true
# pcs resource create DRBDFS Filesystem device="/dev/drbd1" directory="/exports" fstype="ext4"
# pcs constraint order ClusterIP then ClusterNFS
# pcs constraint order ClusterNFS then ClusterSamba
# pcs constraint order promote DRBD-clone then start DRBDFS
# pcs constraint order DRBDFS then ClusterNFS
# pcs constraint order ClusterIP then DRBD-clone
# pcs constraint colocation ClusterSamba with ClusterIP
# pcs constraint colocation add ClusterSamba with ClusterIP
# pcs constraint colocation add ClusterNFS with ClusterIP
# pcs constraint colocation add DRBDFS with DRBD-clone INFINITY with-rsc-role=Master
# pcs constraint colocation add DRBD-clone with ClusterIP
# pcs cluster stop --all && sleep 2 && pcs cluster start --all

--- ## Configs and stats ### /etc/drbd.d/r0.res

resource r0 {
 device /dev/drbd1;
 disk /dev/sdb;
 meta-disk internal;
 net {
  allow-two-primaries;
 }
 on alice {
  address 10.1.1.31:7788;
 }
 on bob {
  address 10.1.1.32:7788;
 } 
}

--- ### /etc/corosync/corosync.conf

totem {
    version: 2
    cluster_name: myCluster
    transport: knet
    crypto_cipher: aes256
    crypto_hash: sha256
}

nodelist {
    node {
        ring0_addr: alice
        name: alice
        nodeid: 1
    }

    node {
        ring0_addr: bob
        name: bob
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/corosync/corosync.log
    to_syslog: yes
    timestamp: on
}

--- ### pcs status

Cluster name: myCluster
Stack: corosync
Current DC: alice (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Fri May 15 12:28:30 2020
Last change: Fri May 15 11:04:50 2020 by root via cibadmin on bob

2 nodes configured
6 resources configured

Online: [ alice bob ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started alice
 ClusterSamba   (lsb:smbd):     Started alice
 ClusterNFS     (ocf::heartbeat:nfsserver):     Started alice
 Clone Set: DRBD-clone [DRBD] (promotable)
 Masters: [ alice ]
 Stopped: [ bob ]
 DRBDFS (ocf::heartbeat:Filesystem):    Started alice

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

--- ### pcs constraint --full

Location Constraints:

Ordering Constraints:
  start ClusterIP then start ClusterNFS (kind:Mandatory) (id:order-ClusterIP-ClusterNFS-mandatory)
  start ClusterNFS then start ClusterSamba (kind:Mandatory) (id:order-ClusterNFS-ClusterSamba-mandatory)
  promote DRBD-clone then start DRBDFS (kind:Mandatory) (id:order-DRBD-clone-DRBDFS-mandatory)
  start DRBDFS then start ClusterNFS (kind:Mandatory) (id:order-DRBDFS-ClusterNFS-mandatory)
  start ClusterIP then start DRBD-clone (kind:Mandatory) (id:order-ClusterIP-DRBD-clone-mandatory)
  start ClusterIP then promote DRBD-clone (kind:Mandatory) (id:order-ClusterIP-DRBD-clone-mandatory-1)

Colocation Constraints:
  ClusterSamba with ClusterIP (score:INFINITY) (id:colocation-ClusterSamba-ClusterIP-INFINITY)
  ClusterNFS with ClusterIP (score:INFINITY) (id:colocation-ClusterNFS-ClusterIP-INFINITY)
  DRBDFS with DRBD-clone (score:INFINITY) (with-rsc-role:Master) (id:colocation-DRBDFS-DRBD-clone-INFINITY)
  DRBD-clone with ClusterIP (score:INFINITY) (id:colocation-DRBD-clone-ClusterIP-INFINITY)

Ticket Constraints:

--- ### /proc/drbd

version: 8.4.10 (api:1/proto:86-101)
srcversion: 983FCB77F30137D4E127B83 

 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:4 dw:8 dr:17 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:4

Miki (31 rep)

May 15, 2020, 11:12 AM • Last activity: Jun 19, 2025, 10:03 PM

2 votes

1 answers

2579 views

After failover Pacemaker moves resource back when node comes back

linux cluster pacemaker corosync

I'm using Pacemaker & Corosync for my cluster. When a node dies pacemaker moving my resources to another online node. Everything ok here. But when the dead node comes back, Pacemaker moving the resource back. I don't have any "location" line in my config and also I tried with "unmove" command but no...

                                  I'm using Pacemaker & Corosync for my cluster.
When a node dies pacemaker moving my resources to another online node. Everything ok here.
But when the dead node comes back, Pacemaker moving the resource back.
I don't have any "location" line in my config and also I tried with "unmove" command but nothing changed.

I failed at somewhere and need to find the reason. 

**crm configure sh**

    node 1: DEV1
    node 2: DEV2
    primitive poolip IPaddr2 \
    	params ip=10.1.60.33 nic=enp2s0f0 cidr_netmask=24 \
    	meta migration-threshold=2 target-role=Started \
    	op monitor interval=20 timeout=20 on-fail=restart
    primitive gui systemd:gui \
    	op monitor interval=20s \
    	meta target-role=Started
    primitive gui-ip IPaddr2 \
    	params ip=10.1.60.35 nic=enp2s0f0 cidr_netmask=24 \
    	meta migration-threshold=2 target-role=Started \
    	op monitor interval=20 timeout=20 on-fail=restart
    colocation cluster-gui inf: gui gui-ip
    order gui-after-ip Mandatory: gui-ip gui
    property cib-bootstrap-options: \
    	have-watchdog=false \
    	dc-version=2.0.0-1-8cf3fe749e \
    	cluster-infrastructure=corosync \
    	cluster-name=mycluster \
    	stonith-enabled=false \
    	no-quorum-policy=ignore \
    	last-lrm-refresh=1545920437
    rsc_defaults rsc-options: \
    	migration-threshold=10 \
    	resource-stickiness=100


**pcs resource defaults**

    migration-threshold=10
    resource-stickiness=100

**pcs resource show gui**

    Resource: gui (class=systemd type=gui)
     Meta Attrs: target-role=Started
     Operations: monitor interval=20s (gui-monitor-20s)


                                

Ozbit (439 rep)

Jan 2, 2019, 08:58 AM • Last activity: Jun 14, 2025, 09:07 PM

1 votes

1 answers

2618 views

Pacemaker Virtual IP cannot be routed outside of its network

pacemaker high-availability

I have a server cluster consisted of following setup: 2 Virtual Servers with 2 NIC's. eth0 (private network 10.0.0.0/16) and eth1 (public network 77.1.2.0/24 with gateway as 77.1.2.1) For HA-01 VPS i have Private IP on eth0 set as 10.0.0.1 For HA-02 VPS i have Private IP set on eth0 as 10.0.0.2 Pace...

                                  I have a server cluster consisted of following setup:

2 Virtual Servers with 2 NIC's. eth0 (private network 10.0.0.0/16) and eth1 (public network 77.1.2.0/24 with gateway as 77.1.2.1)

For HA-01 VPS i have Private IP on eth0 set as 10.0.0.1
For HA-02 VPS i have Private IP set on eth0 as 10.0.0.2

Pacemaker/Corosync Cluster has been established between private IP addresses and Virtual IP (77.1.2.4) defined as clone Resource (IPAddr2) so it can float between two nodes.

    pcs resource create VirtualIP1 ocf:heartbeat:IPaddr2 ip="77.1.2.4" cidr_netmask="24" nic="eth1" clusterip_hash="sourceip-sourceport" op start interval="0s" timeout="60s" op monitor interval="1s" timeout="20s" op stop interval="0s" timeout="60s" clone interleave=true ordered=true

Problem is, i cannot reach that IP address from world. I noticed that there is a route missing, so i add the static route

    ip r add default via 77.1.2.1 dev eth1

But i still cannot ping google.com from those servers nor world can see them on that IP.
I also tried adding IP addresses from same subnet on eth1 like this:

    HA-01 eth1: 77.1.2.2
    HA-02 eth1: 77.1.2.3

Servers can be seen on those IPs by world but if i add VirtualIP resource i cannot reach them on Virtual IP address.
I also tried adding a source ip in routing table

    ip r add default via 77.1.2.1 src 77.1.2.4

to no avail. I don't know what am i supposed to do to get this VirtualIP working.
I can reach 77.1.2.4 (Virtual IP Address) from other servers on that network, but not outside that network.

Firewall is established and high availability ports are passed via command

    firewall-cmd --add-service="high availability"; firewall-cmd --add-service="high availability" --permanent

Is there anything here that i am missing? 
If i add that address (77.1.2.4 - Virtual IP) alone on the interface of only one of those servers, it will work.... So is there an issue with ARP table perhaps or maybe router blocking some traffic?

Marko Todoric (437 rep)

Jul 19, 2019, 02:54 PM • Last activity: Apr 15, 2025, 03:08 AM

1 votes

1 answers

25 views

Drbd promote only after stonith started

linux pacemaker drbd

I want that DRBD-based 2 nodes cluster start resources in the following order - 1. on both nodes start stonith:fence_ipmilan 2. on one node drbd-clone promote 3. on the same node as drbd promote, start all NFS server resources (ip, export,…) But how to tell pacemaker promote drbd-clone only after st...

                                  I want that DRBD-based 2 nodes cluster start resources in the following order -

1. on both nodes start stonith:fence_ipmilan
2. on one node drbd-clone promote
3. on the same node as drbd promote, start all NFS server resources (ip, export,…)

But how to tell pacemaker promote drbd-clone only after started stonith:fence_ipmilan on each of two nodes ?

I tried

pcs constraint order set ipmi-fence-memverge ipmi-fence-memverge2 action=start require-all=true sequential=false set ha-nfs-clone action=promote sequential=false require-all=false

it seems that stonith:fence_ipmilan and drbd-clone promote start simultaneously…

Anton

Anton Gavriliuk (11 rep)

Feb 4, 2025, 03:03 PM • Last activity: Feb 4, 2025, 07:12 PM

1 votes

2 answers

8573 views

pcs stonith not working

pacemaker corosync pcs

i have 2 virtual centos7 nodes , root can login passwordless among themself, i have configured stonith like this but the services are not coming up, fencing is not happening , im new to this, could someone help me rectify issue~ [root@node1 cluster]# pcs stonith create nub1 fence_virt pcmk_host_list...

                                  i have 2 virtual centos7 nodes , root can login passwordless among themself,

i have configured stonith like this but the services are not coming up, fencing is not happening , im new to this, could someone help me rectify issue~ 

    [root@node1 cluster]# pcs stonith create nub1 fence_virt pcmk_host_list="node1"
    [root@node1 cluster]# pcs stonith create nub2 fence_virt pcmk_host_list="node2"
    [root@node1 cluster]# pcs stonith show
     nub1   (stonith:fence_virt):   Stopped
     nub2   (stonith:fence_virt):   Stopped
    [root@node1 cluster]#
    [root@node1 cluster]#
    [root@node1 cluster]#
    [root@node1 cluster]#
    [root@node1 cluster]# pcs status
    Cluster name: mycluster
    Stack: corosync
    Current DC: node2 (version 1.1.15-11.el7_3.5-e174ec8) - partition with quorum
    Last updated: Tue Jul 25 07:03:37 2017          Last change: Tue Jul 25 07:02:00 2017 by root via cibadmin on node1
    
    2 nodes and 3 resources configured
    
    Online: [ node1 node2 ]
    
    Full list of resources:
    
     ClusterIP      (ocf::heartbeat:IPaddr2):       Started node1
     nub1   (stonith:fence_virt):   Stopped
     nub2   (stonith:fence_virt):   Stopped
    
    Failed Actions:
    * nub1_start_0 on node1 'unknown error' (1): call=56, status=Error, exitreason='none',
        last-rc-change='Tue Jul 25 07:01:34 2017', queued=0ms, exec=7006ms
    * nub2_start_0 on node1 'unknown error' (1): call=58, status=Error, exitreason='none',
        last-rc-change='Tue Jul 25 07:01:42 2017', queued=0ms, exec=7009ms
    * nub1_start_0 on node2 'unknown error' (1): call=54, status=Error, exitreason='none',
        last-rc-change='Tue Jul 25 07:01:26 2017', queued=0ms, exec=7010ms
    * nub2_start_0 on node2 'unknown error' (1): call=60, status=Error, exitreason='none',
        last-rc-change='Tue Jul 25 07:01:34 2017', queued=0ms, exec=7013ms
    
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled





    [root@node1 cluster]# pcs stonith fence node2
    Error: unable to fence 'node2'
    Command failed: No route to host
    
    [root@node1 cluster]# pcs stonith fence nub2
    Error: unable to fence 'nub2'
    Command failed: No such device
    
    [root@node1 cluster]# ping node2
    PING node2 (192.168.100.102) 56(84) bytes of data.
    64 bytes from node2 (192.168.100.102): icmp_seq=1 ttl=64 time=0.247 ms
    64 bytes from node2 (192.168.100.102): icmp_seq=2 ttl=64 time=0.304 ms
    ^C
    --- node2 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1001ms
    rtt min/avg/max/mdev = 0.247/0.275/0.304/0.032 ms


                                

Mohammed Ali (691 rep)

Jul 25, 2017, 11:10 AM • Last activity: Feb 10, 2024, 02:01 AM

-3 votes

1 answers

373 views

Resource Group for file share not starting

pacemaker

I have a client that is trying to configure a file share for a 2 node cluster. Running this command seems to fix it but as soon as we switch over, it stops again. Any ideas? [![status][1]][1] [1]: https://i.sstatic.net/yv4Fs.png

                                  I have a client that is trying to configure a file share for a 2 node cluster.  Running this command seems to fix it but as soon as we switch over, it stops again.  Any ideas?

David Kranes (1 rep)

Dec 18, 2023, 02:35 PM • Last activity: Dec 28, 2023, 02:28 PM

0 votes

2 answers

2897 views

RHEL High-Availability Cluster using pcs, configuring service as a resource

rhel cluster pacemaker high-availability pcs

I have a 2 node cluster on RHEL 6.9. Everything is configured except I'm having difficulty with an application launched via shell script that created into a service (in `/etc/init.d/myApplication`), which I'll just call "myApp" . From that application, I did a `pcs resource create myApp lsb:myApp op...

                                  I have a 2 node cluster on RHEL 6.9. Everything is configured except I'm having difficulty with an application launched via shell script that created into a service (in /etc/init.d/myApplication), which I'll just call "myApp". From that application, I did a pcs resource create myApp lsb:myApp op monitor interval=30s op start on-fail=standby. I am new to using this suite of software but it's for work. What I need is for this application to be launched on both nodes simultaneously as it has to be started manually so if the first node fails, it would need intervention if it were not already active on the passive node. 

I have two other services:  
-VirtIP (ocf:heartbeat:IPaddr2) for providing a service IP for the application server  
-Cron (lsb:crond) to synchronize the application files (we are not using shared storage)

I have the VirtIP and Cron as dependents via colocation to myApp.

I've tried master/slave as well as cloning but I must be missing something regarding their config. If I take the application offline, pacemaker does not detect the service has gone down and pcs status outputs that myApp is still running on the node (or nodes depending on my config). I'm also sometimes getting the issue that the service running the app is stopped by pacemaker on the passive node. 

Which is the way I need to configure this? I've gone through the RHEL documentation but I'm still stuck. How do I get pacemaker to initiate failover if myApp service goes down? I don't know why it's not detecting the service has stopped in some cases.

EDIT: So for testing purposes, I removed the password requirement for starting/restarting and the service starts/restarts fine as expected and the colocation dependent resources stop/start as expected. But stopping the myApp service does not reflect as a stopped resource but simply stays at Started node1. Likewise, simulating a failover via putting node1 into standby simply stops all resources on node1.

Greg (187 rep)

Sep 29, 2017, 07:52 AM • Last activity: Sep 6, 2023, 09:56 PM

0 votes

1 answers

358 views

Prevent promotion on specific node in Pacemaker

pacemaker drbd

I have a drbd + pacemaker cluster with three nodes, one being a quorum device only. I'm trying to configure the pacemaker resource so that the promotable drbd-resource should run on all three devices, but it should never be promoted on the quorum device. I've tried setting location constraints on th...

                                  I have a drbd + pacemaker cluster with three nodes, one being a quorum device only. I'm trying to configure the pacemaker resource so that the promotable drbd-resource should run on all three devices, but it should never be promoted on the quorum device. I've tried setting location constraints on the resource, but that results in pacemaker not starting the resource at all on the quorum device so drbd can't keep quorum on a failover.

The desired state would be:
- drbd resource is started on all three nodes
- drbd resource is promotable
- pacemaker will never promote the quorum device 

I can't find anything in the documentation, but what I'm envisioning would be a parameter like "don't promote on device X" that I have missed for the drbd resource.
                                

comrain (3 rep)

Mar 13, 2023, 07:23 AM • Last activity: Mar 13, 2023, 06:55 PM

0 votes

1 answers

11525 views

DRBD - 'node1' not defined in your config (for this host) - Error when setting Primary

hostname hosts drbd pacemaker corosync

I am getting the following error when trying to set the Primary node for DRBD. 'node1' not defined in your config (for this host). I know this is related to DNS/Hostname/Hosts and the config clusterdb.res. I know this because I originally got an error when trying to start clusterdb.res if node1 didn...

                                  I am getting the following error when trying to set the Primary node for DRBD.

    'node1' not defined in your config (for this host).

I know this is related to DNS/Hostname/Hosts and the config clusterdb.res. I know this because I originally got an error when trying to start clusterdb.res if node1 didn't resolve correctly. So what confuses me is that I can start the clusterdb.res if either use:



*I have used this command on the hosts*

    hostnamectl set-hostname $(uname -n | sed s/\\..*//)

To make the hostname resolve to node1 instead of node1.localdomain

Or add node1.localdomain to the config, either works. But I have tried all combinations and can't seem to get this command to take :

    drbdadm primary --force node1 && cat /proc/drbd

**My Configs**

/etc/drbd.d/clusterdb.res

    resource clusterdb{
     	protocol C;
    	meta-disk internal;
 	    device /dev/drbd0;

    startup {
    	wfc-timeout 30;
    	outdated-wfc-timeout 20;
    	degr-wfc-timeout 30;
    }

    net {
    	cram-hmac-alg sha1;
    	shared-secret sync_disk;
    }


    syncer {
    	rate 10M;
	    al-extents 257;
	    on-no-data-accessible io-error;
	    verify-alg sha1;
    }
    on node1 {
    	disk /dev/sda3;
    	address 192.168.1.216:7788;
    }
    on node2 {
    	disk /dev/sda3;
    	address 192.168.1.217:7788;
    }
    }

/etc/hosts :


    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.1.216 node1
    192.168.1.217 node2

/etc/hostname

    node#

My full write up ATM (wip) 

**Edits :**

     [root@node1 ~]# hostname
     node1
     [root@node1 ~]# cat /etc/hosts
     127.0.0.1   localhost localhost.localdomain localhost4      localhost4.localdomain4
     ::1         localhost localhost.localdomain localhost6      localhost6.localdomain6
     127.0.1.1     node1
     192.168.1.216 node1
     192.168.1.217 node2
     [root@node1 ~]#

Update: I have gotten this to work with LVM following this guide  exactly, so I think my issue actually lies with the following lines of code. But for now I think i will stick with LVM since it works, unless somebody else really wants to work on this. (My working LVM writeup)  

    device /dev/drbd0;
or

     device /dev/drbd0; 

The reason I say this, is I used the same hosts/hostname/shortname/ip_addr but LVM and it worked, but maybe I missed something the first time, I fixed in my new VM Template (I started from scratch to build LVM)


                                

FreeSoftwareServers (2682 rep)

May 1, 2016, 01:59 AM • Last activity: Mar 8, 2023, 02:50 AM

4 votes

1 answers

1660 views

Clustered NFS server reply ERR 24: Auth Bogus Credentials (seal broken)

centos mount nfs cluster pacemaker

I have 4 servers on the VirtualBox. Two of the servers are a CentOS 7 cluster with Pacemaker(corosync), and they have an NFSv4 server in Active/Passive mode. There are also 2 clients with CentOS 6, also using this NFS server. The problem does not always occur, but sometimes when I manually or automa...

                                  I have 4 servers on the VirtualBox. Two of the servers are a CentOS 7 cluster with Pacemaker(corosync), and they have an NFSv4 server in Active/Passive mode. There are also 2 clients with CentOS 6, also using this NFS server.

The problem does not always occur, but sometimes when I manually or automatically failover from the active NFS server cluster, both clients give the error: *Permission denied.* The tcpdump from the clients shows:  

    [17:24:29.271467] IP client.example.net.34236755563 > server.example.net.nfs 112 getattr [|nfs]
    [17:24:29.271619] IP server.example.net.nfs > client.example.net.3423675563: reply ERR 24: Auth Bogus Credentials (seal broken)

Until this problem is solved nothing is working: I have tried to transfer to NFSv3, tried different cluster configurations, tried a grace period for NFSv4 from 10 to 90 seconds, with no luck. 

Cluster configuration:

    node 1: storage1
    node 2: storage2
    primitive p_drbd_nfs ocf:linbit:drbd \
        params drbd_resource=cgp \
        op monitor interval=31s role=Master \
        op monitor interval=29s role=Slave \
        op start interval=0 timeout=240s \
        op stop interval=0 timeout=120s
    primitive p_fs_home Filesystem \
        params device="/dev/drbd0" directory="/mnt" fstype=xfs options="noatime,nobarrier" \
        op monitor interval=10s \
        meta is-managed=true
    primitive p_ip_nfs IPaddr2 \
        params ip=192.168.56.100 cidr_netmask=24 \
        op monitor interval=30s \
        meta is-managed=true
    primitive p_nfs_exports exportfs \
        params fsid=0 directory="/mnt" options="rw,async,no_wdelay,mountpoint,insecure,no_subtree_check,no_root_squash" clientspec="192.168.56.0/255.255.255.0" wait_for_leasetime_on_stop=true rmtab_backup=none \
        op monitor interval=10s \
        op stop interval=0 timeout=120s \
        meta is-managed=true
    primitive p_nfsserver nfsserver \
        params grace_time=90 proc_num=16 \
        op monitor interval=30s \
        meta is-managed=true
    primitive p_ping ocf:pacemaker:ping \
        params host_list=192.168.56.1 multiplier=1000 attempts=1 timeout=3 name=p_ping \
        op monitor interval=5 timeout=60
    ms ms_drbd_nfs p_drbd_nfs \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true is-managed=true
    clone cl_p_ping p_ping \
        meta is-managed=true target-role=Started
    location l_0 ms_drbd_nfs \
        rule $role=Master -inf: not_defined p_ping or p_ping lte 0
    colocation c_1 inf: p_fs_home ms_drbd_nfs:Master
    colocation c_2 inf: p_nfsserver p_fs_home
    colocation c_3 inf: p_nfs_exports p_nfsserver
    colocation c_4 inf: p_ip_nfs p_nfs_exports
    order o_1 inf: ms_drbd_nfs:promote p_fs_home:start
    order o_2 inf: p_fs_home p_nfsserver
    order o_3 inf: p_nfsserver p_nfs_exports
    order o_4 inf: p_nfs_exports p_ip_nfs
    property cib-bootstrap-options: \
        dc-version=1.1.10-32.el7_0.1-368c726 \
        cluster-infrastructure=corosync \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        last-lrm-refresh=1428329105
    rsc_defaults rsc-options: \
        resource-stickiness=200

Here is a string from the client fstab file:

    192.168.56.100:/        /mnt                    nfs     nfsvers=4,proto=tcp,rsize=32768,wsize=32768,hard,timeo=300,retrans=2,bg,actimeo=3,noatime,nodiratime        0 0



                                

Max Karpenkov (41 rep)

Apr 9, 2015, 03:11 PM • Last activity: Jan 8, 2023, 02:02 AM

0 votes

1 answers

396 views

How to tell if a VG is clustered?

cluster pacemaker

I have a CentOS 7 Pacemaker cluster with GFS2 Filesystrems mounted. I'm fairly certain that `vgchange -cy vg_name` was NOT run during setup. I tried running `vgchange --test -cy vg_name` and it tells me the volume group is already clustered. In Linux 6 `service clvmd status` will show if the vg is c...

                                  I have a CentOS 7 Pacemaker cluster with GFS2 Filesystrems mounted.  I'm fairly certain that vgchange -cy vg_name was NOT run during setup.  I tried running vgchange --test -cy vg_name and it tells me the volume group is already clustered.  

In Linux 6 service clvmd status will show if the vg is clustered or not.  However on Linux 7 pcs resource show clvmd output is quite different and I'm not sure what to look for.

    pcs resource show clvmd
         Resource: clvmd (class=ocf provider=heartbeat type=clvm)
         Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)

    start interval=0s timeout=90s (clvmd-start-interval-0s)
    stop interval=0s timeout=90s (clvmd-stop-interval-0s)

Would creating the filesystem resources have done the vgchange if needed?  Is there anything else I can check?    
                                

ex_submariner (1 rep)

Sep 22, 2022, 05:56 PM • Last activity: Oct 2, 2022, 02:25 AM

1 votes

1 answers

488 views

Where are libvirt's VM definitions "originals" stored, and how to sync them across multiple nodes?

libvirt cluster xen pacemaker

Migrating from Xen's `xm` to Xen's `xl` under control of libvirt, I wonder: Where does libvirt store the "originals" of VM configurations? I found that my PVM configurations are stored in `/etc/libvirt/libxl/`, but when viewing such files I see a comment saying that file has been created automatical...

                                  Migrating from Xen's xm to Xen's xl under control of libvirt, I wonder:

Where does libvirt store the "originals" of VM configurations?

I found that my PVM configurations are stored in /etc/libvirt/libxl/, but when viewing such files I see a comment saying that file has been created automatically and I should not be edited ("use virsh edit ...").
I also found XML and JSON files in /var/lib/xen, named after the Domain ID and UUID of the VM.

As I'm configuring a HA cluster, I'd like to synchronize VM configurations across all cluster nodes (allowing live-migration).
In the past syncing /etc/xen/vm was enough, but for libvirt it seems to be much more complicated:
Sometimes I'll have to virsh define a VM from the XML file, virsh destroy does not only destroy the running VM, but also the configuration it seems, and virsh undefine also removed the XML file in /etc/libvirt/libxl/ it seems.
I don't know how to synchronize the configuration across the cluster nodes.

The major problem I see is this:
After csync2-ing the XML files defining the VM configurations to the other cluster nodes, I see the changes in the /etc/libvirt/libxl/ files, saying "do not edit; use virsh edit instead".

However when I use virsh edit for one of those files, the contents I see in the editor is not what I see in the XML files located in /etc/libvirt/libxl/.

Maybe re-phrase the question to:
*If I update the XML files in /etc/libvirt/libxl/ (like via csync2), how can I ensure that libvirt uses the updated configurations?*

Update
---
This question became more important after I had added a block device for paging using xl block-attach corresponding to the edited configuration.
When the VM was live-migrated to another node in the cluster, the added disk was not transferred to the VM, so the VM froze when trying to access that disk.
So obviously the configuration of the current machine was not used for live-migration, and the saved configuration in the XML files weren't used either.
                                

U. Windl (1715 rep)

Feb 17, 2021, 12:08 PM • Last activity: Sep 8, 2022, 09:52 AM

0 votes

1 answers

200 views

Convert puppet manifest config to hiera

ruby puppet yaml pacemaker corosync

I installed corosync-pacemaker cluster via puppet. Now I would to like keep my data into hiera file. How should I convert cs_primitive section into yaml file? cs_primitive { 'nfsshare_fs': primitive_class => 'ocf', primitive_type => 'Filesystem', provided_by => 'heartbeat', parameters => { 'device'...

                                  I installed corosync-pacemaker cluster via puppet. Now I would to like keep my data into hiera file. How should I convert cs_primitive section into yaml file?

    cs_primitive { 'nfsshare_fs':
      primitive_class => 'ocf',
      primitive_type  => 'Filesystem',
      provided_by     => 'heartbeat',
      parameters      => { 'device' => '/dev/disk/lvname', 'directory' => '/share', 'fstype' => 'ext4' },
    }->



I tried the below code but it didn't work.

    corosync::cs_primitive:
      'nfsshare_fs':
        primitive_class: 'ocf'
        primitive_type: 'Filesystem'
        provided_by: 'heartbeat'
        parameters:
          device: '/dev/disk/by-id/lvname'
          directory: '/share'
          fstype: 'ext4'

Thanks.
                                

fortunate1357 (1 rep)

Apr 4, 2022, 06:21 PM • Last activity: Jul 14, 2022, 07:27 PM

1 votes

1 answers

1468 views

Debian 10 Pacemaker-Cluster: GFS2 Mount fails because of "Global lock failed: check that global lockspace is started."

debian cluster pacemaker

I'm trying to setup a new Debian 10 cluster with three instances. My stack is based on pacemaker, corosync, dlm, and lvmlockd with a GFS2 volume. All servers have access to the GFS2 volume but I can't mount it with pacemaker or manually when using the GFS2 filesystem. I configured corosync and all t...

                                  I'm trying to setup a new Debian 10 cluster with three instances. My stack is based on pacemaker, corosync, dlm, and lvmlockd with a GFS2 volume. All servers have access to the GFS2 volume but I can't mount it with pacemaker or manually when using the GFS2 filesystem. I configured corosync and all three instances are online. I continued with dlm and lvm configuration. Here my configuration steps for LVM and pacemaker:
    
    LVM:
    sudo nano /etc/lvm/lvm.conf --> Set locking_type = 1 and use_lvmlockd = 1   

    Pacemaker Resources:
    sudo pcs -f stonith_cfg stonith create meatware meatware hostlist="firmwaredroid-swarm-1 firmwaredroid-swarm-2 firmwaredroid-swarm-3" op monitor interval=60s
    sudo pcs resource create dlm ocf:pacemaker:controld \
        op start timeout=90s interval=0 \
        op stop timeout=100s interval=0
    sudo pcs resource create lvmlockd ocf:heartbeat:lvmlockd \
        op start timeout=90s interval=0 \
        op stop timeout=100s interval=0
    sudo pcs resource group add base-group dlm lvmlockd
    sudo pcs resource clone base-group \
        meta interleave=true ordered=true target-role=Started
 
The pcs status shows that all resources are up and online. After the pacemaker configuration I tried to setup a shared Volume Group to add the Filesystem resource to pacemaker but all the commands fail with Global lock failed: check that global lockspace is started.

    sudo pvcreate /dev/vdb
    --> Global lock failed: check that global lockspace is started
    sudo vgcreate vgGFS2 /dev/vdb —shared
    --> Global lock failed: check that global lockspace is started

I then tried to directly format the /dev/vdb with mkfs.gfs2 which works but seems to me a step in the wrong direction, because mounting the volume then always fails: 

    sudo mkfs.gfs2 -p lock_dlm -t firmwaredroidcluster:gfsvolfs -j 3 /dev/gfs2share/lvGfs2Share
    sudo mount -v -t "gfs2" /dev/vdb ./swarm_file_mount/
    mount: /home/debian/swarm_file_mount: mount(2) system call failed: Transport endpoint is not connected.

 I tried several configurations like starting lvmlockd -g dlm or debugging dlm with dlm_controld -d but I don't find any infos on how to do it. On the web I found some RedHat forums that discuss similar errors but do not provide any solutions due to a paywall. 

How can I start or initialise the global lock with dlm so that I can mount the GFS2 correctly on the pacemaker Debian cluster? Or in other words what's wrong with my dlm configuration?


Thx for any help!
                                

Me7e0r (11 rep)

Jun 23, 2021, 10:53 AM • Last activity: Jul 12, 2021, 03:37 PM

0 votes

0 answers

459 views

HA-Cluster / corosync / pacemaker: Active-Active cluster with service ip / service ip is not switching

pacemaker high-availability corosync

How to configure crm to migrate the ServiceIP if one Service is failed? node 1: web01a \ attributes standby=off node 2: web01b \ attributes standby=off primitive Apache2 systemd:apache2 \ operations $id=Apache2-operations \ op start interval=0 timeout=100 \ op stop interval=0 timeout=100 \ op monito...

                                  How to configure crm to migrate the ServiceIP if one Service is failed?

    node 1: web01a \
		attributes standby=off
    node 2: web01b \
    	attributes standby=off
    primitive Apache2 systemd:apache2 \
    	operations $id=Apache2-operations \
    	op start interval=0 timeout=100 \
    	op stop interval=0 timeout=100 \
    	op monitor interval=15 timeout=100 start-delay=15 \
    	meta
    primitive PHP-FPM systemd:php7.4-fpm \
    	operations $id=PHP-FPM-operations \
    	op start interval=0 timeout=100 \
    	op stop interval=0 timeout=100 \
    	op monitor interval=15 timeout=100 start-delay=15 \
    	meta
    primitive Redis systemd:redis-server \
    	operations $id=Redis-operations \
    	op start interval=0 timeout=100 \
    	op stop interval=0 timeout=100 \
    	op monitor interval=15 timeout=100 start-delay=15 \
    	meta
    primitive ServiceIP IPaddr2 \
    	params ip=1.2.3.4 \
    	operations $id=ServiceIP-operations \
    	op monitor interval=10 timeout=20 start-delay=0 \
    	op_params migration-threshold=1 \
    	meta
    primitive lsyncd systemd:lsyncd \
    	op start interval=0 timeout=100 \
    	op stop interval=0 timeout=100 \
    	op monitor interval=15 timeout=100 start-delay=15 \
    	meta target-role=Started
    group ActiveNode ServiceIP lsyncd
    group WebServer Apache2 PHP-FPM Redis
    clone cl_WS WebServer \
    	meta clone-max=2 notify=true interleave=true
    colocation col_cl_WS_ActiveNode 100: cl_WS ActiveNode
    property cib-bootstrap-options: \
    	have-watchdog=false \
    	dc-version=2.0.3-4b1f869f0f \
    	cluster-infrastructure=corosync \
    	cluster-name=debian \
    	stonith-enabled=false \
    	no-quorum-policy=ignore \
    	startup-fencing=false \
    	maintenance-mode=false \
    	last-lrm-refresh=1622628525 \
    	start-failure-is-fatal=true

These services should always be started
- Apache2
- PHP-FPM
- Redis

If one of these services is not running, the node is unhelthy.
The **ServiceIP** and **lsyncd** should switch to an healthy node.

When I killed the apache2 process, the IP is not switched.


                                

FaxMax (726 rep)

Jun 2, 2021, 12:29 PM

1 votes

0 answers

143 views

Stop a pacemaker node when local shell script returns an error?

pacemaker high-availability pcs

Is it possible to make pacemaker stopping a node when a local test script fails, and start a node if the local test script returns true again? This seems like a very simple problem, but as i can't find ANY way to do this within pacemaker, I'm about to run the following shell script on all my nodes:...

                                  Is it possible to make pacemaker stopping a node when a local test script fails, and start a node if the local test script returns true again?

This seems like a very simple problem, but as i can't find ANY way to do this within pacemaker, I'm about to run the following shell script on all my nodes:

    while true; do
      
      pcs status 2>/dev/null >/dev/null && node_running=true
      /is_node_healthy.sh && node_healthy=true
    
      [[ -v node_running ]] && ! [[ -v node_healthy ]] && pcs cluster stop
      [[ -v node_healthy ]] && ! [[ -v node_running ]] && pcs cluster start
    
      unset node_running node_healthy
      sleep 10
    done

This does exactly what i want, but looks like a very dirty hack in my eyes. Is there a more elegant way to get the same thing done by pacemaker itself?

BTW: The overall task i want to solve seems quite simple: create a ha cluster that has a public ip address assigned to a vital host, where vitality can be checked with /is_node_healthy.sh
                                

psicolor (11 rep)

Feb 22, 2021, 11:54 AM

1 votes

1 answers

393 views

fence_virtualbox failed to reboot

virtualbox virtual-machine pacemaker high-availability pcs

I’m learning how to fence pacemaker using fence_virtualbox from [\[ClusterLabs\] Fence agent for VirtualBox][1], but I can’t get it working. When I try to run `stonith_admin –-reboot ` it failed. Currently, my setup is: Node ID: VM name: orcllinux1 OL7 orcllinux2 OL7_2 I set it up using: `pcs stonit...

                                  I’m learning how to fence pacemaker using fence_virtualbox from [\[ClusterLabs\] Fence agent for VirtualBox][1] , but I can’t get it working. When I try to run stonith_admin –-reboot  it failed. 

Currently, my setup is:

    Node ID:		VM name:
    orcllinux1		OL7
    orcllinux2		OL7_2

I set it up using:

pcs stonith create fence_vbox fence_virtualbox pcmk_host_map=”orcllinux1:OL7,orcllinux2:OL7_2” pcmk_host_list=”orcllinux1,orcllinux2” pcmk_host_check=static_list ipaddr=”192.168.57.1” login=”root”

But stonith_admin –-reboot  resulting in this error:

I tried to use the fence_virtualbox manually using:

    fence_virtualbox -s 192.168.57.1 -p OL7 -o=reboot

and it succeeded.

Is my stonith create syntax wrong? What's the right syntax if it's wrong?

Christophorus Reyhan (33 rep)

Jan 8, 2021, 11:16 AM • Last activity: Feb 16, 2021, 03:51 AM

2 votes

1 answers

5355 views

Pacemaker - Corosync - HA - Simple Custom Resource Testing - Status flapping - Started - Failed - Stopped - Started

scripting pacemaker high-availability corosync

I am testing using the OCF:Heartbeat:Dummy script and I want to make a very basic setup just to know it works and build on that. The only information I can find was this web blog here. https://raymii.org/s/tutorials/Corosync_Pacemaker_-_Execute_a_script_on_failover.html It has some typos but basical...

                                  I am testing using the OCF:Heartbeat:Dummy script and I want to make a very basic setup just to know it works and build on that.

The only information I can find was this web blog here. 
https://raymii.org/s/tutorials/Corosync_Pacemaker_-_Execute_a_script_on_failover.html 

It has some typos but basically worked for me.

The script currently just contains the following :

    sudo nano /usr/local/bin/failover.sh && sudo chmod +x /usr/local/bin/failover.sh
    
    #!/bin/sh
    
    touch /tmp/testfailover.sh

Here is my setup :

    cp /usr/lib/ocf/resource.d/heartbeat/Dummy /usr/lib/ocf/resource.d/heartbeat/FailOverScript
    
    sudo nano /usr/lib/ocf/resource.d/heartbeat/FailOverScript
    
    dummy_start() {
        dummy_monitor
        /usr/local/bin/failover.sh
        if [ $? =  $OCF_SUCCESS ]; then
        return $OCF_SUCCESS
        fi
        touch ${OCF_RESKEY_state}
    }
    
    sed -i 's/Dummy/FailOverScript/g' /usr/lib/ocf/resource.d/heartbeat/FailOverScript
    
    sed -i 's/dummy/FailOverScript/g' /usr/lib/ocf/resource.d/heartbeat/FailOverScript
    
    pcs resource create FailOverScript ocf:heartbeat:FailOverScript op monitor interval="30"

The only testing I can really do :

    [root@node2 ~]# /usr/lib/ocf/resource.d/heartbeat/FailOverScript start ; echo $?
    DEBUG: default start : 0
    0

ocf-tester doesn't seem to exist in the latest HA Software Suite, not really sure how to manually install it, but the script "half works".

**The script doesn't need monitoring, its supposed to be very basic, but it seems to be flapping and giving me the following error code. Any idea's what to do?**

    FailOverScript (ocf::heartbeat:FailOverScript):        Started
    node2
    
    Failed Actions:
    * FailOverScript_monitor_30000 on node2 'not running' (7): call=
    24423, status=complete, exitreason='none',
        last-rc-change='Tue Aug 16 15:53:50 2016', queued=0ms, exec=
    9ms

**Example of what I want to do:**

Cluster start

Script runs "start.sh"

Cluster fails over to node2.

On node1 script runs "fail.sh"

On node2 script runs "start.sh"

and vis versa if it fails the other direction.

Note: The script does work, I get /tmp/testfailover.sh. I even tried putting another script under dummy_stop to remove the file and that worked, but it just keeps flapping along removing/adding/removing/adding file and starting/failing/stoping/starting etc etc.

Thanks for reading!

FreeSoftwareServers (2682 rep)

Aug 16, 2016, 07:56 PM • Last activity: Dec 21, 2020, 06:56 AM

0 votes

1 answers

1741 views

Pacemaker apache resource is Failed to access httpd status page after change to HTTPS

apache-httpd cluster pacemaker high-availability heartbeat

I get this error from pacemaker after i change apache from http to https. now my ocf::heartbeat:apache resource is not find status page. I generate SSL certificate separately for 3 servers. Everything was working fine when running on http but as soon as I added the (self-signed) SSL certificate pace...

                                  I get this error from pacemaker after i change apache from http to https. 
now my ocf::heartbeat:apache resource is not find status page. 

I generate SSL certificate separately for 3 servers. 

Everything was working fine when running on http but as soon as I added the (self-signed) SSL certificate
pacemaker Apache (ocf::heartbeat:apache):        Stopped

And error shows 

    Failed Actions:
    * Apache_start_0 on server3 'unknown error' (1): call=315, status=complete, exitreason='Failed to access httpd status page.',
        last-rc-change='Mon Sep 21 16:22:37 2020', queued=0ms, exec=3456ms
    * Apache_start_0 on server1 'unknown error' (1): call=59, status=complete, exitreason='Failed to access httpd status page.',
        last-rc-change='Mon Sep 21 16:22:41 2020', queued=0ms, exec=3421ms
    * Apache_start_0 on server2 'unknown error' (1): call=197, status=complete, exitreason='Failed to access httpd status page.',
        last-rc-change='Mon Sep 21 16:22:33 2020', queued=0ms, exec=3451ms




/etc/apache2/sites-available/000-default.conf

    
            ServerAdmin webmaster@localhost
            DocumentRoot /var/www/html
            Redirect "/" "https://10.226.***.***/ "
    
    
     SetHandler server-status        ServerAdmin webmaster@localhost
            DocumentRoot /var/www/html
            Redirect "/" "https://10.226.179.205/ "
    
     Order deny,allow
     Deny from all
     Allow from 127.0.0.1
    
    

*pcs resource debug-monitor --full Apache*

    Operation monitor for Apache (ocf:heartbeat:apache) returned 1
     >  stderr: + echo
     >  stderr: + printenv
     >  stderr: + sort
     >  stderr: + env=
     >  stderr: AONIX_LM_DIR=/home/TeleUSE/etc
     >  stderr: BXwidgets=/home/BXwidgets
     >  stderr: HA_logfacility=none
     >  stderr: HOME=/root
     >  stderr: LC_ALL=C
     >  stderr: LOGNAME=root
     >  stderr: MAIL=/var/mail/root
     >  stderr: OCF_EXIT_REASON_PREFIX=ocf-exit-reason:
     >  stderr: OCF_RA_VERSION_MAJOR=1
     >  stderr: OCF_RA_VERSION_MINOR=0
     >  stderr: OCF_RESKEY_CRM_meta_class=ocf
     >  stderr: OCF_RESKEY_CRM_meta_id=Apache
     >  stderr: OCF_RESKEY_CRM_meta_migration_threshold=5
     >  stderr: OCF_RESKEY_CRM_meta_provider=heartbeat
     >  stderr: OCF_RESKEY_CRM_meta_resource_stickiness=10
     >  stderr: OCF_RESKEY_CRM_meta_type=apache
     >  stderr: OCF_RESKEY_configfile=/etc/apache2/apache2.conf
     >  stderr: OCF_RESKEY_statusurl=http://localhost/server-status
     >  stderr: OCF_RESOURCE_INSTANCE=Apache
     >  stderr: OCF_RESOURCE_PROVIDER=heartbeat
     >  stderr: OCF_RESOURCE_TYPE=apache
     >  stderr: OCF_ROOT=/usr/lib/ocf
     >  stderr: OCF_TRACE_RA=1
     >  stderr: PATH=/root/.rbenv/shims:/root/.rbenv/bin:/root/.rbenv/shims:/root/.rbenv/bin:/usr/local/bin:/home/TeleUSE/bin:/home/xrt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/ucb
     >  stderr: PCMK_logfacility=none
     >  stderr: PCMK_service=crm_resource
     >  stderr: PWD=/root
     >  stderr: RBENV_SHELL=bash
     >  stderr: SHELL=/bin/bash
     >  stderr: SHLVL=1
     >  stderr: SSH_CLIENT=10.12.116.46 63097 22
     >  stderr: SSH_CONNECTION=10.12.116.46 63097 10.226.179.205 22
     >  stderr: SSH_TTY=/dev/pts/0
     >  stderr: TERM=xterm
     >  stderr: TeleUSE=/home/TeleUSE
     >  stderr: USER=root
     >  stderr: _=/usr/sbin/pcs
     >  stderr: __OCF_TRC_DEST=
     >  stderr: __OCF_TRC_MANAGE=
     >  stderr: + ocf_is_true
     >  stderr: + false
     >  stderr: + . /usr/lib/ocf/lib/heartbeat/apache-conf.sh
     >  stderr: + . /usr/lib/ocf/lib/heartbeat/http-mon.sh
     >  stderr: + bind_address=127.0.0.1
     >  stderr: + curl_ipv6_opts=
     >  stderr: + ocf_is_true
     >  stderr: + false
     >  stderr: + echo
     >  stderr: + grep -qs ::
     >  stderr: + WGETOPTS=-O- -q -L --no-proxy --bind-address=127.0.0.1
     >  stderr: + CURLOPTS=-o - -Ss -L --interface lo
     >  stderr: + HA_VARRUNDIR=/var/run
     >  stderr: + IBMHTTPD=/opt/IBMHTTPServer/bin/httpd
     >  stderr: + HTTPDLIST=/sbin/httpd2 /usr/sbin/httpd2 /usr/sbin/apache2 /sbin/httpd /usr/sbin/httpd /usr/sbin/apache /opt/IBMHTTPServer/bin/httpd
     >  stderr: + MPM=/usr/share/apache2/find_mpm
     >  stderr: + [ -x /usr/share/apache2/find_mpm ]
     >  stderr: + LOCALHOST=http://localhost
     >  stderr: + HTTPDOPTS=-DSTATUS
     >  stderr: + DEFAULT_IBMCONFIG=/opt/IBMHTTPServer/conf/httpd.conf
     >  stderr: + DEFAULT_SUSECONFIG=/etc/apache2/httpd.conf
     >  stderr: + DEFAULT_RHELCONFIG=/etc/httpd/conf/httpd.conf
     >  stderr: + DEFAULT_DEBIANCONFIG=/etc/apache2/apache2.conf
     >  stderr: + basename /usr/lib/ocf/resource.d/heartbeat/apache
     >  stderr: + CMD=apache
     >  stderr: + OCF_REQUIRED_PARAMS=
     >  stderr: + OCF_REQUIRED_BINARIES=
     >  stderr: + ocf_rarun monitor
     >  stderr: + mk_action_func
     >  stderr: + echo apache_monitor
     >  stderr: + tr - _
     >  stderr: + ACTION_FUNC=apache_monitor
     >  stderr: + validate_args
     >  stderr: + is_function apache_monitor
     >  stderr: + command -v apache_monitor
     >  stderr: + test zapache_monitor = zapache_monitor
     >  stderr: + simple_actions
     >  stderr: + check_required_params
     >  stderr: + local v
     >  stderr: + run_function apache_getconfig
     >  stderr: + is_function apache_getconfig
     >  stderr: + command -v apache_getconfig
     >  stderr: + test zapache_getconfig = zapache_getconfig
     >  stderr: + apache_getconfig
     >  stderr: + HTTPD=
     >  stderr: + PORT=
     >  stderr: + STATUSURL=http://localhost/server-status
     >  stderr: + CONFIGFILE=/etc/apache2/apache2.conf
     >  stderr: + OPTIONS=
     >  stderr: + CLIENT=
     >  stderr: + TESTREGEX=
     >  stderr: + TESTURL=
     >  stderr: + TESTREGEX10=
     >  stderr: + TESTCONFFILE=
     >  stderr: + TESTNAME=
     >  stderr: + : /etc/apache2/envvars
     >  stderr: + source_envfiles /etc/apache2/envvars
     >  stderr: + [ -f /etc/apache2/envvars -a -r /etc/apache2/envvars ]
     >  stderr: + . /etc/apache2/envvars
     >  stderr: + unset HOME
     >  stderr: + [  !=  ]
     >  stderr: + SUFFIX=
     >  stderr: + export APACHE_RUN_USER=www-data
     >  stderr: + export APACHE_RUN_GROUP=www-data
     >  stderr: + export APACHE_PID_FILE=/var/run/apache2/apache2.pid
     >  stderr: + export APACHE_RUN_DIR=/var/run/apache2
     >  stderr: + export APACHE_LOCK_DIR=/var/lock/apache2
     >  stderr: + export APACHE_LOG_DIR=/var/log/apache2
     >  stderr: + export LANG=C
     >  stderr: + export LANG
     >  stderr: + [ X = X -o ! -f  -o ! -x  ]
     >  stderr: + find_httpd_prog
     >  stderr: + HTTPD=
     >  stderr: + [ -f /sbin/httpd2 -a -x /sbin/httpd2 ]
     >  stderr: + [ -f /usr/sbin/httpd2 -a -x /usr/sbin/httpd2 ]
     >  stderr: + [ -f /usr/sbin/apache2 -a -x /usr/sbin/apache2 ]
     >  stderr: + HTTPD=/usr/sbin/apache2
     >  stderr: + break
     >  stderr: + [ X != X -a X/usr/sbin/apache2 != X ]
     >  stderr: + detect_default_config
     >  stderr: + [ -f /etc/apache2/httpd.conf ]
     >  stderr: + [ -f /etc/apache2/apache2.conf ]
     >  stderr: + echo /etc/apache2/apache2.conf
     >  stderr: + DefaultConfig=/etc/apache2/apache2.conf
     >  stderr: + CONFIGFILE=/etc/apache2/apache2.conf
     >  stderr: + [ -n /usr/sbin/apache2 ]
     >  stderr: + basename /usr/sbin/apache2
     >  stderr: + httpd_basename=apache2
     >  stderr: + GetParams /etc/apache2/apache2.conf
     >  stderr: + ConfigFile=/etc/apache2/apache2.conf
     >  stderr: + [ ! -f /etc/apache2/apache2.conf ]
     >  stderr: + get_apache_params /etc/apache2/apache2.conf ServerRoot PidFile Port Listen
     >  stderr: + configfile=/etc/apache2/apache2.conf
     >  stderr: + shift 1
     >  stderr: + echo ServerRoot PidFile Port Listen
     >  stderr: + sed s/ /,/g
     >  stderr: + vars=ServerRoot,PidFile,Port,Listen
     >  stderr: + apachecat /etc/apache2/apache2.conf
     >  stderr: + awk -v vars=ServerRoot,PidFile,Port,Listen
     >  stderr:     BEGIN{
     >  stderr:             split(vars,v,",");
     >  stderr:             for( i in v )
     >  stderr:                     vl[i]=tolower(v[i]);
     >  stderr:     }
     >  stderr:     {
     >  stderr:             for( i in v )
     >  stderr:                     if( tolower($1)==vl[i] ) {
     >  stderr:                     print v[i]"="$2
     >  stderr:                     delete vl[i]
     >  stderr:                     break
     >  stderr:             }
     >  stderr:     }
     >  stderr:
     >  stderr: + awk
     >  stderr:     function procline() {
     >  stderr:             split($0,a);
     >  stderr:             if( a~/^[Ii]nclude$/ ) {
     >  stderr:                     includedir=a;
     >  stderr:                     gsub("\"","",includedir);
     >  stderr:                     procinclude(includedir);
     >  stderr:             } else {
     >  stderr:                     if( a=="ServerRoot" ) {
     >  stderr:                             rootdir=a;
     >  stderr:                             gsub("\"","",rootdir);
     >  stderr:                     }
     >  stderr:                     print;
     >  stderr:             }
     >  stderr:     }
     >  stderr:     function printfile(infile, a) {
     >  stderr:             while( (getline 0 ) {
     >  stderr:                     procline();
     >  stderr:             }
     >  stderr:             close(infile);
     >  stderr:     }
     >  stderr:     function allfiles(dir, cmd,f) {
     >  stderr:             cmd="find -L "dir" -type f";
     >  stderr:             while( ( cmd | getline f ) > 0 ) {
     >  stderr:                     printfile(f);
     >  stderr:             }
     >  stderr:             close(cmd);
     >  stderr:     }
     >  stderr:     function listfiles(pattern, cmd,f) {
     >  stderr:             cmd="ls "pattern" 2>/dev/null";
     >  stderr:             while( ( cmd | getline f ) > 0 ) {
     >  stderr:                     printfile(f);
     >  stderr:             }
     >  stderr:             close(cmd);
     >  stderr:     }
     >  stderr:     function procinclude(spec) {
     >  stderr:             if( rootdir!="" && spec!~/^\// ) {
     >  stderr:                     spec=rootdir"/"spec;
     >  stderr:             }
     >  stderr:             if( isdir(spec) ) {
     >  stderr:                     allfiles(spec); # read all files in a directory (and subdirs)
     >  stderr:             } else {
     >  stderr:                     listfiles(spec); # there could be jokers
     >  stderr:             }
     >  stderr:     }
     >  stderr:     function isdir(s) {
     >  stderr:             return !system("test -d \""s"\"");
     >  stderr:     }
     >  stderr:     { procline(); }
     >  stderr:      /etc/apache2/apache2.conf
     >  stderr: + sed s/#.*//;s/[[:blank:]]*$//;s/^[[:blank:]]*//
     >  stderr: + grep -v ^$
     >  stderr: + eval PidFile=${APACHE_PID_FILE}
     >  stderr: + PidFile=/var/run/apache2/apache2.pid
     >  stderr: + CheckPort
     >  stderr: + ocf_is_decimal
     >  stderr: + false
     >  stderr: + CheckPort
     >  stderr: + ocfError performing operation: Operation not permitted
    _is_decimal
     >  stderr: + false
     >  stderr: + CheckPort 80
     >  stderr: + ocf_is_decimal 80
     >  stderr: + true
     >  stderr: + [ 80 -gt 0 ]
     >  stderr: + PORT=80
     >  stderr: + break
     >  stderr: + echo
     >  stderr: + grep :
     >  stderr: + Listen=localhost:
     >  stderr: + [ Xhttp://localhost/server-status = X ]
     >  stderr: + test /var/run/apache2/apache2.pid
     >  stderr: + return 0
     >  stderr: + validate_env
     >  stderr: + check_required_binaries
     >  stderr: + local v
     >  stderr: + is_function apache_validate_all
     >  stderr: + command -v apache_validate_all
     >  stderr: + test zapache_validate_all = zapache_validate_all
     >  stderr: + local rc
     >  stderr: + LSB_STATUS_STOPPED=3
     >  stderr: + apache_validate_all
     >  stderr: + [ -z /usr/sbin/apache2 ]
     >  stderr: + [ ! -x /usr/sbin/apache2 ]
     >  stderr: + [ ! -f /etc/apache2/apache2.conf ]
     >  stderr: + [ -n  ]
     >  stderr: + [ -n  ]
     >  stderr: + dirname /var/run/apache2/apache2.pid
     >  stderr: + local a
     >  stderr: + local b
     >  stderr: + [ 1 = 1 ]
     >  stderr: + a=/var/run/apache2/apache2.pid
     >  stderr: + [ 1 ]
     >  stderr: + b=/var/run/apache2/apache2.pid
     >  stderr: + [ /var/run/apache2/apache2.pid = /var/run/apache2/apache2.pid ]
     >  stderr: + break
     >  stderr: + b=/var/run/apache2
     >  stderr: + [ -z /var/run/apache2 -o /var/run/apache2/apache2.pid = /var/run/apache2 ]
     >  stderr: + echo /var/run/apache2
     >  stderr: + return 0
     >  stderr: + ocf_mkstatedir root 755 /var/run/apache2
     >  stderr: + local owner
     >  stderr: + local perms
     >  stderr: + local path
     >  stderr: + owner=root
     >  stderr: + perms=755
     >  stderr: + path=/var/run/apache2
     >  stderr: + test -d /var/run/apache2
     >  stderr: + return 0
     >  stderr: + return 0
     >  stderr: + rc=0
     >  stderr: + [ 0 -ne 0 ]
     >  stderr: + ocf_is_probe
     >  stderr: + [ monitor = monitor -a 0 = 0 ]
     >  stderr: + run_probe
     >  stderr: + is_function apache_probe
     >  stderr: + command -v apache_probe
     >  stderr: + test z = zapache_probe
     >  stderr: + shift 1
     >  stderr: + apache_monitor
     >  stderr: + silent_status
     >  stderr: + local pid
     >  stderr: + get_pid
     >  stderr: + [ -f /var/run/apache2/apache2.pid ]
     >  stderr: + cat /var/run/apache2/apache2.pid
     >  stderr: + pid=17552
     >  stderr: + [ -n 17552 ]
     >  stderr: + ProcessRunning 17552
     >  stderr: + local pid=17552
     >  stderr: + [ -d /proc -a -d /proc/1 ]
     >  stderr: + [ -d /proc/17552 ]
     >  stderr: + [ 0 -ne 0 ]
     >  stderr: + findhttpclient
     >  stderr: + [ x != x ]
     >  stderr: + which wget
     >  stderr: + echo wget
     >  stderr: + ourhttpclient=wget
     >  stderr: + [ -z wget ]
     >  stderr: + ocf_check_level 10
     >  stderr: + local lvl prev
     >  stderr: + lvl=0
     >  stderr: + prev=0
     >  stderr: + ocf_is_decimal 0
     >  stderr: + true
     >  stderr: + [ 10 -eq 0 ]
     >  stderr: + [ 10 -gt 0 ]
     >  stderr: + lvl=0
     >  stderr: + break
     >  stderr: + echo 0
     >  stderr: + apache_monitor_basic
     >  stderr: + wget_func http://localhost/server-status
     >  stderr: + auth=
     >  stderr: + cl_opts=-O- -q -L --no-proxy --bind-address=127.0.0.1
     >  stderr: + [ x !=+  x ]
     >  stderr: grep+ wget -Ei -O-  -q
     >  stderr:  -L --no-proxy --bind-address=127.0.0.1 http://localhost/server-status
     >  stderr: + attempt_index_monitor_request
     >  stderr: + local indexpage=
     >  stderr: + [ -n  ]
     >  stderr: + [ -n  ]
     >  stderr: + [ -n  ]
     >  stderr: + [ -n http://localhost/server-status ]
     >  stderr: + return 1
     >  stderr: + [ 1 -eq 0 ]
     >  stderr: + ocf_is_probe
     >  stderr: + [ monitor = monitor -a 0 = 0 ]
     >  stderr: + return 1

**pcs config**

     Resource: MasterVip (class=ocf provider=heartbeat type=IPaddr2)
      Attributes: ip=10.226.***.*** nic=lo cidr_netmask=32 iflabel=pgrepvip
      Meta Attrs: target-role=Started
      Operations: start interval=0s timeout=20s (MasterVip-start-interval-0s)
                  stop interval=0s timeout=20s (MasterVip-stop-interval-0s)
                  monitor interval=90s (MasterVip-monitor-interval-90s)
    
     Resource: Apache (class=ocf provider=heartbeat type=apache)
      Attributes: configfile=/etc/apache2/apache2.conf statusurl=http://localhost/server-status
      Operations: start interval=0s timeout=40s (Apache-start-interval-0s)
                  stop interval=0s timeout=60s (Apache-stop-interval-0s)
                  monitor interval=1min (Apache-monitor-interval-1min)

I don't know how to fix this. if anyone knows please help me. 




                                

Karippery (1 rep)

Sep 21, 2020, 03:04 PM • Last activity: Sep 22, 2020, 11:36 AM

Showing page 1 of 20 total questions