Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes

2 answers

2793 views

How do I mount a disk on /var/log directory even if I have process writing on it?

mount partition logs disk high-availability

I would like to mount a disk on /var/log, the thing is, there are some process/services writing into it, such as openvpn, or system logs. Is there a way to mount a filesystem without having to restart the machine, or stopping the service? Many thanks

                                  I would like to mount a disk on /var/log, the thing is, there are some process/services writing into it, such as openvpn, or system logs. Is there a way to mount a filesystem without having to restart the machine, or stopping the service?

Many thanks

LinuxEnthusiast (1 rep)

Aug 10, 2020, 10:10 AM • Last activity: Aug 1, 2025, 11:02 PM

0 votes

1 answers

2844 views

Apache resource failed to start in Pacemaker

centos apache-httpd pacemaker high-availability

I am using Pacemaker with Corosync to set up a basic Apache HA cluster with 3 nodes running CentOS7. For some reasons, I cannot get the apache resource started in pcs. Cluster IP: 192.168.200.40 # pcs resource show ClusterIP Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes:...

                                  I am using Pacemaker with Corosync to set up a basic Apache HA cluster with 3 nodes running CentOS7. For some reasons, I cannot get the apache resource started in pcs. 

Cluster IP: 192.168.200.40

    # pcs resource show ClusterIP
         Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
          Attributes: cidr_netmask=24 ip=192.168.200.40
          Operations: monitor interval=20s (ClusterIP-monitor-interval-20s)
                      start interval=0s timeout=20s (ClusterIP-start-interval-0s)
                      stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)



    # pcs resource show WebServer
     Resource: WebServer (class=ocf provider=heartbeat type=apache)
      Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
      Operations: monitor interval=1min (WebServer-monitor-interval-1min)
                  start interval=0s timeout=40s (WebServer-start-interval-0s)
                  stop interval=0s timeout=60s (WebServer-stop-interval-0s)



    # pcs status
    Cluster name: 
    WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
    Stack: corosync
    Current DC: server3.example.com (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
    Last updated: Thu Jun  7 21:59:09 2018
    Last change: Thu Jun  7 21:45:23 2018 by root via cibadmin on server1.example.com
    
    3 nodes configured
    2 resources configured
    
    Online: [ server1.example.com server2.example.com server3.example.com ]
    
    Full list of resources:
    
     ClusterIP	(ocf::heartbeat:IPaddr2):	Started server2.example.com
     WebServer	(ocf::heartbeat:apache):	Stopped
    
    Failed Actions:
    * WebServer_start_0 on server3.example.com 'unknown error' (1): call=49, status=Timed Out, exitreason='',
        last-rc-change='Thu Jun  7 21:46:03 2018', queued=0ms, exec=40002ms
    * WebServer_start_0 on server1.example.com 'unknown error' (1): call=53, status=Timed Out, exitreason='',
        last-rc-change='Thu Jun  7 21:45:23 2018', queued=0ms, exec=40003ms
    * WebServer_start_0 on server2.example.com 'unknown error' (1): call=47, status=Timed Out, exitreason='',
        last-rc-change='Thu Jun  7 21:46:43 2018', queued=1ms, exec=40002ms
    
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled

The httpd instance is **enabled** and **running** on all three nodes.  The cluster IP and individual node IPs are able to access the web page. The ClusterIP resource also works well for failover. What may go wrong for the apache resource in this case? 

Thank you very much!


Update:

Here is more information from the debug output. It seems the Apache is unable to bind to the port, but there is no error from the apache log, and systemctl status httpd gave all green on all nodes. I can open web pages via the cluster IP and each every node IP. The ClusterIP resource failover works fine, too. Any idea on why Apache resource doesn't work with pacemaker?

    # pcs resource debug-start WebServer --full
    Operation start for WebServer (ocf:heartbeat:apache) failed: 'Timed Out' (2)
     >  stderr: ERROR: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80 (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80 no listening sockets available, shutting down AH00015: Unable to open logs
     >  stderr: INFO: apache not running
     >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
     >  stderr: INFO: apache not running
     >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
     >  stderr: INFO: apache not running
     >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
     >  stderr: INFO: apache not running


                                

cody (67 rep)

Jun 8, 2018, 04:16 PM • Last activity: Jul 15, 2025, 02:03 AM

1 votes

1 answers

2618 views

Pacemaker Virtual IP cannot be routed outside of its network

pacemaker high-availability

I have a server cluster consisted of following setup: 2 Virtual Servers with 2 NIC's. eth0 (private network 10.0.0.0/16) and eth1 (public network 77.1.2.0/24 with gateway as 77.1.2.1) For HA-01 VPS i have Private IP on eth0 set as 10.0.0.1 For HA-02 VPS i have Private IP set on eth0 as 10.0.0.2 Pace...

                                  I have a server cluster consisted of following setup:

2 Virtual Servers with 2 NIC's. eth0 (private network 10.0.0.0/16) and eth1 (public network 77.1.2.0/24 with gateway as 77.1.2.1)

For HA-01 VPS i have Private IP on eth0 set as 10.0.0.1
For HA-02 VPS i have Private IP set on eth0 as 10.0.0.2

Pacemaker/Corosync Cluster has been established between private IP addresses and Virtual IP (77.1.2.4) defined as clone Resource (IPAddr2) so it can float between two nodes.

    pcs resource create VirtualIP1 ocf:heartbeat:IPaddr2 ip="77.1.2.4" cidr_netmask="24" nic="eth1" clusterip_hash="sourceip-sourceport" op start interval="0s" timeout="60s" op monitor interval="1s" timeout="20s" op stop interval="0s" timeout="60s" clone interleave=true ordered=true

Problem is, i cannot reach that IP address from world. I noticed that there is a route missing, so i add the static route

    ip r add default via 77.1.2.1 dev eth1

But i still cannot ping google.com from those servers nor world can see them on that IP.
I also tried adding IP addresses from same subnet on eth1 like this:

    HA-01 eth1: 77.1.2.2
    HA-02 eth1: 77.1.2.3

Servers can be seen on those IPs by world but if i add VirtualIP resource i cannot reach them on Virtual IP address.
I also tried adding a source ip in routing table

    ip r add default via 77.1.2.1 src 77.1.2.4

to no avail. I don't know what am i supposed to do to get this VirtualIP working.
I can reach 77.1.2.4 (Virtual IP Address) from other servers on that network, but not outside that network.

Firewall is established and high availability ports are passed via command

    firewall-cmd --add-service="high availability"; firewall-cmd --add-service="high availability" --permanent

Is there anything here that i am missing? 
If i add that address (77.1.2.4 - Virtual IP) alone on the interface of only one of those servers, it will work.... So is there an issue with ARP table perhaps or maybe router blocking some traffic?

Marko Todoric (437 rep)

Jul 19, 2019, 02:54 PM • Last activity: Apr 15, 2025, 03:08 AM

0 votes

2 answers

2897 views

RHEL High-Availability Cluster using pcs, configuring service as a resource

rhel cluster pacemaker high-availability pcs

I have a 2 node cluster on RHEL 6.9. Everything is configured except I'm having difficulty with an application launched via shell script that created into a service (in `/etc/init.d/myApplication`), which I'll just call "myApp" . From that application, I did a `pcs resource create myApp lsb:myApp op...

                                  I have a 2 node cluster on RHEL 6.9. Everything is configured except I'm having difficulty with an application launched via shell script that created into a service (in /etc/init.d/myApplication), which I'll just call "myApp". From that application, I did a pcs resource create myApp lsb:myApp op monitor interval=30s op start on-fail=standby. I am new to using this suite of software but it's for work. What I need is for this application to be launched on both nodes simultaneously as it has to be started manually so if the first node fails, it would need intervention if it were not already active on the passive node. 

I have two other services:  
-VirtIP (ocf:heartbeat:IPaddr2) for providing a service IP for the application server  
-Cron (lsb:crond) to synchronize the application files (we are not using shared storage)

I have the VirtIP and Cron as dependents via colocation to myApp.

I've tried master/slave as well as cloning but I must be missing something regarding their config. If I take the application offline, pacemaker does not detect the service has gone down and pcs status outputs that myApp is still running on the node (or nodes depending on my config). I'm also sometimes getting the issue that the service running the app is stopped by pacemaker on the passive node. 

Which is the way I need to configure this? I've gone through the RHEL documentation but I'm still stuck. How do I get pacemaker to initiate failover if myApp service goes down? I don't know why it's not detecting the service has stopped in some cases.

EDIT: So for testing purposes, I removed the password requirement for starting/restarting and the service starts/restarts fine as expected and the colocation dependent resources stop/start as expected. But stopping the myApp service does not reflect as a stopped resource but simply stays at Started node1. Likewise, simulating a failover via putting node1 into standby simply stops all resources on node1.

Greg (187 rep)

Sep 29, 2017, 07:52 AM • Last activity: Sep 6, 2023, 09:56 PM

-1 votes

1 answers

717 views

IBM AIX - Method to identify Cluster or HA services

aix cluster high-availability

I am keen to learn if existing IBM AIX servers from different location have Clustering/HA features. Kindly let me know the steps to check. Thanks.

                                  I am keen to learn if existing IBM AIX servers from different location have Clustering/HA features. Kindly let me know the steps to check. Thanks.
                                

Nick eric adelee (49 rep)

Dec 5, 2022, 09:01 AM • Last activity: Dec 5, 2022, 06:36 PM

0 votes

0 answers

137 views

Options for high-availablility, high-throughput bonding in Linux

linux bonding high-availability

When trying to configure high-availablity (HA) bonding in Linux that should also use the bandwidth available, I wonder what the options are: The solution should ensure HA and optimal throughput (when all links are up) in a (simplified) scenario like this: [![Example Scenario for HA-Bonding][1]][1] S...

                                  When trying to configure high-availablity (HA) bonding in Linux that should also use the bandwidth available, I wonder what the options are:

The solution should ensure HA and optimal throughput (when all links are up) in a (simplified) scenario like this:



So for example host **H1** has two interfaces **1** and **2**, also denoted as **H1.1** and **H1.2**.
Starting with a standard configuration like active-backup with miimon link monitoring there are these problems:

 - Only one interface is used at a time
 - If **S1.3** fails, both **H1.1** and **H1.2** will see a valid link, but **H1.1** could not reach **H2** then

So the first step was to use arp_ip_target for link monitoring to detect a possible inter-switch link (ISL) failure.

But still the problem is that only one of both host interfaces can be used at a time.
Do I tried to use balance-tlb instead of active-backup.
However it seems balance-tlb does not allow to use arp_ip_target for link monitoring.

So I wonder:
Is there a solution that provides both, high availability in case of any link failures *and* high bandwidth.

Final note:
===
Conceptually **S1** and **S3** would be connected, too (just as **S2** and **S4**), but for illustration the example is simple enough, I hope.

Also I can configure the hosts, but not the switches.
                                

U. Windl (1715 rep)

Nov 13, 2020, 08:04 AM • Last activity: Sep 8, 2022, 09:57 AM

0 votes

0 answers

1408 views

High CPU utilization showing in CPU

top cpu-usage high-availability

We are running Two Node cluster using Redhat Pacemaker running on RHEL 7. Last thursday (3/2/2022) i updated kernel to latest version. And on Friday at 3:49 First node rebooted(Reason unknow) and then rejoined but at time resources were running on Node2. Today i noticed that are is high cpu utilizat...

                                  We are running Two Node cluster using Redhat Pacemaker running on RHEL 7. Last thursday (3/2/2022) i updated kernel to latest version. And on Friday at 3:49 First node rebooted(Reason unknow)  and then rejoined but at time resources were running on Node2. 

Today i noticed that are is high cpu utilization and top command shows 

    %Cpu(s):  2.9 us, 89.8 sy,  0.2 ni,  7.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

I dont know what process is using 89.8% of cpu

top command
===========

    top - 07:10:55 up 4 days, 14:17,  2 users,  load average: 8.08, 8.13, 7.98
    Tasks: 483 total,   8 running, 475 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  2.7 us, 89.7 sy,  0.2 ni,  7.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem : 39464316+total,  1881036 free, 21074576+used, 18201638+buff/cache
    KiB Swap: 93749248 total, 93749248 free,        0 used. 18109798+avail Mem
    
       PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    183327 oracle    20   0  195.5g  56260  42096 R  99.3  0.0   1133:15 oracle_183327_s
    183552 oracle    20   0  195.5g  58704  42032 R  99.3  0.0   5626:11 oracle_183552_s
    183443 oracle    20   0  195.5g  54488  40728 R  98.8  0.0   1626:54 oracle_183443_s
    183554 oracle    20   0  195.5g  57076  41912 R  98.6  0.0   5304:10 oracle_183554_s
    183354 oracle    20   0  195.5g  47248  39176 R  97.8  0.0   4847:28 oracle_183354_s
    183556 oracle    20   0  195.5g  60040  43456 R  97.8  0.0   2486:30 oracle_183556_s
    104734 oracle    20   0  195.5g  48516  39564 R  97.1  0.0   1583:06 oracle_104734_s
    142910 root      20   0  162524   2704   1588 R  27.5  0.0   1:43.11 top
root      39  19   13172   9268    480 S   3.8  0.0 255:12.73 apps.plugin
netdata   39  19  251412 137752   2760 S   1.0  0.0  92:52.38 netdata
    175736 root      20   0  755216  74556  13932 S   1.0  0.0  64:27.23 guard_stap
    183545 oracle    20   0  195.5g  61516  42944 S   1.0  0.0  50:46.33 oracle_183545_s
    165271 oracle    -2   0  195.4g  18936  15872 S   0.7  0.0  44:31.74 ora_vktm_ssys
    183352 oracle    20   0  195.5g  45884  38572 S   0.7  0.0  35:28.20 oracle_183352_s
    183550 oracle    20   0  195.5g  52640  42520 S   0.7  0.0  47:01.94 oracle_183550_s
    189069 oracle    20   0  195.5g  58344  41844 S   0.7  0.0  38:45.02 oracle_189069_s
root      20   0  916256 131244  18368 S   0.5  0.0  42:22.64 ds_agent
root      rt   0  196440  98180  70968 S   0.5  0.0  42:39.31 corosync
oracle    20   0  195.5g  49116  39316 S   0.5  0.0  10:22.26 oracle_69846_ss
    183350 oracle    20   0  195.5g  45672  38332 S   0.5  0.0  36:46.71 oracle_183350_s
    183356 oracle    20   0  195.5g  45992  38452 S   0.5  0.0  36:24.67 oracle_183356_s
    183787 oracle    20   0  195.5g  45428  37976 S   0.5  0.0   2:10.28 oracle_183787_s
    198328 oracle    20   0  195.5g  52616  42012 S   0.5  0.0  38:30.80 oracle_198328_s
root      20   0       0      0      0 S   0.2  0.0   0:14.07 MpxPeriodicCall
root      20   0  138468   9392   5696 S   0.2  0.0   4:15.32 stonithd
swiagent  20   0 2342756  14444   6624 S   0.2  0.0   5:07.94 swiagent
netdata   39  19  161488  21948   4312 S   0.2  0.0  16:25.79 python
oracle    20   0  195.4g  26088  21472 S   0.2  0.0   0:01.55 ora_m006_ssys
    114147 oracle    20   0  195.4g  41528  34804 S   0.2  0.0   0:00.43 oracle_114147_s
    117437 oracle    20   0  195.5g  45332  38108 S   0.2  0.0   5:33.05 oracle_117437_s
    135186 root      20   0 3706316 163948  31820 S   0.2  0.0  18:33.37 ds_am
    148697 netdata   39  19    1648   1008    616 S   0.2  0.0   0:00.20 bash
    152754 root      20   0  477760   4984   3960 S   0.2  0.0   0:00.01 SolarWinds.ADM.
    165327 oracle    20   0  195.5g  81356  51292 S   0.2  0.0   1:49.24 ora_mmon_ssys
    183783 oracle    20   0  195.5g  44960  37616 S   0.2  0.0   2:12.17 oracle_183783_s
         1 root      20   0  191832   4924   2660 S   0.0  0.0  44:03.90 systemd
         2 root      20   0       0      0      0 S   0.0  0.0   0:00.13 kthreadd
         4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
         6 root      20   0       0      0      0 S   0.0  0.0   0:26.87 ksoftirqd/0
         7 root      rt   0       0      0      0 S   0.0  0.0   0:03.91 migration/0
         8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
         9 root      20   0       0      0      0 S   0.0  0.0  10:48.66 rcu_sched
        10 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 lru-add-drain
        11 root      rt   0       0      0      0 S   0.0  0.0   0:00.62 watchdog/0

This CPU utilization is increasing since Friday 9AM and was gradually increasing

SAR command (Friday)
====================

    sudo sar -u  ALL -f /var/log/sa/sa04
    Linux 3.10.0-1160.53.1.el7.x86_64 (prod-db2-node2)      02/04/2022      _x86_64_        (8 CPU)
    
00:01 AM     CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
10:01 AM     all      3.54      0.31      2.99      0.04      0.00      0.00      0.02      0.00      0.00     93.10
20:01 AM     all      3.56      0.31      3.00      0.03      0.00      0.00      0.02      0.00      0.00     93.08
30:01 AM     all      3.55      0.31      3.04      0.03      0.00      0.00      0.02      0.00      0.00     93.04
40:01 AM     all      3.62      0.31      3.06      0.03      0.00      0.00      0.02      0.00      0.00     92.96
50:01 AM     all      3.53      0.31      3.34      0.04      0.00      0.00      0.02      0.00      0.00     92.76
00:01 AM     all      3.74      0.31      3.08      0.04      0.00      0.00      0.02      0.00      0.00     92.81
10:01 AM     all      3.88      0.31      3.07      0.08      0.00      0.00      0.03      0.00      0.00     92.64
20:01 AM     all      3.54      0.31      3.02      0.03      0.00      0.00      0.02      0.00      0.00     93.08
30:01 AM     all      3.56      0.31      3.04      0.03      0.00      0.00      0.03      0.00      0.00     93.03
40:01 AM     all      3.55      0.30      3.03      0.03      0.00      0.00      0.02      0.00      0.00     93.07
50:01 AM     all      3.61      0.31      3.04      0.03      0.00      0.00      0.02      0.00      0.00     92.99
00:01 AM     all      3.55      0.31      3.05      0.03      0.00      0.00      0.02      0.00      0.00     93.03
10:01 AM     all      3.60      0.31      3.04      0.04      0.00      0.00      0.02      0.00      0.00     92.99
20:01 AM     all      3.52      0.31      3.01      0.03      0.00      0.00      0.02      0.00      0.00     93.10
30:01 AM     all      3.75      0.31      3.05      0.03      0.00      0.00      0.02      0.00      0.00     92.83
40:01 AM     all      3.52      0.31      3.01      0.03      0.00      0.00      0.02      0.00      0.00     93.11
50:01 AM     all      3.57      0.31      3.02      0.03      0.00      0.00      0.02      0.00      0.00     93.04
00:01 AM     all      3.55      0.30      3.02      0.03      0.00      0.00      0.02      0.00      0.00     93.08
10:01 AM     all      3.59      0.31      3.03      0.04      0.00      0.00      0.02      0.00      0.00     93.00
20:01 AM     all      3.58      0.31      3.04      0.04      0.00      0.00      0.02      0.00      0.00     93.02
30:01 AM     all      3.51      0.31      2.99      0.03      0.00      0.00      0.02      0.00      0.00     93.13
40:01 AM     all      3.57      0.31      3.02      0.03      0.00      0.00      0.02      0.00      0.00     93.05
50:01 AM     all      3.55      0.34      3.10      0.20      0.00      0.00      0.02      0.00      0.00     92.79
00:01 AM     all      3.71      0.31      3.04      0.03      0.00      0.00      0.02      0.00      0.00     92.89
10:01 AM     all      3.54      0.31      3.01      0.03      0.00      0.00      0.02      0.00      0.00     93.08
20:01 AM     all      3.53      0.31      3.02      0.03      0.00      0.00      0.02      0.00      0.00     93.08
30:01 AM     all      3.51      0.31      3.01      0.03      0.00      0.00      0.02      0.00      0.00     93.12
40:01 AM     all      3.57      0.31      3.03      0.03      0.00      0.00      0.03      0.00      0.00     93.03
50:01 AM     all      3.45      0.31      3.19      0.03      0.00      0.00      0.03      0.00      0.00     93.00
00:01 AM     all      3.57      0.31      3.05      0.03      0.00      0.00      0.02      0.00      0.00     93.01
10:02 AM     all      3.56      0.31      3.07      0.03      0.00      0.00      0.02      0.00      0.00     93.00
20:01 AM     all      3.54      0.31      3.09      0.03      0.00      0.00      0.02      0.00      0.00     93.01
30:01 AM     all      3.72      0.31      3.08      0.03      0.00      0.00      0.02      0.00      0.00     92.83
40:01 AM     all      3.54      0.31      3.05      0.03      0.00      0.00      0.02      0.00      0.00     93.05
50:01 AM     all      3.53      0.31      3.03      0.03      0.00      0.00      0.02      0.00      0.00     93.08
00:01 AM     all      3.53      0.31      3.03      0.03      0.00      0.00      0.03      0.00      0.00     93.08
10:01 AM     all      3.61      0.31      3.06      0.03      0.00      0.00      0.02      0.00      0.00     92.97
20:01 AM     all      3.50      0.31      3.01      0.03      0.00      0.00      0.02      0.00      0.00     93.13
    
20:01 AM     CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
30:01 AM     all      3.58      0.31      3.09      0.03      0.00      0.00      0.03      0.00      0.00     92.97
40:01 AM     all      3.56      0.31      3.06      0.03      0.00      0.00      0.03      0.00      0.00     93.01
50:01 AM     all      3.56      0.31      3.07      0.03      0.00      0.00      0.03      0.00      0.00     93.00
00:02 AM     all      3.70      0.31      3.09      0.03      0.00      0.00      0.03      0.00      0.00     92.85
10:01 AM     all      3.61      0.31      3.09      0.03      0.00      0.00      0.03      0.00      0.00     92.93
20:02 AM     all      3.50      0.31      3.07      0.03      0.00      0.00      0.02      0.00      0.00     93.07
30:01 AM     all      3.59      0.30      3.08      0.03      0.00      0.00      0.03      0.00      0.00     92.97
40:01 AM     all      3.58      0.31      3.09      0.03      0.00      0.00      0.03      0.00      0.00     92.96
50:01 AM     all      3.54      0.31      3.06      0.03      0.00      0.00      0.02      0.00      0.00     93.04
00:01 AM     all      3.55      0.31      3.26      0.03      0.00      0.00      0.02      0.00      0.00     92.82
10:01 AM     all      3.57      0.31      3.07      0.03      0.00      0.00      0.02      0.00      0.00     93.00
20:01 AM     all      3.55      0.31      3.08      0.03      0.00      0.00      0.02      0.00      0.00     93.01
30:01 AM     all      3.69      0.31      3.11      0.03      0.00      0.00      0.02      0.00      0.00     92.84
40:01 AM     all      3.62      0.31      3.11      0.03      0.00      0.00      0.02      0.00      0.00     92.91
50:01 AM     all      3.52      0.31      3.06      0.03      0.00      0.00      0.02      0.00      0.00     93.06
00:01 AM     all      3.28      0.29     15.20      0.03      0.00      0.00      0.04      0.00      0.00     81.16
10:01 AM     all      3.28      0.29     15.30      0.03      0.00      0.00      0.04      0.00      0.00     81.07
20:01 AM     all      3.26      0.29     15.33      0.03      0.00      0.00      0.04      0.00      0.00     81.06
30:01 AM     all      3.23      0.29     15.30      0.03      0.00      0.00      0.04      0.00      0.00     81.12
40:01 AM     all      3.30      0.28     15.32      0.03      0.00      0.00      0.04      0.00      0.00     81.03
50:01 AM     all      3.26      0.28     15.29      0.03      0.00      0.00      0.04      0.00      0.00     81.10
00:01 AM     all      3.38      0.28     15.37      0.03      0.00      0.00      0.04      0.00      0.00     80.90
10:01 AM     all      3.31      0.28     15.33      0.04      0.00      0.00      0.04      0.00      0.00     81.01
20:01 AM     all      3.23      0.29     15.33      0.03      0.00      0.00      0.04      0.00      0.00     81.09
30:01 AM     all      3.28      0.28     15.33      0.03      0.00      0.00      0.04      0.00      0.00     81.04
40:01 AM     all      3.25      0.29     15.31      0.03      0.00      0.00      0.04      0.00      0.00     81.09
50:01 AM     all      3.27      0.28     15.33      0.03      0.00      0.00      0.04      0.00      0.00     81.05
00:01 AM     all      3.21      0.28     15.32      0.03      0.00      0.00      0.04      0.00      0.00     81.12
10:01 AM     all      3.33      0.29     15.35      0.03      0.00      0.00      0.04      0.00      0.00     80.96
20:01 AM     all      3.26      0.28     15.32      0.03      0.00      0.00      0.04      0.00      0.00     81.06
30:01 AM     all      3.44      0.28     15.36      0.03      0.00      0.00      0.04      0.00      0.00     80.85
40:01 AM     all      3.26      0.29     15.32      0.03      0.00      0.00      0.03      0.00      0.00     81.07
50:01 AM     all      3.29      0.29     15.33      0.03      0.00      0.00      0.04      0.00      0.00     81.02
00:01 PM     all      3.29      0.28     15.33      0.03      0.00      0.00      0.04      0.00      0.00     81.02
10:01 PM     all      3.29      0.29     15.35      0.03      0.00      0.00      0.04      0.00      0.00     81.01
20:01 PM     all      3.27      0.28     15.35      0.03      0.00      0.00      0.04      0.00      0.00     81.02
30:01 PM     all      3.25      0.29     15.34      0.03      0.00      0.00      0.04      0.00      0.00     81.06
40:01 PM     all      3.30      0.28     15.35      0.03      0.00      0.00      0.03      0.00      0.00     80.99
    
40:01 PM     CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
50:01 PM     all      3.25      0.28     15.34      0.03      0.00      0.00      0.04      0.00      0.00     81.06
00:01 PM     all      3.46      0.29     15.40      0.03      0.00      0.00      0.04      0.00      0.00     80.79
10:01 PM     all      3.25      0.29     15.34      0.03      0.00      0.00      0.04      0.00      0.00     81.05
20:01 PM     all      3.30      0.28     15.38      0.03      0.00      0.00      0.04      0.00      0.00     80.98
30:01 PM     all      3.26      0.28     15.36      0.04      0.00      0.00      0.04      0.00      0.00     81.03
40:01 PM     all      3.61      0.29     15.41      0.18      0.00      0.00      0.04      0.00      0.00     80.47
50:01 PM     all      3.24      0.28     15.38      0.03      0.00      0.00      0.04      0.00      0.00     81.03
00:01 PM     all      3.29      0.28     15.39      0.03      0.00      0.00      0.04      0.00      0.00     80.97
10:01 PM     all      3.30      0.28     15.38      0.04      0.00      0.00      0.04      0.00      0.00     80.96
20:01 PM     all      3.14      0.28     20.19      0.03      0.00      0.00      0.04      0.00      0.00     76.32
30:02 PM     all      3.22      0.28     27.71      0.03      0.00      0.00      0.04      0.00      0.00     68.73
40:01 PM     all      3.00      0.28     27.66      0.03      0.00      0.00      0.04      0.00      0.00     68.99
50:01 PM     all      3.06      0.28     27.65      0.03      0.00      0.00      0.04      0.00      0.00     68.94
00:01 PM     all      3.00      0.28     27.68      0.03      0.00      0.00      0.03      0.00      0.00     68.97
10:02 PM     all      3.28      0.27     27.70      0.05      0.00      0.00      0.04      0.00      0.00     68.66
20:01 PM     all      2.99      0.28     27.66      0.03      0.00      0.00      0.04      0.00      0.00     69.00
30:01 PM     all      3.07      0.28     27.68      0.03      0.00      0.00      0.04      0.00      0.00     68.90
40:01 PM     all      3.04      0.28     27.67      0.03      0.00      0.00      0.04      0.00      0.00     68.94
50:01 PM     all      3.04      0.27     27.69      0.03      0.00      0.00      0.04      0.00      0.00     68.93
00:01 PM     all      3.19      0.28     27.71      0.03      0.00      0.00      0.04      0.00      0.00     68.76
10:01 PM     all      3.09      0.28     28.14      0.03      0.00      0.00      0.04      0.00      0.00     68.42
20:01 PM     all      3.04      0.28     27.69      0.03      0.00      0.00      0.03      0.00      0.00     68.92
30:01 PM     all      3.04      0.28     27.68      0.03      0.00      0.00      0.04      0.00      0.00     68.94
40:01 PM     all      3.08      0.28     27.72      0.03      0.00      0.00      0.03      0.00      0.00     68.85
50:01 PM     all      3.01      0.28     27.70      0.03      0.00      0.00      0.04      0.00      0.00     68.95
00:01 PM     all      3.05      0.28     27.68      0.03      0.00      0.00      0.04      0.00      0.00     68.92
10:01 PM     all      5.55      0.26     32.05      6.84      0.00      0.00      0.12      0.00      0.00     55.17
20:01 PM     all      3.05      0.28     27.71      0.03      0.00      0.00      0.03      0.00      0.00     68.89
30:01 PM     all      3.19      0.28     27.73      0.03      0.00      0.00      0.03      0.00      0.00     68.73
40:01 PM     all      3.05      0.28     27.70      0.03      0.00      0.00      0.04      0.00      0.00     68.91
50:01 PM     all      3.03      0.28     27.69      0.03      0.00      0.00      0.04      0.00      0.00     68.93
00:01 PM     all      3.03      0.28     27.72      0.03      0.00      0.00      0.04      0.00      0.00     68.91
10:02 PM     all      3.06      0.28     27.72      0.03      0.00      0.00      0.04      0.00      0.00     68.88
20:01 PM     all      3.07      0.28     27.72      0.03      0.00      0.00      0.03      0.00      0.00     68.87
30:01 PM     all      3.09      0.28     27.77      0.56      0.00      0.00      0.04      0.00      0.00     68.26
40:01 PM     all      3.05      0.28     27.74      0.03      0.00      0.00      0.04      0.00      0.00     68.86
50:01 PM     all      3.07      0.28     27.71      0.03      0.00      0.00      0.04      0.00      0.00     68.87
00:01 PM     all      3.19      0.28     27.75      0.03      0.00      0.00      0.04      0.00      0.00     68.71
    
00:01 PM     CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
10:01 PM     all      3.14      0.27     27.76      0.03      0.00      0.00      0.03      0.00      0.00     68.76
20:01 PM     all      3.03      0.28     27.72      0.03      0.00      0.00      0.04      0.00      0.00     68.90
30:01 PM     all      3.08      0.28     27.73      0.03      0.00      0.00      0.04      0.00      0.00     68.84
40:01 PM     all      3.06      0.28     27.73      0.03      0.00      0.00      0.04      0.00      0.00     68.87
50:01 PM     all      3.05      0.28     27.73      0.03      0.00      0.00      0.04      0.00      0.00     68.87
00:01 PM     all      3.03      0.27     27.74      0.03      0.00      0.00      0.03      0.00      0.00     68.89
10:01 PM     all      3.10      0.27     27.76      0.03      0.00      0.00      0.04      0.00      0.00     68.79
20:01 PM     all      3.03      0.28     27.73      0.03      0.00      0.00      0.04      0.00      0.00     68.90
30:01 PM     all      3.23      0.28     27.77      0.03      0.00      0.00      0.03      0.00      0.00     68.66
40:01 PM     all      3.08      0.28     27.75      0.03      0.00      0.00      0.03      0.00      0.00     68.82
50:01 PM     all      3.04      0.28     27.74      0.03      0.00      0.00      0.04      0.00      0.00     68.88
00:01 PM     all      3.08      0.28     27.76      0.03      0.00      0.00      0.04      0.00      0.00     68.81
10:01 PM     all      3.07      0.28     27.77      0.03      0.00      0.00      0.04      0.00      0.00     68.81
20:01 PM     all      3.07      0.28     27.76      0.03      0.00      0.00      0.04      0.00      0.00     68.81
30:01 PM     all      3.04      0.28     27.74      0.03      0.00      0.00      0.04      0.00      0.00     68.87
40:01 PM     all      3.09      0.28     27.77      0.03      0.00      0.00      0.04      0.00      0.00     68.79
50:01 PM     all      3.04      0.28     27.77      0.03      0.00      0.00      0.03      0.00      0.00     68.85
00:01 PM     all      3.21      0.26     36.38      0.04      0.00      0.00      0.03      0.00      0.00     60.08
10:01 PM     all      7.59      0.25     40.00      0.15      0.00      0.00      0.04      0.00      0.00     51.98
20:01 PM     all      2.98      0.26     40.02      0.03      0.00      0.00      0.03      0.00      0.00     56.68
30:01 PM     all      2.98      0.25     40.02      0.04      0.00      0.00      0.03      0.00      0.00     56.67
40:01 PM     all      3.00      0.25     40.03      0.03      0.00      0.00      0.03      0.00      0.00     56.65
50:01 PM     all      2.97      0.26     40.05      0.03      0.00      0.00      0.03      0.00      0.00     56.65
00:01 PM     all      2.92      0.26     40.03      0.03      0.00      0.00      0.04      0.00      0.00     56.72
10:01 PM     all      3.03      0.25     40.08      0.03      0.00      0.00      0.03      0.00      0.00     56.57
20:01 PM     all      2.95      0.26     40.03      0.03      0.00      0.00      0.03      0.00      0.00     56.70
30:01 PM     all      3.14      0.26     40.06      0.03      0.00      0.00      0.03      0.00      0.00     56.47
40:01 PM     all      2.97      0.26     40.05      0.03      0.00      0.00      0.03      0.00      0.00     56.67
50:01 PM     all      2.99      0.26     40.06      0.03      0.00      0.00      0.03      0.00      0.00     56.63
    Average:        all      3.36      0.29     16.80      0.09      0.00      0.00      0.03      0.00      0.00     79.43

It started increase exactly at nine AM on Friday

SAR command (Latest)
====================

    sudo sar -u
    Linux 3.10.0-1160.53.1.el7.x86_64 (prod-db2-node2)      02/08/2022      _x86_64_        (8 CPU)
    
00:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
10:02 AM     all      1.54      0.21     88.00      0.02      0.00     10.23
20:01 AM     all      1.50      0.22     87.99      0.01      0.00     10.28
30:01 AM     all      1.47      0.21     87.97      0.01      0.00     10.34
40:01 AM     all      1.48      0.22     87.98      0.01      0.00     10.31
50:01 AM     all      1.47      0.21     88.00      0.01      0.00     10.30
00:01 AM     all      1.70      0.22     87.98      0.01      0.00     10.10
10:01 AM     all      1.93      0.21     87.94      0.02      0.00      9.90
20:01 AM     all      1.50      0.22     88.00      0.01      0.00     10.27
30:02 AM     all      1.51      0.21     87.97      0.01      0.00     10.29
40:01 AM     all      1.51      0.21     87.95      0.01      0.00     10.32
50:01 AM     all      1.46      0.21     87.96      0.02      0.00     10.35
00:02 AM     all      1.49      0.22     87.95      0.01      0.00     10.32
10:02 AM     all      1.53      0.22     87.93      0.01      0.00     10.31
20:02 AM     all      1.44      0.22     87.95      0.01      0.00     10.38
30:01 AM     all      1.70      0.21     87.94      0.02      0.00     10.13
40:01 AM     all      1.44      0.21     87.95      0.02      0.00     10.38
50:01 AM     all      1.47      0.21     87.97      0.01      0.00     10.34
00:02 AM     all      1.43      0.21     87.94      0.01      0.00     10.40
10:01 AM     all      1.50      0.21     87.96      0.01      0.00     10.31
20:01 AM     all      1.51      0.23     87.97      0.01      0.00     10.28
30:02 AM     all      1.48      0.21     87.93      0.01      0.00     10.36
40:02 AM     all      1.47      0.22     87.94      0.02      0.00     10.35
50:01 AM     all      1.44      0.22     87.95      0.01      0.00     10.38
00:01 AM     all      1.64      0.21     87.94      0.02      0.00     10.19
10:01 AM     all      1.52      0.22     87.92      0.02      0.00     10.33
20:01 AM     all      1.45      0.22     87.92      0.02      0.00     10.40
30:02 AM     all      1.43      0.21     87.95      0.02      0.00     10.39
40:02 AM     all      1.48      0.22     87.95      0.02      0.00     10.33
50:01 AM     all      1.41      0.22     87.97      0.02      0.00     10.39
00:01 AM     all      1.48      0.22     87.94      0.02      0.00     10.35
10:01 AM     all      1.53      0.21     87.95      0.02      0.00     10.29
20:01 AM     all      1.45      0.22     87.96      0.01      0.00     10.36
30:02 AM     all      1.65      0.21     87.92      0.01      0.00     10.20
40:01 AM     all      1.49      0.21     87.94      0.01      0.00     10.35
50:01 AM     all      1.43      0.21     87.95      0.01      0.00     10.40
00:01 AM     all      1.47      0.21     87.93      0.01      0.00     10.38
10:01 AM     all      1.50      0.22     87.94      0.01      0.00     10.34
20:01 AM     all      1.44      0.22     87.96      0.01      0.00     10.38
30:01 AM     all      1.47      0.21     87.93      0.01      0.00     10.37
40:01 AM     all      1.43      0.21     87.94      0.01      0.00     10.40
50:01 AM     all      1.44      0.22     87.94      0.01      0.00     10.39
00:01 AM     all      1.75      0.22     88.04      0.01      0.00      9.98
10:01 AM     all      2.27      0.21     88.86      0.01      0.00      8.65
    Average:        all      1.53      0.21     87.98      0.01      0.00     10.27

VMSTAT Command
==============

    vmstat 1 -w
    procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
     r  b         swpd         free         buff        cache   si   so    bi    bo   in   cs  us  sy  id  wa  st
0            0      4380564       827180    179070080    0    0   231    41   11    9   3  48  48   0   0
0            0      4364796       827180    179070080    0    0     0   160 9274 3727   1  88  10   0   0
0            0      4359876       827184    179070080    0    0     0   176 9180 3915   1  88  11   0   0
0            0      4359372       827184    179070096    0    0  1664    36 9201 3607   1  88  11   0   0
0            0      4351796       827184    179070112    0    0  6656   156 9392 4170   1  89  10   0   0
0            0      4361172       827184    179070096    0    0  1664   208 9352 4380   1  89  10   0   0
0            0      4360752       827184    179070096    0    0     0    48 9179 3496   0  88  12   0   0
0            0      4362452       827184    179070096    0    0     0    12 9281 4572   1  89   9   0   0
0            0      4363568       827184    179070096    0    0     0   124 9197 3497   0  88  12   0   0
0            0      4364952       827184    179070096    0    0     0   140 9189 3682   0  88  11   0   0
0            0      4364640       827184    179070096    0    0     0    88 9195 3556   0  88  12   0   0
    


I checked for the logs and there nothing i could find that could cause the high cpu utilization

Now i know that TOP command is showing processes are related to Oracle DB. But Category shows that 89.8 in SystemSpace and not in UserSpace. 

Any advice on how to get what caused this spike 

Thanks


                                

OmiPenguin (4398 rep)

Feb 8, 2022, 04:52 AM • Last activity: Feb 9, 2022, 05:12 AM

0 votes

0 answers

459 views

HA-Cluster / corosync / pacemaker: Active-Active cluster with service ip / service ip is not switching

pacemaker high-availability corosync

How to configure crm to migrate the ServiceIP if one Service is failed? node 1: web01a \ attributes standby=off node 2: web01b \ attributes standby=off primitive Apache2 systemd:apache2 \ operations $id=Apache2-operations \ op start interval=0 timeout=100 \ op stop interval=0 timeout=100 \ op monito...

                                  How to configure crm to migrate the ServiceIP if one Service is failed?

    node 1: web01a \
		attributes standby=off
    node 2: web01b \
    	attributes standby=off
    primitive Apache2 systemd:apache2 \
    	operations $id=Apache2-operations \
    	op start interval=0 timeout=100 \
    	op stop interval=0 timeout=100 \
    	op monitor interval=15 timeout=100 start-delay=15 \
    	meta
    primitive PHP-FPM systemd:php7.4-fpm \
    	operations $id=PHP-FPM-operations \
    	op start interval=0 timeout=100 \
    	op stop interval=0 timeout=100 \
    	op monitor interval=15 timeout=100 start-delay=15 \
    	meta
    primitive Redis systemd:redis-server \
    	operations $id=Redis-operations \
    	op start interval=0 timeout=100 \
    	op stop interval=0 timeout=100 \
    	op monitor interval=15 timeout=100 start-delay=15 \
    	meta
    primitive ServiceIP IPaddr2 \
    	params ip=1.2.3.4 \
    	operations $id=ServiceIP-operations \
    	op monitor interval=10 timeout=20 start-delay=0 \
    	op_params migration-threshold=1 \
    	meta
    primitive lsyncd systemd:lsyncd \
    	op start interval=0 timeout=100 \
    	op stop interval=0 timeout=100 \
    	op monitor interval=15 timeout=100 start-delay=15 \
    	meta target-role=Started
    group ActiveNode ServiceIP lsyncd
    group WebServer Apache2 PHP-FPM Redis
    clone cl_WS WebServer \
    	meta clone-max=2 notify=true interleave=true
    colocation col_cl_WS_ActiveNode 100: cl_WS ActiveNode
    property cib-bootstrap-options: \
    	have-watchdog=false \
    	dc-version=2.0.3-4b1f869f0f \
    	cluster-infrastructure=corosync \
    	cluster-name=debian \
    	stonith-enabled=false \
    	no-quorum-policy=ignore \
    	startup-fencing=false \
    	maintenance-mode=false \
    	last-lrm-refresh=1622628525 \
    	start-failure-is-fatal=true

These services should always be started
- Apache2
- PHP-FPM
- Redis

If one of these services is not running, the node is unhelthy.
The **ServiceIP** and **lsyncd** should switch to an healthy node.

When I killed the apache2 process, the IP is not switched.


                                

FaxMax (726 rep)

Jun 2, 2021, 12:29 PM

1 votes

0 answers

143 views

Stop a pacemaker node when local shell script returns an error?

pacemaker high-availability pcs

Is it possible to make pacemaker stopping a node when a local test script fails, and start a node if the local test script returns true again? This seems like a very simple problem, but as i can't find ANY way to do this within pacemaker, I'm about to run the following shell script on all my nodes:...

                                  Is it possible to make pacemaker stopping a node when a local test script fails, and start a node if the local test script returns true again?

This seems like a very simple problem, but as i can't find ANY way to do this within pacemaker, I'm about to run the following shell script on all my nodes:

    while true; do
      
      pcs status 2>/dev/null >/dev/null && node_running=true
      /is_node_healthy.sh && node_healthy=true
    
      [[ -v node_running ]] && ! [[ -v node_healthy ]] && pcs cluster stop
      [[ -v node_healthy ]] && ! [[ -v node_running ]] && pcs cluster start
    
      unset node_running node_healthy
      sleep 10
    done

This does exactly what i want, but looks like a very dirty hack in my eyes. Is there a more elegant way to get the same thing done by pacemaker itself?

BTW: The overall task i want to solve seems quite simple: create a ha cluster that has a public ip address assigned to a vital host, where vitality can be checked with /is_node_healthy.sh
                                

psicolor (11 rep)

Feb 22, 2021, 11:54 AM

1 votes

1 answers

393 views

fence_virtualbox failed to reboot

virtualbox virtual-machine pacemaker high-availability pcs

I’m learning how to fence pacemaker using fence_virtualbox from [\[ClusterLabs\] Fence agent for VirtualBox][1], but I can’t get it working. When I try to run `stonith_admin –-reboot ` it failed. Currently, my setup is: Node ID: VM name: orcllinux1 OL7 orcllinux2 OL7_2 I set it up using: `pcs stonit...

                                  I’m learning how to fence pacemaker using fence_virtualbox from [\[ClusterLabs\] Fence agent for VirtualBox][1] , but I can’t get it working. When I try to run stonith_admin –-reboot  it failed. 

Currently, my setup is:

    Node ID:		VM name:
    orcllinux1		OL7
    orcllinux2		OL7_2

I set it up using:

pcs stonith create fence_vbox fence_virtualbox pcmk_host_map=”orcllinux1:OL7,orcllinux2:OL7_2” pcmk_host_list=”orcllinux1,orcllinux2” pcmk_host_check=static_list ipaddr=”192.168.57.1” login=”root”

But stonith_admin –-reboot  resulting in this error:

I tried to use the fence_virtualbox manually using:

    fence_virtualbox -s 192.168.57.1 -p OL7 -o=reboot

and it succeeded.

Is my stonith create syntax wrong? What's the right syntax if it's wrong?

Christophorus Reyhan (33 rep)

Jan 8, 2021, 11:16 AM • Last activity: Feb 16, 2021, 03:51 AM

2 votes

1 answers

5355 views

Pacemaker - Corosync - HA - Simple Custom Resource Testing - Status flapping - Started - Failed - Stopped - Started

scripting pacemaker high-availability corosync

I am testing using the OCF:Heartbeat:Dummy script and I want to make a very basic setup just to know it works and build on that. The only information I can find was this web blog here. https://raymii.org/s/tutorials/Corosync_Pacemaker_-_Execute_a_script_on_failover.html It has some typos but basical...

                                  I am testing using the OCF:Heartbeat:Dummy script and I want to make a very basic setup just to know it works and build on that.

The only information I can find was this web blog here. 
https://raymii.org/s/tutorials/Corosync_Pacemaker_-_Execute_a_script_on_failover.html 

It has some typos but basically worked for me.

The script currently just contains the following :

    sudo nano /usr/local/bin/failover.sh && sudo chmod +x /usr/local/bin/failover.sh
    
    #!/bin/sh
    
    touch /tmp/testfailover.sh

Here is my setup :

    cp /usr/lib/ocf/resource.d/heartbeat/Dummy /usr/lib/ocf/resource.d/heartbeat/FailOverScript
    
    sudo nano /usr/lib/ocf/resource.d/heartbeat/FailOverScript
    
    dummy_start() {
        dummy_monitor
        /usr/local/bin/failover.sh
        if [ $? =  $OCF_SUCCESS ]; then
        return $OCF_SUCCESS
        fi
        touch ${OCF_RESKEY_state}
    }
    
    sed -i 's/Dummy/FailOverScript/g' /usr/lib/ocf/resource.d/heartbeat/FailOverScript
    
    sed -i 's/dummy/FailOverScript/g' /usr/lib/ocf/resource.d/heartbeat/FailOverScript
    
    pcs resource create FailOverScript ocf:heartbeat:FailOverScript op monitor interval="30"

The only testing I can really do :

    [root@node2 ~]# /usr/lib/ocf/resource.d/heartbeat/FailOverScript start ; echo $?
    DEBUG: default start : 0
    0

ocf-tester doesn't seem to exist in the latest HA Software Suite, not really sure how to manually install it, but the script "half works".

**The script doesn't need monitoring, its supposed to be very basic, but it seems to be flapping and giving me the following error code. Any idea's what to do?**

    FailOverScript (ocf::heartbeat:FailOverScript):        Started
    node2
    
    Failed Actions:
    * FailOverScript_monitor_30000 on node2 'not running' (7): call=
    24423, status=complete, exitreason='none',
        last-rc-change='Tue Aug 16 15:53:50 2016', queued=0ms, exec=
    9ms

**Example of what I want to do:**

Cluster start

Script runs "start.sh"

Cluster fails over to node2.

On node1 script runs "fail.sh"

On node2 script runs "start.sh"

and vis versa if it fails the other direction.

Note: The script does work, I get /tmp/testfailover.sh. I even tried putting another script under dummy_stop to remove the file and that worked, but it just keeps flapping along removing/adding/removing/adding file and starting/failing/stoping/starting etc etc.

Thanks for reading!

FreeSoftwareServers (2682 rep)

Aug 16, 2016, 07:56 PM • Last activity: Dec 21, 2020, 06:56 AM

0 votes

1 answers

1741 views

Pacemaker apache resource is Failed to access httpd status page after change to HTTPS

apache-httpd cluster pacemaker high-availability heartbeat

I get this error from pacemaker after i change apache from http to https. now my ocf::heartbeat:apache resource is not find status page. I generate SSL certificate separately for 3 servers. Everything was working fine when running on http but as soon as I added the (self-signed) SSL certificate pace...

                                  I get this error from pacemaker after i change apache from http to https. 
now my ocf::heartbeat:apache resource is not find status page. 

I generate SSL certificate separately for 3 servers. 

Everything was working fine when running on http but as soon as I added the (self-signed) SSL certificate
pacemaker Apache (ocf::heartbeat:apache):        Stopped

And error shows 

    Failed Actions:
    * Apache_start_0 on server3 'unknown error' (1): call=315, status=complete, exitreason='Failed to access httpd status page.',
        last-rc-change='Mon Sep 21 16:22:37 2020', queued=0ms, exec=3456ms
    * Apache_start_0 on server1 'unknown error' (1): call=59, status=complete, exitreason='Failed to access httpd status page.',
        last-rc-change='Mon Sep 21 16:22:41 2020', queued=0ms, exec=3421ms
    * Apache_start_0 on server2 'unknown error' (1): call=197, status=complete, exitreason='Failed to access httpd status page.',
        last-rc-change='Mon Sep 21 16:22:33 2020', queued=0ms, exec=3451ms




/etc/apache2/sites-available/000-default.conf

    
            ServerAdmin webmaster@localhost
            DocumentRoot /var/www/html
            Redirect "/" "https://10.226.***.***/ "
    
    
     SetHandler server-status        ServerAdmin webmaster@localhost
            DocumentRoot /var/www/html
            Redirect "/" "https://10.226.179.205/ "
    
     Order deny,allow
     Deny from all
     Allow from 127.0.0.1
    
    

*pcs resource debug-monitor --full Apache*

    Operation monitor for Apache (ocf:heartbeat:apache) returned 1
     >  stderr: + echo
     >  stderr: + printenv
     >  stderr: + sort
     >  stderr: + env=
     >  stderr: AONIX_LM_DIR=/home/TeleUSE/etc
     >  stderr: BXwidgets=/home/BXwidgets
     >  stderr: HA_logfacility=none
     >  stderr: HOME=/root
     >  stderr: LC_ALL=C
     >  stderr: LOGNAME=root
     >  stderr: MAIL=/var/mail/root
     >  stderr: OCF_EXIT_REASON_PREFIX=ocf-exit-reason:
     >  stderr: OCF_RA_VERSION_MAJOR=1
     >  stderr: OCF_RA_VERSION_MINOR=0
     >  stderr: OCF_RESKEY_CRM_meta_class=ocf
     >  stderr: OCF_RESKEY_CRM_meta_id=Apache
     >  stderr: OCF_RESKEY_CRM_meta_migration_threshold=5
     >  stderr: OCF_RESKEY_CRM_meta_provider=heartbeat
     >  stderr: OCF_RESKEY_CRM_meta_resource_stickiness=10
     >  stderr: OCF_RESKEY_CRM_meta_type=apache
     >  stderr: OCF_RESKEY_configfile=/etc/apache2/apache2.conf
     >  stderr: OCF_RESKEY_statusurl=http://localhost/server-status
     >  stderr: OCF_RESOURCE_INSTANCE=Apache
     >  stderr: OCF_RESOURCE_PROVIDER=heartbeat
     >  stderr: OCF_RESOURCE_TYPE=apache
     >  stderr: OCF_ROOT=/usr/lib/ocf
     >  stderr: OCF_TRACE_RA=1
     >  stderr: PATH=/root/.rbenv/shims:/root/.rbenv/bin:/root/.rbenv/shims:/root/.rbenv/bin:/usr/local/bin:/home/TeleUSE/bin:/home/xrt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/ucb
     >  stderr: PCMK_logfacility=none
     >  stderr: PCMK_service=crm_resource
     >  stderr: PWD=/root
     >  stderr: RBENV_SHELL=bash
     >  stderr: SHELL=/bin/bash
     >  stderr: SHLVL=1
     >  stderr: SSH_CLIENT=10.12.116.46 63097 22
     >  stderr: SSH_CONNECTION=10.12.116.46 63097 10.226.179.205 22
     >  stderr: SSH_TTY=/dev/pts/0
     >  stderr: TERM=xterm
     >  stderr: TeleUSE=/home/TeleUSE
     >  stderr: USER=root
     >  stderr: _=/usr/sbin/pcs
     >  stderr: __OCF_TRC_DEST=
     >  stderr: __OCF_TRC_MANAGE=
     >  stderr: + ocf_is_true
     >  stderr: + false
     >  stderr: + . /usr/lib/ocf/lib/heartbeat/apache-conf.sh
     >  stderr: + . /usr/lib/ocf/lib/heartbeat/http-mon.sh
     >  stderr: + bind_address=127.0.0.1
     >  stderr: + curl_ipv6_opts=
     >  stderr: + ocf_is_true
     >  stderr: + false
     >  stderr: + echo
     >  stderr: + grep -qs ::
     >  stderr: + WGETOPTS=-O- -q -L --no-proxy --bind-address=127.0.0.1
     >  stderr: + CURLOPTS=-o - -Ss -L --interface lo
     >  stderr: + HA_VARRUNDIR=/var/run
     >  stderr: + IBMHTTPD=/opt/IBMHTTPServer/bin/httpd
     >  stderr: + HTTPDLIST=/sbin/httpd2 /usr/sbin/httpd2 /usr/sbin/apache2 /sbin/httpd /usr/sbin/httpd /usr/sbin/apache /opt/IBMHTTPServer/bin/httpd
     >  stderr: + MPM=/usr/share/apache2/find_mpm
     >  stderr: + [ -x /usr/share/apache2/find_mpm ]
     >  stderr: + LOCALHOST=http://localhost
     >  stderr: + HTTPDOPTS=-DSTATUS
     >  stderr: + DEFAULT_IBMCONFIG=/opt/IBMHTTPServer/conf/httpd.conf
     >  stderr: + DEFAULT_SUSECONFIG=/etc/apache2/httpd.conf
     >  stderr: + DEFAULT_RHELCONFIG=/etc/httpd/conf/httpd.conf
     >  stderr: + DEFAULT_DEBIANCONFIG=/etc/apache2/apache2.conf
     >  stderr: + basename /usr/lib/ocf/resource.d/heartbeat/apache
     >  stderr: + CMD=apache
     >  stderr: + OCF_REQUIRED_PARAMS=
     >  stderr: + OCF_REQUIRED_BINARIES=
     >  stderr: + ocf_rarun monitor
     >  stderr: + mk_action_func
     >  stderr: + echo apache_monitor
     >  stderr: + tr - _
     >  stderr: + ACTION_FUNC=apache_monitor
     >  stderr: + validate_args
     >  stderr: + is_function apache_monitor
     >  stderr: + command -v apache_monitor
     >  stderr: + test zapache_monitor = zapache_monitor
     >  stderr: + simple_actions
     >  stderr: + check_required_params
     >  stderr: + local v
     >  stderr: + run_function apache_getconfig
     >  stderr: + is_function apache_getconfig
     >  stderr: + command -v apache_getconfig
     >  stderr: + test zapache_getconfig = zapache_getconfig
     >  stderr: + apache_getconfig
     >  stderr: + HTTPD=
     >  stderr: + PORT=
     >  stderr: + STATUSURL=http://localhost/server-status
     >  stderr: + CONFIGFILE=/etc/apache2/apache2.conf
     >  stderr: + OPTIONS=
     >  stderr: + CLIENT=
     >  stderr: + TESTREGEX=
     >  stderr: + TESTURL=
     >  stderr: + TESTREGEX10=
     >  stderr: + TESTCONFFILE=
     >  stderr: + TESTNAME=
     >  stderr: + : /etc/apache2/envvars
     >  stderr: + source_envfiles /etc/apache2/envvars
     >  stderr: + [ -f /etc/apache2/envvars -a -r /etc/apache2/envvars ]
     >  stderr: + . /etc/apache2/envvars
     >  stderr: + unset HOME
     >  stderr: + [  !=  ]
     >  stderr: + SUFFIX=
     >  stderr: + export APACHE_RUN_USER=www-data
     >  stderr: + export APACHE_RUN_GROUP=www-data
     >  stderr: + export APACHE_PID_FILE=/var/run/apache2/apache2.pid
     >  stderr: + export APACHE_RUN_DIR=/var/run/apache2
     >  stderr: + export APACHE_LOCK_DIR=/var/lock/apache2
     >  stderr: + export APACHE_LOG_DIR=/var/log/apache2
     >  stderr: + export LANG=C
     >  stderr: + export LANG
     >  stderr: + [ X = X -o ! -f  -o ! -x  ]
     >  stderr: + find_httpd_prog
     >  stderr: + HTTPD=
     >  stderr: + [ -f /sbin/httpd2 -a -x /sbin/httpd2 ]
     >  stderr: + [ -f /usr/sbin/httpd2 -a -x /usr/sbin/httpd2 ]
     >  stderr: + [ -f /usr/sbin/apache2 -a -x /usr/sbin/apache2 ]
     >  stderr: + HTTPD=/usr/sbin/apache2
     >  stderr: + break
     >  stderr: + [ X != X -a X/usr/sbin/apache2 != X ]
     >  stderr: + detect_default_config
     >  stderr: + [ -f /etc/apache2/httpd.conf ]
     >  stderr: + [ -f /etc/apache2/apache2.conf ]
     >  stderr: + echo /etc/apache2/apache2.conf
     >  stderr: + DefaultConfig=/etc/apache2/apache2.conf
     >  stderr: + CONFIGFILE=/etc/apache2/apache2.conf
     >  stderr: + [ -n /usr/sbin/apache2 ]
     >  stderr: + basename /usr/sbin/apache2
     >  stderr: + httpd_basename=apache2
     >  stderr: + GetParams /etc/apache2/apache2.conf
     >  stderr: + ConfigFile=/etc/apache2/apache2.conf
     >  stderr: + [ ! -f /etc/apache2/apache2.conf ]
     >  stderr: + get_apache_params /etc/apache2/apache2.conf ServerRoot PidFile Port Listen
     >  stderr: + configfile=/etc/apache2/apache2.conf
     >  stderr: + shift 1
     >  stderr: + echo ServerRoot PidFile Port Listen
     >  stderr: + sed s/ /,/g
     >  stderr: + vars=ServerRoot,PidFile,Port,Listen
     >  stderr: + apachecat /etc/apache2/apache2.conf
     >  stderr: + awk -v vars=ServerRoot,PidFile,Port,Listen
     >  stderr:     BEGIN{
     >  stderr:             split(vars,v,",");
     >  stderr:             for( i in v )
     >  stderr:                     vl[i]=tolower(v[i]);
     >  stderr:     }
     >  stderr:     {
     >  stderr:             for( i in v )
     >  stderr:                     if( tolower($1)==vl[i] ) {
     >  stderr:                     print v[i]"="$2
     >  stderr:                     delete vl[i]
     >  stderr:                     break
     >  stderr:             }
     >  stderr:     }
     >  stderr:
     >  stderr: + awk
     >  stderr:     function procline() {
     >  stderr:             split($0,a);
     >  stderr:             if( a~/^[Ii]nclude$/ ) {
     >  stderr:                     includedir=a;
     >  stderr:                     gsub("\"","",includedir);
     >  stderr:                     procinclude(includedir);
     >  stderr:             } else {
     >  stderr:                     if( a=="ServerRoot" ) {
     >  stderr:                             rootdir=a;
     >  stderr:                             gsub("\"","",rootdir);
     >  stderr:                     }
     >  stderr:                     print;
     >  stderr:             }
     >  stderr:     }
     >  stderr:     function printfile(infile, a) {
     >  stderr:             while( (getline 0 ) {
     >  stderr:                     procline();
     >  stderr:             }
     >  stderr:             close(infile);
     >  stderr:     }
     >  stderr:     function allfiles(dir, cmd,f) {
     >  stderr:             cmd="find -L "dir" -type f";
     >  stderr:             while( ( cmd | getline f ) > 0 ) {
     >  stderr:                     printfile(f);
     >  stderr:             }
     >  stderr:             close(cmd);
     >  stderr:     }
     >  stderr:     function listfiles(pattern, cmd,f) {
     >  stderr:             cmd="ls "pattern" 2>/dev/null";
     >  stderr:             while( ( cmd | getline f ) > 0 ) {
     >  stderr:                     printfile(f);
     >  stderr:             }
     >  stderr:             close(cmd);
     >  stderr:     }
     >  stderr:     function procinclude(spec) {
     >  stderr:             if( rootdir!="" && spec!~/^\// ) {
     >  stderr:                     spec=rootdir"/"spec;
     >  stderr:             }
     >  stderr:             if( isdir(spec) ) {
     >  stderr:                     allfiles(spec); # read all files in a directory (and subdirs)
     >  stderr:             } else {
     >  stderr:                     listfiles(spec); # there could be jokers
     >  stderr:             }
     >  stderr:     }
     >  stderr:     function isdir(s) {
     >  stderr:             return !system("test -d \""s"\"");
     >  stderr:     }
     >  stderr:     { procline(); }
     >  stderr:      /etc/apache2/apache2.conf
     >  stderr: + sed s/#.*//;s/[[:blank:]]*$//;s/^[[:blank:]]*//
     >  stderr: + grep -v ^$
     >  stderr: + eval PidFile=${APACHE_PID_FILE}
     >  stderr: + PidFile=/var/run/apache2/apache2.pid
     >  stderr: + CheckPort
     >  stderr: + ocf_is_decimal
     >  stderr: + false
     >  stderr: + CheckPort
     >  stderr: + ocfError performing operation: Operation not permitted
    _is_decimal
     >  stderr: + false
     >  stderr: + CheckPort 80
     >  stderr: + ocf_is_decimal 80
     >  stderr: + true
     >  stderr: + [ 80 -gt 0 ]
     >  stderr: + PORT=80
     >  stderr: + break
     >  stderr: + echo
     >  stderr: + grep :
     >  stderr: + Listen=localhost:
     >  stderr: + [ Xhttp://localhost/server-status = X ]
     >  stderr: + test /var/run/apache2/apache2.pid
     >  stderr: + return 0
     >  stderr: + validate_env
     >  stderr: + check_required_binaries
     >  stderr: + local v
     >  stderr: + is_function apache_validate_all
     >  stderr: + command -v apache_validate_all
     >  stderr: + test zapache_validate_all = zapache_validate_all
     >  stderr: + local rc
     >  stderr: + LSB_STATUS_STOPPED=3
     >  stderr: + apache_validate_all
     >  stderr: + [ -z /usr/sbin/apache2 ]
     >  stderr: + [ ! -x /usr/sbin/apache2 ]
     >  stderr: + [ ! -f /etc/apache2/apache2.conf ]
     >  stderr: + [ -n  ]
     >  stderr: + [ -n  ]
     >  stderr: + dirname /var/run/apache2/apache2.pid
     >  stderr: + local a
     >  stderr: + local b
     >  stderr: + [ 1 = 1 ]
     >  stderr: + a=/var/run/apache2/apache2.pid
     >  stderr: + [ 1 ]
     >  stderr: + b=/var/run/apache2/apache2.pid
     >  stderr: + [ /var/run/apache2/apache2.pid = /var/run/apache2/apache2.pid ]
     >  stderr: + break
     >  stderr: + b=/var/run/apache2
     >  stderr: + [ -z /var/run/apache2 -o /var/run/apache2/apache2.pid = /var/run/apache2 ]
     >  stderr: + echo /var/run/apache2
     >  stderr: + return 0
     >  stderr: + ocf_mkstatedir root 755 /var/run/apache2
     >  stderr: + local owner
     >  stderr: + local perms
     >  stderr: + local path
     >  stderr: + owner=root
     >  stderr: + perms=755
     >  stderr: + path=/var/run/apache2
     >  stderr: + test -d /var/run/apache2
     >  stderr: + return 0
     >  stderr: + return 0
     >  stderr: + rc=0
     >  stderr: + [ 0 -ne 0 ]
     >  stderr: + ocf_is_probe
     >  stderr: + [ monitor = monitor -a 0 = 0 ]
     >  stderr: + run_probe
     >  stderr: + is_function apache_probe
     >  stderr: + command -v apache_probe
     >  stderr: + test z = zapache_probe
     >  stderr: + shift 1
     >  stderr: + apache_monitor
     >  stderr: + silent_status
     >  stderr: + local pid
     >  stderr: + get_pid
     >  stderr: + [ -f /var/run/apache2/apache2.pid ]
     >  stderr: + cat /var/run/apache2/apache2.pid
     >  stderr: + pid=17552
     >  stderr: + [ -n 17552 ]
     >  stderr: + ProcessRunning 17552
     >  stderr: + local pid=17552
     >  stderr: + [ -d /proc -a -d /proc/1 ]
     >  stderr: + [ -d /proc/17552 ]
     >  stderr: + [ 0 -ne 0 ]
     >  stderr: + findhttpclient
     >  stderr: + [ x != x ]
     >  stderr: + which wget
     >  stderr: + echo wget
     >  stderr: + ourhttpclient=wget
     >  stderr: + [ -z wget ]
     >  stderr: + ocf_check_level 10
     >  stderr: + local lvl prev
     >  stderr: + lvl=0
     >  stderr: + prev=0
     >  stderr: + ocf_is_decimal 0
     >  stderr: + true
     >  stderr: + [ 10 -eq 0 ]
     >  stderr: + [ 10 -gt 0 ]
     >  stderr: + lvl=0
     >  stderr: + break
     >  stderr: + echo 0
     >  stderr: + apache_monitor_basic
     >  stderr: + wget_func http://localhost/server-status
     >  stderr: + auth=
     >  stderr: + cl_opts=-O- -q -L --no-proxy --bind-address=127.0.0.1
     >  stderr: + [ x !=+  x ]
     >  stderr: grep+ wget -Ei -O-  -q
     >  stderr:  -L --no-proxy --bind-address=127.0.0.1 http://localhost/server-status
     >  stderr: + attempt_index_monitor_request
     >  stderr: + local indexpage=
     >  stderr: + [ -n  ]
     >  stderr: + [ -n  ]
     >  stderr: + [ -n  ]
     >  stderr: + [ -n http://localhost/server-status ]
     >  stderr: + return 1
     >  stderr: + [ 1 -eq 0 ]
     >  stderr: + ocf_is_probe
     >  stderr: + [ monitor = monitor -a 0 = 0 ]
     >  stderr: + return 1

**pcs config**

     Resource: MasterVip (class=ocf provider=heartbeat type=IPaddr2)
      Attributes: ip=10.226.***.*** nic=lo cidr_netmask=32 iflabel=pgrepvip
      Meta Attrs: target-role=Started
      Operations: start interval=0s timeout=20s (MasterVip-start-interval-0s)
                  stop interval=0s timeout=20s (MasterVip-stop-interval-0s)
                  monitor interval=90s (MasterVip-monitor-interval-90s)
    
     Resource: Apache (class=ocf provider=heartbeat type=apache)
      Attributes: configfile=/etc/apache2/apache2.conf statusurl=http://localhost/server-status
      Operations: start interval=0s timeout=40s (Apache-start-interval-0s)
                  stop interval=0s timeout=60s (Apache-stop-interval-0s)
                  monitor interval=1min (Apache-monitor-interval-1min)

I don't know how to fix this. if anyone knows please help me. 




                                

Karippery (1 rep)

Sep 21, 2020, 03:04 PM • Last activity: Sep 22, 2020, 11:36 AM

1 votes

2 answers

4955 views

Keepalived not working?

centos high-availability

I'm trying to create HA for HAProxy using keepalived on CentOS 8, here's what I have: Virtual IP: 10.10.10.14 HAProxy Server 1: 10.10.10.15 HAProxy Server 2: 10.10.10.18 and my keepalived configuration on **MASTER**: vrrp_script chk_haproxy { script "killall -0 haproxy" # check the haproxy process i...

                                  I'm trying to create HA for HAProxy using keepalived on CentOS 8, here's what I have:

    Virtual IP: 10.10.10.14
    HAProxy Server 1: 10.10.10.15
    HAProxy Server 2: 10.10.10.18

and my keepalived configuration on **MASTER**:

    vrrp_script chk_haproxy {
      script "killall -0 haproxy" # check the haproxy process
      interval 2 # every 2 seconds
      weight 2 # add 2 points if OK
    }
    
    vrrp_instance VI_1 {
      interface ens190 
      state MASTER 
      virtual_router_id 51
      priority 101 
      virtual_ipaddress {
        10.10.10.14 
      }
      track_script {
        chk_haproxy
      }
    }

Keepalived config on **BACKUP**:

    vrrp_script chk_haproxy {
      script "killall -0 haproxy" # check the haproxy process
      interval 2 # every 2 seconds
      weight 2 # add 2 points if OK
    }
    
    vrrp_instance VI_1 {
      interface ens165 
      state BACKUP 
      virtual_router_id 51
      priority 100 
      virtual_ipaddress {
        10.10.10.14 
      }
      track_script {
        chk_haproxy
      }
    }

But every time I try to stop my HAProxy process it won't connect to the backup server. Instead it only works on the server with the recent start of keepalived.

My ip -a command would return like this for **Master**:

    inet 10.10.10.15/24 brd 10.10.10.255 scope global noprefixroute ens190
    inet 10.10.10.14/32 scope global ens190

For **Backup**:

    inet 10.10.10.18/24 brd 10.10.10.255 scope global noprefixroute ens165
    inet 10.10.10.14/32 scope global ens165

Anything wrong? I have also set net.ipv4.ip_nonlocal_bind = 1 on my sysctl configuration. My logs only show the start and stop of the service?
                                

Gwynn (41 rep)

Jul 15, 2020, 05:35 AM • Last activity: Jul 15, 2020, 11:51 PM

2 votes

3 answers

1346 views

What is the best way to store a single counter persistently?

shell-script persistence high-availability

I have a simple bash script that increments a counter a few times per second, guaranteed less than 100 times per second. The script works fine, but I would like the counter to persist on machine crashes. What would be the best way to persist the counter on my SSD-only system? Should I just echo it o...

                                  I have a simple bash script that increments a counter a few times per second, guaranteed less than 100 times per second. The script works fine, but I would like the counter to persist on machine crashes.

What would be the best way to persist the counter on my SSD-only system? Should I just echo it out to /var// somewhere (i.e. store in a file) each time it updates? If so, is /var// the right place? Do I need to install a full database to keep track of this single value? Is there some cute little Linux feature built to do this effectively?

To clarify, my problem isn't making sure that the counter is persistent between separate runs of the script, I have that solved already. My concern is in case the system unexpectedly and suddenly fails due to machine crash (I can therefore not rely on a trap in a shell script).

00prometheus (813 rep)

Jun 1, 2020, 05:49 PM • Last activity: Jun 3, 2020, 11:50 AM

0 votes

2 answers

1078 views

Linux Pacemaker: Resource showing as "unrunnable start (blocked)" has been created

linux nfs sles pacemaker high-availability

We are using SLES 12 SP4 We have observed few things from the today testing. Following are the steps: **Step 1**: When we create kernel panic (on Node01) with the command “**echo 'b' > /proc/sysrq-trigger**” or “**echo 'c' > /proc/sysrq-trigger**” on the node where the resources are running, then the cluster detecting the change but unable to start any resources (except SBD) on other active node. **Step 2**: As per the logs we can find the following errors:

pengine:     info: LogActions:       Leave      stonith-sbd           (Started node02)
pengine:   notice: LogAction:      * Start      pri-javaiq            (node02 )   due to unrunnable nfs_filesystem start (blocked)
pengine:   notice: LogAction:      * Start      lb_health_probe       (node02 )   due to unrunnable nfs_filesystem start (blocked)
pengine:   notice: LogAction:      * Start      pri-ip_vip            (node02 )   due to unrunnable nfs_filesystem start (blocked)
pengine:   notice: LogAction:      * Start      nfs_filesystem        (node02 )   blocked

**Step 3**: But when we execute “init 6” on the node (on which we have created ‘kernel panic’), surprisingly the resources on other node are starting and running successfully.

Ram Too (1 rep)

May 14, 2020, 02:25 PM • Last activity: May 15, 2020, 04:47 PM

0 votes

0 answers

35 views

high availability of file in Linux

linux files high-availability

I have a very specific scenario. I have set of config files and I want to maintain it at two different paths which would be nothing but a copy of the same files. For some reason, if one of the location becomes unavailable my process should still access the files from the second location. How can I a...

                                  I have a very specific scenario. I have set of config files and I want to maintain it at two different paths which would be nothing but a copy of the same files. For some reason, if one of the location becomes unavailable my process should still access the files from the second location.

How can I achieve this scenario through something like a symbolic link that takes care of this file pointer based on the availability.

Any thoughts or ideas are highly appreciated.

Thanks much -

Parthi (1 rep)

Apr 30, 2020, 06:06 AM

1 votes

0 answers

310 views

OpenLDAP Cluster

proxy cluster openldap high-availability redundancy

Trying to implement an OpenLDAP cluster, I already managed to set up the two backend LDAP servers in mirroring mode. The application (iRedMail) using the LDAP service is running on the same systems as the LDAP servers. This applications needs the LDAP configuration in the former slapd.conf manner and not in the CONFIG-DB way. So I added the mirroring parameters to the slapd.conf file. The file looks like this on the first backend node:

include     /etc/openldap/schema/core.schema
include     /etc/openldap/schema/corba.schema
include     /etc/openldap/schema/cosine.schema
include     /etc/openldap/schema/inetorgperson.schema
include     /etc/openldap/schema/nis.schema
include     /etc/openldap/schema/calentry.schema
include     /etc/openldap/schema/calresource.schema
include     /etc/openldap/schema/amavisd-new.schema
include     /etc/openldap/schema/iredmail.schema

pidfile     /var/run/openldap/slapd.pid
argsfile    /var/run/openldap/slapd.args

# The syncprov overlay
moduleload syncprov.la

disallow    bind_anon
require     LDAPv3
loglevel    0

access to attrs="userPassword,mailForwardingAddress,employeeNumber"
    by anonymous    auth
    by self         write
    by dn.exact="cn=vmail,dc=myCompany,dc=de"   read
    by dn.exact="cn=vmailadmin,dc=myCompany,dc=de"  write
    by users        none

access to attrs="cn,sn,gn,givenName,telephoneNumber"
    by anonymous    auth
    by self         write
    by dn.exact="cn=vmail,dc=myCompany,dc=de"   read
    by dn.exact="cn=vmailadmin,dc=myCompany,dc=de"  write
    by users        read

access to attrs="objectclass,domainName,mtaTransport,enabledService,domainSenderBccAddress,domainRecipientBccAddress,domainBackupMX,domainMaxQuotaSize,domainMaxUserNumber,domainPendingAliasName"
    by anonymous    auth
    by self         read
    by dn.exact="cn=vmail,dc=myCompany,dc=de"   read
    by dn.exact="cn=vmailadmin,dc=myCompany,dc=de"  write
    by users        read

access to attrs="domainAdmin,domainGlobalAdmin,domainSenderBccAddress,domainRecipientBccAddress"
    by anonymous    auth
    by self         read
    by dn.exact="cn=vmail,dc=myCompany,dc=de"   read
    by dn.exact="cn=vmailadmin,dc=myCompany,dc=de"  write
    by users        none

access to attrs="mail,accountStatus,domainStatus,userSenderBccAddress,userRecipientBccAddress,mailQuota,backupMailAddress,shadowAddress,memberOfGroup,member,uniqueMember,storageBaseDirectory,homeDirectory,mailMessageStore,mailingListID"
    by anonymous    auth
    by self         read
    by dn.exact="cn=vmail,dc=myCompany,dc=de"   read
    by dn.exact="cn=vmailadmin,dc=myCompany,dc=de"  write
    by users        read

access to dn="cn=vmail,dc=myCompany,dc=de"
    by anonymous                    auth
    by self                         write
    by users                        none

access to dn="cn=vmailadmin,dc=myCompany,dc=de"
    by anonymous                    auth
    by self                         write
    by users                        none

access to dn.regex="domainName=([^,]+),o=domains,dc=myCompany,dc=de$"
    by anonymous                    auth
    by self                         write
    by dn.exact="cn=vmail,dc=myCompany,dc=de"   read
    by dn.exact="cn=vmailadmin,dc=myCompany,dc=de"  write
    by dn.regex="mail=[^,]+@$1,o=domainAdmins,dc=myCompany,dc=de$" write
    by dn.regex="mail=[^,]+@$1,ou=Users,domainName=$1,o=domains,dc=myCompany,dc=de$" read
    by users                        none

access to dn.subtree="o=domains,dc=myCompany,dc=de"
    by anonymous                    auth
    by self                         write
    by dn.exact="cn=vmail,dc=myCompany,dc=de"    read
    by dn.exact="cn=vmailadmin,dc=myCompany,dc=de"  write
    by users                        read

access to dn.subtree="o=domainAdmins,dc=myCompany,dc=de"
    by anonymous                    auth
    by self                         write
    by dn.exact="cn=vmail,dc=myCompany,dc=de"    read
    by dn.exact="cn=vmailadmin,dc=myCompany,dc=de"  write
    by users                        none

access to dn.regex="cn=[^,]+,dc=myCompany,dc=de"
    by anonymous                    auth
    by self                         write
    by users                        none

access to *
    by anonymous                    auth
    by self                         write
    by users                        read

database monitor
access to dn="cn=monitor"
    by dn.exact="cn=Manager,dc=myCompany,dc=de" read
    by dn.exact="cn=vmail,dc=myCompany,dc=de" read
    by * none

database    mdb
suffix      dc=myCompany,dc=de
directory   /var/lib/ldap/myCompany.de
rootdn      cn=Manager,dc=myCompany,dc=de
rootpw      {SSHA}V5/UQXm9SmzRGjKK2zAKB79eFSaysc2wG9tPIg==
sizelimit   unlimited
maxsize     2147483648
checkpoint  128 3
mode        0700

index objectclass,entryCSN,entryUUID                eq
index uidNumber,gidNumber,uid,memberUid,loginShell  eq,pres
index homeDirectory,mailMessageStore                eq,pres
index ou,cn,mail,surname,givenname,telephoneNumber,displayName  eq,pres,sub
index nisMapName,nisMapEntry                        eq,pres,sub
index shadowLastChange                              eq,pres
index member,uniqueMember eq,pres

index domainName,mtaTransport,accountStatus,enabledService,disabledService  eq,pres,sub
index domainAliasName    eq,pres,sub
index domainMaxUserNumber eq,pres
index domainAdmin,domainGlobalAdmin,domainBackupMX    eq,pres,sub
index domainSenderBccAddress,domainRecipientBccAddress  eq,pres,sub

index accessPolicy,hasMember,listAllowedUser,mailingListID   eq,pres,sub

index mailForwardingAddress,shadowAddress   eq,pres,sub
index backupMailAddress,memberOfGroup   eq,pres,sub
index userRecipientBccAddress,userSenderBccAddress  eq,pres,sub
index mobile,departmentNumber eq,pres,sub

#Mirror Mode
serverID    001

# Consumer
syncrepl rid=001 \
provider=ldap://rm2.myCompany.de \
bindmethod=simple \
binddn="cn=vmail,dc=myCompany,dc=de" \
credentials="gtV9FwILIcp8Zw8YtGeB1AC9GbGfti" \
searchbase="dc=myCompany,dc=de" \
attrs="*,+" \
type=refreshAndPersist \
interval=00:00:01:00 \
retry="60 +"
# Provider
overlay syncprov
syncprov-checkpoint 50 1
syncprov-sessionlog 50

mirrormode on

There are only two differences in the second node's config file:

[...]
#Mirror Mode
serverID    002
[...]

# Consumer
[...]
provider=ldap://rm2.myCompany.de \
[...]

As mentioned before the mirroring works perfectly. Now I need a single connection address for the LDAP clients, i.e. web applications using LDAP as authentication mechanism. I read that you can use an OpenLDAP proxy for that purpose. The LDAP client (here: web application) connects to the LDAP proxy and the proxy will retrieve the authentication data from multiple backend LDAP servers. I set up an OpenLDAP proxy, it uses CONFIG-DB, not the ancient way. The slapd.conf file looks like this:

include         /etc/openldap/schema/corba.schema
include         /etc/openldap/schema/core.schema
include         /etc/openldap/schema/cosine.schema
include         /etc/openldap/schema/duaconf.schema
include         /etc/openldap/schema/dyngroup.schema
include         /etc/openldap/schema/inetorgperson.schema
include         /etc/openldap/schema/java.schema
include         /etc/openldap/schema/misc.schema
include         /etc/openldap/schema/nis.schema
include         /etc/openldap/schema/openldap.schema
include         /etc/openldap/schema/ppolicy.schema
 
pidfile         /var/run/openldap/slapd.pid
argsfile        /var/run/openldap/slapd.args

modulepath  /usr/lib/openldap
modulepath  /usr/lib64/openldap
moduleload  back_ldap.la       
loglevel	0

database		ldap
readonly		yes            
protocol-version	3
rebind-as-user
uri			"ldap://rm1.myCompany.de:389"
suffix		        "dc=myCompany,dc=de"
uri                     "ldap://rm2.myCompany.de:389"
suffix		        "dc=myCompany,dc=de"

First issue: Creating the CONFIG-DB using slaptest, the command fails, claiming:

5dc44107 /etc/openldap/slapd.conf: line 48: suffix already served by this backend!.
slaptest: bad configuration directory!

The slaptest command looks like this:

slaptest -f /etc/openldap/slapd.conf -F /etc/openldap/slapd.d/

It is possible that I didn't understand completely the concept, because all guides I found are using subdomain prefixes for the different LDAP backend servers, i.e. instead of:

uri			"ldap://rm1.myCompany.de:389"
suffix		        "dc=myCompany,dc=de"
uri                     "ldap://rm2.myCompany.de:389"
suffix		        "dc=myCompany,dc=de"

they use:

uri            "ldap://rm1.myCompany.de:389"
suffix		   "dc=ou1,dc=myCompany,dc=de"
uri            "ldap://rm2.myCompany.de:389"
suffix		   "dc=ou2,dc=myCompany,dc=de"

What I didn't understand: On the backend servers there is no ou1 and ou2 respectively. How can they expect to find anything in the backend LDAPs if the DNs do not match? I temporarily commented the second uri in order to check if, apart from this issue, LDAP queries to the LDAP proxy succeed, but ran into the second issue. Second issue: If I run an ldapsearch against directly to the two backend LDAP servers (one after the other), all of the LDAP users will be enumerated. If I run the same ldapsearch against the LDAP proxy, only the user "vmail" will be enumerated. I think that the same users should be listed as in the direct query. This is the ldapsearch command:

ldapsearch -D "cn=vmail,dc=myCompany,dc=de" -w gtV9FwILIcp8Zw8YtGeB1AC9GbGfti -p 389 -h 192.168.0.92 -b "dc=myCompany,dc=de" -s sub "(objectclass=person)"

Did I miss sth.? Thank you for your considerations! Best regards, Florian

arminV (11 rep)

Nov 8, 2019, 10:45 AM

1 votes

1 answers

2580 views

Pacemaker: Primary node is rebooted and comes back is primary instead of standby

linux reboot pacemaker corosync high-availability

We are using pacemaker, corosync to automate failovers. We noticed one behaviour- when primary node is rebooted, the standby node takes over as primary - which is fine. When the node comes back online and services are started on it, it takes back the role of Primary. It should ideally start as standby. Are we missing any configuration? > pcs resource defaults O/p: resource-stickiness: INFINITY migration-threshold: 0 Stickiness is set to INFINITY. Please suggest. Adding Config details: ======================

[root@Node1 heartbeat]# pcs config show –l
Cluster Name: cluster1
Corosync Nodes:
 Node1 Node2
Pacemaker Nodes:
 Node1 Node2

Resources:
 Master: msPostgresql
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1
  Resource: pgsql (class=ocf provider=heartbeat type=pgsql)
   Attributes: master_ip=10.70.10.1 node_list="Node1 Node2" pgctl=/usr/pgsql-9.6/bin/pg_ctl pgdata=/var/lib/pgsql/9.6/data/ primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" psql=/usr/pgsql-9.6/bin/psql rep_mode=async restart_on_promote=true restore_command="cp /var/lib/pgsql/9.6/data/archivedir/%f %p"
   Meta Attrs: failure-timeout=60
   Operations: demote interval=0s on-fail=stop timeout=60s (pgsql-demote-interval-0s)
               methods interval=0s timeout=5s (pgsql-methods-interval-0s)
               monitor interval=4s on-fail=restart timeout=60s (pgsql-monitor-interval-4s)
               monitor interval=3s on-fail=restart role=Master timeout=60s (pgsql-monitor-interval-3s)
               notify interval=0s timeout=60s (pgsql-notify-interval-0s)
               promote interval=0s on-fail=restart timeout=60s (pgsql-promote-interval-0s)
               start interval=0s on-fail=restart timeout=60s (pgsql-start-interval-0s)
               stop interval=0s on-fail=block timeout=60s (pgsql-stop-interval-0s)
 Group: master-group
  Resource: vip-master (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=24 ip=10.70.10.2
   Operations: monitor interval=10s on-fail=restart timeout=60s (vip-master-monitor-interval-10s)
               start interval=0s on-fail=restart timeout=60s (vip-master-start-interval-0s)
               stop interval=0s on-fail=block timeout=60s (vip-master-stop-interval-0s)
  Resource: vip-rep (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=24 ip=10.70.10.1
   Meta Attrs: migration-threshold=0
   Operations: monitor interval=10s on-fail=restart timeout=60s (vip-rep-monitor-interval-10s)
               start interval=0s on-fail=stop timeout=60s (vip-rep-start-interval-0s)
               stop interval=0s on-fail=ignore timeout=60s (vip-rep-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
  promote msPostgresql then start master-group (score:INFINITY) (non-symmetrical)
  demote msPostgresql then stop master-group (score:0) (non-symmetrical)
Colocation Constraints:
  master-group with msPostgresql (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 resource-stickiness: INFINITY
 migration-threshold: 0
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster1
 cluster-recheck-interval: 60
 dc-version: 1.1.19-8.el7-c3c624ea3d
 have-watchdog: false
 no-quorum-policy: ignore
 start-failure-is-fatal: false
 stonith-enabled: false
Node Attributes:
 Node1: pgsql-data-status=STREAMING|ASYNC
 Node2: pgsql-data-status=LATEST

Quorum:
  Options:

Thanks !

User2019 (11 rep)

Sep 12, 2019, 09:30 AM • Last activity: Sep 16, 2019, 06:18 PM

1 votes

1 answers

1268 views

how to unexport NFS share on VCS HS cluster

linux nfs cluster high-availability

**see imp update at bottom of orig. question. not sure how to unexport only the 'world' mountable share? I have a NFS server which had a share with world-mountable permissions. To make it mountable only by the clients on a subnet i added the share to /etc/exports, which was empty before. I am not su...

                                  **see imp update at bottom of orig. question.

not sure how to unexport only the 'world' mountable share? I have a NFS server which had a share with world-mountable permissions. To make it mountable only by the clients on a subnet i added the share to /etc/exports, which was empty before. I am not sure how the folder was shared before?? I put the entry in /etc/exports and shared again, but it is still showing world mountable share available.


before:

    [root@nfsServer ~]# exportfs -v
    /export/home    (rw,wdelay,no_root_squash,no_subtree_check)
    
    # ls -l /var/lib/nfs/xtab
    -rw-r--r-- 1 root root 0 Dec 15  2009 /var/lib/nfs/xtab
    
    # ls -l /proc/fs/nfs
    -r--r--r-- 1 root root 0 May  2 00:41 exports


change:

added following line to /etc/exports (which was empty before)

    /export/home    192.168.253.0/24(rw,wdelay,no_root_squash,no_subtree_check)
    
    then re-export folders:
    # exportfs -ra



after:

    [root@nfsServer ~]# exportfs -v
    /export/home    192.168.253.0/24(rw,wdelay,no_root_squash,no_subtree_check)
    /export/home    (rw,wdelay,no_root_squash,no_subtree_check)
    
    # cat /etc/exports
    /export/home    192.168.253.0/24(rw,wdelay,no_root_squash,no_subtree_check)
    
    # ls -l /var/lib/nfs/xtab
    -rw-r--r-- 1 root root 0 Dec 15  2009 /var/lib/nfs/xtab
    
    # ls -l /proc/fs/nfs
    -r--r--r-- 1 root root 0 May  2 00:41 exports
    
    
    [root@nfsServer ~]# ls -ltr /proc/fs/nfsd
    total 0
    -rw------- 1 root root 0 Mar  1  2017 versions
    -rw------- 1 root root 0 Mar  1  2017 threads
    -rw------- 1 root root 0 Mar  1  2017 portlist
    -rw------- 1 root root 0 Mar  1  2017 nfsv4recoverydir
    -rw------- 1 root root 0 Mar  1  2017 nfsv4leasetime
    -rw------- 1 root root 0 Mar  1  2017 filehandle
    -r--r--r-- 1 root root 0 Mar  1  2017 exports
    [root@nfsServer ~]# cd /proc/fs/nfsd
    [root@nfsServer nfsd]# cat exports
    # Version 1.1
    # Path Client(Flags) # IPs
    /export/home    *,192.168.253.0/24(rw,no_root_squash,sync,wdelay,no_subtree_check)

    # cat versions
    +2 +3 -4


Note that it has * added in front of the /etc/exports entry. I want to know where is the "*" entry coming from and how to get rid of it. All help is appreciated.

system:
Red Hat Enterprise Linux Server release 5.5 (Tikanga)    2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux


*IMP: sorry i missed to clarify that this is NFS running on VCS HA on redhat 5.5. so when i restart nfs, i get err:

    # service nfs stop
    Shutting down NFS mountd:                                  [  OK  ]
    Shutting down NFS daemon:                                  [  OK  ]
    Shutting down NFS quotas:                                  [  OK  ]
    Shutting down NFS services:                                [  OK  ]
    
    # service nfs start
    Starting NFS services:                                     [  OK  ]
    Starting NFS quotas:                                       [  OK  ]
    Starting NFS daemon:                                       [FAILED]
    
    # service nfs start
    Starting NFS services:                                     [  OK  ]
    Starting NFS quotas:                                       [  OK  ]
    Starting NFS daemon:                                       [FAILED]

but when you check...

    #  service nfs status
    rpc.mountd (pid 24103) is running...
    nfsd (pid 24052 24051 24050 24049 24048 24047 24046 24045) is running...
    rpc.rquotad (pid 22872 20490 19133) is running...



I figured that in VCS main.cf this line sets up the 'nfs' share: but i am not sure how to add subnet restriction to it...

    Share share_home (
            Options = "rw, no_root_squash"
            PathName = "/export/home"
            )


Thanks.
Raj

                                

Rajeev (256 rep)

May 2, 2018, 12:55 AM • Last activity: May 2, 2018, 05:04 PM

2 votes

0 answers

434 views

Setting up a kerberized HA NFS share

nfs kerberos high-availability nfsv4

I'm trying to set up a kerberized NFS share from an HA cluster. I've successfully set up a krb-aware NFS share from a single server, I'm using a mostly identical configuration on the cluster. Exports file from working single server: /nfs *(rw,sec=krb5:krb5i:krb5p) Cluster resource configuration: # p...

                                  I'm trying to set up a kerberized NFS share from an HA cluster. I've successfully set up a krb-aware NFS share from a single server, I'm using a mostly identical configuration on the cluster.

Exports file from working single server:

    /nfs	*(rw,sec=krb5:krb5i:krb5p)

Cluster resource configuration:

    # pcs resource show nfs-export1
     Resource: nfs-export1 (class=ocf provider=heartbeat type=exportfs)
      Attributes: clientspec=10.1.0.0/255.255.255.0 directory=/nfsshare/exports/export1 fsid=1 options=rw,sec=krb5:krb5i:krb5p,sync,no_root_squash
      Operations: monitor interval=10 timeout=20 (nfs-export1-monitor-interval-10)
                  start interval=0s timeout=40 (nfs-export1-start-interval-0s)
                  stop interval=0s timeout=120 (nfs-export1-stop-interval-0s)

Client showmount to working single server:

    # showmount -e ceserv
    Export list for ceserv:
    /nfs *

Client showmount to floating cluster name:

    # showmount -e hafloat
    Export list for hafloat:
    /nfsshare/exports/export1 10.1.0.0/255.255.255.0
    /nfsshare/exports         10.1.0.0/255.255.255.0

Contents of client /etc/fstab:

    ceserv:/nfs /mnt/nfs nfs4 sec=krb5i,rw,proto=tcp,port=2049
    hafloat.ncphotography.lan:export1 /nfsmount nfs4 sec=krb5i,rw,proto=tcp,port=2049

Results of mount -av command:

    # mount -av
    mount.nfs4: timeout set for Mon Dec  4 20:57:14 2017
    mount.nfs4: trying text-based options 'sec=krb5i,proto=tcp,port=2049,vers=4.1,addr=10.1.0.24,clientaddr=10.1.0.23'
    /mnt/nfs                 : successfully mounted
    mount.nfs4: timeout set for Mon Dec  4 20:57:14 2017
    mount.nfs4: trying text-based options 'sec=krb5i,proto=tcp,port=2049,vers=4.1,addr=10.1.0.29,clientaddr=10.1.0.23'
    mount.nfs4: mount(2): Operation not permitted
    mount.nfs4: Operation not permitted
    
All firewalls have been disabled. All names resolve correctly to IP addresses within the 10.1.0.0/24 network, and all IP addresses reverse-resolve to the correct hostname.
                                

John (17381 rep)

Dec 4, 2017, 09:05 PM • Last activity: Mar 6, 2018, 03:25 PM

Showing page 1 of 20 total questions