Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

0 votes
1 answers
2303 views
Secondary DRBD node does not auto-start in Pacemaker+Corosync setup
I am trying to set up a 2-PC cluster with shared resources: `ClusterIP`, `ClusterSamba`, `ClusterNFS`, `DRBD` (cloned resource), and a `DRBDFS`. The beginning of the project followed the [Clusters from Scratch](https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Clusters_from_Scratch/inde...
I am trying to set up a 2-PC cluster with shared resources: ClusterIP, ClusterSamba, ClusterNFS, DRBD (cloned resource), and a DRBDFS. The beginning of the project followed the [Clusters from Scratch](https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Clusters_from_Scratch/index.html) guide. When everything in this guide is done, it works without problems. So, I wanted to use parts of that guide and build my own setup: I created one shared IP (ClusterIP) that is automatically assigned to one node, and (here is where it gets tricky) on that node, I mount my /dev/drbd1 device to /exports and then share this mount through **SAMBA** and **NFS**. When I start the cluster, all resources come up as they should, _but DRBD does not go up on the secondary node_ (Primary/Unknown). If I bring it up manually, it syncs and works. Also, when I stop the cluster (or forcibly reboot the first node), all resources transfer to the other node and everything works, _except DRBD on the other node goes into an Unknown state_. ### So now, here is the problem: **Why does DRBD go down on the secondary node when I stop the cluster? Or why doesn't it start in the Secondary role on the secondary node?** Sorry if my description is bad. --- ## Here are the commands I used
# apt install -y pacemaker pcs psmisc policycoreutils-python-utils drbd-utils samba nfs-kernel-server 
# systemctl start pcsd.service
# systemctl enable pcsd.service
# passwd hacluster
# pcs host auth alice bob
# pcs cluster setup myCluster alice bob --force
# pcs cluster start --all
# pcs property set stonith-enabled=false
# pcs property set no-quorum-policy=ignore
# modprobe drbd
# echo drbd >/etc/modules-load.d/drbd.conf
# drbdadm create-md r0
# drbdadm up r0
# drbdadm primary r0 --force
# mkfs.ext4 /dev/drbd1
# systemctl disable smbd
# systemctl disable nfs-kernel-server.service 
# mkdir /exports
# vi /etc/samba/smb.conf 
# vi /etc/exports 
# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=10.1.1.30 cidr_netmask=24 op monitor interval=30s
# pcs resource defaults resource-stickiness=100
# pcs resource op defaults timeout=240s
# pcs resource create ClusterSamba lsb:smbd op monitor interval=60s
# pcs resource create ClusterNFS ocf:heartbeat:nfsserver op monitor interval=60s
# pcs resource create DRBD ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s
# pcs resource promotable DRBD promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 notify=true
# pcs resource create DRBDFS Filesystem device="/dev/drbd1" directory="/exports" fstype="ext4"
# pcs constraint order ClusterIP then ClusterNFS
# pcs constraint order ClusterNFS then ClusterSamba
# pcs constraint order promote DRBD-clone then start DRBDFS
# pcs constraint order DRBDFS then ClusterNFS
# pcs constraint order ClusterIP then DRBD-clone
# pcs constraint colocation ClusterSamba with ClusterIP
# pcs constraint colocation add ClusterSamba with ClusterIP
# pcs constraint colocation add ClusterNFS with ClusterIP
# pcs constraint colocation add DRBDFS with DRBD-clone INFINITY with-rsc-role=Master
# pcs constraint colocation add DRBD-clone with ClusterIP
# pcs cluster stop --all && sleep 2 && pcs cluster start --all
--- ## Configs and stats ### /etc/drbd.d/r0.res
resource r0 {
 device /dev/drbd1;
 disk /dev/sdb;
 meta-disk internal;
 net {
  allow-two-primaries;
 }
 on alice {
  address 10.1.1.31:7788;
 }
 on bob {
  address 10.1.1.32:7788;
 } 
}
--- ### /etc/corosync/corosync.conf
totem {
    version: 2
    cluster_name: myCluster
    transport: knet
    crypto_cipher: aes256
    crypto_hash: sha256
}

nodelist {
    node {
        ring0_addr: alice
        name: alice
        nodeid: 1
    }

    node {
        ring0_addr: bob
        name: bob
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/corosync/corosync.log
    to_syslog: yes
    timestamp: on
}
--- ### pcs status
Cluster name: myCluster
Stack: corosync
Current DC: alice (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Fri May 15 12:28:30 2020
Last change: Fri May 15 11:04:50 2020 by root via cibadmin on bob

2 nodes configured
6 resources configured

Online: [ alice bob ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started alice
 ClusterSamba   (lsb:smbd):     Started alice
 ClusterNFS     (ocf::heartbeat:nfsserver):     Started alice
 Clone Set: DRBD-clone [DRBD] (promotable)
 Masters: [ alice ]
 Stopped: [ bob ]
 DRBDFS (ocf::heartbeat:Filesystem):    Started alice

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
--- ### pcs constraint --full
Location Constraints:

Ordering Constraints:
  start ClusterIP then start ClusterNFS (kind:Mandatory) (id:order-ClusterIP-ClusterNFS-mandatory)
  start ClusterNFS then start ClusterSamba (kind:Mandatory) (id:order-ClusterNFS-ClusterSamba-mandatory)
  promote DRBD-clone then start DRBDFS (kind:Mandatory) (id:order-DRBD-clone-DRBDFS-mandatory)
  start DRBDFS then start ClusterNFS (kind:Mandatory) (id:order-DRBDFS-ClusterNFS-mandatory)
  start ClusterIP then start DRBD-clone (kind:Mandatory) (id:order-ClusterIP-DRBD-clone-mandatory)
  start ClusterIP then promote DRBD-clone (kind:Mandatory) (id:order-ClusterIP-DRBD-clone-mandatory-1)

Colocation Constraints:
  ClusterSamba with ClusterIP (score:INFINITY) (id:colocation-ClusterSamba-ClusterIP-INFINITY)
  ClusterNFS with ClusterIP (score:INFINITY) (id:colocation-ClusterNFS-ClusterIP-INFINITY)
  DRBDFS with DRBD-clone (score:INFINITY) (with-rsc-role:Master) (id:colocation-DRBDFS-DRBD-clone-INFINITY)
  DRBD-clone with ClusterIP (score:INFINITY) (id:colocation-DRBD-clone-ClusterIP-INFINITY)

Ticket Constraints:
--- ### /proc/drbd
version: 8.4.10 (api:1/proto:86-101)
srcversion: 983FCB77F30137D4E127B83 

 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:4 dw:8 dr:17 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:4
Miki (31 rep)
May 15, 2020, 11:12 AM • Last activity: Jun 19, 2025, 10:03 PM
1 votes
1 answers
25 views
Drbd promote only after stonith started
I want that DRBD-based 2 nodes cluster start resources in the following order - 1. on both nodes start stonith:fence_ipmilan 2. on one node drbd-clone promote 3. on the same node as drbd promote, start all NFS server resources (ip, export,…) But how to tell pacemaker promote drbd-clone only after st...
I want that DRBD-based 2 nodes cluster start resources in the following order - 1. on both nodes start stonith:fence_ipmilan 2. on one node drbd-clone promote 3. on the same node as drbd promote, start all NFS server resources (ip, export,…) But how to tell pacemaker promote drbd-clone only after started stonith:fence_ipmilan on each of two nodes ? I tried pcs constraint order set ipmi-fence-memverge ipmi-fence-memverge2 action=start require-all=true sequential=false set ha-nfs-clone action=promote sequential=false require-all=false it seems that stonith:fence_ipmilan and drbd-clone promote start simultaneously… Anton
Anton Gavriliuk (11 rep)
Feb 4, 2025, 03:03 PM • Last activity: Feb 4, 2025, 07:12 PM
0 votes
1 answers
130 views
Not able to use drbd over lustre zfs
I am running the below command to format the nvme drive to lustre with backfstype as zfs ``` mkfs.lustre --mdt --reformat --mgsnode=mgsmaster@tcp --backfstype=zfs --fsname=lustre mdtpool/mdt --index=0 /dev/nvme1n1 ``` After formatting the drive I am trying to setup drbd using the below resource conf...
I am running the below command to format the nvme drive to lustre with backfstype as zfs
mkfs.lustre --mdt --reformat --mgsnode=mgsmaster@tcp --backfstype=zfs --fsname=lustre mdtpool/mdt --index=0 /dev/nvme1n1
After formatting the drive I am trying to setup drbd using the below resource config
resource "r0" {
protocol C; //Updated
device /dev/drbd0; //Updated
disk /dev/nvme1n1;
meta-disk internal;
options {
auto-promote no;
}
on "hostname1" {
node-id 0;
}
on "hostname2" {
node-id 1;
}
connection {
host "hostname1" address 10.40.40.1:7789;
host "hostname2" address 10.40.40.2:7789;
}
}
after this I run the below command to initialize drbd
drbdadm create-md r0
open(/dev/nvme1n1) failed: Device or resource busy
...
I found this post on the drbd website where it's mentioned that ZFS doesn't hold the device open in the kernel the same way other filesystems or processes in Linux do [DRBD Reference website for zfs](https://kb.linbit.com/using-zfs-over-drbd-with-pacemaker) Need some help here. I tried first setting up drbd on the drive and then formatting with mkfs.lustre but then it's the same mkfs.lustre wants drbd service to be down, and only after that it can format update: The issue has been resolved The steps for replication are as follows:
drbdadm create-md r0
drbdadm up r0
drbdadm primary r0 --force
mkfs.lustre --mdt --reformat --mgsnode=mgsmaster@tcp --backfstype=zfs --fsname=lustre mdtpool/mdt --index=0 /dev/drbd0
mount -t mdtpool/mdt /mnt/MDT
If we want to remount after reboot
drbdadm up r0
drbdadm primary r0 --force
zpool import -o cachefile=none mdtpool
mount -t mdtpool/mdt /mnt/MDT
Neil Karania (3 rep)
Jun 11, 2024, 09:09 AM • Last activity: Jun 13, 2024, 08:56 AM
0 votes
1 answers
358 views
Prevent promotion on specific node in Pacemaker
I have a drbd + pacemaker cluster with three nodes, one being a quorum device only. I'm trying to configure the pacemaker resource so that the promotable drbd-resource should run on all three devices, but it should never be promoted on the quorum device. I've tried setting location constraints on th...
I have a drbd + pacemaker cluster with three nodes, one being a quorum device only. I'm trying to configure the pacemaker resource so that the promotable drbd-resource should run on all three devices, but it should never be promoted on the quorum device. I've tried setting location constraints on the resource, but that results in pacemaker not starting the resource at all on the quorum device so drbd can't keep quorum on a failover. The desired state would be: - drbd resource is started on all three nodes - drbd resource is promotable - pacemaker will never promote the quorum device I can't find anything in the documentation, but what I'm envisioning would be a parameter like "don't promote on device X" that I have missed for the drbd resource.
comrain (3 rep)
Mar 13, 2023, 07:23 AM • Last activity: Mar 13, 2023, 06:55 PM
0 votes
1 answers
11525 views
DRBD - 'node1' not defined in your config (for this host) - Error when setting Primary
I am getting the following error when trying to set the Primary node for DRBD. 'node1' not defined in your config (for this host). I know this is related to DNS/Hostname/Hosts and the config clusterdb.res. I know this because I originally got an error when trying to start clusterdb.res if node1 didn...
I am getting the following error when trying to set the Primary node for DRBD. 'node1' not defined in your config (for this host). I know this is related to DNS/Hostname/Hosts and the config clusterdb.res. I know this because I originally got an error when trying to start clusterdb.res if node1 didn't resolve correctly. So what confuses me is that I can start the clusterdb.res if either use: *I have used this command on the hosts* hostnamectl set-hostname $(uname -n | sed s/\\..*//) To make the hostname resolve to node1 instead of node1.localdomain Or add node1.localdomain to the config, either works. But I have tried all combinations and can't seem to get this command to take : drbdadm primary --force node1 && cat /proc/drbd **My Configs** /etc/drbd.d/clusterdb.res resource clusterdb{ protocol C; meta-disk internal; device /dev/drbd0; startup { wfc-timeout 30; outdated-wfc-timeout 20; degr-wfc-timeout 30; } net { cram-hmac-alg sha1; shared-secret sync_disk; } syncer { rate 10M; al-extents 257; on-no-data-accessible io-error; verify-alg sha1; } on node1 { disk /dev/sda3; address 192.168.1.216:7788; } on node2 { disk /dev/sda3; address 192.168.1.217:7788; } } /etc/hosts : 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.216 node1 192.168.1.217 node2 /etc/hostname node# My full write up ATM (wip) **Edits :** [root@node1 ~]# hostname node1 [root@node1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.0.1.1 node1 192.168.1.216 node1 192.168.1.217 node2 [root@node1 ~]# Update: I have gotten this to work with LVM following this guide exactly, so I think my issue actually lies with the following lines of code. But for now I think i will stick with LVM since it works, unless somebody else really wants to work on this. (My working LVM writeup) device /dev/drbd0; or device /dev/drbd0; The reason I say this, is I used the same hosts/hostname/shortname/ip_addr but LVM and it worked, but maybe I missed something the first time, I fixed in my new VM Template (I started from scratch to build LVM)
FreeSoftwareServers (2682 rep)
May 1, 2016, 01:59 AM • Last activity: Mar 8, 2023, 02:50 AM
1 votes
2 answers
40135 views
warning /dev/centos/root swap centos-root does not exist -- after configuring DRBD
I configured DRBD on default CentOS 7.3 Installation like following: `/dev/centos/home` was taking all the space in `sda2` so reduced it and created `/dev/centos/home` (20% space) and `/dev/centos/drbd` (remaining space using `lvcreate -l 100%VG -n drbd centos`) DRBD resource device is `/dev/drbd0`...
I configured DRBD on default CentOS 7.3 Installation like following: /dev/centos/home was taking all the space in sda2 so reduced it and created /dev/centos/home (20% space) and /dev/centos/drbd (remaining space using lvcreate -l 100%VG -n drbd centos) DRBD resource device is /dev/drbd0 and disk is /dev/mapper/centos-drbd formatted as ext4. Now everytime I reboot the system I get the errors: Warning: /dev/centos/root does not exist Warning: /dev/centos/swap does not exist Warning: /dev/mapper/centos-root does not exist From dracut shell I run : $ lvm vgscan $ lvm vgchange -ay $ exit and system boots fine. But It fails again at reboot. Any solution? ---------- UPDATE: Found the cause, drbd device was causing the issue. I removed it from both servers and it fixed the 2nd server but not the 1st one. blkid still shows wrong UUID and Type of /dev/sda2 $ blkid /dev/sda1: UUID="bdfa3672-b24b-41ec-88f8-d0f0a81057d1" TYPE="xfs" /dev/sda2: UUID="d8d241f07976f3ce" TYPE="drbd" /dev/mapper/centos-swap: UUID="3c8653bb-060a-4e46-8eaa-ce51637752ee" TYPE="swap" /dev/mapper/centos-root: UUID="93941d8b-22e0-4ad7-8666-1ce8ba8d1109" TYPE="xfs" /dev/mapper/centos-home: UUID="63c9a5ad-9b4b-4852-8e95-22b356d8729a" TYPE="xfs"
bakasan (21 rep)
Feb 19, 2018, 12:15 AM • Last activity: Mar 2, 2023, 03:51 PM
0 votes
1 answers
398 views
DRBD on top of Ceph
Would it be possible to have DRBD directly running inside a Ceph Pool? I have a backup machine with files stored directly on disk. The offsite backup machine has Ceph installed and configured on all the disks. I would like to have a second replica of the backup data on the offsite backup machine, bu...
Would it be possible to have DRBD directly running inside a Ceph Pool? I have a backup machine with files stored directly on disk. The offsite backup machine has Ceph installed and configured on all the disks. I would like to have a second replica of the backup data on the offsite backup machine, but I'm a bit confused at which 'layers' DRBD and Ceph operate. Would it be possible to create a RBD pool at the offsite backup machine and configure DRBD directly on that or, do I need to go the route where I run a virtual machine using Ceph and configure DRBD in the virtual machine as a abstraction layer? Edit: The reason the (single node) offsite backup machine is running Ceph is because it is mirroring the pools of a (multi node) main Ceph cluster. In addition to the main Ceph cluster we have a backup server creating file backups of the machines running on the cluster. This is a simple RAID5 configuration where the data is stored on. To have a extra copy of the backup data I also want to sync it, using DRBD so that I do not have a problem with small files, to the offsite backup machine. But as the disks of the backup machine are already configured to be a Ceph OSD I need to store it somehow on in a Ceph pool.
Mr. Diba (400 rep)
Feb 14, 2023, 09:50 AM • Last activity: Feb 15, 2023, 03:45 PM
1 votes
1 answers
639 views
DRBD service vs drbdadm
DRBD v9.17 (kernel v9.1.4) I'm trying to understand the typical roles of `drbd` when run as a service vs manually with the `drbdadm` tool which seems newer than some of the walkthroughs I'm seeing online. When should the service be used vs the `drbdadm` tool and where does `pacemaker` fit in regardi...
DRBD v9.17 (kernel v9.1.4) I'm trying to understand the typical roles of drbd when run as a service vs manually with the drbdadm tool which seems newer than some of the walkthroughs I'm seeing online. When should the service be used vs the drbdadm tool and where does pacemaker fit in regarding control of drbd's failover? I have created a resource manually using drbdadm and have it up and now showing UPToDate with the status command on both nodes. I need to figure out how to get that implemented in pacemaker for failover and I just think I'm missing the big picture here - The addition of the resource in pacemaker seems a bit more complicated than that of a floating IP... Thanks for reading!
userthirtytwo (13 rep)
Nov 11, 2021, 01:13 PM • Last activity: Nov 12, 2021, 04:55 PM
0 votes
0 answers
548 views
cannot suddenly ping to a certain host even though it worked before and it works vise versa
I cannot ping only to a one certain host `$B` from an only certain host `$A`, even though it worked before. * [A] `ping $B` --> destination host unreachable * [B] `ping $A` works. * [C] `ping $B` works. * [A] `traceroute $B` result: ``` 1. $B 3005.339 ms !H 3005.277 ms !H 3005.250 ms !H ``` Host `$B...
I cannot ping only to a one certain host $B from an only certain host $A, even though it worked before. * [A] ping $B --> destination host unreachable * [B] ping $A works. * [C] ping $B works. * [A] traceroute $B result:
1. $B 3005.339 ms !H 3005.277 ms !H 3005.250 ms !H
Host $B has 2 ethernet interfaces (eth0 - by default, eth1 - for drbd, DEFROUTE=no for eth1), and used as primary node. eth0 and eth1 are located in different subnets. Both hosts A and B, C are located in a same subnet and have CentOS 7 installed.
y47999 (1 rep)
Oct 15, 2021, 07:34 AM • Last activity: Oct 15, 2021, 10:23 AM
3 votes
1 answers
9560 views
DRBD Failure: (127) Device minor not allocated
I use wmware workstation to run two virtual machines with `OpenVZ 2.6.32-042stab108.2` installed on top on `CentOS 6.6`. I have created another primary partition, `/dev/sda4`, to configure it as drbd resource. I also created a filsystem on it. The second machine is actually created using the virtual...
I use wmware workstation to run two virtual machines with OpenVZ 2.6.32-042stab108.2 installed on top on CentOS 6.6. I have created another primary partition, /dev/sda4, to configure it as drbd resource. I also created a filsystem on it. The second machine is actually created using the virtual disk of the first one, with changed hostname and eth0 ip address. The drbd configuration file is this: global { usage-count no; } common { syncer { rate 100M; } } resource r0 { protocol C; startup { wfc-timeout 15; degr-wfc-timeout 60; } net { cram-hmac-alg sha1; shared-secret "password"; } on primary { device /dev/drbd0; disk /dev/sda4; address 192.168.18.10:7788; meta-disk internal; } on secondary { device /dev/drbd0; disk /dev/sda4; address 192.168.18.20:7788; meta-disk internal; } } After creating the resource with **drbdadm create-md r0**, when I enter **service drbd start**, I get: Failure: (127) Device minor not allocated. The output of **drbdadm dump all** might be helpful: [root@primary ~]# drbdadm dump all # /etc/drbd.conf # resource r0 on primary: not ignored, not stacked resource r0 { protocol C; on primary { device /dev/drbd0 minor 0; disk /dev/sda4; address ipv4 192.168.18.10:7788; meta-disk internal; } on secondary { device /dev/drbd0 minor 0; disk /dev/sda4; address ipv4 192.168.18.20:7788; meta-disk internal; } net { cram-hmac-alg sha1; shared-secret danuts; } startup { wfc-timeout 15; degr-wfc-timeout 60; } } What is causing this error and how can it be mitigated? Thanks!
Tanatos Daniel (295 rep)
Jun 8, 2015, 11:52 PM • Last activity: Jan 16, 2021, 04:04 PM
0 votes
2 answers
4199 views
DRBD comes up after reboot with Connected Diskless/Diskless
After an unattented power loss, facing a major issue, every reboot the **DBRB** comes up with **Connected Diskless/Diskless** status. **main problems:** > - dump-md response: Found meta data is "unclean" > - apply-al command terminated with exit code 20 with message open(/dev/nvme0n1p1) failed: Devi...
After an unattented power loss, facing a major issue, every reboot the **DBRB** comes up with **Connected Diskless/Diskless** status. **main problems:** > - dump-md response: Found meta data is "unclean" > - apply-al command terminated with exit code 20 with message open(/dev/nvme0n1p1) failed: Device or resource busy > - drbd resource config cannot be opened exclusive. **About the environment:** This drbd resource normaly used as a block storage for lvm, which configured as an (shared lvm) storage to a proxmox ve 5.3-8 cluster. On top of drbd block device an lvm configured. As it recommended on drbd host lvm config the device (/dev/nvme0n1p1) below drbd serivice are filtered out (/etc/lvm/lvm.conf shown below) *The device under drbd is an PCIe NVMe device* It has some extra properties shown by systemctl: root@pmx0:~# systemctl list-units | grep nvme sys-devices-pci0000:00-0000:00:01.1-0000:0c:00.0-nvme-nvme0-nvme0n1-nvme0n1p1.device loaded active plugged /sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 sys-devices-pci0000:00-0000:00:01.1-0000:0c:00.0-nvme-nvme0-nvme0n1.device loaded active plugged /sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1 Other storage device normal SAS disks listing in sytemctl looks a little different: root@pmx0:~# systemctl list-units | grep sdb sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb-sdb1.device loaded active plugged PERC_H710 1 sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb-sdb2.device loaded active plugged PERC_H710 2 sys-devices-pci0000:00-0000:00:01.0-0000:0b:00.0-host0-target0:2:1-0:2:1:0-block-sdb.device loaded active plugged PERC_H710 list NVMe /sys/devices/.. with ls: root@pmx0:~# ls /sys/devices/pci0000:00/0000:00:01.1/0000:0c:00.0/nvme/nvme0/nvme0n1/nvme0n1p1 alignment_offset dev discard_alignment holders inflight partition power ro size start stat subsystem trace uevent **Things are NOT hepls:** - Reboot again not help - drbd service restart not help - drbdadm detach/disconnect/attach/service restart not help - nfs-kernel-server service aren't confiured on these drbd nodes (so cannot unconfigure nfs-server) **After some investigation:** > dump-md response: Found **meta data is "unclean"**, please apply-al first > apply-al command terminated with exit code 20 with this message: > open(/dev/nvme0n1p1) **failed: Device or resource busy** > > It seems that the **problem is that this device** (/dev/nvme0n1p1) used by my > drbd resource config **cannot be opened exclusive**. **Failing DRBD commands:** root@pmx0:~# drbdadm attach r0 open(/dev/nvme0n1p1) failed: Device or resource busy Operation canceled. Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated with exit code 20 root@pmx0:~# drbdadm apply-al r0 open(/dev/nvme0n1p1) failed: Device or resource busy Operation canceled. Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated with exit code 20 root@pmx0:~# drbdadm dump-md r0 open(/dev/nvme0n1p1) failed: Device or resource busy Exclusive open failed. Do it anyways? [need to type 'yes' to confirm] yes Found meta data is "unclean", please apply-al first Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal dump-md' terminated with exit code 255 **DRBD service status/commands:** root@pmx0:~# drbd-overview 0:r0/0 Connected Secondary/Secondary Diskless/Diskless root@pmx0:~# drbdadm dstate r0 Diskless/Diskless root@pmx0:~# drbdadm disconnect r0 root@pmx0:~# drbd-overview 0:r0/0 . . . root@pmx0:~# drbdadm detach r0 root@pmx0:~# drbd-overview 0:r0/0 . . . **Trying reattach resource r0:** root@pmx0:~# drbdadm attach r0 open(/dev/nvme0n1p1) failed: Device or resource busy Operation canceled. Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated with exit code 20 root@pmx0:~# drbdadm apply-al r0 open(/dev/nvme0n1p1) failed: Device or resource busy Operation canceled. Command 'drbdmeta 0 v08 /dev/nvme0n1p1 internal apply-al' terminated with exit code 20 **lsof, fuser zero output:** root@pmx0:~# lsof /dev/nvme0n1p1 root@pmx0:~# fuser /dev/nvme0n1p1 root@pmx0:~# fuser /dev/nvme0n1 root@pmx0:~# lsof /dev/nvme0n1 **Resource disk partition and LVM config:** root@pmx0:~# fdisk -l /dev/nvme0n1 Disk /dev/nvme0n1: 1.9 TiB, 2048408248320 bytes, 4000797360 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x59762e31 Device Boot Start End Sectors Size Id Type /dev/nvme0n1p1 2048 3825207295 3825205248 1.8T 83 Linux root@pmx0:~# pvs PV VG Fmt Attr PSize PFree /dev/sdb2 pve lvm2 a-- 135.62g 16.00g root@pmx0:~# vgs VG #PV #LV #SN Attr VSize VFree pve 1 3 0 wz--n- 135.62g 16.00g root@pmx0:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert data pve twi-a-tz-- 75.87g 0.00 0.04 root pve -wi-ao---- 33.75g swap pve -wi-ao---- 8.00g root@pmx0:~# vi /etc/lvm/lvm.conf root@pmx0:~# cat /etc/lvm/lvm.conf | grep nvm filter = [ "r|/dev/nvme0n1p1|", "a|/dev/sdb|", "a|sd.*|", "a|drbd.*|", "r|.*|" ] **DRBD resource config:** root@pmx0:~# cat /etc/drbd.d/r0.res resource r0 { protocol C; startup { wfc-timeout 0; # non-zero wfc-timeout can be dangerous (http://forum.proxmox.com/threads/3465-Is-it-safe-to-use-wfc-timeout-in-DRBD-configuration) degr-wfc-timeout 300; become-primary-on both; } net { cram-hmac-alg sha1; shared-secret "*********"; allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; #data-integrity-alg crc32c; # has to be enabled only for test and disabled for production use (check man drbd.conf, section "NOTES ON DATA INTEGRITY") } on pmx0 { device /dev/drbd0; disk /dev/nvme0n1p1; address 10.0.20.15:7788; meta-disk internal; } on pmx1 { device /dev/drbd0; disk /dev/nvme0n1p1; address 10.0.20.16:7788; meta-disk internal; } disk { # no-disk-barrier and no-disk-flushes should be applied only to systems with non-volatile (battery backed) controller caches. # Follow links for more information: # http://www.drbd.org/users-guide-8.3/s-throughput-tuning.html#s-tune-disable-barriers # http://www.drbd.org/users-guide/s-throughput-tuning.html#s-tune-disable-barriers no-disk-barrier; no-disk-flushes; } } **OTHER NODE:** root@pmx1:~# drbd-overview 0:r0/0 Connected Secondary/Secondary Diskless/Diskless and so on every command responses and configurations showing the same like node pmx0 above... **Debian and DRBD versions:** root@pmx0:~# uname -a Linux pmx0 4.15.18-10-pve #1 SMP PVE 4.15.18-32 (Sat, 19 Jan 2019 10:09:37 +0100) x86_64 GNU/Linux root@pmx0:~# cat /etc/debian_version 9.8 root@pmx0:~# dpkg --list| grep drbd ii drbd-utils 8.9.10-2 amd64 RAID 1 over TCP/IP for Linux (user utilities) root@pmx0:~# lsmod | grep drbd drbd 364544 1 lru_cache 16384 1 drbd libcrc32c 16384 2 dm_persistent_data,drbd root@pmx0:~# modinfo drbd filename: /lib/modules/4.15.18-10-pve/kernel/drivers/block/drbd/drbd.ko alias: block-major-147-* license: GPL version: 8.4.10 description: drbd - Distributed Replicated Block Device v8.4.10 author: Philipp Reisner , Lars Ellenberg srcversion: 9A7FB947BDAB6A2C83BA0D4 depends: lru_cache,libcrc32c retpoline: Y intree: Y name: drbd vermagic: 4.15.18-10-pve SMP mod_unload modversions parm: allow_oos:DONT USE! (bool) parm: disable_sendpage:bool parm: proc_details:int parm: minor_count:Approximate number of drbd devices (1-255) (uint) parm: usermode_helper:string **MOUNTS:** root@pmx0:~# cat /proc/mounts sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,relatime 0 0 udev /dev devtmpfs rw,nosuid,relatime,size=24679656k,nr_inodes=6169914,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=4940140k,mode=755 0 0 /dev/mapper/pve-root / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0 securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev 0 0 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0 cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0 pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0 cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0 cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0 cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0 cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=39,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=20879 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0 mqueue /dev/mqueue mqueue rw,relatime 0 0 sunrpc /run/rpc_pipefs rpc_pipefs rw,relatime 0 0 configfs /sys/kernel/config configfs rw,relatime 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 /dev/sda1 /mnt/intelSSD700G ext3 rw,relatime,errors=remount-ro,data=ordered 0 0 lxcfs /var/lib/lxcfs fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0 /dev/fuse /etc/pve fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0 10.0.0.15:/samba/shp /mnt/pve/bckNFS nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.0.15,mountvers=3,mountport=42772,mountproto=udp,local_lock=none,addr=10.0.0.15 0 0
Gabor Koltai (101 rep)
Mar 12, 2019, 02:40 PM • Last activity: Aug 6, 2020, 12:59 PM
0 votes
1 answers
4348 views
DRBD: "Couldn't mount device [/dev/drbd0] as /mydata" when failing over or rebooting node
I'm creating a cluster system using two ESXi hosts, with a CentOS 7 server on each. Going through I created the filesystem, and it mounts on `node1`. When I perform a standby or reboot from `node01` to `node02` the failover works as it should. However, if I perform it from `node02` back to `node01`...
I'm creating a cluster system using two ESXi hosts, with a CentOS 7 server on each. Going through I created the filesystem, and it mounts on node1. When I perform a standby or reboot from node01 to node02 the failover works as it should. However, if I perform it from node02 back to node01 it returns a resource error about failing to mount the filesystem under /mbdata I am receiving this message: Failed Resource Actions: * mb-drbdFS_start_0 on node01 'unknown error' (1): call=75, status=complete, exitreason='Couldn't mount device [/dev/drbd0] as /mbdata', last-rc-change='Thu May 7 16:09:25 2020', queued=1ms, exec=129ms When I clean the resources, and node02 is online it starts running again. I have googled to see why I am getting this error, but the only thing I can see is that the server is not notifying the new master that is in fact the master (not slave). But I haven't found anything to help me to activate this. I have tried umount on both systems - but usually get on node02 that it is not mounted. I have tried mounting the system on both (but then one is read-only and defeats the purpose of the cluster controlling it). I was following a tutorial in the beginning but they didn't list having the error - they just said it kicks over to the new node so I'm lost! The only difference I have done is not use /mnt as the destination, but my own directory - but I didn't think that would be the problem. What I'm trying to have is: - have a fence on each ESXi host (physical server, to reboot it's own VM) - have a DRBD storage so I can have shared storage - have a virtual IP for client access - have Apache to run the web server - have MariaDb for the SQL database - run them on the same servers (colocation) and have the other as full standby enter image description here When it does run I have: [root@node01 ~]# pcs status Cluster name: mb_cluster Stack: corosync Current DC: node01 (version 1.1.21-4.el7-f14e36fd43) - partition with quorum Last updated: Fri May 8 09:46:59 2020 Last change: Fri May 8 09:22:59 2020 by hacluster via crmd on node01 2 nodes configured 8 resources configured Online: [ node01, node02 ] Full list of resources: mb-fence-01 (stonith:fence_vmware_soap): Started node01 mb-fence-02 (stonith:fence_vmware_soap): Started node02 Master/Slave Set: mb-clone [mb-data] Masters: [ node01 ] Slaves: [ node02 ] Resource Group: mb-group mb-drbdFS (ocf::heartbeat:Filesystem): Started node01 mb-vip (ocf::heartbeat:IPaddr2): Started node01 mb-web (ocf::heartbeat:apache): Started node01 mb-sql (ocf::heartbeat:mysql): Started node01 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled And the constraints: [root@node01 ~]# pcs constraint list --full Location Constraints: Resource: mb-fence-01 Enabled on: node01 (score:INFINITY) (id:location-mb-fence-01-node01-INFINITY) Resource: mb-fence-02 Enabled on: node02 (score:INFINITY) (id:location-mb-fence-02-node02-INFINITY) Ordering Constraints: start mb-drbdFS then start mb-vip (kind:Mandatory) (id:order-mb-drbdFS-mb-vip-mandatory) start mb-vip then start mb-web (kind:Mandatory) (id:order-mb-vip-mb-web-mandatory) start mb-vip then start mb-sql (kind:Mandatory) (id:order-mb-vip-mb-sql-mandatory) promote mb-clone then start mb-drbdFS (kind:Mandatory) (id:order-mb-clone-mb-drbdFS-mandatory) Colocation Constraints: mb-drbdFS with mb-clone (score:INFINITY) (with-rsc-role:Master) (id:colocation-mb-drbdFS-mb-clone-INFINITY) mb-vip with mb-drbdFS (score:INFINITY) (id:colocation-mb-vip-mb-drbdFS-INFINITY) mb-web with mb-vip (score:INFINITY) (id:colocation-mb-web-mb-vip-INFINITY) mb-sql with mb-vip (score:INFINITY) (id:colocation-mb-sql-mb-vip-INFINITY) Ticket Constraints:
markb (143 rep)
May 7, 2020, 06:34 AM • Last activity: May 8, 2020, 12:16 AM
1 votes
0 answers
480 views
Failover using pcs causes DRBD disk get unavailable on source server "No such resource"
I am working on DRBD, PCS to run a 2 node cluster. With the config virtual_IP and DRBD disk works fine on first node. Then I test the failover with "pcs cluster stop" on master, the disk and virtual IP gets properly migrated to second node. However, on first node the disk becomes un-available. drbda...
I am working on DRBD, PCS to run a 2 node cluster. With the config virtual_IP and DRBD disk works fine on first node. Then I test the failover with "pcs cluster stop" on master, the disk and virtual IP gets properly migrated to second node. However, on first node the disk becomes un-available. drbdadm status Error: cluster is not currently running on this node opt_disk: No such resource Command 'drbdsetup-84 status opt_disk' terminated with exit code 10 Configuration: Cluster Name: cluster_zmbx1 Corosync Nodes: host_1 host_2 Pacemaker Nodes: host_1 host_2 Resources: Master: Z_Root Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1 Resource: zroot (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=opt_disk Operations: demote interval=0s timeout=90 (zroot-demote-interval-0s) monitor interval=30s (zroot-monitor-interval-30s) notify interval=0s timeout=90 (zroot-notify-interval-0s) promote interval=0s timeout=90 (zroot-promote-interval-0s) reload interval=0s timeout=30 (zroot-reload-interval-0s) start interval=0s timeout=240 (zroot-start-interval-0s) stop interval=0s timeout=100 (zroot-stop-interval-0s) Resource: z_fs (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/drbd0 directory=/opt/ fstype=ext4 options=noatime Operations: monitor interval=20s timeout=40s (z_fs-monitor-interval-20s) notify interval=0s timeout=60s (z_fs-notify-interval-0s) start interval=0s timeout=60s (z_fs-start-interval-0s) stop interval=0s timeout=60s (z_fs-stop-interval-0s) Resource: MailIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=20 ip=10.64.200.21 nic=eth0 Operations: monitor interval=10s (MailIP-monitor-interval-10s) start interval=0s timeout=20s (MailIP-start-interval-0s) stop interval=0s timeout=20s (MailIP-stop-interval-0s) Stonith Devices: Fencing Levels: Location Constraints: Ordering Constraints: promote Z_Root then start z_fs (kind:Mandatory) start z_fs then start MailIP (kind:Mandatory) Colocation Constraints: z_fs with Z_Root (score:INFINITY) (with-rsc-role:Master) MailIP with z_fs (score:INFINITY) Ticket Constraints: Alerts: No alerts defined Resources Defaults: resource-stickiness: 200 Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: cluster_zmbx1 dc-version: 1.1.19-8.el7_6.4-c3c624ea3d have-watchdog: false no-quorum-policy: ignore stonith-enabled: false Quorum: Options: auto_tie_breaker: 0 last_man_standing: 1 wait_for_all: 1 Logs on source host when failover happens: Jul 25 20:31:56 host_1 systemd: Stopping Pacemaker High Availability Cluster Manager... Jul 25 20:31:56 host_1 pacemakerd: notice: Caught 'Terminated' signal Jul 25 20:31:56 host_1 pacemakerd: notice: Shutting down Pacemaker Jul 25 20:31:56 host_1 pacemakerd: notice: Stopping crmd Jul 25 20:31:56 host_1 crmd: notice: Caught 'Terminated' signal Jul 25 20:31:56 host_1 crmd: notice: Shutting down cluster resource manager Jul 25 20:31:56 host_1 crmd: notice: State transition S_IDLE -> S_POLICY_ENGINE Jul 25 20:31:56 host_1 pengine: notice: On loss of CCM Quorum: Ignore Jul 25 20:31:56 host_1 pengine: notice: Scheduling Node host_1 for shutdown Jul 25 20:31:56 host_1 pengine: notice: * Shutdown host_1 Jul 25 20:31:56 host_1 pengine: notice: * Promote zroot:0 ( Slave -> Master host_2 ) Jul 25 20:31:56 host_1 pengine: notice: * Stop zroot:1 ( Master host_1 ) due to node availability Jul 25 20:31:56 host_1 pengine: notice: * Move z_fs ( host_1 -> host_2 ) Jul 25 20:31:56 host_1 pengine: notice: * Move MailIP ( host_1 -> host_2 ) Jul 25 20:31:56 host_1 pengine: notice: Calculated transition 4, saving inputs in /var/lib/pacemaker/pengine/pe-input-3930.bz2 Jul 25 20:31:56 host_1 crmd: notice: Initiating cancel operation zroot_monitor_30000 on host_2 Jul 25 20:31:56 host_1 crmd: notice: Initiating stop operation MailIP_stop_0 locally on host_1 Jul 25 20:31:56 host_1 crmd: notice: Initiating notify operation zroot_pre_notify_demote_0 on host_2 Jul 25 20:31:56 host_1 crmd: notice: Initiating notify operation zroot_pre_notify_demote_0 locally on host_1 Jul 25 20:31:56 host_1 crmd: notice: Result of notify operation for zroot on host_1: 0 (ok) Jul 25 20:31:56 host_1 IPaddr2(MailIP): INFO: IP status = ok, IP_CIP= Jul 25 20:31:56 host_1 crmd: notice: Result of stop operation for MailIP on host_1: 0 (ok) Jul 25 20:31:56 host_1 crmd: notice: Initiating stop operation z_fs_stop_0 locally on host_1 Jul 25 20:31:56 host_1 Filesystem(z_fs): INFO: Running stop for /dev/drbd0 on /opt Jul 25 20:31:56 host_1 Filesystem(z_fs): INFO: Trying to unmount /opt Jul 25 20:31:56 host_1 Filesystem(z_fs): INFO: unmounted /opt successfully Jul 25 20:31:56 host_1 crmd: notice: Result of stop operation for z_fs on host_1: 0 (ok) Jul 25 20:31:56 host_1 crmd: notice: Initiating demote operation zroot_demote_0 locally on host_1 Jul 25 20:31:56 host_1 kernel: block drbd0: role( Primary -> Secondary ) Jul 25 20:31:56 host_1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jul 25 20:31:56 host_1 crmd: notice: Result of demote operation for zroot on host_1: 0 (ok) Jul 25 20:31:56 host_1 crmd: notice: Initiating notify operation zroot_post_notify_demote_0 on host_2 Jul 25 20:31:56 host_1 crmd: notice: Initiating notify operation zroot_post_notify_demote_0 locally on host_1 Jul 25 20:31:56 host_1 crmd: notice: Result of notify operation for zroot on host_1: 0 (ok) Jul 25 20:31:56 host_1 crmd: notice: Initiating notify operation zroot_pre_notify_stop_0 on host_2 Jul 25 20:31:56 host_1 crmd: notice: Initiating notify operation zroot_pre_notify_stop_0 locally on host_1 Jul 25 20:31:56 host_1 crmd: notice: Result of notify operation for zroot on host_1: 0 (ok) Jul 25 20:31:56 host_1 crmd: notice: Initiating stop operation zroot_stop_0 locally on host_1 Jul 25 20:31:56 host_1 kernel: drbd opt_disk: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Jul 25 20:31:56 host_1 kernel: drbd opt_disk: ack_receiver terminated Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_a_opt_disk Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Connection closed Jul 25 20:31:56 host_1 kernel: drbd opt_disk: conn( Disconnecting -> StandAlone ) Jul 25 20:31:56 host_1 kernel: drbd opt_disk: receiver terminated Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_r_opt_disk Jul 25 20:31:56 host_1 kernel: block drbd0: disk( UpToDate -> Failed ) Jul 25 20:31:56 host_1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jul 25 20:31:56 host_1 kernel: block drbd0: disk( Failed -> Diskless ) Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_w_opt_disk Jul 25 20:31:56 host_1 crmd: notice: Transition aborted by deletion of nvpair[@id='status-1-master-zroot']: Transient attribute change Jul 25 20:31:56 host_1 crmd: notice: Result of stop operation for zroot on host_1: 0 (ok) Jul 25 20:31:56 host_1 crmd: notice: Initiating notify operation zroot_post_notify_stop_0 on host_2 Jul 25 20:31:56 host_1 crmd: notice: Transition 4 (Complete=25, Pending=0, Fired=0, Skipped=2, Incomplete=13, Source=/var/lib/pacemaker/pengine/pe-input-3930.bz2): Stopped Jul 25 20:31:56 host_1 pengine: notice: On loss of CCM Quorum: Ignore Jul 25 20:31:56 host_1 pengine: notice: Scheduling Node host_1 for shutdown Jul 25 20:31:56 host_1 pengine: notice: * Shutdown host_1 Jul 25 20:31:56 host_1 pengine: notice: * Promote zroot:0 ( Slave -> Master host_2 ) Jul 25 20:31:56 host_1 pengine: notice: * Start z_fs ( host_2 ) Jul 25 20:31:56 host_1 pengine: notice: * Start MailIP ( host_2 ) Jul 25 20:31:56 host_1 pengine: notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-3931.bz2 Jul 25 20:31:56 host_1 crmd: notice: Initiating notify operation zroot_pre_notify_promote_0 on host_2 Jul 25 20:31:57 host_1 crmd: notice: Initiating promote operation zroot_promote_0 on host_2 Jul 25 20:31:57 host_1 crmd: notice: Initiating notify operation zroot_post_notify_promote_0 on host_2 Jul 25 20:31:57 host_1 crmd: notice: Transition aborted by status-2-master-zroot doing modify master-zroot=10000: Transient attribute change Jul 25 20:31:57 host_1 crmd: notice: Transition 5 (Complete=10, Pending=0, Fired=0, Skipped=1, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-3931.bz2): Stopped
irfan (11 rep)
Jul 25, 2019, 07:10 PM
1 votes
1 answers
616 views
Why does udev concatenate two symlinks?
My distribution is SLES 12 SP 2. I work with DRBD (distributed replicated block device) right now. Here is an example of a configuration for a drbd resource. resource HA1dat { device /dev/drbd1; disk /dev/mqdat/HA1; meta-disk internal; on srv0 { address 192.168.174.10:7000; node-id 0; } on srv1 { ad...
My distribution is SLES 12 SP 2. I work with DRBD (distributed replicated block device) right now. Here is an example of a configuration for a drbd resource. resource HA1dat { device /dev/drbd1; disk /dev/mqdat/HA1; meta-disk internal; on srv0 { address 192.168.174.10:7000; node-id 0; } on srv1 { address 192.168.174.11:7000; node-id 1; } on srv9 { address 192.168.174.19:7000; node-id 2; } connection-mesh { hosts srv0 srv1 srv9; } } After activating this resource as a drbd device, normally the following objects are created. brw-rw---- 1 root disk 147, 1 Feb 13 19:41 /dev/drbd1 lrwxrwxrwx 1 root root 14 Feb 13 19:41 /dev/drbd/by-res/HA1dat/0 -> ../../../drbd1 lrwxrwxrwx 1 root root 14 Feb 13 19:41 /dev/drbd/by-disk/mqdat/HA1 -> ../../../drbd1 These are the block device itself and two symbolic links to it. This is the correct behaviour! On some other servers (same distribution) the behaviour is different. The configuration of the drbd resource is very similiar. resource LN0N001Edat { device /dev/drbd1; disk /dev/data1vg/LN0N001E_lv; meta-disk internal; on sedcmmwd0030 { address xxx.yyy.zzz.83:7000; node-id 0; } on sedcmmwd0040 { address xxx.yyy.zzz.99:7000; node-id 1; } on sedcmmwd0050 { address xxx.yyy.zzz.100:7000; node-id 2; } connection-mesh { hosts sedcmmwd0030 sedcmmwd0040 sedcmmwd0050; } } After activating this resource, only the block device and one link are created. brw-rw---- 1 root disk 147, 1 Mar 2 09:49 /dev/drbd1 lrwxrwxrwx 1 root root 23 Mar 2 09:49 /dev/drbd/by-res/LN0N001Edat/0_drbd/by-disk/data1vg/LN0N001E_lv -> ../../../../../../drbd1 This is the problem! This link is a concatenation of the two expected links. I have no idea, why they are concatenated. The link is created by udev. The following output show differences, what udev does on these nodes. Server with correct behaviour juser@srv0:~> udevadm info /dev/drbd1 P: /devices/virtual/block/drbd1 N: drbd1 S: drbd/by-disk/mqdat/HA1 S: drbd/by-res/HA1dat/0 E: DEVICE=drbd1 E: DEVLINKS=/dev/drbd/by-res/HA1dat/0 /dev/drbd/by-disk/mqdat/HA1 E: DEVNAME=/dev/drbd1 E: DEVPATH=/devices/virtual/block/drbd1 E: DEVTYPE=disk E: MAJOR=147 E: MINOR=1 E: SUBSYSTEM=block E: SYMLINK=drbd/by-res/HA1dat/0 drbd/by-disk/mqdat/HA1 E: TAGS=:systemd: E: USEC_INITIALIZED=12263844870 Server with the problem root@sedcmmwd0030:/root : udevadm info /dev/drbd1 P: /devices/virtual/block/drbd1 N: drbd1 S: drbd/by-res/LN0N001Edat/0_drbd/by-disk/data1vg/LN0N001E_lv E: DEVICE=drbd1 E: DEVLINKS=/dev/drbd/by-res/LN0N001Edat/0_drbd/by-disk/data1vg/LN0N001E_lv E: DEVNAME=/dev/drbd1 E: DEVPATH=/devices/virtual/block/drbd1 E: DEVTYPE=disk E: MAJOR=147 E: MINOR=1 E: SUBSYSTEM=block E: SYMLINK=drbd/by-res/LN0N001Edat/0 drbd/by-disk/data1vg/LN0N001E_lv E: TAGS=:systemd: E: USEC_INITIALIZED=1212108486973 The SYMLINK lines are still both correct. The differences start with the DEVLINKS lines. The rules for drbd are the same in both cases cat /usr/lib/udev/rules.d/65-drbd.rules # This file contains the rules to create named DRBD devices. SUBSYSTEM!="block", GOTO="drbd_end" KERNEL!="drbd*", GOTO="drbd_end" IMPORT{program}="/sbin/drbdadm sh-udev minor-%m" # Use symlink from the environment if available ENV{SYMLINK}!="", SYMLINK="$env{SYMLINK}", GOTO="have_symlink" # Legacy rules for older DRBD 8.3 & 8.4 when drbdadm sh-udev did not yet export SYMLINK ENV{DISK}!="", SYMLINK+="drbd/by-disk/$env{DISK}" ENV{RESOURCE}!="", SYMLINK+="drbd/by-res/$env{RESOURCE}" LABEL="have_symlink" ENV{DEVICE}=="drbd_?*", SYMLINK+="$env{DEVICE}" LABEL="drbd_end" Does anybody have an explanation for the creation of the errorneous link?
user279269 (13 rep)
Mar 6, 2018, 11:19 PM • Last activity: Apr 17, 2019, 06:53 PM
2 votes
1 answers
596 views
correct method to corrupt super block in ext3 filesystem associated with drbd
I am trying to simulate file system super block corruption. During this experiment I could not understand the difference between below super block corruption. Please help to know the difference. A DRBD device drbd1 is created on top of LV (Ex: LV1) Filesystem is created on top of DRBD device. **VG -...
I am trying to simulate file system super block corruption. During this experiment I could not understand the difference between below super block corruption. Please help to know the difference. A DRBD device drbd1 is created on top of LV (Ex: LV1) Filesystem is created on top of DRBD device. **VG -> LV -> DRBD -> Ext3 FS** > 1) dd if=/dev/zero of=/dev/VG1/LV1 count=1 bs=4096 > > 2) dd if=/dev/zero of=/dev/drbd1 count=1 bs=4096 Is there any difference between above two commands? My understanding is that we should not use 1) command to corrupt the filesystem, if at all FS(filesystem) is created and associated with drbd. Please help to understand.
sandeep nagendra (173 rep)
Nov 4, 2016, 07:18 AM • Last activity: Apr 17, 2019, 04:02 PM
0 votes
1 answers
1294 views
split-brain recovery with crm-fence-peer script
I implemented DRBD resource level fencing with `crm-fence-peer.9.sh` and `crm-unfence-peer.9.sh` scripts on both nodes. [![enter image description here][1]][1] Now, I have the following situation on my lab nodes: - both nodes `otrs1` and `otrs2` are online - resource are running on `otrs1` - as per...
I implemented DRBD resource level fencing with crm-fence-peer.9.sh and crm-unfence-peer.9.sh scripts on both nodes. enter image description here Now, I have the following situation on my lab nodes: - both nodes otrs1 and otrs2 are online - resource are running on otrs1 - as per drbdadm status otrs1 holds the primary role and otrs2 holds the secondary role enter image description here Now when I reboot otrs1 on otrs2 I get the following message: enter image description here Can see that the resources are moved to otrs2: enter image description here I can see the location constraint create: enter image description here If the replication link becomes connected again and DRBD completes its synchronization process, then the constraint is removed. The cluster manager is now free to promote the resource. In fact the constraint is now removed: enter image description here But as soon I disable the NIC on otrs2 (the currently active node) I can see the split-brain occurred: enter image description here Obviously this is a split-brain scenario. Why's that so ? is it because > In case of the crm-fence-peer script it is necessary that Pacemakers > communication stays available when DRBD’s network link breaks. Source https://docs.linbit.com/docs/users-guide-9.0/#s-automatic-split-brain-recovery-configuration
blabla_trace (385 rep)
Feb 21, 2019, 01:14 PM • Last activity: Feb 21, 2019, 05:22 PM
0 votes
1 answers
127 views
DRBD 9 internal-metada vs makefs
Here are the steps I followed to create a two node DRBD 9 setup: 1. Create LVM 2. On each node created internal meta-data with `drbdadm create-md 'resourcename'` 3. On each node brought the resource up with `drbdadm up 'resourcename'` 4. Prompted one with `drbdadm primary --force 'resourcename'` as...
Here are the steps I followed to create a two node DRBD 9 setup: 1. Create LVM 2. On each node created internal meta-data with drbdadm create-md 'resourcename' 3. On each node brought the resource up with drbdadm up 'resourcename' 4. Prompted one with drbdadm primary --force 'resourcename' as the primary node 5. Made it secondary again drbdadm secondary 'resourcename' 6. Formatted it makefs.ext4 /dev/drbd1 My question is, why does step 6 not wipe out the internal meta-data from step 2?
blabla_trace (385 rep)
Feb 13, 2019, 07:25 PM • Last activity: Feb 19, 2019, 06:42 PM
1 votes
1 answers
741 views
drbd data: failed to create debugfs dentry
CentOS 7 kernel release 3.10.0-957.el7.x86_64 drbd 9.0.16-1 Have one resource configured. drbd servicve is not enabled to start at boot time. I reboot the two nodes. One the first node I run `systemctl start drbd` and get > "DRBD's startup script waits for the peer node(s) to appear. On second now w...
CentOS 7 kernel release 3.10.0-957.el7.x86_64 drbd 9.0.16-1 Have one resource configured. drbd servicve is not enabled to start at boot time. I reboot the two nodes. One the first node I run systemctl start drbd and get > "DRBD's startup script waits for the peer node(s) to appear. On second now when I run systemctl start drbd I get: > drbd data: failed to create debugfs dentry # EDIT # When I run drbdadm up resource_name on both nodes with drbd service disable and not started I get the two nodes in Secondary/UpToDate state which is good. # EDIT2 # global_common.conf enter image description here data.res enter image description here All resource are started on otrs1. It has the Primary role bot shows connecting for the remote node, and on the remote node I don't see the other node at all. enter image description here When I run pcs cluster stop --all and next run drbdadm up data on both nodes all looks good. enter image description here Now when I mount /dev/drbd1 /opt/otrs on one of the nodes it gets auto promoted to the primary role. enter image description here Now when I umount and bring the reource down on both nodes and re-run drbdadm status I get obviously No currently configured DRBD found. Now, exactly the same happens when I run systemctl start drbd on both. On first node the output appears to hand but I guess it waits for the other node to start its services as well, right ? enter image description here After a reboot, the cluster and resources start on node 1 but after putting it into standby mode, resources are not moved: enter image description here And here's what I see journalctl -xe enter image description here # EDIT3 # ok that's odd enter image description here I was loading the drbd kernel module at boot via /etc/modules-load.d/drbd.conf on both nodes but disabled it. Rebooted and to my suprise on one node it's loaded but without drbd_transport_tcp, is pacemaker loading the drbd kernel_module? I can't imagine that. enter image description here Now when I systemctl disable pcsd; systemctl disable pacemaker; systemctl disable corosync on both node and reboot an do lsmod | grep drbd it returns no results. I don't get it :(
blabla_trace (385 rep)
Feb 14, 2019, 08:54 AM • Last activity: Feb 19, 2019, 05:32 PM
0 votes
1 answers
393 views
What is the default role for DRBD nodes
I have upgraded a node in a two-node DRBD cluster as follows: 1. reinstall Debian 2. `apt-get install drbd-utils` 3. `systemctl start drbd` 4. obtain `/etc/drbd.d/my_resource.res` from backup 5. configure block device (`disk`) and IP address for DRBD resource 6. `systemctl reload drbd` At this point...
I have upgraded a node in a two-node DRBD cluster as follows: 1. reinstall Debian 2. apt-get install drbd-utils 3. systemctl start drbd 4. obtain /etc/drbd.d/my_resource.res from backup 5. configure block device (disk) and IP address for DRBD resource 6. systemctl reload drbd At this point DRBD starts to resync and the cluster becomes operational again. My question is this: what determines if the node acts as DRBD primary or secondary at this point. In my case the opposite node was the primary, so it was important that the upgraded node started as secondary. Is this the default?
rookie099 (137 rep)
Jan 14, 2019, 09:52 AM • Last activity: Jan 14, 2019, 01:35 PM
3 votes
1 answers
4397 views
DRBD no output of cat /proc/drbd
I sync my new disk, output of `drbdtop`: Resource: myres: (Overall danger score: 14) Local Disc(Primary): volume 0 (/dev/drbd0): UpToDate(normal disk state) volume 1 (/dev/drbd1): UpToDate(normal disk state) volume 2 (/dev/drbd2): Inconsistent(data is not accessible or usable until resync is complet...
I sync my new disk, output of drbdtop: Resource: myres: (Overall danger score: 14) Local Disc(Primary): volume 0 (/dev/drbd0): UpToDate(normal disk state) volume 1 (/dev/drbd1): UpToDate(normal disk state) volume 2 (/dev/drbd2): Inconsistent(data is not accessible or usable until resync is complete) Connection to zfs.user.osdc2(Secondary): Connected(connected to zfs.user.osdc2) volume 0: UpToDate(normal disk state) volume 1: UpToDate(normal disk state) volume 2: Replication:SyncTarget(local volume is being synchronized with data from zfs.user.osdc2) 95.6% remaining UpToDate(normal disk state) But there is no output in cat /proc/drbd about my resource. Only: version: 9.0.9-1 (api:2/proto:86-112) GIT-hash: f7b979e7af01813e031aac579140237640c94569 build by mockbuild@, 2017-09-14 17:45:45 Transports (api:16): tcp (9.0.9-1) Why there is no output? How should I resolve this issue?
dorinand (698 rep)
May 2, 2018, 12:35 PM • Last activity: Nov 23, 2018, 08:36 PM
Showing page 1 of 20 total questions