Sample Header Ad - 728x90

Unable to mount gfs2 file system on Debian Stretch, probable dlm mis-config?

2 votes
1 answer
2842 views
I am experimenting with gfs2 on Debian Stretch, and having some difficulties. I am a reasonably experienced Linux admin, but new to shared-disk and parallel file systems. My immediate project is to mount a gfs2-formatted iscsi-exported device on multiple clients as a shared file system. For the moment, I am not interested in HA or fencing, although this may be important later on. The iscsi part is fine, I am able to log in to the target, format it as an xfs file system, and also mount it on multiple clients and verify that it shows up with the same blkid. To do the gfs2 business, I am following the scheme on the Debian stretch "gfs2" man page, modified for my config, and embellished slightly by various searches and so forth. Man page is here: https://manpages.debian.org/stretch/gfs2-utils/gfs2.5.en.html The actual error is, when I attempt to mount my gfs2 file system, the mount command returns with mount: mount(2) failed: /mnt: No such file or directory ... where /mnt is the desired mount point, which certainly does exist. (If you attempt to mount to a nonexistent mount point the error is "mount: mount point /wrong does not exist"). Related, at each mount attempt, dmesg reports: gfs2: can't find protocol lock_dlm I briefly went down the path of assuming the problem was that Debian packages do not provide "/sbin/mount.gfs2", and looked for that, but I think that was an incorrect guess. I have a five-machine cluster (of Raspberry Pis, in case it matters), named, somewhat idiosyncratically, pio, pi, pj, pk, and pl. They all have fixed static IP addresses, and there's no domain. I have installed the Debian gfs2, corosync, and dlm-controld packages. For the corosync step, my corosync config is (e.g. for pio, intended to be the master of the cluster): totem { version: 2 cluster_name: rpitest token: 3000 token_retransmits_before_loss_const: 10 clear_node_high_bit: yes crypto_cipher: none crypto_hash: none nodeid: 17 interface { ringnumber: 0 bindnetaddr: 192.168.0.17 mcastport: 5405 ttl: 1 } } nodelist { node { ring0_addr: 192.168.0.17 nodeid: 17 } node { ring0_addr: 192.168.0.11 nodeid: 1 } node { ring0_addr: 192.168.0.12 nodeid: 2 } node { ring0_addr: 192.168.0.13 nodeid: 3 } node { ring0_addr: 192.168.0.14 nodeid: 4 } } logging { fileline: off to_stderr: no to_logfile: no to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 5 } This file is present on all the nodes, with appropriate node-specific changes to the nodeid and bindnetaddr fields in the totem section. The corosync tool starts without error on all nodes, and all the nodes also have sane-looking output from corosync-quorumtool, thus: root@pio:~# corosync-quorumtool Quorum information ------------------ Date: Sun Apr 22 11:04:13 2018 Quorum provider: corosync_votequorum Nodes: 5 Node ID: 17 Ring ID: 1/124 Quorate: Yes Votequorum information ---------------------- Expected votes: 5 Highest expected: 5 Total votes: 5 Quorum: 3 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 1 1 192.168.0.11 2 1 192.168.0.12 3 1 192.168.0.13 4 1 192.168.0.14 17 1 192.168.0.17 (local) The dlm-controld package was installed, and /etc/dlm/dlm.conf created with the following simple config. Again, I am skipping fencing for now. The dlm.conf file is the same on all the nodes. enable_fencing=0 lockspace rpitest nodir=1 master rpitest node=17 I am unclear on whether or not the DLM "lockspace" name is supposed to match the corosync cluster name or not. I see the same behavior either way. The dlm-controld service starts without errors, and the the output of "dlm_tool status" appears sane: root@pio:~# dlm_tool status cluster nodeid 17 quorate 1 ring seq 124 124 daemon now 1367 fence_pid 0 node 1 M add 31 rem 0 fail 0 fence 0 at 0 0 node 2 M add 31 rem 0 fail 0 fence 0 at 0 0 node 3 M add 31 rem 0 fail 0 fence 0 at 0 0 node 4 M add 31 rem 0 fail 0 fence 0 at 0 0 node 17 M add 7 rem 0 fail 0 fence 0 at 0 0 The gfs2 file system was created by: mkfs -t gfs2 -p lock_dlm -j 5 -t rpitest:one /path/to/device Subsequent to this, "blkid /path/to/device" reports: /path/to/device: LABEL="rpitest:one" UUID= TYPE="gfs2" It looks the same on all the iscsi clients. At this point, I feel like I should be able to mount the gfs2 file system on any/all of the clients, but here is where I get the error above -- the mount command reports a "no such file or directory", and dmesg and syslog report "gfs2: can't find protocol lock_dlm". There are several other gfs2 guides out there, but many of them seem to be RH/CentOS specific, and for other cluster-management schemes besides corosync, like cman or pacemaker. Those aren't necessarily deal-breakers, but it's high-value to me to have this work on nearly-stock Debian Stretch. It also seems likely to me that this is probably a pretty simple dlm misconfiguration, but I can't seem to nail it down. Additional clues: When I try to "join" a lockspace via dlm_tool join ... I get a dmesg output: dlm cluster name 'rpitest' is being used without an application provided cluster name This happens independently of whether the lockspace I am joining is "rpitest" or not. This suggests that lockspace names and cluster names are indeed the same thing, and/but that the dlm is evidently not aware of the corosync config?
Asked by Andrew Reid (53 rep)
Apr 22, 2018, 04:44 PM
Last activity: Apr 24, 2018, 06:09 AM