Hoping someone can provide help as to what I am doing wrong or missing:
- Have 3 test servers (hostnames node1..3) each with a Mellanox MT28908 family infiniband card, model CX653105A
- Have Mellanox HDR infiniband switch QM8700.
- On each server running RHEL 7.9, kernel 3.10.0-1160.49.1.el7.x86_64
- Was referencing Access_Redhat Storage Admin Guide, 8.6 *Configuring The NFS Server*.
-
systemctl disable firewalld; service firewalld stop
- selinux set to permissive in /etc/selinux/config
.
- Have MLNX_OFED_LINUX-5.4-3.1.0.0-rhel7.9-x86_64.iso
(362Mb) downloaded from https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed ; and installed on each node1, node2, node3.
- mount -o loop MLNX_OFED_LINUX-5.4-3.1.0.0-rhel7.9-x86_64.iso /mlx;
- run /mlx/mlnxofedinstall
- let it do its thing, yes to all prompts, it uninstalls existing rhel ofed stuff that it says is incompatible.
- chkconfig opensm on
and openibd.service=enabled
on all 3 nodes.
- rebooted
- configured network for ib0
via gui under System Tool-Settings-Network.
- ib0 Transport Mode = DATAGRAM; MTU = automatic; ipv4 address manually set; ipv6=ignore.
- manually edited /etc/hosts
to add ip addresses of node1, node2, node3 of my 3 test servers: 172.16.2.{1,2,3}
- This infiniband network is the only network physically cabled between these 3 servers and switch.
- I can successfully ping and ssh or scp to each over the infiniband network.
- /etc/sysconfig/nfs
is the same on all nodes, that file listed below.
- on node1 : /etc/exports
: /scratch *(rw,async,no_root_squash)
and nfs.service=enabled
- node2 or node3 : mount node1:/scratch /scratch
works
- node2 or node3 : mount -o rdma,port=20049 node1:/scratch /scratch
results in **mount.nfs : mount system call failed**
- There is an /etc/rdma/mlx4.conf
file, which I have not modified.
- There is no /etc/rdma/rmda.conf
but there is an /etc/rdma/modules/rdma.conf
.
- I did ln -s /etc/rdma/modules/rdma.conf /etc/rdma/rdma.conf
; not sure if needed.
- I uncommented all the lines (services) within; the syntax seems different there is no "LOAD="; can post this file if needed.
- /var/log/messages
after trying mount -o proto=rdma
, not sure if this is significant: Request for unknown module key Mellanox Technologies signing key err -11
I have tried setting the Transport Protocol to **Connected** in System-Tools-Setting-Network on two nodes but then the Off/On becomes off and will not stay on, and an ip a
shows ib0 link up but has no ip address and then there is no network connectivity between the two.
**Questions:**
1. to have NFS over RDMA, what should the Transport Mode be set to, DATAGRAM or CONNECTED ?
2. For a *basic* **TCP** network over infiniband like I can get working, is what I described above correct? Did I miss anything of do anything wrong? Regarding opensm
running on every server is that correct?
3. Is mount -o rdma node1:/scratch /scratch
all that is needed if everything else is configured properly to get NFS=RDMA working
4. If/when NFS+RDMA is actually working, will a general tcp type network still exist between my little network that I can ssh or scp between them?
5. A systemctl list-unit-files | grep nfs
shows 13 different nfs-xxx.service
's. Do any of these others (blkmap, config, idmap, lock, mountd, rquotad, server, lock) need to be enabled?
6. Will selinux=enforcing be a problem?
--
My /etc/sysconfig/nfs file on each server
#LOCKDARG=
#RPCNFSDCOUNT=16
#NFSD_V4_GRACE=90
#NFSD_V4_LEASE=90
RPCMOUNTDOPTS=""
STATDARG=""
#STATD_HA_CALLOUT="/usr/local/bin/foo"
SMNOTIFYARGS=""
RPCIDMAPDARGS=""
RPCGSSDARGS=""
GSS_USE_PROXY="yes"
BLKMAPDARGS=""
RPCNFSDARGS="--rdma=20049"
STATD_PORT=4001
STATD_OUTGOING_PORT=4002
MOUNTD_PORT=4003
LOCKD_TCPPORT=4004
LOCKD_UDPPORT=4004
Asked by ron
(8647 rep)
Dec 28, 2021, 05:34 PM
Last activity: Feb 22, 2022, 03:03 PM
Last activity: Feb 22, 2022, 03:03 PM