Sample Header Ad - 728x90

Infiniband RHEL7, NFS RDMA setup & help

0 votes
1 answer
1382 views
Hoping someone can provide help as to what I am doing wrong or missing: - Have 3 test servers (hostnames node1..3) each with a Mellanox MT28908 family infiniband card, model CX653105A - Have Mellanox HDR infiniband switch QM8700. - On each server running RHEL 7.9, kernel 3.10.0-1160.49.1.el7.x86_64 - Was referencing Access_Redhat Storage Admin Guide, 8.6 *Configuring The NFS Server*. - systemctl disable firewalld; service firewalld stop - selinux set to permissive in /etc/selinux/config. - Have MLNX_OFED_LINUX-5.4-3.1.0.0-rhel7.9-x86_64.iso (362Mb) downloaded from https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed ; and installed on each node1, node2, node3. - mount -o loop MLNX_OFED_LINUX-5.4-3.1.0.0-rhel7.9-x86_64.iso /mlx; - run /mlx/mlnxofedinstall - let it do its thing, yes to all prompts, it uninstalls existing rhel ofed stuff that it says is incompatible. - chkconfig opensm on and openibd.service=enabled on all 3 nodes. - rebooted - configured network for ib0 via gui under System Tool-Settings-Network. - ib0 Transport Mode = DATAGRAM; MTU = automatic; ipv4 address manually set; ipv6=ignore. - manually edited /etc/hosts to add ip addresses of node1, node2, node3 of my 3 test servers: 172.16.2.{1,2,3} - This infiniband network is the only network physically cabled between these 3 servers and switch. - I can successfully ping and ssh or scp to each over the infiniband network. - /etc/sysconfig/nfs is the same on all nodes, that file listed below. - on node1 : /etc/exports : /scratch *(rw,async,no_root_squash) and nfs.service=enabled - node2 or node3 : mount node1:/scratch /scratch works - node2 or node3 : mount -o rdma,port=20049 node1:/scratch /scratch results in **mount.nfs : mount system call failed** - There is an /etc/rdma/mlx4.conf file, which I have not modified. - There is no /etc/rdma/rmda.conf but there is an /etc/rdma/modules/rdma.conf. - I did ln -s /etc/rdma/modules/rdma.conf /etc/rdma/rdma.conf; not sure if needed. - I uncommented all the lines (services) within; the syntax seems different there is no "LOAD="; can post this file if needed. - /var/log/messages after trying mount -o proto=rdma, not sure if this is significant: Request for unknown module key Mellanox Technologies signing key err -11 I have tried setting the Transport Protocol to **Connected** in System-Tools-Setting-Network on two nodes but then the Off/On becomes off and will not stay on, and an ip a shows ib0 link up but has no ip address and then there is no network connectivity between the two. **Questions:** 1. to have NFS over RDMA, what should the Transport Mode be set to, DATAGRAM or CONNECTED ? 2. For a *basic* **TCP** network over infiniband like I can get working, is what I described above correct? Did I miss anything of do anything wrong? Regarding opensm running on every server is that correct? 3. Is mount -o rdma node1:/scratch /scratch all that is needed if everything else is configured properly to get NFS=RDMA working 4. If/when NFS+RDMA is actually working, will a general tcp type network still exist between my little network that I can ssh or scp between them? 5. A systemctl list-unit-files | grep nfs shows 13 different nfs-xxx.service's. Do any of these others (blkmap, config, idmap, lock, mountd, rquotad, server, lock) need to be enabled? 6. Will selinux=enforcing be a problem? -- My /etc/sysconfig/nfs file on each server #LOCKDARG= #RPCNFSDCOUNT=16 #NFSD_V4_GRACE=90 #NFSD_V4_LEASE=90 RPCMOUNTDOPTS="" STATDARG="" #STATD_HA_CALLOUT="/usr/local/bin/foo" SMNOTIFYARGS="" RPCIDMAPDARGS="" RPCGSSDARGS="" GSS_USE_PROXY="yes" BLKMAPDARGS="" RPCNFSDARGS="--rdma=20049" STATD_PORT=4001 STATD_OUTGOING_PORT=4002 MOUNTD_PORT=4003 LOCKD_TCPPORT=4004 LOCKD_UDPPORT=4004
Asked by ron (8647 rep)
Dec 28, 2021, 05:34 PM
Last activity: Feb 22, 2022, 03:03 PM