Sample Header Ad - 728x90

archlinux netboot diskless node/system, systemd on NFS (v4) fails, rpc.idmapd

5 votes
1 answer
1887 views
**updates: 5 (20171209)** **updates: 5 (20171210)** * mount -t nfs4 [SERVER IP]:/archlinux /mnt works. * ss -ntp | grep 2049 the client establishes a connection to the server before systemd begins. * NSF4 id mapper can only be used with Kerberos? # the problem I am attempting to set up a diskless node/workstation/system. The OS (4.13.12-1-ARCH) is installed on the SERVER /srv/archlinux. After a [successful netboot from GRUB to NFSv4](https://unix.stackexchange.com/questions/408477/archlinux-efi-netboot-kernel-ip-does-not-work-systemd-failed-to-start-switc) , systemd begins but fails at multiple stages, for example: * Failed to mount Kernel Configuration File System. * Failed to mount Kernel Debug File System. * Failed to mount Huge Pages File System * Failed to start Load/Save Random Seed. * Failed to mount /tmp. * Failed to start Rebuild Journal Catalog. * Then ends with Not tainted 4.13.12-1-ARCH #1... Or, * Failed to mount POSIX Message Queue File System. * Failed to start Remount Root and Kernel File System. * Failed to mount Huge Pages File System. * Failed to mount Kernel Debug File System. * Failed to mount Kernel Configuration File System. * Then ends with Not tainted 4.13.12-1-ARCH #1... I suspect the failures are caused by an incorrect configuration of NFSv4 or the local network. ## rpc.idmapd /etc/idmapd.conf [General] Verbosity = 7 Pipefs-Directory = /var/lib/nfs/rpc_pipefs Domain = localdomain [Mapping] Nobody-User = nobody Nobody-Group = nobody [Translation] Method = nnswitch /etc/exports (printed using # exportfs -v) /srv (rw,sync,wdelay,hide,no_subtree_check,fsid=0,sec=sys,no_root_squash,no_all_squash) /srv/archlinux (rw,sync,wdelay,hide,no_subtree_check,sec=sys,no_root_squash,no_all_squash) (Exposed to "world" for debugging purposes) Running rpc.idmapd -fvvv on a separate tty during bootup logs the following: rpc.idmapd: libnfsidmap: using domain: localdomain rpc.idmapd: libnfsidmap: Realms list: 'LOCALDOMAIN' rpc.idmapd: libnfsidmap: processing 'Method' list rpc.idmapd: libnfsidmap: loaded plugin /usr/lib/libnfsidmap/nsswitch.so for method nsswitch rpc.idmapd: Expiration time is 600 seconds. rpc.idmapd: Opened /proc/net/rpc/nfs4.nametoid/channel rpc.idmapd: Opened /proc/net/rpc/nfs4.idtoname/channel rpc.idmapd: nfsdcb: authbuf=* authtype=user rpc.idmapd: nfs4_uid_to_name: calling nsswitch->uid_to_name rpc.idmapd: nfs4_uid_to_name: nsswitch->uid_to_name returned 0 rpc.idmapd: nfs4_uid_to_name: final return value is 0 rpc.idmapd: Server : (user) id "0" -> name "root@localdomain" If exportfs sec=sys, it continues like: rpc.idmapd: nfsdch: authbuf=* authtype=user rpc.idmapd: nfs4_name_to_uid: calling nsswitch->name_to_uid rpc.idmapd: nss_getpwnam: name '0' domain 'localdomain': resulting localname '(null)' rpc.idmapd: nss_getpwnam: name '0' does not map into domain 'localdomain' rpc.idmapd: nfs4_name_to_uid: nsswitch->name_to_uid returned -22 rpc.idmapd: nfs4_name_to_uid: final return value is -22 rpc.idmapd: Server : (user) name "0" -> id "99" (stops here) +(20171209) After making sure that the /etc/hostname for the CLIENT was set to client2 (duh), if exportfs sec=none **or** sec=sys, it continues like: rpc.idmapd: nfsdch: authbuf=* authtype=group rpc.idmapd: nfs4_gid_to_name: calling nsswitch->gid_to_name rpc.idmapd: nfs4_gid_to_name: nsswitch->gid_to_name returned 0 rpc.idmapd: nfs4_gid_to_name: final return value is 0 rpc.idmapd: Server : (group) id "190" -> name "systemd-journal@localdomain" rpc.idmapd: nfsdch: authbuf=* authtype=user rpc.idmapd: nfs4_name_to_uid: calling nsswitch->name_to_uid rpc.idmapd: nss_getpwnam: name '0' domain 'localdomain': resulting localname '(null)' rpc.idmapd: nss_getpwnam: name '0' does not map into domain 'localdomain' rpc.idmapd: nfs4_name_to_uid: nsswitch->name_to_uid returned -22 rpc.idmapd: nfs4_name_to_uid: final return value is -22 rpc.idmapd: Server : (user) name "0" -> id "99" (stops here) If I instead change method from nsswitch to static (https://unix.stackexchange.com/questions/286924/uid-mapping-in-nfs) /etc/idmapd.conf ... [Translation] Method = static [Static] root@localdomain = root The rpc.idmapd -fvvv on a separate tty during bootup logs the following: rpc.idmapd: libnfsidmap: using domain: localdomain rpc.idmapd: libnfsidmap: Realms list: 'LOCALDOMAIN' rpc.idmapd: libnfsidmap: processing 'Method' list rpc.idmapd: static_getpwnam: name 'root@localdomain' mapped to 'root' rpc.idmapd: static_getpwnam: group 'root@localdomain' mapped to ' root' rpc.idmapd: libnfsidmap: loaded plugin /usr/lib/libnfsidmap/static.so for method static rpc.idmapd: Expiration time is 600 seconds. rpc.idmapd: Opened /proc/net/rpc/nfs4.nametoid/channel rpc.idmapd: Opened /proc/net/rpc/nfs4.idtoname/channel rpc.idmapd: nfsdcb: authbuf=* authtype=user rpc.idmapd: nfs4_uid_to_name: calling static->uid_to_name rpc.idmapd: nfs4_uid_to_name: static->uid_to_name returned 0 rpc.idmapd: nfs4_uid_to_name: final return value is 0 rpc.idmapd: Server : (user) id "0" -> name "root@localdomain" If exportfs sec=sys, it continues like: rpc.idmapd: nfsdch: authbuf=* authtype=user rpc.idmapd: nfs4_name_to_uid: calling static->name_to_uid rpc.idmapd: nfs4_name_to_uid: static->name_to_uid returned -2 rpc.idmapd: nfs4_name_to_uid: final return value is -2 rpc.idmapd: Server : (user) name "0" -> id "99" (stops here) If exportfs sec=none, it continues like: rpc.idmapd: nfsdch: authbuf=* authtype=group rpc.idmapd: nfs4_gid_to_name: calling static->gid_to_name rpc.idmapd: nfs4_gid_to_name: static->gid_to_name returned -2 rpc.idmapd: nfs4_gid_to_name: final return value is -2 rpc.idmapd: Server : (group) id "190" -> name "nobody" rpc.idmapd: nfsdch: authbuf=* authtype=user rpc.idmapd: nfs4_name_to_uid: calling static->name_to_uid rpc.idmapd: nfs4_name_to_uid: static->name_to_uid returned -2 rpc.idmapd: nfs4_name_to_uid: final return value is -2 rpc.idmapd: Server : (user) name "0" -> id "99" (stops here) Similar problems with the user ID mapping: * [NFSv4 User Mapping](https://serverfault.com/questions/812813/nfsv4-user-mapping) * [NFS user mapping](https://serverfault.com/questions/520276/nfs-user-mapping) * [Mapping UID and GID of local user to the mounted NFS share](https://serverfault.com/questions/514118/mapping-uid-and-gid-of-local-user-to-the-mounted-nfs-share) * And many many more... Often related to a switch from NFSv3 to NFSv4, and rarely about netboot. # troubleshooting * No firewall * No Kerberos, LDAP, etc. * No SELinux * The user root exists on both SERVER and CLIENT, with the same password. ## SERVER All other relevant configuration files for NFSv4 I could identify on the SERVER. /etc/nsswitch.conf passwd: compat mymachines systemd group: compat mymachines systemd shadow: compat publickey: files hosts: files mymachines resolve [!UNAVAIL=return] dns myhostname networks: files protocols: files services: files ethers: files rpc: files netgroup: files /etc/nfs.conf (all settings commented out) /etc/conf.d/nfs-common.conf (all settings commented out) ### network configuration * [How to set the domain name on GNU/Linux?](https://serverfault.com/questions/490825/how-to-set-the-domain-name-on-gnu-linux) * [Archlinux Wiki Network configuration: Set the hostname](https://wiki.archlinux.org/index.php/Network_configuration#Set_the_hostname) * [Archlinux Wiki Network configuration: Local network hostname resolution](https://wiki.archlinux.org/index.php/Network_configuration#Local_network_hostname_resolution) The SERVER hostname is server and has 3 network devices (nd[1-3]). The Gateway default via 192.168.0.1 nd1. /etc/hosts 127.0.0.1 localhost.localdomain localhost ::1 ip6.localhost localhost 192.168.0.101 nd1.localdomain server servernd1 192.168.1.101 nd2.localdomain server servernd2 192.168.2.101 nd3.localdomain server servernd2 192.168.1.102 client1.localdomain client1 192.168.2.102 client2.localdomain client2 /etc/resolveconf.conf name_servers=192.168.0.1 # hostname -f # nd1.localdomain # hostname -i 192.168.0.101 192.168.1.101 192.168.2.101 # getent hosts IP -> the corresponding line in /etc/hosts # getent ahosts HOSTNAME -> the corresponding line in /etc/hosts # ping -c 3 server.localdomain -> 0% packet loss # id -u root -> 0 # id -un 0 -> root Display the system's effective NFSv4 domain name on stdout. # nfsidmap -d -> localdomain Display on stdout all keys currently in the keyring used to cache ID mapping results. These keys are visible only to the superuser. # nfsidmap -l -> nfsidmap: '.id_resolver' keyring was not found. ## CLIENT /etc/hostname +(20171209) client2 /etc/hosts (exactly the same as the hosts file on the server) /etc/resolveconf.conf name_servers=192.168.0.1 /etc/idmapd.conf (exactly the same as the idmapd.conf file on the server) /etc/fstab # sys=sec or sys=none to correspond to server export settings. /dev/nfs / nfs rw,hard,rsize=9151,sec=sys,clientaddr=192.168.2.102 0 0 devtmpfs /dev devtmpfs defaults proc /proc proc defaults none /run tmpfs defaults sys /sys sysfs defaults run /run tmpfs defaults tmp /tmp tmpfs defaults The fstab was defined by comparing the mounted directories on the server using findmnt -A. ## net_nfs4 * +(20171210) NFS version on SERVER and CLIENT cat /proc/fs/nfsd/versions -> -2 +3 +4 +4.1 +4.2 * On SERVER and CLIENT [cat /sys/module/nfsd/parameters/nfs4_disable_idmapping -> N](https://wiki.archlinux.org/index.php/NFS#Ensure_NFSv4_idmapping_is_fully_enabled) . * On SERVER echo "options nfsd nfs4_disable_idmapping=0" > /etc/modprobe.d/nfsd.conf. * On CLIENT the /sys/module/nfs/parameters/nfs4_disable_idmapping does not exist, and not sure how to manually create it as the /sys is read only. * +(20171210) On CLIENT echo "options nfs nfs4_disable_idmapping=0" > /etc/modprobe.d/nfs.conf. The CLIENT IP is 192.168.2.102/24. The CLIENT network device is connected to SERVER nd2 192.168.2.101/24 (hostname: servernd2). The network information during boot: :: running early hook [udev] starting version 235 :: running hook [udev] :: Triggering uevents... :: running hook [net_nfs4] IP-Config: eth0 hardware address [CLIENT NETWORK DEVICE MAC] mtu 1500 DHCP hostname client2 IP-Config: eth0 guessed broadcast address 192.168.2.255 IP-Config: eth0 complete (from 192.168.0.101): address: 192.168.2.102 broadcast: 192.168.2.255 netmask: 255.255.255.0 gateway: 192.168.2.101 dns0 : 192.168.0.1 dns1 : 0.0.0.0 host : client2 domain : localdomain rootserver: 192.168.0.101 rootpath: /srv/archlinux filename : /netboot/grub/i386-pc/core.0 NFS-Mount: 192.168.2.101:/archlinux Waiting 10 seconds for device /dev/nfs ... (systemd takes over from here) ## Why the NSFv4 errors occur? ### Server : (group) id "190" -> name "nobody" >With NFSv4, things change: users are mapped by username, and the mapping between user names and user IDs is handled by a process called "ID map daemon" (idmapd). In particular, NFSv4 clients and server should use the same domain for the mapping to work properly, otherwise requests will be mapped to the anonymous user/group. -- [Trying out NFSv4 (on Linux and Solaris) -- March 15th, 2012 - 13:03 / bronto](https://syslog.me/2012/03/15/trying-out-nfsv4-on-linux-and-solaris/) --- >In an ideal world, the user and group of the requesting client would determine the permissions of the data returned. We don't live in an ideal world. Two real-world problems intervene: > 1. You might not trust the root user of a client with root access to the server's files. > 1. The same username on client and server might have different numerical ID's >Problem 1 is conceptually simple. John Q. Programmer is given a test machine for which he has root access. In no way does that mean that John Q. Programmer should be able to alter root owned files on the server. Therefore NFS offers root squashing, a feature that maps uid 0 (root) to the anonymous (nfsnobody) uid, which defaults to -2 (65534 on 16 bit numbers). -- [NFS: Overview and Gotchas -- Copyright (C) 2003 by Steve Litt](http://www.troubleshooters.com/linux/nfs.htm#_Configure_the_NFS_Server) ### +(20171209) rpc.idmapd: nss_getpwnam: name '0' domain 'localdomain': resulting localname '(null)' According to [Steve Dickson in a comment (2011-08-12 16:01:55 EDT) to a Red Hat Bugzilla – Bug 715430 report](https://bugzilla.redhat.com/show_bug.cgi?id=715430#c2) >The [error] statement explains the problem. DNS on the local machine was not set up (or returning NULL) and the Domain= variable in /etc/idmapd.conf was not set. ### nss_getpwnam: name '0' does not map into domain On the Debian Mailing Lists, in an [e-mail correspondence between Jonas Meurer and Christian Seiler (20150722) concerning "Kerberos-secured NFSv4"](https://lists.debian.org/debian-user/2015/07/msg00966.html) the error is explained in detail. My summary of the discussion: When the NFS client sends nss_getpwnam: name '8' domain 'freesources.org': resulting localname '(null)' > The NFS client sends just the uid converted to a string in some cases instead of the properly translated NFS username, which the server then rejects. The client should send nss_getpwnam: name 'mail@freesources.org' domain 'freesources.org': resulting localname 'mail' > Here you can see that the owner name that was transmitted by the NFS client was 'mail@freesources.org' (and not simply '8'), so that does contain an @; nss_getpwname can see that the domain name matches and just strips it, resulting in a user name 'mail', which it looks up in /etc/passwd, returns the user id (in this case, 8, because it's the same on client and server) and the server is perfectly happy. > So why does the client send the wrong username? > ... every once in a while, idmapping will fail, so the kernel will just send a number. But that number will cause the chown command to fail, since the server won't translate it back. > > Short answer: I have no idea. > > Longer answer: ... If I understand the longer answer correctly, the problem could occur because the NFS client relies on the "kernel's key cache". For the NFS server this should never be a problem because the "kernel's key cache" is never used. Nonetheless, > Since you are using just regular nsswitch via /etc/passwd, nss_getpwnam should *never* fail in your case, unless you do some weird stuff with /etc/passwd at the same time. The answer also refers to an alternative method to idmapd; nfsidmap, although reading the man I cannot quite understand how it would replace idmapd. ### +(20171209) nss_getpwnam: name 'root@domain.com' does not map into domain 'localdomain' This error message does not seem to occur for me, I am however including the answer from [SUSE's support knowledgebase -- 10-DEC-13 Modified Date: 12-OCT-17 --](https://www.suse.com/support/kb/doc/?id=7014266) because of the description of cause, and the proposed remedy which stands in contrast to the other found discussions. >NFSv4 handles user identities differently than NFSv3. In v3, an nfs client would simply pass a UID number in chown (and other requests) and the nfs server would accept that (even if the nfs server did not know of an account with that UID number). However, v4 was designed to pass identities in the form of @. To function correctly, that normally requires idmapd (id mapping daemon) to be active at client and server, and for each to consider themselves part of the same id mapping domain. > >Chown failures or idmapd errors like the ones documented above are typically a result of either: > >1. The username is known to the client but not known to the server, or >2. The idmapd domain name is set differently on the client than it is on the server. > >Therefore, this issue can be fixed by insuring that the nfs server and client are configured with the same idmapd domain name (/etc/idmapd.conf) and both have knowledge of the usernames / accounts in question. > >However, it is often not convenient to insure that both sides have the same user account knowledge, especially if the nfs server is a filer. The NFS community has recognized that this idmapd feature of NFSv4 is often more troublesome that it is worth, so there are steps and modifications coming into effect to allow the NFSv3 behavior to work even under NFSv4. The proposed remedy is to disable idmapd. nfs.nfs4_disable_idmapping=1 ## +(20171209) Wireshark Analyzing the Wireshark log, it is quite extensive but begins with something like: [IP CLIENT] -> [IP SERVER] NFS 226 V4 Call ACCESS FH: [HEX VALUE], [Check: RD LU MD XT DL] [IP SERVER] -> [IP CLIENT] NFS 238 V4 Reply (Call In 34) ACCESS, [Allowed: RD LU MD XT DL] [IP CLIENT] -> [IP SERVER] NFS 246 V4 Call LOOKUP DH: [HEX VALUE]/archlinux where a similar pattern [A HEX VALUE]/[PATH] can be discerned for /sbin, /usr, /bin, /init, /lib, /systemd, /dev, /proc, /sys, /run, /, /lib64. When the CLIENT requests /Id-linux-x86-64.so.2 the first errors start to appear: [IP CLIENT] -> [IP SERVER] NFS 342 V4 Call OPEN DH: [HEX VALUE]/Id-linux-x86-64.so.2 [SERVER IP] -> [CLIENT IP] NFS 166 V4 Reply (Call In 124) OPEN Status: NFS4ERR_SYMLINK The pattern more or less repeats itself with more frequent errors, for example, LOOKUP Status; and OPEN Status: reporting NFS4ERR_NOENT. Interestingly, it is at the very end of the log where to first and only reference to user permission is made, [SERVER IP] -> [CLIENT IP] NFS 182 V4 Reply (Call In 9562) SETATTR Status: NFS4ERR_BADOWNER ## RFC According to * [RFC7530 (Network File System (NFS) Version 4 Protocol, 201503, PROPOSED STANDARD)](https://www.rfc-editor.org/rfc/rfc7530) -- Updated by [RFC7931](https://www.rfc-editor.org/rfc/rfc7931) * [RFC5661 (Network File System (NFS) Version 4 Minor Version 1 Protocol, 201001, PROPOSED STANDARD)](https://www.rfc-editor.org/rfc/rfc5661) -- Updated by [RFC8178](https://www.rfc-editor.org/rfc/rfc8178) * [RFC7862 (Network File System (NFS) Version 4 Minor Version 2 Protocol, 201001, PROPOSED STANDARD)](https://www.rfc-editor.org/rfc/rfc7862) -- Updated by [RFC8178](https://www.rfc-editor.org/rfc/rfc8178) -- which refers back to [RFC5661]. ### NFS4ERR_BADOWNER (Error Code 10039) >This error is returned when an owner or owner_group attribute value or the who field of an ACE within an ACL attribute value cannot be translated to a local representation. The specifications discuss in Section 5.9. *Interpreting owner and owner_group*, I am not sure what to cite as relevant however. ### NFS4ERR_SYMLINK (Error Code 10029) >The current filehandle designates a symbolic link when the current operation does not allow a symbolic link as the target. ### NFS4ERR_NOENT (Error Code 2) > This indicates no such file or directory. The file system object referenced by the name specified does not exist. The error could however be expected ... >The current filehandle is assumed to refer to a regular directory a named attribute directory. LOOKUPP assigns the filehandle for its parent directory to be the current filehandle. If there is no parent directory, an NFS4ERR_NOENT error must be returned. Therefore, NFS4ERR_NOENT will be returned by the server when the current filehandle is at the root or top of the server's file tree. ## +(20171210) mount -t nfs4 [SERVER IP]:/archlinux /mnt On the client computer, using the Archlinux "LiveUSB" I was able to mount the network drive, download the latest kernel (4.14-4-1-ARCH) via the SERVER internet connection, and install archlinux on the [SERVER IP]/archlinux. During install rpc.idmapd -fvvv indicated a successful mapping of usernames, for example, rpc.idmapd: Server : (user) id "0" -> name "root@localdomain" rpc.idmapd: Server : (group) id "99" -> name "nobody@localdomain" ... -> name "tty@localdomain" ... -> name "systemd-journal-upload@localdomain" ... -> name rpc@localdomain ... -> name systemd-journal@localdomain ... -> name utmp@localdomain The result of genfstab was also different: Nevertheless, after reboot systemd failed again with the same failures as described at the beginning of the post. ## +(20171210) Is the remote directory on the server mounted to /new_root? The mkinitcpio script uses the variable mount_handler to carry an assigned "mounting function", in this case nfs_mount_handler(), to which the "root path" is passed $1 at a later stage; /new_root. I am trying to verify that the client has mounted the [SERVER IP]:/archlinux to the /new_root. On the server, I can only observe that the client has established a connection but not if the directory is mounted and to where? showmount -a server -> All mount points on server: (empty) ss -ntp | grep 2049 -> ESTAB 0 0 192.168.2.101:2049 192.168.2.102:809 (random port) ## +(20171210) NFS4, sec=sys and id mapper are incompatible? >**Reading the doco, it looks like sec=sys and the id mapper can be used to correctly map uid/gid to name where the client and server have different mappings in /etc/passwd and /etc/group. This simply isn't true.** > >That's because with sec=sys the id mapper doesn't come into play in the authentication part of the nfs protocol, only the file attributes part. With sec=sys authentication, nfs just passes the client uid/gid which is used directly by the server. So permissions checks will be screwed if client and server uid and gid don't align. To confuse things further, when the client creates a new file it is the authentication credentials that are used, so the file gets created at the server with the client's uid/gid. After that nfs uses idmap to get the file attributes, so the uid/gid (which originally came from the client) gets mapped at the server, and you end up seeing the server's name for a client uid/gid. Borkage! On the other hand, if the file was originally created at the server, you will see the correct name at the client, even if the uid/gid differs. But permissions checking will still be broken. -- [kimmie -- Posted: Wed Feb 20, 2013 3:14 am Post subject:](https://forums.gentoo.org/viewtopic-p-7250220.html?sid=f9d53191215294ce744797d1da1aee27#7250220) -- Emphasis in original
Asked by user212827 (91 rep)
Dec 8, 2017, 04:21 PM
Last activity: Dec 27, 2017, 04:04 AM