Yesterday and today I spend hours figuring out why our nfs clients hung on opening any file from an nfs4 mount. Listing dirs etc worked fine. Finally I figured out the root cause while doing a tcpdump of the communication and nfs debugging enabled on the nfsd component. It seemed to call OP_SETCLIENTID in some loop, each time it returned with error 10008. After looking up nfs4_make_rec_clidname I got a hunch that maybe MD5 crypto algo wasn't available. A kernel upgrade on the nfs server during which apparently the crypto md5 module was deselected because no kernel component required it. It seems MD5 is required in fs/nfsd/nfs4recover.c in the function nfs4_make_rec_clidname I think config NFSD_V4 should select MD5 to make sure md5 support is present.
Reassigning to Bruce since this appears to be a NFS server, and not NFS filesystem related issue.
Thanks for the report. "A kernel upgrade on the nfs server during which apparently the crypto md5 module was deselected because no kernel component required it." From which kernel version to which kernel version? As of jlayton's 2216d449a97927cc105912e337d169cd4d4db548 "nfsd: get rid of cl_recdir field" (in 3.8) this should at least fail more gracefully in this case (the client will be allowed to continue with an error to the server's logs). But probably it would make sense to put the legacy tracker under its own config and make that select md5.
Agreed. Eventually, we'll want to remove the legacy tracker altogether. As an interim step it would probably make sense to allow it to be disabled at compile time too.
For stable kernels though, it might make sense to add a 'select CRYPTO_MD5' to the 'config NFSD_V4' section. We can always remove that when we move the legacy tracker under its own Kconfig option.
The kernel was upgraded from 3.5.7 to 3.6.11. It seems during the upgrade we disabled CONFIG_RPCSEC_GSS_KRB5 which is the only thing depending on CRYPTO_MD5 (we're not using kerberos authentication).
I can confirm this; after I installed a 3.6.11 kernel with CONFIG_CRYPTO_MD5 disabled on my nfs server, all attempts to access files on nfs clients hung indefinitely. Installing a 3.6.11 kernel with CONFIG_CRYPTO_MD5=m fixed the issue.
I ran into this bug on Linux 4.5.3. Without CONFIG_CRYPTO_MD5, I was seeing this in my kernel log at boot: Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period (net ffffffff81a718c0) NFSD: unable to generate recoverydir name (-2). NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly! After I built and installed md5.ko and rebooted I now have only this in my kernel log: Installing knfsd (copyright (C) 1996 okir@monad.swb.de). NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period (net ffffffff81a718c0) This bug has been open for more than three years, and evidently the "legacy client ID tracker" still has not been separated, so I'll second the request that NFSD_V4 should select CRYPTO_MD5.
Is /usr/sbin/nfsdcltrack not installed on your machine? I'm just wondering why the kernel is falling back to legacy recovery tracking at all here. With the newer recovery tracking you also get the benefit of having the grace period lifted early if there are only v4.1 clients present and they've all finished reclaiming.
(In reply to Jeff Layton from comment #8) > Is /usr/sbin/nfsdcltrack not installed on your machine? I'm just wondering > why the kernel is falling back to legacy recovery tracking at all here. Indeed it is not. I'm running Gentoo and have disabled its "nfsdcld" ("Enable nfsdcld NFSv4 clientid tracking daemon") USE flag on the net-fs/nfs-utils package because I didn't know why I would want yet another daemon running on my system. NFSv4 mounts seem to work fine without it, or am I missing something? (This question is off-topic for this bug report, so if you reply, please reply by email.) I think I'll open a bug report on Gentoo's bug tracker to make net-fs/nfs-utils emit a build warning if USE="+nfsv4 -nfsdcld" and CONFIG_CRYPTO_MD5="n".
It's not a daemon actually. It's a usermode helper upcall program, which means that it gets exec'ed by the kernel as necessary. The main benefit is that if you have only v4.1 clients, then nfsdcltrack will lift the grace period early. No need to wait 90s before the server is reusable after rebooting it. In any case, you are correct that this is a bit offtopic. Adding select CRYPTO_MD5 to the Kconfig might make sense, but ISTR the FIPS folks getting their knickers in a twist over using md5 hashes (even when we're not using them in a cryptographic way, as is the case here). Feel free to propose a patch on the linux-nfs mailing list. If you're not comfortable doing that, I'll toss it onto my to-do pile, but it may be a while before I can get to it.
I opened a pull request to add a warning to Gentoo's net-fs/nfs-utils ebuild in case both CONFIG_CRYPTO_MD5="n" and USE="-nfsdcld". https://github.com/gentoo/gentoo/pull/1448 I have rebuilt nfs-utils with nfsdcltrack, and now I have working client tracking without md5.ko, so I hereby _retract_ my request that CONFIG_NFSD_V4 select CONFIG_MD5. But I would suggest that the kernel emit a more helpful message than "unable to generate recoverydir name (-2)". It would be great if the kernel would additionally say *why*, such as: "MD5 crypto alg not available". It also would be great if the kernel would emit a one-time warning if it is unable to exec nfsdcltrack. These two warnings would have saved me some time and consternation.
A better error's still a reasonable request, I'm sure it'd go in if somebody wants to make a patch, but I haven't heard this complaint in a while so it's just not a priority.