Bug 52271

Summary: config NFSD_V4 should select MD5
Product: File System Reporter: Mark (mark.sf.net)
Component: NFSAssignee: bfields
Status: RESOLVED WILL_NOT_FIX    
Severity: normal CC: alan, jlayton, kernel, seraph, szg00000, trondmy
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.6.11 Subsystem:
Regression: No Bisected commit-id:

Description Mark 2013-01-04 10:22:09 UTC
Yesterday and today I spend hours figuring out why our nfs clients hung on opening any file from an nfs4 mount. Listing dirs etc worked fine.

Finally I figured out the root cause while doing a tcpdump of the communication and nfs debugging enabled on the nfsd component. It seemed to call OP_SETCLIENTID in some loop, each time it returned with error 10008. After looking up nfs4_make_rec_clidname I got a hunch that maybe MD5 crypto algo wasn't available.

A kernel upgrade on the nfs server during which apparently the crypto md5 module was deselected because no kernel component required it.

It seems MD5 is required in fs/nfsd/nfs4recover.c in the function nfs4_make_rec_clidname

I think config NFSD_V4 should select MD5 to make sure md5 support is present.
Comment 1 Trond Myklebust 2013-01-04 14:19:36 UTC
Reassigning to Bruce since this appears to be a NFS server, and not NFS
filesystem related issue.
Comment 2 bfields 2013-01-04 17:32:44 UTC
Thanks for the report.

"A kernel upgrade on the nfs server during which apparently the crypto md5
module was deselected because no kernel component required it."

From which kernel version to which kernel version?

As of jlayton's 2216d449a97927cc105912e337d169cd4d4db548 "nfsd: get rid of cl_recdir field" (in 3.8) this should at least fail more gracefully in this case (the client will be allowed to continue with an error to the server's logs).

But probably it would make sense to put the legacy tracker under its own config and make that select md5.
Comment 3 Jeff Layton 2013-01-04 18:09:13 UTC
Agreed. Eventually, we'll want to remove the legacy tracker altogether. As an interim step it would probably make sense to allow it to be disabled at compile time too.
Comment 4 Jeff Layton 2013-01-04 18:13:39 UTC
For stable kernels though, it might make sense to add a 'select CRYPTO_MD5' to the 'config NFSD_V4' section.

We can always remove that when we move the legacy tracker under its own Kconfig option.
Comment 5 Mark 2013-01-07 10:04:12 UTC
The kernel was upgraded from 3.5.7 to 3.6.11.

It seems during the upgrade we disabled CONFIG_RPCSEC_GSS_KRB5 which is the only thing depending on CRYPTO_MD5 (we're not using kerberos authentication).
Comment 6 seraph@xs4all.nl 2013-03-10 23:11:14 UTC
I can confirm this; after I installed a 3.6.11 kernel with CONFIG_CRYPTO_MD5 disabled on my nfs server, all attempts to access files on nfs clients hung indefinitely.  Installing a 3.6.11 kernel with CONFIG_CRYPTO_MD5=m fixed the issue.
Comment 7 Matt Whitlock 2016-05-10 13:26:57 UTC
I ran into this bug on Linux 4.5.3. Without CONFIG_CRYPTO_MD5, I was seeing this in my kernel log at boot:

Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period (net ffffffff81a718c0)
NFSD: unable to generate recoverydir name (-2).
NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly!

After I built and installed md5.ko and rebooted I now have only this in my kernel log:

Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period (net ffffffff81a718c0)

This bug has been open for more than three years, and evidently the "legacy client ID tracker" still has not been separated, so I'll second the request that NFSD_V4 should select CRYPTO_MD5.
Comment 8 Jeff Layton 2016-05-10 14:35:43 UTC
Is /usr/sbin/nfsdcltrack not installed on your machine? I'm just wondering why the kernel is falling back to legacy recovery tracking at all here. With the newer recovery tracking you also get the benefit of having the grace period lifted early if there are only v4.1 clients present and they've all finished reclaiming.
Comment 9 Matt Whitlock 2016-05-10 22:48:11 UTC
(In reply to Jeff Layton from comment #8)
> Is /usr/sbin/nfsdcltrack not installed on your machine? I'm just wondering
> why the kernel is falling back to legacy recovery tracking at all here.

Indeed it is not. I'm running Gentoo and have disabled its "nfsdcld" ("Enable nfsdcld NFSv4 clientid tracking daemon") USE flag on the net-fs/nfs-utils package because I didn't know why I would want yet another daemon running on my system. NFSv4 mounts seem to work fine without it, or am I missing something? (This question is off-topic for this bug report, so if you reply, please reply by email.)

I think I'll open a bug report on Gentoo's bug tracker to make net-fs/nfs-utils emit a build warning if USE="+nfsv4 -nfsdcld" and CONFIG_CRYPTO_MD5="n".
Comment 10 Jeff Layton 2016-05-10 22:57:28 UTC
It's not a daemon actually. It's a usermode helper upcall program, which means that it gets exec'ed by the kernel as necessary. The main benefit is that if you have only v4.1 clients, then nfsdcltrack will lift the grace period early. No need to wait 90s before the server is reusable after rebooting it.

In any case, you are correct that this is a bit offtopic. Adding select CRYPTO_MD5 to the Kconfig might make sense, but ISTR the FIPS folks getting their knickers in a twist over using md5 hashes (even when we're not using them in a cryptographic way, as is the case here).

Feel free to propose a patch on the linux-nfs mailing list. If you're not comfortable doing that, I'll toss it onto my to-do pile, but it may be a while before I can get to it.
Comment 11 Matt Whitlock 2016-05-11 01:32:58 UTC
I opened a pull request to add a warning to Gentoo's net-fs/nfs-utils ebuild in case both CONFIG_CRYPTO_MD5="n" and USE="-nfsdcld".

https://github.com/gentoo/gentoo/pull/1448

I have rebuilt nfs-utils with nfsdcltrack, and now I have working client tracking without md5.ko, so I hereby _retract_ my request that CONFIG_NFSD_V4 select CONFIG_MD5.

But I would suggest that the kernel emit a more helpful message than "unable to generate recoverydir name (-2)". It would be great if the kernel would additionally say *why*, such as: "MD5 crypto alg not available". It also would be great if the kernel would emit a one-time warning if it is unable to exec nfsdcltrack. These two warnings would have saved me some time and consternation.
Comment 12 bfields 2022-01-21 17:21:38 UTC
A better error's still a reasonable request, I'm sure it'd go in if somebody wants to make a patch, but I haven't heard this complaint in a while so it's just not a priority.