Bug 31442

Summary: nfs4/gssapi mounts hang
Product: File System Reporter: Brian J. Murrell (brian)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: RESOLVED OBSOLETE    
Severity: high CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38 Subsystem:
Regression: Yes Bisected commit-id:

Description Brian J. Murrell 2011-03-19 16:15:22 UTC
Per the downstream bug report at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/702385 I have been having problems with NFS4/GSSAPI mounts hanging since the downstream's 2.6.32-27-generic kernel.  That is, downstream's 2.6.32-22-generic works but downstream's 2.6.32-27-generic doesn't.  Those two kernels map to upstream kernels 
2.6.32.11+drm33.2 and 2.6.32.26+drm33.12 respectively.

I have now also reproduced this Ubuntu's "mainline" kernel 2.6.38-020638-generic which as I understand is the vanilla upstream kernel (2.6.38) with their .config.

Specifically, while the server is booted with this 2.6.38 kernel a client on a 2.6.35-24-generic (which maps to upstream kernel 2.6.35-24) will hang trying to mount an NFS4/GSSAPI export.  NFS3 works fine still.  If I revert the server back to the downstream kernel 2.6.32-22-generic functionality is restored.

On the client, the mount.nfs4 process is blocked with this stack:

[  153.186278] mount.nfs4    D f2915c38     0  3930   3929 0x00000000
[  153.186278]  f2915c48 00000086 00000002 f2915c38 f2f44b5c c05d99e0 c08c4700 c08c4700
[  153.186278]  21058719 00000019 c08c4700 c08c4700 21038edd 00000019 00000000 c08c4700
[  153.186278]  c08c4700 f2fd58d0 00000001 f2915c7c 00000000 f2915c84 f2915c50 fabeec0c
[  153.186278] Call Trace:
[  153.186278]  [<fabeec0c>] rpc_wait_bit_killable+0x1c/0x40 [sunrpc]
[  153.186278]  [<c05c823d>] __wait_on_bit+0x4d/0x70
[  153.235963]  [<fabeebf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[  153.235963]  [<fabeebf0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[  153.235963]  [<c05c830b>] out_of_line_wait_on_bit+0xab/0xc0
[  153.235963]  [<c0165f10>] ? wake_bit_function+0x0/0x50
[  153.235963]  [<fabef31b>] __rpc_execute+0xdb/0x250 [sunrpc]
[  153.235963]  [<fabeea17>] ? rpc_init_task+0xd7/0x120 [sunrpc]
[  153.235963]  [<c0230bef>] ? mntput_no_expire+0x1f/0xd0
[  153.235963]  [<fabef4fe>] rpc_execute+0x6e/0x80 [sunrpc]
[  153.235963]  [<fabe89af>] rpc_run_task+0x1f/0x30 [sunrpc]
[  153.235963]  [<fabe8abe>] rpc_call_sync+0x3e/0x60 [sunrpc]
[  153.235963]  [<f84696b2>] _nfs4_call_sync+0x22/0x30 [nfs]
[  153.235963]  [<f8467795>] nfs4_proc_get_root+0xa5/0x100 [nfs]
[  153.235963]  [<f84516f8>] nfs4_get_rootfh+0x48/0x130 [nfs]
[  153.235963]  [<f8453a33>] ? nfs_alloc_fattr+0x23/0xb0 [nfs]
[  153.235963]  [<f844bf19>] ? nfs4_init_server+0xf9/0x200 [nfs]
[  153.235963]  [<f844b4b4>] nfs4_server_common_setup+0x54/0x170 [nfs]
[  153.235963]  [<f844c062>] nfs4_create_server+0x42/0xc0 [nfs]
[  153.235963]  [<f84564eb>] nfs4_remote_get_sb+0x6b/0x250 [nfs]
[  153.235963]  [<c020fadf>] ? __alloc_percpu+0xf/0x20
[  153.235963]  [<c0231629>] ? alloc_vfsmnt+0xf9/0x130
[  153.235963]  [<c021b354>] vfs_kern_mount+0x74/0x1c0
[  153.235963]  [<f84578b9>] nfs_do_root_mount+0x69/0x90 [nfs]
[  153.235963]  [<f84579bf>] nfs4_try_mount+0x3f/0xb0 [nfs]
[  153.235963]  [<f8457ca1>] ? nfs_alloc_parsed_mount_data+0x41/0xa0 [nfs]
[  153.235963]  [<f8457d50>] nfs4_get_sb+0x50/0xd0 [nfs]
[  153.235963]  [<c0231629>] ? alloc_vfsmnt+0xf9/0x130
[  153.235963]  [<c021b354>] vfs_kern_mount+0x74/0x1c0
[  153.235963]  [<c022f9b3>] ? get_fs_type+0x33/0xb0
[  153.235963]  [<c021b4fe>] do_kern_mount+0x3e/0xe0
[  153.235963]  [<c0232b2c>] do_mount+0x1dc/0x220
[  153.235963]  [<c0232bdb>] sys_mount+0x6b/0xa0
[  153.235963]  [<c05c9cc4>] syscall_call+0x7/0xb
Comment 1 Trond Myklebust 2011-03-19 16:48:11 UTC
The above looks exactly like the sort of trace I'd expect if the server
is failing to reply to my RPC request.

If this is RPCSEC_GSS, are you perhaps enabling 3des or aes encryption against
a server that doesn't support it?
Comment 2 Brian J. Murrell 2011-03-19 17:05:07 UTC
(In reply to comment #1)
> The above looks exactly like the sort of trace I'd expect if the server
> is failing to reply to my RPC request.

Certainly sounds reasonable.
 
> If this is RPCSEC_GSS, are you perhaps enabling 3des or aes encryption
> against
> a server that doesn't support it?

The *only* thing I am doing is changing which kernel I boot.  How would I derive from the different kernels if this type of scenario is what is happening?
Comment 3 Trond Myklebust 2011-03-19 18:27:58 UTC
Hmm... I was hoping that rpc.gssd would log what enctype it uses, but it doesn't appear to do so.

One thing you might try is simply to comment out the call to
'parse_enctypes()' in utils/gssd/gssd_proc.c:handle_gssd_upcall() and then
recompile your nfs-utils.
Comment 4 Brian J. Murrell 2011-03-19 18:41:42 UTC
Since I hate to assume, let me confirm that you want that nfs-utils change on the server right?

(In reply to comment #3)
> 
> One thing you might try is simply to comment out the call to
> 'parse_enctypes()' in utils/gssd/gssd_proc.c:handle_gssd_upcall() and then
> recompile your nfs-utils.

Is that on the client or the server?  I'm thinking client since it's in utils/gssd/.

What are the results I am looking for when I've done that (and restarted gssd on the client)?
Comment 5 Brian J. Murrell 2011-03-19 19:16:38 UTC
(In reply to comment #3)
> 
> One thing you might try is simply to comment out the call to
> 'parse_enctypes()' in utils/gssd/gssd_proc.c:handle_gssd_upcall() and then
> recompile your nfs-utils.

Hrm.  I don't have a parse_enctypes() in my utils/gssd/gssd_proc.c:handle_gssd_upcall().  I am using nfs-utils-1.2.2.
Comment 6 Trond Myklebust 2011-03-19 19:28:14 UTC
Ah... What version of kerberos?

Also, what does 'sudo klist -keK' show on the client and server?
Comment 7 Brian J. Murrell 2011-03-19 19:43:48 UTC
(In reply to comment #6)
> Ah... What version of kerberos?

1.8.1 on both the client and server.  Specifically 1.8.1+dfsg-5ubuntu0.6 and 1.8.1+dfsg-2ubuntu0.6 respectively, for whatever it's worth.
 
> Also, what does 'sudo klist -keK' show on the client and server?

Client:
$ sudo klist -keK
Keytab name: WRFILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   2 nfs/pc.example.com@ILINX (Triple DES cbc mode with HMAC/sha1)  (0x3438...f462)
   2 nfs/pc.example.com@ILINX (DES cbc mode with CRC-32)  (0x7c...d5)
   4 host/pc.example.com@ILINX (Triple DES cbc mode with HMAC/sha1)  (0x926d...52b3)
   4 host/pc.example.com@ILINX (DES cbc mode with CRC-32)  (0xdf...c4)

Server:
# klist -keK
Keytab name: WRFILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   4 nfs/linux.example.com@ILINX (DES cbc mode with CRC-32)  (0x98...51)
   5 imap/linux.example.com@ILINX (DES cbc mode with CRC-32)  (0x25...52)
   5 imap/linux.example.com@ILINX (Triple DES cbc mode with HMAC/sha1)  (0xfd0d...32b3)
   3 smtp/linux.example.com@ILINX (DES cbc mode with CRC-32)  (0x16...76)
   3 smtp/linux.example.com@ILINX (Triple DES cbc mode with HMAC/sha1)  (0xc8b6...4952)
   3 host/linux.example.com@ILINX (DES cbc mode with CRC-32)  (0xd0...e5)
   3 host/linux.example.com@ILINX (Triple DES cbc mode with HMAC/sha1)  (0x6745...464a)
Comment 8 Trond Myklebust 2011-03-19 20:19:42 UTC
Can you try getting rid of the Triple DES 'nfs' and 'host' service entries on the
client (and possibly on the server too) keytab?

Then please run the rpc.gssd with the '-f -vvv' options on the client side to ensure that you get a full log of what happens when you try to mount.
Comment 9 Brian J. Murrell 2011-03-19 22:58:06 UTC
OK.  Removed the 3des entries:

Client:
pc# klist -keK
Keytab name: WRFILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   2 nfs/pc.example.com@ILINX (DES cbc mode with CRC-32)  (0x7c...d5)
   4 host/pc.example.com@ILINX (DES cbc mode with CRC-32)  (0xdf...c4)

Server:
linux# klist -keK
Keytab name: WRFILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   4 nfs/linux.example.com@ILINX (DES cbc mode with CRC-32)  (0x98...51)
   5 imap/linux.example.com@ILINX (DES cbc mode with CRC-32)  (0x25...52)
   3 smtp/linux.example.com@ILINX (DES cbc mode with CRC-32)  (0x16...76)
   3 host/linux.example.com@ILINX (DES cbc mode with CRC-32)  (0xd0...e5)

pc# rpc.gssd with the -f -vvv
beginning poll
handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt6e1)
handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 '
handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt6e1)
process_krb5_upcall: service is '<null>'
Full hostname for 'linux.example.com' is 'linux.example.com'
Full hostname for 'pc' is 'pc'
Key table entry not found while getting keytab entry for 'root/pc@ILINX'
Key table entry not found while getting keytab entry for 'nfs/pc@ILINX'
Key table entry not found while getting keytab entry for 'host/pc@ILINX'
Success getting keytab entry for nfs/*@ILINX
WARNING: Key table entry not found while getting initial ticket for principal 'nfs/pc.example.com@ILINX' using keytab 'WRFILE:/etc/krb5.keytab'
ERROR: No credentials found for connection to server linux.example.com
doing error downcall
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e1
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e0
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6df
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e4
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt6e3

pc$ sudo mount -t nfs4 -o sec=krb5 linux:/tmp /mnt/tmp
mount.nfs4: access denied by server while mounting linux:/tmp

I wasn't getting access denied before removing the 3des keytab entries.
Comment 10 Brian J. Murrell 2011-03-19 23:27:15 UTC
I also tries setting the hostname to the FQDN on the client given the number of errors about not being able to find keytab entries for "pc" and the end result was the same:

pc# rpc.gssd with the -f -vvv
beginning poll
handling gssd upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt75a)
handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 '
handling krb5 upcall (/var/lib/nfs/rpc_pipefs/nfs/clnt75a)
process_krb5_upcall: service is '<null>'
Full hostname for 'linux.example.com' is 'linux.example.com'
Full hostname for 'pc.example.com' is 'pc.example.com'
Key table entry not found while getting keytab entry for 'root/pc.example.com@ILINX'
Success getting keytab entry for 'nfs/pc.example.com@ILINX'
WARNING: Key table entry not found while getting initial ticket for principal 'nfs/pc.example.com@ILINX' using keytab 'WRFILE:/etc/krb5.keytab'
ERROR: No credentials found for connection to server linux.example.com
doing error downcall
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt75a
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt754
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt753
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt75d
destroying client /var/lib/nfs/rpc_pipefs/nfs/clnt75c

for

pc$ sudo mount -t nfs4 -o sec=krb5 linux:/tmp /mnt/tmp
mount.nfs4: access denied by server while mounting linux:/tmp

Although, as I said before, this now getting "access denied" is new since having removed the 3des entries and all of this, (client named "pc", 3des entries in keytabs, etc.) all work on older kernels.
Comment 11 Brian J. Murrell 2011-03-21 12:40:34 UTC
Thanx much for all of the input so far.  Is there anything more I can provide to help move this along to resolution?
Comment 12 Brian J. Murrell 2011-03-23 16:15:53 UTC
You probably saw this on the linux nfs list but just to keep information here complete (from http://permalink.gmane.org/gmane.linux.nfs/39578), when I posted the output of both the rpc.gssd and rpc.svcgssd services Kevin Coffman determined:

> prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and leng=
th 8

This shows that a des (enctype 4) session key is being negotiated and
delivered to both kernels, so I don't think any of the Kerberos issues
should be involved here.  (At least from the user-land perspective.)

I'm not sure what kernel change would be causing your hang ...