Bug 12256

Summary: [regression: 2.6.28] NFS client with locking fails
Product: File System Reporter: Kees Cook (kees)
Component: NFSAssignee: Trond Myklebust (trondmy)
Severity: normal CC: chuck.lever, neilb, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-rc8 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11808    

Description Kees Cook 2008-12-19 13:38:51 UTC
Latest working kernel version: 2.6.27
Earliest failing kernel version: 2.6.28-rc7
Distribution: Ubuntu
Hardware Environment: Intel
Software Environment: Ubuntu Jaunty
Problem Description:
Attempting to mount an NFS share with locking will fail, claiming it cannot reach portmapper (which is running fine):
Dec 7 18:15:56 nattbrygga kernel: [28315.080038] rpcbind: server localhost not responding, timed out
Dec 7 18:15:56 nattbrygga kernel: [28315.080076] RPC: failed to contact local rpcbind server (errno 5).

$ rpcinfo -p localhost
   program vers proto port
    100000 2 tcp 111 portmapper
    100000 2 udp 111 portmapper
    100024 1 udp 56296 status
    100024 1 tcp 49528 status

Steps to reproduce:
# mount server:/storage/thing on /mnt/thing -t nfs -o rw,hard,intr,lock,nfsvers=3,rsize=32768,wsize=32768,posix,sloppy

See also: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309268
Comment 1 Kees Cook 2008-12-19 13:42:50 UTC
Er, rather, see: https://bugs.launchpad.net/bugs/306016
Comment 2 Trond Myklebust 2008-12-19 14:15:29 UTC
I'm not seeing any such problems on my own setup with a 2.6.28-rc9 kernel
talking to the rpcbind from Fedora 9.

Having read the argument in the launchpad report, please do note that neither
rpcbind nor libtirpc are part of nfs-utils. They are separate packages, with
separate git repositories.
Comment 3 Kees Cook 2008-12-19 16:24:16 UTC
What would you recommend as the best way to further diagnose this problem?
Comment 4 Neil Brown 2008-12-22 14:45:06 UTC
tcpdump -i lo -s 0 -w /tmp/tcp &
strace -e mount -f -s 1000 mount server:/storage/thing on /mnt/thing -t nfs .....
kill %
tshark -r /tmp/tcp

post the output of 'strace' and attach the file created by 'tcpdump'.

I'm guessing that an IPv6 address is being given to the kernel, so it
tries to talk to rpcbind, but as only portmap is listening, it has 
But that is only a guess.
Comment 5 Trond Myklebust 2008-12-23 06:58:22 UTC
If so, then that would be a .config error. You should _not_ be enabling
SUNRPC_REGISTER_V4 if you are running a legacy portmapper. That's why the
option defaults to 'n', and why it explicitly states in the help text that

'Distributions using the legacy Linux portmapper daemon must say N here.'
Comment 6 Chuck Lever 2008-12-23 08:17:28 UTC
Trond is correct that SUNRPC_REGISTER_V4 should be set to N for distributions that still use portmapper instead of rpcbind.  If this option is enabled, all portmap registrations are handled with an rpcbind v4 request, which the legacy portmapper does not support.

To be sure that we are dealing with a portmap registration issue, enable debug messages before trying the mount with:

  sudo rpcdebug -m rpc -s bind svcsock svcsdp

Try the mount, then look at the tail of /var/log/messages.  To disable debug messages, use:

  sudo rpcdebug -m rpc -c

I would have expected an rpcbind protocol version mismatch to be reported immediately as such, rather than the request timing out.
Comment 7 Kees Cook 2008-12-23 09:54:02 UTC
I can confirm that Ubuntu Jaunty is not using rpcbind, yet has SUNRPC_REGISTER_V4=y.  I will try to get this fixed.  Thanks for the details!