Bug 12256 - [regression: 2.6.28] NFS client with locking fails
Summary: [regression: 2.6.28] NFS client with locking fails
Status: CLOSED INVALID
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks: 11808
  Show dependency tree
 
Reported: 2008-12-19 13:38 UTC by Kees Cook
Modified: 2008-12-23 14:07 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.28-rc8
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Kees Cook 2008-12-19 13:38:51 UTC
Latest working kernel version: 2.6.27
Earliest failing kernel version: 2.6.28-rc7
Distribution: Ubuntu
Hardware Environment: Intel
Software Environment: Ubuntu Jaunty
Problem Description:
Attempting to mount an NFS share with locking will fail, claiming it cannot reach portmapper (which is running fine):
Dec 7 18:15:56 nattbrygga kernel: [28315.080038] rpcbind: server localhost not responding, timed out
Dec 7 18:15:56 nattbrygga kernel: [28315.080076] RPC: failed to contact local rpcbind server (errno 5).
...

$ rpcinfo -p localhost
   program vers proto port
    100000 2 tcp 111 portmapper
    100000 2 udp 111 portmapper
    100024 1 udp 56296 status
    100024 1 tcp 49528 status

Steps to reproduce:
# mount server:/storage/thing on /mnt/thing -t nfs -o rw,hard,intr,lock,nfsvers=3,rsize=32768,wsize=32768,posix,sloppy

See also: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309268
Comment 1 Kees Cook 2008-12-19 13:42:50 UTC
Er, rather, see: https://bugs.launchpad.net/bugs/306016
Comment 2 Trond Myklebust 2008-12-19 14:15:29 UTC
I'm not seeing any such problems on my own setup with a 2.6.28-rc9 kernel
talking to the rpcbind from Fedora 9.

Having read the argument in the launchpad report, please do note that neither
rpcbind nor libtirpc are part of nfs-utils. They are separate packages, with
separate git repositories.
Comment 3 Kees Cook 2008-12-19 16:24:16 UTC
What would you recommend as the best way to further diagnose this problem?
Comment 4 Neil Brown 2008-12-22 14:45:06 UTC
tcpdump -i lo -s 0 -w /tmp/tcp &
strace -e mount -f -s 1000 mount server:/storage/thing on /mnt/thing -t nfs .....
kill %
tshark -r /tmp/tcp


post the output of 'strace' and attach the file created by 'tcpdump'.

I'm guessing that an IPv6 address is being given to the kernel, so it
tries to talk to rpcbind, but as only portmap is listening, it has 
problems.
But that is only a guess.
Comment 5 Trond Myklebust 2008-12-23 06:58:22 UTC
If so, then that would be a .config error. You should _not_ be enabling
SUNRPC_REGISTER_V4 if you are running a legacy portmapper. That's why the
option defaults to 'n', and why it explicitly states in the help text that

'Distributions using the legacy Linux portmapper daemon must say N here.'
Comment 6 Chuck Lever 2008-12-23 08:17:28 UTC
Trond is correct that SUNRPC_REGISTER_V4 should be set to N for distributions that still use portmapper instead of rpcbind.  If this option is enabled, all portmap registrations are handled with an rpcbind v4 request, which the legacy portmapper does not support.

To be sure that we are dealing with a portmap registration issue, enable debug messages before trying the mount with:

  sudo rpcdebug -m rpc -s bind svcsock svcsdp

Try the mount, then look at the tail of /var/log/messages.  To disable debug messages, use:

  sudo rpcdebug -m rpc -c

I would have expected an rpcbind protocol version mismatch to be reported immediately as such, rather than the request timing out.
Comment 7 Kees Cook 2008-12-23 09:54:02 UTC
I can confirm that Ubuntu Jaunty is not using rpcbind, yet has SUNRPC_REGISTER_V4=y.  I will try to get this fixed.  Thanks for the details!

Note You need to log in before you can comment on or make changes to this bug.