|Summary:||[regression: 2.6.28] NFS client with locking fails|
|Product:||File System||Reporter:||Kees Cook (kees)|
|Component:||NFS||Assignee:||Trond Myklebust (trondmy)|
|Severity:||normal||CC:||chuck.lever, neilb, rjw|
|Bug Depends on:|
Description Kees Cook 2008-12-19 13:38:51 UTC
Latest working kernel version: 2.6.27 Earliest failing kernel version: 2.6.28-rc7 Distribution: Ubuntu Hardware Environment: Intel Software Environment: Ubuntu Jaunty Problem Description: Attempting to mount an NFS share with locking will fail, claiming it cannot reach portmapper (which is running fine): Dec 7 18:15:56 nattbrygga kernel: [28315.080038] rpcbind: server localhost not responding, timed out Dec 7 18:15:56 nattbrygga kernel: [28315.080076] RPC: failed to contact local rpcbind server (errno 5). ... $ rpcinfo -p localhost program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 56296 status 100024 1 tcp 49528 status Steps to reproduce: # mount server:/storage/thing on /mnt/thing -t nfs -o rw,hard,intr,lock,nfsvers=3,rsize=32768,wsize=32768,posix,sloppy See also: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309268
Comment 2 Trond Myklebust 2008-12-19 14:15:29 UTC
I'm not seeing any such problems on my own setup with a 2.6.28-rc9 kernel talking to the rpcbind from Fedora 9. Having read the argument in the launchpad report, please do note that neither rpcbind nor libtirpc are part of nfs-utils. They are separate packages, with separate git repositories.
Comment 3 Kees Cook 2008-12-19 16:24:16 UTC
What would you recommend as the best way to further diagnose this problem?
Comment 4 Neil Brown 2008-12-22 14:45:06 UTC
tcpdump -i lo -s 0 -w /tmp/tcp & strace -e mount -f -s 1000 mount server:/storage/thing on /mnt/thing -t nfs ..... kill % tshark -r /tmp/tcp post the output of 'strace' and attach the file created by 'tcpdump'. I'm guessing that an IPv6 address is being given to the kernel, so it tries to talk to rpcbind, but as only portmap is listening, it has problems. But that is only a guess.
Comment 5 Trond Myklebust 2008-12-23 06:58:22 UTC
If so, then that would be a .config error. You should _not_ be enabling SUNRPC_REGISTER_V4 if you are running a legacy portmapper. That's why the option defaults to 'n', and why it explicitly states in the help text that 'Distributions using the legacy Linux portmapper daemon must say N here.'
Comment 6 Chuck Lever 2008-12-23 08:17:28 UTC
Trond is correct that SUNRPC_REGISTER_V4 should be set to N for distributions that still use portmapper instead of rpcbind. If this option is enabled, all portmap registrations are handled with an rpcbind v4 request, which the legacy portmapper does not support. To be sure that we are dealing with a portmap registration issue, enable debug messages before trying the mount with: sudo rpcdebug -m rpc -s bind svcsock svcsdp Try the mount, then look at the tail of /var/log/messages. To disable debug messages, use: sudo rpcdebug -m rpc -c I would have expected an rpcbind protocol version mismatch to be reported immediately as such, rather than the request timing out.
Comment 7 Kees Cook 2008-12-23 09:54:02 UTC
I can confirm that Ubuntu Jaunty is not using rpcbind, yet has SUNRPC_REGISTER_V4=y. I will try to get this fixed. Thanks for the details!