Bug 7274

Summary: NFSv4 fails to mount (timeout) on kernel 2.6.19-rc1
Product: File System Reporter: Torsten Kaiser (just.for.lkml)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.19-rc1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: NFSv4: Fix thinko in fs/nfs/super.c

Description Torsten Kaiser 2006-10-06 09:51:30 UTC
Most recent kernel where this bug did not occur:
vanilla 2.6.18
all 2.6.18-mm[1..3] have the bug.

I did a git-bisect to track it down to this commit:

51b6ded4d9a94a61035deba1d8f51a54e3a3dd86 is first bad commit
commit 51b6ded4d9a94a61035deba1d8f51a54e3a3dd86
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Fri Sep 15 16:31:56 2006 -0400

    NFSv4: When mounting with a port=0 argument, substitute port=2049

    RFC3530 states that the registered port 2049 for the NFS protocol should be
    the default configuration in order to allow clients not to use the RPC
    binding protocols.
    If the mount program sends us a port=0, we therefore substitute port=2049.

    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

:040000 040000 f093abbb7cb6a01155ee2d2a820839ca2e9edadb 
4f90adecb2095bc921bcac83d9cff3c0fc1f5388 M  fs

Distribution: Gentoo-2006.1
Hardware Environment: x86_64 server, i386 client
Software Environment:
NFS-Server: tested with 2.6.18-mm2, but the same error could be seen with 
2.6.17-rc6-mm2 as server.
NFS-Client: 2.6.17 and 2.6.18 work, but 2.6.19-rc1 fails with the same 
configuration

The fstab entry of the failing mount:
192.168.0.4:/portage /usr/portage nfs4 rw,noatime,intr,noauto 0 0

Problem Description:
When trying to mount this share the mount process hangs several minutes, then 
it will display: "mount: 192.168.0.4:/portage: can't read superblock"
The syslog will contain: "nfs: server 192.168.0.4 not responding, timed out"

There is no firewall between the two systems, and the server does listen on 
port 2049.
rpcinfo -p localhost on the server:
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  32769  status
    100024    1   tcp  51226  status
    100021    1   udp  32770  nlockmgr
    100021    3   udp  32770  nlockmgr
    100021    4   udp  32770  nlockmgr
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100021    1   tcp  39360  nlockmgr
    100021    3   tcp  39360  nlockmgr
    100021    4   tcp  39360  nlockmgr
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100005    1   udp    739  mountd
    100005    1   tcp    742  mountd
    100005    2   udp    739  mountd
    100005    2   tcp    742  mountd
    100005    3   udp    739  mountd
    100005    3   tcp    742  mountd

netstat --listen --inet contains these lines:
tcp        0      0 0.0.0.0:2049            0.0.0.0:*               LISTEN
udp        0      0 0.0.0.0:2049            0.0.0.0:*

Steps to reproduce:
Export NFS:
/var/export 192.168.0.2(fsid=0,rw,no_root_squash,sync)
/var/export/portage 192.168.0.2(rw,nohide,no_root_squash,sync)
Try to mount it via above fstab-line:
mount /usr/portage
Comment 1 Trond Myklebust 2006-10-06 10:23:58 UTC
Created attachment 9174 [details]
NFSv4: Fix thinko in fs/nfs/super.c

This patch ought to fix it for you...

BTW: Why _is_ the gentoo mount command setting a 0 port value?
Comment 2 Torsten Kaiser 2006-10-06 10:50:09 UTC
I can confirm, that the attached patch fixes the problem for me.

Gentoo may set the port to 0 because I did not specified a portnumber in the 
fstab.

I have the following mount programm installed:
sys-apps/util-linux-2.12r-r4

Gentoo patches the original sources from 
http://www.kernel.org/pub/linux/utils/util-linux/ with support for Loop-AES and 
several other patches. One also enabled NFSv4.
You can find the NFSv4-Patch here:
http://sources.gentoo.org/viewcvs.py/gentoo-x86/sys-apps/util-linux/files/util-linux-2.12i-nfsv4.patch?rev=1.1&view=markup

But as far I can see, it should default to 2049...
Comment 3 Torsten Kaiser 2006-10-06 10:56:33 UTC
Ah.. I found the error in the Gentoo-Patch:

The port defaults correctly to 2049 and will also be read from the commandline 
options. But the patch seems to fail to fill the 'int port'-variable into 
the 'struct nfs4_mount_data' that will be passed to the kernel.
Comment 4 Adrian Bunk 2006-10-20 13:48:04 UTC
The fix is now in Linus' tree.