Subject : NFS oops in 2.6.26rc4
Submitter : Dave Jones <email@example.com>
Date : 2008-05-27 19:04
References : http://marc.info/?l=linux-kernel&m=121191548915522&w=4
This entry is being used for tracking a regression from 2.6.25. Please don't
close it until the problem is fixed in the mainline.
Can also reproduce this at will.
Created attachment 16552 [details]
NFS: Fix filehandle size comparisons in the mount code
Does the above patch help in any way?
Created attachment 16553 [details]
> Does the above patch help in any way?
Not here. If its of any help, the mount is tried against a server that's returning access denied because of configuration and thats whats returned with the noncrashing stable kernels.
Can you try getting a dump of the kernel mount attempt using the command
echo 1024 >/proc/sys/sunrpc/nfs_debug
Nothing more in the kernel log. This in syslog
kernel:<4>NFS: nfs mount opts='addr=xxx.xxx.xxx.xxx'
kernel:<4>NFS: parsing nfs mount option 'addr=xxx.xxx.xxx.xxx'
kernel:<4>NFS: sending MNT request for xxx.xxx.xxx.xxx:/path/path
kernel:<4>NFS: MNT server returned result 13
So there is no evidence of the line
"NFS: unable to mount server xxxx.xxxx.xxxx.xxxx, error -13"
in the logs or when you do 'dmesg'?
I'v tried this a number of times and it varies. Sometimes the machine is dead two lines into printing the trace, without any NFS debug, other times it prints a the nfs debug including the "NFS: unable to mount server xxxx.xxxx.xxxx.xxxx, error -13" then continues directly after that with the whole trace and stays alive enought to print dmesg and tail the syslog. In the case it stays alive syslog has the nfs debug but without the unable to mount line. Sorry if this is a bit confusing.
OK, so the mount code is propagating the error correctly, but somehow it
is getting lost on the way back to nfs_get_sb() in your kernel, but not
apparently in mine. The code itself seems pretty straightforward, since
both nfs_try_mount(), and nfs_validate_mount_data() will immediately
propagate errors back to the caller.
I can only think of two possibilities:
- stack corruption or overflow
- a really bad gcc bug.
Let's test out the stack corruption hypothesis first...
Created attachment 16561 [details]
NFS: Reduce the NFS mount code stack usage.
This one fixed it for me.
# mount -t nfs xxx.xxx.xxx.xxx:/path/path t
mount.nfs: access denied by server while mounting xxx.xxx.xxx.xxx:/path/path
Cool! davej, could you try this patch out and see if it fixes your
Dave confirmed that it has been fixed by commit 33852a1f2bb014e4047a844556c0d76a2f790c37