|Summary:||NFS oops in 2.6.26rc4|
|Product:||File System||Reporter:||Rafael J. Wysocki (rjw)|
|Component:||NFS||Assignee:||Trond Myklebust (trondmy)|
|Severity:||normal||CC:||bunk, davej, yaneti|
|Bug Depends on:|
NFS: Fix filehandle size comparisons in the mount code
NFS: Reduce the NFS mount code stack usage.
Description Rafael J. Wysocki 2008-05-29 14:53:00 UTC
Subject : NFS oops in 2.6.26rc4 Submitter : Dave Jones <firstname.lastname@example.org> Date : 2008-05-27 19:04 References : http://marc.info/?l=linux-kernel&m=121191548915522&w=4 This entry is being used for tracking a regression from 2.6.25. Please don't close it until the problem is fixed in the mainline.
Comment 1 Yanko Kaneti 2008-06-17 02:10:15 UTC
Can also reproduce this at will.
Comment 2 Trond Myklebust 2008-06-19 12:33:18 UTC
Created attachment 16552 [details] NFS: Fix filehandle size comparisons in the mount code
Comment 3 Trond Myklebust 2008-06-19 12:34:15 UTC
Does the above patch help in any way?
Comment 4 Yanko Kaneti 2008-06-19 14:23:34 UTC
Created attachment 16553 [details] crash log > Does the above patch help in any way? Not here. If its of any help, the mount is tried against a server that's returning access denied because of configuration and thats whats returned with the noncrashing stable kernels.
Comment 5 Trond Myklebust 2008-06-19 15:19:40 UTC
Can you try getting a dump of the kernel mount attempt using the command echo 1024 >/proc/sys/sunrpc/nfs_debug
Comment 6 Yanko Kaneti 2008-06-19 16:12:02 UTC
Nothing more in the kernel log. This in syslog kernel:<4>NFS: nfs mount opts='addr=xxx.xxx.xxx.xxx' kernel:<4>NFS: parsing nfs mount option 'addr=xxx.xxx.xxx.xxx' kernel:<4>NFS: sending MNT request for xxx.xxx.xxx.xxx:/path/path kernel:<4>NFS: MNT server returned result 13
Comment 7 Trond Myklebust 2008-06-19 16:43:45 UTC
So there is no evidence of the line "NFS: unable to mount server xxxx.xxxx.xxxx.xxxx, error -13" in the logs or when you do 'dmesg'?
Comment 8 Yanko Kaneti 2008-06-19 18:10:19 UTC
I'v tried this a number of times and it varies. Sometimes the machine is dead two lines into printing the trace, without any NFS debug, other times it prints a the nfs debug including the "NFS: unable to mount server xxxx.xxxx.xxxx.xxxx, error -13" then continues directly after that with the whole trace and stays alive enought to print dmesg and tail the syslog. In the case it stays alive syslog has the nfs debug but without the unable to mount line. Sorry if this is a bit confusing.
Comment 9 Trond Myklebust 2008-06-20 07:01:26 UTC
OK, so the mount code is propagating the error correctly, but somehow it is getting lost on the way back to nfs_get_sb() in your kernel, but not apparently in mine. The code itself seems pretty straightforward, since both nfs_try_mount(), and nfs_validate_mount_data() will immediately propagate errors back to the caller. I can only think of two possibilities: - stack corruption or overflow - a really bad gcc bug. Let's test out the stack corruption hypothesis first...
Comment 10 Trond Myklebust 2008-06-20 07:02:20 UTC
Created attachment 16561 [details] NFS: Reduce the NFS mount code stack usage.
Comment 11 Yanko Kaneti 2008-06-20 08:53:04 UTC
This one fixed it for me. # mount -t nfs xxx.xxx.xxx.xxx:/path/path t mount.nfs: access denied by server while mounting xxx.xxx.xxx.xxx:/path/path
Comment 12 Trond Myklebust 2008-06-20 09:04:27 UTC
Cool! davej, could you try this patch out and see if it fixes your troubles too?
Comment 13 Adrian Bunk 2008-06-24 00:55:23 UTC
Dave confirmed that it has been fixed by commit 33852a1f2bb014e4047a844556c0d76a2f790c37