Bug 10826
Summary: | NFS oops in 2.6.26rc4 | ||
---|---|---|---|
Product: | File System | Reporter: | Rafael J. Wysocki (rjw) |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | bunk, davej, yaneti |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.26-rc4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 10492 | ||
Attachments: |
NFS: Fix filehandle size comparisons in the mount code
crash log NFS: Reduce the NFS mount code stack usage. |
Description
Rafael J. Wysocki
2008-05-29 14:53:00 UTC
Can also reproduce this at will. Created attachment 16552 [details]
NFS: Fix filehandle size comparisons in the mount code
Does the above patch help in any way? Created attachment 16553 [details] crash log > Does the above patch help in any way? Not here. If its of any help, the mount is tried against a server that's returning access denied because of configuration and thats whats returned with the noncrashing stable kernels. Can you try getting a dump of the kernel mount attempt using the command echo 1024 >/proc/sys/sunrpc/nfs_debug Nothing more in the kernel log. This in syslog kernel:<4>NFS: nfs mount opts='addr=xxx.xxx.xxx.xxx' kernel:<4>NFS: parsing nfs mount option 'addr=xxx.xxx.xxx.xxx' kernel:<4>NFS: sending MNT request for xxx.xxx.xxx.xxx:/path/path kernel:<4>NFS: MNT server returned result 13 So there is no evidence of the line "NFS: unable to mount server xxxx.xxxx.xxxx.xxxx, error -13" in the logs or when you do 'dmesg'? I'v tried this a number of times and it varies. Sometimes the machine is dead two lines into printing the trace, without any NFS debug, other times it prints a the nfs debug including the "NFS: unable to mount server xxxx.xxxx.xxxx.xxxx, error -13" then continues directly after that with the whole trace and stays alive enought to print dmesg and tail the syslog. In the case it stays alive syslog has the nfs debug but without the unable to mount line. Sorry if this is a bit confusing. OK, so the mount code is propagating the error correctly, but somehow it is getting lost on the way back to nfs_get_sb() in your kernel, but not apparently in mine. The code itself seems pretty straightforward, since both nfs_try_mount(), and nfs_validate_mount_data() will immediately propagate errors back to the caller. I can only think of two possibilities: - stack corruption or overflow - a really bad gcc bug. Let's test out the stack corruption hypothesis first... Created attachment 16561 [details]
NFS: Reduce the NFS mount code stack usage.
This one fixed it for me. # mount -t nfs xxx.xxx.xxx.xxx:/path/path t mount.nfs: access denied by server while mounting xxx.xxx.xxx.xxx:/path/path Cool! davej, could you try this patch out and see if it fixes your troubles too? Dave confirmed that it has been fixed by commit 33852a1f2bb014e4047a844556c0d76a2f790c37 |