Bug 10826 - NFS oops in 2.6.26rc4
Summary: NFS oops in 2.6.26rc4
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks: 10492
  Show dependency tree
 
Reported: 2008-05-29 14:53 UTC by Rafael J. Wysocki
Modified: 2008-06-24 00:55 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.26-rc4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
NFS: Fix filehandle size comparisons in the mount code (1.61 KB, patch)
2008-06-19 12:33 UTC, Trond Myklebust
Details | Diff
crash log (3.48 KB, text/plain)
2008-06-19 14:23 UTC, Yanko Kaneti
Details
NFS: Reduce the NFS mount code stack usage. (4.89 KB, patch)
2008-06-20 07:02 UTC, Trond Myklebust
Details | Diff

Description Rafael J. Wysocki 2008-05-29 14:53:00 UTC
Subject    : NFS oops in 2.6.26rc4
Submitter  : Dave Jones <davej@redhat.com>
Date       : 2008-05-27 19:04
References : http://marc.info/?l=linux-kernel&m=121191548915522&w=4

This entry is being used for tracking a regression from 2.6.25.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Yanko Kaneti 2008-06-17 02:10:15 UTC
Can also reproduce this at will.
Comment 2 Trond Myklebust 2008-06-19 12:33:18 UTC
Created attachment 16552 [details]
NFS: Fix filehandle size comparisons in the mount code
Comment 3 Trond Myklebust 2008-06-19 12:34:15 UTC
Does the above patch help in any way?
Comment 4 Yanko Kaneti 2008-06-19 14:23:34 UTC
Created attachment 16553 [details]
crash log

> Does the above patch help in any way?

Not here. If its of any help, the mount is tried against a server that's returning access denied because of configuration and thats whats returned with the noncrashing stable kernels.
Comment 5 Trond Myklebust 2008-06-19 15:19:40 UTC
Can you try getting a dump of the kernel mount attempt using the command

     echo 1024 >/proc/sys/sunrpc/nfs_debug
Comment 6 Yanko Kaneti 2008-06-19 16:12:02 UTC
Nothing more in the kernel log. This in syslog

kernel:<4>NFS: nfs mount opts='addr=xxx.xxx.xxx.xxx'
kernel:<4>NFS:   parsing nfs mount option 'addr=xxx.xxx.xxx.xxx'
kernel:<4>NFS: sending MNT request for xxx.xxx.xxx.xxx:/path/path
kernel:<4>NFS: MNT server returned result 13
Comment 7 Trond Myklebust 2008-06-19 16:43:45 UTC
So there is no evidence of the line

"NFS: unable to mount server xxxx.xxxx.xxxx.xxxx, error -13"

in the logs or when you do 'dmesg'?
Comment 8 Yanko Kaneti 2008-06-19 18:10:19 UTC
I'v tried this a number of times and it varies. Sometimes the machine is dead two lines into printing the trace, without any NFS debug, other times it prints a the nfs debug including the "NFS: unable to mount server xxxx.xxxx.xxxx.xxxx, error -13" then continues directly after that with the whole trace and stays alive enought to print dmesg and tail the syslog. In the case it stays alive syslog has the nfs debug but without the unable to mount line.  Sorry if this is a bit confusing.
Comment 9 Trond Myklebust 2008-06-20 07:01:26 UTC
OK, so the mount code is propagating the error correctly, but somehow it
is getting lost on the way back to nfs_get_sb() in your kernel, but not
apparently in mine. The code itself seems pretty straightforward, since
both nfs_try_mount(), and nfs_validate_mount_data() will immediately
propagate errors back to the caller.

I can only think of two possibilities:
  - stack corruption or overflow
  - a really bad gcc bug.

Let's test out the stack corruption hypothesis first...
Comment 10 Trond Myklebust 2008-06-20 07:02:20 UTC
Created attachment 16561 [details]
NFS: Reduce the NFS mount code stack usage.
Comment 11 Yanko Kaneti 2008-06-20 08:53:04 UTC
This one fixed it for me.

# mount -t nfs xxx.xxx.xxx.xxx:/path/path t
mount.nfs: access denied by server while mounting xxx.xxx.xxx.xxx:/path/path
Comment 12 Trond Myklebust 2008-06-20 09:04:27 UTC
Cool! davej, could you try this patch out and see if it fixes your
troubles too?
Comment 13 Adrian Bunk 2008-06-24 00:55:23 UTC
Dave confirmed that it has been fixed by commit 33852a1f2bb014e4047a844556c0d76a2f790c37

Note You need to log in before you can comment on or make changes to this bug.