Bug 16306

Summary: 2.6.35-rc3 BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 cifs_show_options
Product: File System Reporter: Maciej Rutecki (maciej.rutecki)
Component: CIFSAssignee: fs_cifs (fs_cifs)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: andrew.hendry, jlayton, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35-rc3 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    

Description Maciej Rutecki 2010-06-27 19:57:01 UTC
Subject    : 2.6.35-rc3 BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 cifs_show_options
Submitter  : Andrew Hendry <andrew.hendry@gmail.com>
Date       : 2010-06-26 10:46
Message-ID : AANLkTilhTrEBYZd4HxeXQk8B6-yV8rCJ2C0jXsEREgIR@mail.gmail.com
References : http://marc.info/?l=linux-kernel&m=127754922110501&w=2

This entry is being used for tracking a regression from 2.6.34.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Jeff Layton 2010-06-29 12:43:31 UTC
It seems likely that this is a situation where something is racing with a umount.

One possibility is that the reporter's machine is mounting and unmounting CIFS filesystems on suspend/resume -- seems like there was a bug report a while back where SuSE's dhcp client was doing that...

Andrew, does your machine unmount and remount the CIFS filesystems during a suspend/resume?
Comment 2 andrew.hendry 2010-07-06 12:41:16 UTC
Hi Jeff,

Yes my system does that, I have some mount scripts in if-up.d and if-down.d
I put them there a few months ago to work around another issue.

Have still only seen this error once, hard to reproduce.

ANdrew.
Comment 3 Jeff Layton 2010-07-06 13:04:46 UTC
Interesting...how exactly is the umount done? Does it set any flags (-f for instance?)
Comment 4 andrew.hendry 2010-07-06 22:02:02 UTC
No extra flags on the umounts

root@jaunty:/etc/network# cat if-down.d/mounts 
#!/bin/sh
umount /media/NAS-Videos
umount /media/NAS-Music
umount /media/NAS-Photos 
umount /media/NAS-Public

Mounts are also fairly standard.

mount -t cifs -o rw,username=xxx,password=xxx,uid=1000,gid=1000 //192.168.0.177/Videos /media/NAS-Videos
mount -t cifs -o rw,username=xxx,password=xxx,uid=1000,gid=1000 //192.168.0.177/Music /media/NAS-Music
mount -t cifs -o rw,username=xxx,password=xxx,uid=1000,gid=1000 //192.168.0.177/Photos /media/NAS-Photos 
mount -t cifs -o rw,user=,pass=,uid=1000,gid=1000  //192.168.0.177/Public /media/NAS-Public
Comment 5 Jeff Layton 2010-07-07 14:03:56 UTC
Ok, thanks. As best I can tell, it looks like a umount was allowed to occur while something was reading from /proc/pid/mountinfo. The server pointer that gets passed to the function that oopsed here gets zeroed out as the cifsd thread goes down. That should only occur when the last mount to the server has been unmounted.

IOW, that pointer should only get zeroed out after the mount has been detached from the tree and is no longer in the /proc/pid/mountinfo list. Is it possible that the iterator for /proc/pid/mountinfo is insufficiently protected against removal of list entries?
Comment 6 Rafael J. Wysocki 2010-07-08 23:27:56 UTC
Handled-By : Jeff Layton <jlayton@redhat.com>
Comment 7 Rafael J. Wysocki 2010-07-09 20:59:51 UTC
On Friday, July 09, 2010, Andrew Hendry wrote:
> It might not be a regression, and so far i haven't been able to reproduce.
> Its seems to be related to suspend/resume interface up/down and
> mount/unmount.
> Also potentially with cifs waiting for a NAS to spinup to complete mount.
> 
> On Fri, Jul 9, 2010 at 10:49 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > On Fri,  9 Jul 2010 01:41:39 +0200 (CEST)
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >
> >> This message has been generated automatically as a part of a summary
> report
> >> of recent regressions.
> >>
> >> The following bug entry is on the current list of known regressions
> >> from 2.6.34.  Please verify if it still should be listed and let the
> tracking team
> >> know (either way).
> >>
> >>
> >> Bug-Entry     : http://bugzilla.kernel.org/show_bug.cgi?id=16306
> >> Subject               : 2.6.35-rc3 BUG: unable to handle kernel NULL
> pointer dereference at 0000000000000048 cifs_show_options
> >> Submitter     : Andrew Hendry <andrew.hendry@gmail.com>
> >> Date          : 2010-06-26 10:46 (13 days old)
> >
> > Not sure if this is a new bug or not...
> >
> > I don't think this is really a CIFS bug, per-se. It seems like the
> > problem may be that the iterator for /proc/pid/mountinfo is not
> > sufficiently protected against removal from the vfsmount list.
> >
> > Filesystems don't seem to be expected to do any locking in their
> > show_options routines though so I'm guessing that something is borked
> > in the generic vfs layer.
> >
> > Either that or this is some sort of generic mem corruption? I'm open to
> > input from others that have a better grasp of this stuff at the VFS
> > layer...