Bug 38972

Summary: Kernel panic when syncing while cifs is mounted
Product: File System Reporter: farbing
Component: CIFSAssignee: Steve French (sfrench)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, haegar, jlayton, maciej.rutecki, rjw, sfrench
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0.0-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 36912    
Attachments: Screenshot of kernel panic
trace when touching a file
patch -- move bdi_setup_and_register outside CONFIG_CIFS_DFS_UPCALL

Description farbing 2011-07-08 15:39:01 UTC
The kernel panics every time a sync from the command line is executed.

I have bisected it to commit dd8544661947ad6d8d87b3c9d4333bfa1583d1bc (take bdi setup/destruction into cifs_mount/cifs_umount)

I'm mounting the cifs fs with the following line from fstab:

//fritz.box/FRITZ.NAS/Verbatim-STORENGO-01      /mnt/fritz      cifs    credentials=/path/to/cred/file,user,gid=disk,nounix,file_mode=0664,dir_mode=0775,noauto,comment=systemd.automount 0 0

comment=systemd.automount means that it will be mounted on first access through an autofs mount point.

Attached is a screenshot of the kernel panic.

This is probably related to the bug report here: https://lkml.org/lkml/2011/7/4/289

Thanks
Comment 1 farbing 2011-07-08 15:40:43 UTC
Created attachment 64962 [details]
Screenshot of kernel panic
Comment 2 Jeff Layton 2011-07-08 16:54:45 UTC
Like Suresh, I'm not able to reproduce this either...

Assuming your offsets line up with the ones in my kernel (which is probably the case), this crashed here:

(gdb) list *(bdi_queue_work+0x40)
0xffffffff8115666e is in bdi_queue_work (include/linux/spinlock.h:290).
288	static inline void spin_lock_bh(spinlock_t *lock)
289	{
290		raw_spin_lock_bh(&lock->rlock);
291	}

...that would probably indicate that bdi.lock was NULL, which would be the case if the bdi_setup_and_register never happened. I don't see how that could occur though -- Al's patch is pretty straightforward. Does the mount otherwise work before you issue a sync?
Comment 3 Jeff Layton 2011-07-08 16:59:48 UTC
Sorry... that would indicate that bdi.wb_lock
Comment 4 Rafael J. Wysocki 2011-07-08 17:32:02 UTC
First-Bad-Commit : dd8544661947ad6d8d87b3c9d4333bfa1583d1bc
Comment 5 Jeff Layton 2011-07-08 17:46:49 UTC
Yeah, I'm just not seeing a bug here.

My initial thought was that maybe we have a cifs_sb that didn't have bdi_setup_and_register run on it, but I don't see how that could happen. Perhaps it's some sort of more generic memory corruption then? 

One (semi-remote) possibility is that it's related to some other mount fixes that I sent to Steve this week:

http://article.gmane.org/gmane.linux.kernel.cifs/3673

...and...

http://article.gmane.org/gmane.linux.kernel.cifs/3687

...I'm not sure what he's waiting on wrt to pushing them, but it may be worthwhile to test those before we dig into this more deeply.
Comment 6 Steve French 2011-07-08 19:16:19 UTC
Are you comfortable adding patches and rebuilding cifs.ko?  If so, may be able to add some debug code around this to isolate further since I am also having problems reproducing this (although in my case due to other problems I am hitting in radeon and virtualbox drivers on 3.0-rc)
Comment 7 farbing 2011-07-08 19:18:14 UTC
Created attachment 65012 [details]
trace when touching a file

I just tested it with Steve French's tree, but unfortunately it still panics when syncing.

It also panics when I do other write operations (but strangely not on rm or mkdir/rmdir). But when I touch a file I also get a panic.

Reading seems to work fine.
Comment 8 farbing 2011-07-08 19:28:55 UTC
(In reply to comment #6)
> Are you comfortable adding patches and rebuilding cifs.ko?  If so, may be
> able
> to add some debug code around this to isolate further since I am also having
> problems reproducing this (although in my case due to other problems I am
> hitting in radeon and virtualbox drivers on 3.0-rc)

Yes, I can try some patches. (I have cifs compiled in, maybe that's relevant?)
Comment 9 Jeff Layton 2011-07-08 19:44:11 UTC
Interesting -- sounds like the same issue that Adam Nielsen reported to the list yesterday. I've not been able to reproduce that either:

http://article.gmane.org/gmane.linux.kernel.cifs/3699

...I doubt that the module vs. built-in matters here, but at this point anything is possible.
Comment 10 Jeff Layton 2011-07-08 19:44:47 UTC
Reassigning to Steve since he's working on a debug patch...
Comment 11 farbing 2011-07-09 12:45:56 UTC
A workaround seems to be to enable CONFIG_CIFS_DFS_UPCALL. After enabling it, I can sync without panics.

Maybe the problem is that the call to bdi_setup_and_register is inside a #ifdef CONFIG_CIFS_DFS_UPCALL (connect.c:3006), which means without CONFIG_CIFS_DFS_UPCALL it is never called.
Comment 12 Jeff Layton 2011-07-09 13:02:17 UTC
Created attachment 65082 [details]
patch -- move bdi_setup_and_register outside CONFIG_CIFS_DFS_UPCALL

<forehead slap>

Well spotted. That's almost certainly the bug. This patch ought to fix it. Can you test it out?
Comment 13 Sven-Haegar Koch 2011-07-09 13:36:02 UTC
YES!

The patch in #12 fixes the problem for me, all cifs operations work again!

Thanks a lot!
Comment 14 farbing 2011-07-09 13:43:36 UTC
Yep, I can also confirm that #12 fixes the problem.

Thanks!
Comment 15 Jeff Layton 2011-07-09 16:22:43 UTC
Thanks for testing it. Patch sent to Steve F. and linux-cifs@vger.kernel.org. It should make 3.0, assuming Steve pushes it to Linus soon.
Comment 16 Rafael J. Wysocki 2011-07-09 20:02:50 UTC
Handled-By : Jeff Layton <jlayton@redhat.com>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=65082
Comment 17 Rafael J. Wysocki 2011-07-09 20:03:57 UTC
*** Bug 39042 has been marked as a duplicate of this bug. ***
Comment 18 Rafael J. Wysocki 2011-08-14 19:07:29 UTC
Fixed by commit 20547490c12b0ee3d32152b85e9f9bd183aa7224 .