Bug 25792

Summary: Kernel panic in __mark_inode_dirty (fs-writeback.c: 978)
Product: File System Reporter: Karthick A R (karthick.linuxdreamer)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED CODE_FIX    
Severity: high CC: alan, karthick.linuxdreamer
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36-rc8 Subsystem:
Regression: No Bisected commit-id:
Attachments: Crash debug logs related to the kernel panic

Description Karthick A R 2010-12-29 00:10:24 UTC
Created attachment 41822 [details]
Crash debug logs related to the kernel panic

The panic and all associated crash debug logs are attached along with the dump of the inode, superblock structure and the address_space mapping at the time of the panic from a crash session on the vmcore.

It was first reproduced by my daughter. The trick I am told is to pull the USB cable while the daddy is at work :-)

Its consistently reproducible when I run our product stack with our applications sand-boxed to a USB drive (mount-binded) and pulling the USB drive while they are running. The panic is a result of the inodes bdi (backing device info) pointer going NULL while the inode state is I_DIRTY. The superblocks bdi pointer for the "ext4" superblock type is NULL. The panic in the attachment shows that the kernel was trying to resolve a write-protected page fault on a mmapped page with the USB as the backing storage. There appears to be a race with sd_remove resulting in ext4 superblock bdi being invalidated with a parallel write to the mmapped page in the backing store. Wondering how the ext4 superblocks bdi was invalidated/NULL while the inode's being dirtied.

An effort to reproduce the problem outside our product stack where it is consistently reproducible is not successful yet: https://gist.github.com/757928
Though I did hit UNINTERRUPTIBLE task hang warnings when I run the above test-code and remove the USB drive while the writes were being fired from the test. 
Some of the child processes that write to the disk remained in "D" or uninterruptible state FOREVER after the USB device was forcefully ejected. 
They appear to be stuck in ext4 journalled write. The backtrace for all the uninterruptible tasks are also part of the crash debug attachment.

I believe this issue isn't fixed in 2.6.36 even though I am running a slightly old 2.6.36-rc8 since I don't see any fixes in ext4 or fs-writeback related to the above panic.

Since I am always able to reproduce the panic with our product stack standboxed to the USB device, I can easily verify the patches related to this issue.
Regarding the ext4 uninterruptible task lockup/hangs, its easily reproducible with the test-code in my github that I had mentioned above: https://gist.github.com/757928

I believe this is a major issue considering the backtrace, crash debug logs and the probable race symptoms with sd_remove and ext4 writeback mentioned above.