Bug 23232 - kernel hang on insert ext4 usb key
Summary: kernel hang on insert ext4 usb key
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-18 16:45 UTC by shal
Modified: 2012-08-14 13:07 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.37-rc2-git3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
error messages of the kernel (2.96 KB, text/plain)
2010-11-18 16:45 UTC, shal
Details
i/o read error on usb key (2.50 KB, text/plain)
2010-11-19 09:38 UTC, shal
Details

Description shal 2010-11-18 16:45:39 UTC
Created attachment 37592 [details]
error messages of the kernel

Hello,

On my 2.6.37-rc2-git3 (with sched_autogroup patch), my computer crash when I insert an USB key formated in ext4. No problem if the  usb key is in vfat.

When the bug occured, I can used magic sys key but the sync works.
Comment 1 Theodore Tso 2010-11-19 03:00:59 UTC
Does it crash if you boot some earlier version of the kernel?

Can you send an image of the USB key?

It looks like it's dieing very early the VFS layer, in vfs_kern_mount.  It's not clear it's called into ext4 code at all when it gives the oops message.
Comment 2 shal 2010-11-19 09:38:50 UTC
Created attachment 37642 [details]
i/o read error on usb key
Comment 3 shal 2010-11-19 09:41:15 UTC
I have performed more test.

I have find some interesting detail:
The usb key works but after the problem of oops, the usb key has i/o sector read problem (see attachment) , but the kernel doesn't oops.

On another computer (a 2.6.34) I have successfully performed a fsck, the system is clean.
On the 2.6.37-rc2-git3, on insertion of the usb key the system hang. 
I put the key on the 2.6.34 computer and the filesystem is a lot corrupted.
I do a fsck -y  and thousand of error a correction and all files are erased....

I suspect that when the kernel crash, it corrupt the usb key.

I found a consistent problem on my USB key (Flash voyager GT of Corsair, 16gb), there is one partition W95 VFAT and I put a ext4 partition.
It's perhaps the problem but the kernel shouldn't crash in this case....
This key works without any problem on other system (linux or Windows).

With another ext4 usb key no problem.
Comment 4 Lukas Czerner 2010-11-19 14:02:31 UTC
This might be related to a bug fixed with this patch. Can you try that to confirm that it solve the issue ?

http://www.spinics.net/lists/linux-ext4/msg21743.html

Thanks!
-Lukas
Comment 5 shal 2010-11-19 15:30:27 UTC
Thanks,

This seems to correct the kernel crash problem.

But during my test of insertion/removal the usb key , I have to performed several time "fsck.ext4 -y"  without thousand of error corrected...
with this kind of error:
EXT4-fs (sdf1): ext4_check_descriptors: Checksum for group 0 failed (12243!=0)
EXT4-fs (sdf1): group descriptors corrupted!
Comment 6 Eric Sandeen 2010-11-19 15:31:41 UTC
Any chance you're pulling the key out without unmounting it first?
Comment 7 shal 2010-11-19 15:41:40 UTC
I have done test without umount the key properly and sometime ugly.

The kernel crash problem is corrected by the patch.

The corruption of the filesystem is strange , I never seen this kind of corruption (e.g. with lost of all data after fsck, the fsck takes 5 minutes, even I have only insert/retrieve the key without performed any write on the key).

One the same suite of tests with another ext4 usb key of only 1gb , there is no problem in any case.

I am not sure that problems are linked
Comment 8 Eric Sandeen 2010-11-19 15:45:48 UTC
It may well be related to the firmware on the usb key, I don't think there's a lot we can do on the ext4/kernel side.  (other than to "not crash" when it's terribly corrupted)
Comment 9 Theodore Tso 2010-11-19 16:02:53 UTC
I'm not convinced we even got into the ext4_fill_super(); if we did there should have been at least some ext4 printk's.   The OOPS is in vfs_kern_mount().

I've looked through ext4_fill_super(), and if vfs_kern_mount() had gotten as far as calling into ext4_fill_super(), I can't find any code path other than failing the first two memory allocations that wouldn't have resulted in some kind of kernel printk --- which wasn't in the oops display.

So I strongly suspect the failure was before vfs_kern_mount() calling ext4_mount(), or in mount_bdev() --- since all ext4_mount() does is call mount_bdev() passing in ext4_fill_super as a callback function.

Can the oops be replicated at this point?   If so, we can try instrumenting the kernel, but this strongly smells like a hardware problem and a failure in the generic code before we drop into the ext4 mount code in ext4_fill_super().

-- Ted

Note You need to log in before you can comment on or make changes to this bug.