Bug 6628

Summary: kernel BUG at fs/ext3/namei.c:383
Product: File System Reporter: Georg Funke (schorsch.funke)
Component: ext3Assignee: Theodore Tso (tytso)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, duaneg, srinivasa, tytso
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16.18 Subsystem:
Regression: --- Bisected commit-id:

Description Georg Funke 2006-05-30 21:48:04 UTC
Most recent kernel where this bug did not occur:
Distribution: Ubuntu 6.06
Hardware Environment: Mobile AMD Sempron(tm) Processor 2800+
Software Environment: Gnu C                  4.0.3
Gnu make               3.81beta4
binutils               2.16.91
util-linux             2.12r
mount                  2.12r
module-init-tools      3.2.2
e2fsprogs              1.38
Linux C Library        2.3.6
Dynamic linker (ldd)   2.3.6
Procps                 3.2.6
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               5.93
udev                   079
Modules Loaded         sd_mod usb_storage scsi_mod binfmt_misc nls_iso8859_15
nls_cp850 vfat fat ipv6 lp af_packet usbhid uhci_hcd ehci_hcd usbcore parport_pc
parport

Problem Description:kernel BUG at fs/ext3/namei.c:383!
invalid opcode: 0000 [#1]
Modules linked in: sd_mod usb_storage scsi_mod binfmt_misc nls_iso8859_15 nls_cp
850 vfat fat ipv6 lp af_packet usbhid uhci_hcd ehci_hcd usbcore parport_pc parpo rt
CPU:    0
EIP:    0060:[<c0198db8>]    Not tainted VLI
EFLAGS: 00010292   (2.6.16.18 #38)
EIP is at dx_probe+0xf8/0x310
eax: 00000081   ebx: ce102000   ecx: c03430fc   edx: c03430fc
esi: ce102018   edi: cdf281e4   ebp: c179d6cc   esp: ce0c4ce4
ds: 007b   es: 007b   ss: 0068
Process trashapplet (pid: 6401, threadinfo=ce0c4000 task=ce683030)
Stack: <0>c0306888 c02eca67 c0304b22 0000017f c0306ebc ce0c4dd4 00000000 402f755 4
       00000000 00000000 c0dd4000 ce0c4de4 ce0c4dbc ce0c4e6c c019a173 ce0c4dbc
       ce0c4de4 00000000 00000000 00000000 00000000 00000000 d2ecb190 ce0c4db4
Call Trace:
 [<c019a173>] ext3_find_entry+0x2b3/0x610
 [<c0163273>] do_lookup+0x53/0x150
 [<c016bb53>] dput+0x93/0x130
 [<c019adea>] ext3_lookup+0x3a/0xa0
 [<c0163345>] do_lookup+0x125/0x150
 [<c0163a75>] __link_path_walk+0x705/0xd30
 [<c01640ef>] link_path_walk+0x4f/0xe0
 [<c0164546>] do_path_lookup+0xe6/0x210
 [<c0162043>] getname+0xb3/0xe0
 [<c0164c7b>] __user_walk_fd+0x3b/0x60
 [<c015e23f>] vfs_lstat_fd+0x1f/0x50
 [<c015ea5f>] sys_lstat64+0xf/0x30
 [<c01141ed>] handle_vm86_trap+0x2d/0xf0
 [<c0102d13>] sysenter_past_esp+0x54/0x75
Code: 44 24 10 bc 6e 30 c0 c7 44 24 0c 7f 01 00 00 c7 44 24 08 22 4b 30 c0 c7 44
 24 04 67 ca 2e c0 c7 04 24 88 68 30 c0 e8 f8 25 f8 ff <0f> 0b 7f 01 22 4b 30 c0
 8b 44 24 3c c7 44 24 20 00 00 00 00 89


Steps to reproduce: Plug in a IDE Drive with a USB 2.0 Interface
Comment 1 Andrew Morton 2006-05-30 23:34:16 UTC
Ted, it blew up in the htree code:

        assert(dx_get_limit(entries) == dx_root_limit(dir,
                                                      root->info.info_length));

Georg, is this reproducible?   It sounds like it is..
Comment 2 Georg Funke 2006-05-31 03:54:48 UTC
Yes, it is reproducible. I have this bug also under Linux 2.6.15.6. Evertime i
plug in my external IDE drive in the USB slot and then i want to open the disk.
I must reboot to work normal with my Laptop.
Comment 3 Georg Funke 2006-05-31 04:08:36 UTC
Yes, it is reproducible. I have this bug also under Linux 2.6.15.6. Evertime i
plug in my external IDE drive in the USB slot and then i want to open the disk.
I must reboot to work normal with my Laptop.
Comment 4 Georg Funke 2006-05-31 04:13:10 UTC
It is also reproducible under Linux 2.6.17-rc5. I'm not using the kernel
distributed with Ubuntu 6.06. 
Comment 5 Theodore Tso 2006-05-31 04:31:39 UTC
How big is the filesystem?  If you haven't run e2fsck yet, please DO NOT. 
Instead, if you are willing please send me the output of:

e2image -r /dev/hda1 - | bzip2 > hda1.e2i.bz2

But substitute hda1 with sda4, or whatever device name you are using for the
external IDE disk that you are accessing via USB....

Please see the man page for e2image for more details, but what you will be
sending me is the filesystem metadata blocks only.  I will not see the contents
of any of your files, but I will see the filenames of your files.   In some
cases you could use the -s option to e2image to scramble the filenames, but in
this case, I must have the directories undisturbed in order to debug an htree
problem.

If you can reproduce this at will, this is great.   By sending me the e2image
file, I should be able to reproduce it on my system as well.   (Assuming it
isn't a hardware error, of course.  I assume you've checked /var/log/messages
and/or /var/log/kern.log and there are no error messages from the hard disk driver?)

Thanks!!
Comment 6 Georg Funke 2006-05-31 04:59:40 UTC
Hi,
i can't do anything with this disk. I've checked the logs and there are mo
messages from the hard disk driver.
e2image  don' work too. In the logs are now the following messages:
 <6>sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0x3
    ASC=0x11 ASCQ=0x0
end_request: I/O error, dev sda, sector 8126911
Buffer I/O error on device sda1, logical block 1015856
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0x3
    ASC=0x11 ASCQ=0x0
end_request: I/O error, dev sda, sector 8126919
Buffer I/O error on device sda1, logical block 1015857
end_request: I/O error, dev sda, sector 8126927
Buffer I/O error on device sda1, logical block 1015858
Buffer I/O error on device sda1, logical block 1015859
Buffer I/O error on device sda1, logical block 1015860
Buffer I/O error on device sda1, logical block 1015861
Buffer I/O error on device sda1, logical block 1015862
Buffer I/O error on device sda1, logical block 1015863
Buffer I/O error on device sda1, logical block 1015864
Buffer I/O error on device sda1, logical block 1015865
Buffer I/O error on device sda1, logical block 1015866
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key=0x3
    ASC=0x11 ASCQ=0x0
end_request: I/O error, dev sda, sector 8126911
printk: 7 messages suppressed.
Buffer I/O error on device sda1, logical block 1015856

Today or tomorrow i will plug in the disk in my other PC at a IDE port. There i
can check if is a hardware error (smartmontools don't work on my USB drive, but
the disk has smart features). If is not an hardware error i will try e2image
there. Georg
Comment 7 Theodore Tso 2006-05-31 06:37:52 UTC
Can you see if similar errors were in your system log before the "Problem
Description:kernel BUG at fs/ext3/namei.c:383!" kernel oops?   If there is, then
it looks like the problem has to do with the htree code being insufficiently
robust in the face of disk errors.   But I'd like to confirm this to be the case
if possible.

Thanks!!
Comment 8 Georg Funke 2006-05-31 07:32:04 UTC
No there aren't any similar errors in the log before. Next time i will test the
disk adapted to an IDE adapter.
Georg
Comment 9 Karl Koller 2006-08-30 12:49:49 UTC
I can reproduce this bug on a /tmp ext3-partition on my Gentoo-Box. It even 
occurs when booting with older Knoppix based Live-CDs based on kernel 2.6.x. 
The kernel stops and system is frozen. 
I only am able to access this filesystem via an old Knoppix-CD from 2004 
(LinuxTag 2004 DVD Release) based on kernel 2.4. With this release I am still 
able to acces and read all the files on the partition. I tried to repair it 
once with the e2fsck from e2fsprogs version 1.35. But with 2.6.x the bug is 
still reproducable. 
I'm able to provide an e2image, but it will be made with e2fsprogs version 
1.35 from February, 2004 since this is the one residing on the old 
Knoppix-DVD. If it helps, please contact me via email for getting the image 
and additional info that is needed. 
 
Comment 10 Theodore Tso 2006-08-30 15:54:34 UTC
Karl can you send me the e2image file.  Create it via a command line like this:

            e2image -r /dev/hda1 - | bzip2 > hda1.e2i.bz2

That would be very helpful!
Comment 11 Duane Griffin 2007-07-22 16:35:45 UTC
I've been looking at this following a bug report from a gentoo user:
http://bugs.gentoo.org/show_bug.cgi?id=183207

I've written a directory index corrupter utility that can reproduce the problem and a ext3 patch that I think fixes it. Please see the gentoo bug for more information. I'll send the patch upstream once I have confirmation that it fixes the reported problems.
Comment 12 Natalie Protasevich 2008-03-24 13:04:43 UTC
From the gentoo bug mentioned in #11, the patch has been tested and submitted.
Closing the bug, thanks.