Bug 13756 - Kernel panic occasionaly under load
Summary: Kernel panic occasionaly under load
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: File System
Classification: Unclassified
Component: ReiserFS (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: ReiseFS developers team
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-10 11:43 UTC by Igor Novgorodov
Modified: 2009-08-26 17:22 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.30.1 Vanilla x86_64
Tree: Mainline
Regression: Yes


Attachments
full kernel log since bootup up to panic and kernel config (67.32 KB, text/plain)
2009-07-10 11:43 UTC, Igor Novgorodov
Details

Description Igor Novgorodov 2009-07-10 11:43:55 UTC
Created attachment 22297 [details]
full kernel log since bootup up to panic and kernel config

Kernel panics when compiling something huge (it is 100% repeatable when emerging erlang under gentoo).
This issue began somewhere after 2.6.28 and is not hardware dependent.
I've seen it on simple office PC used as test server and even on production HP DL380G5 under VMWare ESX 4.0, so it just there.

exit.c is related to process management as i can see, so post here.

Here's panic log:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018

IP: [<ffffffff80651cf6>] _spin_lock_irq+0x6/0x20

PGD bf885067 PUD abc0d067 PMD 0 

Oops: 0002 [#1] SMP 

last sysfs file: /sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda/uevent

CPU 0 

Modules linked in:

Pid: 23003, comm: install Not tainted 2.6.30.1-MAIL #1 VMware Virtual Platform

RIP: 0010:[<ffffffff80651cf6>]  [<ffffffff80651cf6>] _spin_lock_irq+0x6/0x20

RSP: 0018:ffff8800bf809a60  EFLAGS: 00010092

RAX: 0000000000000100 RBX: ffffe200025eeff0 RCX: 0000000000000001

RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018

RBP: 0000000000000000 R08: ffff88013e89f800 R09: 00000000002c8992

R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000

R13: 0000000000000018 R14: ffff88013e89f800 R15: ffff88010596655c

FS:  00002b8b8da59c40(0000) GS:ffff880028022000(0000) knlGS:0000000000000000

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 0000000000000018 CR3: 00000000bf8f2000 CR4: 00000000000006e0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Process install (pid: 23003, threadinfo ffff8800bf808000, task ffff880105b0ef40)

Stack:

 ffffffff802c451d 0000000000000328 ffffc2001012e2f0 0000000000000000

 ffff880105966540 0000000000000000 ffffffff803165d6 0000000100000000

 000000010000df60 ffff880105966568 0001aef33f4a8000 ffffc20010082000

Call Trace:

 [<ffffffff802c451d>] ? __set_page_dirty+0x2d/0xd0

 [<ffffffff803165d6>] ? flush_commit_list+0x6e6/0x710

 [<ffffffff802502b0>] ? bit_waitqueue+0x10/0xb0

 [<ffffffff803169de>] ? flush_journal_list+0x16e/0x810

 [<ffffffff802446c4>] ? lock_timer_base+0x34/0x70

 [<ffffffff8031714b>] ? flush_used_journal_lists+0xcb/0xe0

 [<ffffffff80317f0e>] ? do_journal_end+0xbee/0xf20

 [<ffffffff803186d3>] ? do_journal_begin_r+0x283/0x330

 [<ffffffff802503c0>] ? autoremove_wake_function+0x0/0x30

 [<ffffffff8031883d>] ? journal_begin+0x8d/0x160

 [<ffffffff80312baf>] ? reiserfs_do_truncate+0x26f/0x500

 [<ffffffff80312e62>] ? reiserfs_delete_object+0x22/0x70

 [<ffffffff802ff480>] ? reiserfs_delete_inode+0x0/0xf0

 [<ffffffff802ff52e>] ? reiserfs_delete_inode+0xae/0xf0

 [<ffffffff8044dc48>] ? _atomic_dec_and_lock+0x68/0x90

 [<ffffffff802ff480>] ? reiserfs_delete_inode+0x0/0xf0

 [<ffffffff802b563e>] ? generic_delete_inode+0x8e/0x140

 [<ffffffff802ac632>] ? do_unlinkat+0x132/0x1e0

 [<ffffffff80223bb8>] ? do_page_fault+0x148/0x2f0

 [<ffffffff80452f01>] ? __up_read+0x21/0xc0

 [<ffffffff8020b42b>] ? system_call_fastpath+0x16/0x1b

Code: c3 0f 1f 40 00 9c 58 fa ba 00 01 00 00 f0 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c3 0f 1f 84 00 00 00 00 00 fa b8 00 01 00 00 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 c3 66 2e 0f 1f 84 

RIP  [<ffffffff80651cf6>] _spin_lock_irq+0x6/0x20

 RSP <ffff8800bf809a60>

CR2: 0000000000000018

---[ end trace 3783cbbc4d105fa7 ]---

------------[ cut here ]------------

WARNING: at kernel/exit.c:896 do_exit+0x5f9/0x710()

Hardware name: VMware Virtual Platform

Modules linked in:

Pid: 23003, comm: install Tainted: G      D    2.6.30.1-MAIL #1

Call Trace:

 [<ffffffff8023df49>] ? do_exit+0x5f9/0x710

 [<ffffffff8023df49>] ? do_exit+0x5f9/0x710

 [<ffffffff8023a519>] ? warn_slowpath_common+0x79/0xd0

 [<ffffffff8023df49>] ? do_exit+0x5f9/0x710

 [<ffffffff80254776>] ? up+0x16/0x50

 [<ffffffff8023ac85>] ? release_console_sem+0x1a5/0x1f0

 [<ffffffff8020f6ed>] ? oops_end+0x9d/0xa0

 [<ffffffff80223547>] ? no_context+0xf7/0x270

 [<ffffffff8064fac4>] ? thread_return+0x6a/0x656

 [<ffffffff80223815>] ? __bad_area_nosemaphore+0x155/0x200

 [<ffffffff8043eed0>] ? elv_insert+0xe0/0x260

 [<ffffffff80440862>] ? __generic_unplug_device+0x12/0x40

 [<ffffffff80452d8a>] ? __down_read_trylock+0x3a/0x60

 [<ffffffff80452f01>] ? __up_read+0x21/0xc0

 [<ffffffff8065215f>] ? page_fault+0x1f/0x30

 [<ffffffff80651cf6>] ? _spin_lock_irq+0x6/0x20

 [<ffffffff802c451d>] ? __set_page_dirty+0x2d/0xd0

 [<ffffffff803165d6>] ? flush_commit_list+0x6e6/0x710

 [<ffffffff802502b0>] ? bit_waitqueue+0x10/0xb0

 [<ffffffff803169de>] ? flush_journal_list+0x16e/0x810

 [<ffffffff802446c4>] ? lock_timer_base+0x34/0x70

 [<ffffffff8031714b>] ? flush_used_journal_lists+0xcb/0xe0

 [<ffffffff80317f0e>] ? do_journal_end+0xbee/0xf20

 [<ffffffff803186d3>] ? do_journal_begin_r+0x283/0x330

 [<ffffffff802503c0>] ? autoremove_wake_function+0x0/0x30

 [<ffffffff8031883d>] ? journal_begin+0x8d/0x160

 [<ffffffff80312baf>] ? reiserfs_do_truncate+0x26f/0x500

 [<ffffffff80312e62>] ? reiserfs_delete_object+0x22/0x70

 [<ffffffff802ff480>] ? reiserfs_delete_inode+0x0/0xf0

 [<ffffffff802ff52e>] ? reiserfs_delete_inode+0xae/0xf0

 [<ffffffff8044dc48>] ? _atomic_dec_and_lock+0x68/0x90

 [<ffffffff802ff480>] ? reiserfs_delete_inode+0x0/0xf0

 [<ffffffff802b563e>] ? generic_delete_inode+0x8e/0x140

 [<ffffffff802ac632>] ? do_unlinkat+0x132/0x1e0

 [<ffffffff80223bb8>] ? do_page_fault+0x148/0x2f0

 [<ffffffff80452f01>] ? __up_read+0x21/0xc0

 [<ffffffff8020b42b>] ? system_call_fastpath+0x16/0x1b

---[ end trace 3783cbbc4d105fa8 ]---


I'm attaching full kernel log
Comment 1 Roland Kletzing 2009-08-20 20:39:29 UTC
can you please provide a list of the mount options for your local filesystems?

if that lists different filesystems with different mount options, what`s the filesystem where the compile runs on?

there is a recent possible bug with reiserfs and mount option "data=journal", so perhaps you use that mount option ?
Comment 2 Igor Novgorodov 2009-08-21 06:15:58 UTC
Yes, it seems that this is the problem.
My root fs is ReiserFS + data=journal option:

# cat /proc/mounts 
rootfs / rootfs rw 0 0
/dev/root / reiserfs rw,noatime,data=journal 0 0
proc /proc proc rw,nosuid,nodev,noexec 0 0
rc-svcdir /lib64/rc/init.d tmpfs rw,nosuid,nodev,noexec,size=1024k,mode=755 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec 0 0
udev /dev tmpfs rw,nosuid,size=10240k,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,gid=5,mode=620 0 0
shm /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0
/dev/md1 /mnt/130gb_mirror xfs rw,noatime,attr2,nobarrier,logbufs=8,logbsize=256k,noquota 0 0
/dev/sdd1 /mnt/200gb reiserfs rw,noatime,data=writeback 0 0
/dev/sdb1 /mnt/300gb xfs rw,noatime,attr2,logbufs=8,logbsize=256k,noquota 0 0
usbfs /proc/bus/usb usbfs rw,nosuid,noexec,devgid=85,devmode=664 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec 0 0
OWFS /var/lib/owfs/mnt fuse.OWFS rw,nosuid,nodev,user_id=150,group_id=150,allow_other 0 0

Others are XFS and ReiserFS + data=writeback
I can't seem to find any info about this bug in google :)
Is it fixed in any recent kernel?
Comment 3 Roland Kletzing 2009-08-21 09:07:20 UTC
>/dev/root / reiserfs rw,noatime,data=journal 0 0

is that an installation default or did you tune that manually ?

can you confirm that the problem goes away if you remove data=journal ?

>Is it fixed in any recent kernel?
no, i don`t think so. resources for fixing reiserfs bugs are scarce....
Comment 4 Roland Kletzing 2009-08-21 09:08:40 UTC
oh, and please reassign the bug to reiserfs-devel@vger.kernel.org
Comment 5 Igor Novgorodov 2009-08-21 09:11:25 UTC
No, it was done manually, as it's gentoo :)
I'll try to confirm this bug soon.

I can't reassign this bug there, only assignee (you?) can do it as far as it tells me.
Comment 6 Roland Kletzing 2009-08-21 09:18:17 UTC
no, i´m not the assignee. and i`m not even a real kernel programmer. i´m just having some linux skills and fun with hunting bugs :)

i`m getting :
You tried to change the Assignee field from process_other@kernel-bugs.osdl.org to reiserfs-devel@vger.kernel.org , but only the assignee of the bug, or a user with the required permissions may change that field. 

so please someone assign it appropriately or give me the permission to do that myself
Comment 7 Roland Kletzing 2009-08-22 08:55:12 UTC
if this is the "data=journal" issue, could you please try (and confirm) if this patch: http://bugzilla.kernel.org/attachment.cgi?id=22800  is fixing the problem ?  you can otherwise try .31-rc7 kernel, which already has this patch included.
Comment 8 Igor Novgorodov 2009-08-26 17:22:27 UTC
Well, it looks like the patch solves the problem!
Under 31-rc7 i've recompiled the toolchain three times, and it's rock solid.
We can close the bug now i think. Thanks to all of you!

Note You need to log in before you can comment on or make changes to this bug.