Created attachment 22297 [details] full kernel log since bootup up to panic and kernel config Kernel panics when compiling something huge (it is 100% repeatable when emerging erlang under gentoo). This issue began somewhere after 2.6.28 and is not hardware dependent. I've seen it on simple office PC used as test server and even on production HP DL380G5 under VMWare ESX 4.0, so it just there. exit.c is related to process management as i can see, so post here. Here's panic log: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff80651cf6>] _spin_lock_irq+0x6/0x20 PGD bf885067 PUD abc0d067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda/uevent CPU 0 Modules linked in: Pid: 23003, comm: install Not tainted 2.6.30.1-MAIL #1 VMware Virtual Platform RIP: 0010:[<ffffffff80651cf6>] [<ffffffff80651cf6>] _spin_lock_irq+0x6/0x20 RSP: 0018:ffff8800bf809a60 EFLAGS: 00010092 RAX: 0000000000000100 RBX: ffffe200025eeff0 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018 RBP: 0000000000000000 R08: ffff88013e89f800 R09: 00000000002c8992 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000018 R14: ffff88013e89f800 R15: ffff88010596655c FS: 00002b8b8da59c40(0000) GS:ffff880028022000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 00000000bf8f2000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process install (pid: 23003, threadinfo ffff8800bf808000, task ffff880105b0ef40) Stack: ffffffff802c451d 0000000000000328 ffffc2001012e2f0 0000000000000000 ffff880105966540 0000000000000000 ffffffff803165d6 0000000100000000 000000010000df60 ffff880105966568 0001aef33f4a8000 ffffc20010082000 Call Trace: [<ffffffff802c451d>] ? __set_page_dirty+0x2d/0xd0 [<ffffffff803165d6>] ? flush_commit_list+0x6e6/0x710 [<ffffffff802502b0>] ? bit_waitqueue+0x10/0xb0 [<ffffffff803169de>] ? flush_journal_list+0x16e/0x810 [<ffffffff802446c4>] ? lock_timer_base+0x34/0x70 [<ffffffff8031714b>] ? flush_used_journal_lists+0xcb/0xe0 [<ffffffff80317f0e>] ? do_journal_end+0xbee/0xf20 [<ffffffff803186d3>] ? do_journal_begin_r+0x283/0x330 [<ffffffff802503c0>] ? autoremove_wake_function+0x0/0x30 [<ffffffff8031883d>] ? journal_begin+0x8d/0x160 [<ffffffff80312baf>] ? reiserfs_do_truncate+0x26f/0x500 [<ffffffff80312e62>] ? reiserfs_delete_object+0x22/0x70 [<ffffffff802ff480>] ? reiserfs_delete_inode+0x0/0xf0 [<ffffffff802ff52e>] ? reiserfs_delete_inode+0xae/0xf0 [<ffffffff8044dc48>] ? _atomic_dec_and_lock+0x68/0x90 [<ffffffff802ff480>] ? reiserfs_delete_inode+0x0/0xf0 [<ffffffff802b563e>] ? generic_delete_inode+0x8e/0x140 [<ffffffff802ac632>] ? do_unlinkat+0x132/0x1e0 [<ffffffff80223bb8>] ? do_page_fault+0x148/0x2f0 [<ffffffff80452f01>] ? __up_read+0x21/0xc0 [<ffffffff8020b42b>] ? system_call_fastpath+0x16/0x1b Code: c3 0f 1f 40 00 9c 58 fa ba 00 01 00 00 f0 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c3 0f 1f 84 00 00 00 00 00 fa b8 00 01 00 00 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 c3 66 2e 0f 1f 84 RIP [<ffffffff80651cf6>] _spin_lock_irq+0x6/0x20 RSP <ffff8800bf809a60> CR2: 0000000000000018 ---[ end trace 3783cbbc4d105fa7 ]--- ------------[ cut here ]------------ WARNING: at kernel/exit.c:896 do_exit+0x5f9/0x710() Hardware name: VMware Virtual Platform Modules linked in: Pid: 23003, comm: install Tainted: G D 2.6.30.1-MAIL #1 Call Trace: [<ffffffff8023df49>] ? do_exit+0x5f9/0x710 [<ffffffff8023df49>] ? do_exit+0x5f9/0x710 [<ffffffff8023a519>] ? warn_slowpath_common+0x79/0xd0 [<ffffffff8023df49>] ? do_exit+0x5f9/0x710 [<ffffffff80254776>] ? up+0x16/0x50 [<ffffffff8023ac85>] ? release_console_sem+0x1a5/0x1f0 [<ffffffff8020f6ed>] ? oops_end+0x9d/0xa0 [<ffffffff80223547>] ? no_context+0xf7/0x270 [<ffffffff8064fac4>] ? thread_return+0x6a/0x656 [<ffffffff80223815>] ? __bad_area_nosemaphore+0x155/0x200 [<ffffffff8043eed0>] ? elv_insert+0xe0/0x260 [<ffffffff80440862>] ? __generic_unplug_device+0x12/0x40 [<ffffffff80452d8a>] ? __down_read_trylock+0x3a/0x60 [<ffffffff80452f01>] ? __up_read+0x21/0xc0 [<ffffffff8065215f>] ? page_fault+0x1f/0x30 [<ffffffff80651cf6>] ? _spin_lock_irq+0x6/0x20 [<ffffffff802c451d>] ? __set_page_dirty+0x2d/0xd0 [<ffffffff803165d6>] ? flush_commit_list+0x6e6/0x710 [<ffffffff802502b0>] ? bit_waitqueue+0x10/0xb0 [<ffffffff803169de>] ? flush_journal_list+0x16e/0x810 [<ffffffff802446c4>] ? lock_timer_base+0x34/0x70 [<ffffffff8031714b>] ? flush_used_journal_lists+0xcb/0xe0 [<ffffffff80317f0e>] ? do_journal_end+0xbee/0xf20 [<ffffffff803186d3>] ? do_journal_begin_r+0x283/0x330 [<ffffffff802503c0>] ? autoremove_wake_function+0x0/0x30 [<ffffffff8031883d>] ? journal_begin+0x8d/0x160 [<ffffffff80312baf>] ? reiserfs_do_truncate+0x26f/0x500 [<ffffffff80312e62>] ? reiserfs_delete_object+0x22/0x70 [<ffffffff802ff480>] ? reiserfs_delete_inode+0x0/0xf0 [<ffffffff802ff52e>] ? reiserfs_delete_inode+0xae/0xf0 [<ffffffff8044dc48>] ? _atomic_dec_and_lock+0x68/0x90 [<ffffffff802ff480>] ? reiserfs_delete_inode+0x0/0xf0 [<ffffffff802b563e>] ? generic_delete_inode+0x8e/0x140 [<ffffffff802ac632>] ? do_unlinkat+0x132/0x1e0 [<ffffffff80223bb8>] ? do_page_fault+0x148/0x2f0 [<ffffffff80452f01>] ? __up_read+0x21/0xc0 [<ffffffff8020b42b>] ? system_call_fastpath+0x16/0x1b ---[ end trace 3783cbbc4d105fa8 ]--- I'm attaching full kernel log
can you please provide a list of the mount options for your local filesystems? if that lists different filesystems with different mount options, what`s the filesystem where the compile runs on? there is a recent possible bug with reiserfs and mount option "data=journal", so perhaps you use that mount option ?
Yes, it seems that this is the problem. My root fs is ReiserFS + data=journal option: # cat /proc/mounts rootfs / rootfs rw 0 0 /dev/root / reiserfs rw,noatime,data=journal 0 0 proc /proc proc rw,nosuid,nodev,noexec 0 0 rc-svcdir /lib64/rc/init.d tmpfs rw,nosuid,nodev,noexec,size=1024k,mode=755 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec 0 0 udev /dev tmpfs rw,nosuid,size=10240k,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,gid=5,mode=620 0 0 shm /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0 /dev/md1 /mnt/130gb_mirror xfs rw,noatime,attr2,nobarrier,logbufs=8,logbsize=256k,noquota 0 0 /dev/sdd1 /mnt/200gb reiserfs rw,noatime,data=writeback 0 0 /dev/sdb1 /mnt/300gb xfs rw,noatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 usbfs /proc/bus/usb usbfs rw,nosuid,noexec,devgid=85,devmode=664 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec 0 0 OWFS /var/lib/owfs/mnt fuse.OWFS rw,nosuid,nodev,user_id=150,group_id=150,allow_other 0 0 Others are XFS and ReiserFS + data=writeback I can't seem to find any info about this bug in google :) Is it fixed in any recent kernel?
>/dev/root / reiserfs rw,noatime,data=journal 0 0 is that an installation default or did you tune that manually ? can you confirm that the problem goes away if you remove data=journal ? >Is it fixed in any recent kernel? no, i don`t think so. resources for fixing reiserfs bugs are scarce....
oh, and please reassign the bug to reiserfs-devel@vger.kernel.org
No, it was done manually, as it's gentoo :) I'll try to confirm this bug soon. I can't reassign this bug there, only assignee (you?) can do it as far as it tells me.
no, i´m not the assignee. and i`m not even a real kernel programmer. i´m just having some linux skills and fun with hunting bugs :) i`m getting : You tried to change the Assignee field from process_other@kernel-bugs.osdl.org to reiserfs-devel@vger.kernel.org , but only the assignee of the bug, or a user with the required permissions may change that field. so please someone assign it appropriately or give me the permission to do that myself
if this is the "data=journal" issue, could you please try (and confirm) if this patch: http://bugzilla.kernel.org/attachment.cgi?id=22800 is fixing the problem ? you can otherwise try .31-rc7 kernel, which already has this patch included.
Well, it looks like the patch solves the problem! Under 31-rc7 i've recompiled the toolchain three times, and it's rock solid. We can close the bug now i think. Thanks to all of you!