Most recent kernel where this bug did not occur: Distribution: OpenSuSE 10.1 Hardware Environment: > uname -a Linux router 2.6.19-rc4-nopreempt #7 Wed Nov 1 03:50:53 CET 2006 i686 athlon i386 GNU/Linux > lspci 00:00.0 Host bridge: VIA Technologies, Inc. VT8361 [KLE133] Host Bridge 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP] 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:07.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 16) 00:07.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 16) 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40) 00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 50) 00:09.0 Network controller: Cologne Chip Designs GmbH ISDN network controller [HFC-PCI] (rev 02) 00:0a.0 Network controller: ASUSTeK Computer Inc. ISDNLink P-IN100-ST-D (rev 01) 00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i1 > Software Environment: Problem Description: Hello, I suffer severe crashes when using software RAID5 over LVM2 over 4 IDE disks on an x86 UP machine. The kernel has not SMP support enabled. I upgraded the kernel from 2.6.18.1 to version 2.6.19-rc4, but with no improvement. The crashes are dependent on the RAID5 resync-speed. If the resync speed is high, the crashes occur within minutes. If it is set to the lowest possible value, the crashes occur within days. Up to now, I have been unable to do a full resync on that RAID5 device, because crashes happened always far earlier than resync would have finished. I had been using software RAID4 on this machine for a long time (without sandwitching LVM2 between RAID4 and IDE) which worked flawlessly. I already disabled preemption and IRQ unmasking with no apparent success. Memtest86+ shows no signs of memory corruption. Using netpoll, I obtained following crash logs: ===snip=== Nov 4 13:05:44 router slab: Internal list corruption detected in cache 'dm_tio'(127), slabp cba31000(80). Hexdump: Nov 4 13:05:44 router Nov 4 13:05:44 router 000: Nov 4 13:05:44 router 00 Nov 4 13:05:44 router e0 Nov 4 13:05:44 router 4f Nov 4 13:05:44 router cb Nov 4 13:05:44 router 50 Nov 4 13:05:44 router c9 Nov 4 13:05:44 router 64 Nov 4 13:05:44 router c1 Nov 4 13:05:44 router 18 Nov 4 13:05:44 router 02 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router 18 Nov 4 13:05:44 router 12 Nov 4 13:05:44 router a3 Nov 4 13:05:44 router cb Nov 4 13:05:44 router Nov 4 13:05:44 router 010: Nov 4 13:05:44 router 50 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router last message repeated 2 times Nov 4 13:05:44 router 51 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router last message repeated 6 times Nov 4 13:05:44 router 6b Nov 4 13:05:44 router 00 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router iptable_mangle Nov 4 13:05:44 router ip_nat_ftp Nov 4 13:05:44 router ip_conntrack_ftp Nov 4 13:05:44 router ipt_REJECT Nov 4 13:05:44 router iptable_filter Nov 4 13:05:44 router sch_prio Nov 4 13:05:44 router xt_mark Nov 4 13:05:44 router xt_MARK Nov 4 13:05:44 router nls_utf8 Nov 4 13:05:44 router cifs Nov 4 13:05:44 router act_police Nov 4 13:05:44 router nls_cp850 Nov 4 13:05:44 router nls_iso8859_1 Nov 4 13:05:44 router smbfs Nov 4 13:05:44 router sch_ingress Nov 4 13:05:44 router cls_u32 Nov 4 13:05:44 router sch_htb Nov 4 13:05:44 router xt_tcpudp Nov 4 13:05:44 router iptable_nat Nov 4 13:05:44 router ip_nat Nov 4 13:05:44 router ip_conntrack Nov 4 13:05:44 router nfnetlink Nov 4 13:05:44 router ip_tables Nov 4 13:05:44 router x_tables Nov 4 13:05:44 router rfcomm Nov 4 13:05:44 router hidp Nov 4 13:05:44 router l2cap Nov 4 13:05:44 router bluetooth Nov 4 13:05:44 router hisax Nov 4 13:05:44 router isdn Nov 4 13:05:44 router hfcpci Nov 4 13:05:44 router mISDN_dsp Nov 4 13:05:44 router slab: Internal list corruption detected in cache 'dm_tio'(127), slabp cba31000(80). Hexdump: Nov 4 13:05:44 router Nov 4 13:05:44 router 000: Nov 4 13:05:44 router 00 Nov 4 13:05:44 router e0 Nov 4 13:05:44 router 4f Nov 4 13:05:44 router cb Nov 4 13:05:44 router 50 Nov 4 13:05:44 router c9 Nov 4 13:05:44 router 64 Nov 4 13:05:44 router c1 Nov 4 13:05:44 router 18 Nov 4 13:05:44 router 02 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router 18 Nov 4 13:05:44 router 12 Nov 4 13:05:44 router a3 Nov 4 13:05:44 router cb Nov 4 13:05:44 router Nov 4 13:05:44 router 010: Nov 4 13:05:44 router 50 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router last message repeated 2 times Nov 4 13:05:44 router 51 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router last message repeated 6 times Nov 4 13:05:44 router 6b Nov 4 13:05:44 router 00 Nov 4 13:05:44 router 00 Nov 4 13:05:44 router iptable_mangle Nov 4 13:05:44 router ip_nat_ftp Nov 4 13:05:44 router ip_conntrack_ftp Nov 4 13:05:44 router ipt_REJECT Nov 4 13:05:44 router iptable_filter Nov 4 13:05:44 router sch_prio Nov 4 13:05:44 router xt_mark Nov 4 13:05:44 router xt_MARK Nov 4 13:05:44 router nls_utf8 Nov 4 13:05:44 router cifs Nov 4 13:05:44 router act_police Nov 4 13:05:44 router nls_cp850 Nov 4 13:05:44 router nls_iso8859_1 Nov 4 13:05:44 router smbfs Nov 4 13:05:44 router sch_ingress Nov 4 13:05:44 router cls_u32 Nov 4 13:05:44 router sch_htb Nov 4 13:05:44 router xt_tcpudp Nov 4 13:05:44 router iptable_nat Nov 4 13:05:44 router ip_nat Nov 4 13:05:44 router ip_conntrack Nov 4 13:05:44 router nfnetlink Nov 4 13:05:44 router ip_tables Nov 4 13:05:44 router x_tables Nov 4 13:05:44 router rfcomm Nov 4 13:05:44 router hidp Nov 4 13:05:44 router l2cap Nov 4 13:05:44 router bluetooth Nov 4 13:05:44 router hisax Nov 4 13:05:44 router isdn Nov 4 13:05:44 router hfcpci Nov 4 13:05:44 router mISDN_dsp Nov 4 13:06:20 router BUG: spinlock lockup on CPU#0, md2_raid5/915, c164c974 Nov 4 13:06:20 router [<c0103c49>] Nov 4 13:06:20 router dump_trace+0x64/0x1c5 Nov 4 13:06:20 router [<c0103dc4>] Nov 4 13:06:20 router show_trace_log_lvl+0x1a/0x2f Nov 4 13:06:20 router [<c0104383>] Nov 4 13:06:20 router show_trace+0x12/0x14 Nov 4 13:06:20 router [<c01043c9>] Nov 4 13:06:20 router dump_stack+0x19/0x1b Nov 4 13:06:20 router [<c01fa933>] Nov 4 13:06:20 router _raw_spin_lock+0xe1/0x105 Nov 4 13:06:20 router [<c0307cd8>] Nov 4 13:06:20 router _spin_lock+0x32/0x38 Nov 4 13:06:20 router [<c015e63c>] Nov 4 13:06:20 router cache_flusharray+0x40/0x10a Nov 4 13:06:20 router [<c015e845>] Nov 4 13:06:20 router kmem_cache_free+0x82/0xad Nov 4 13:06:20 router [<c01497bd>] Nov 4 13:06:20 router mempool_free_slab+0xe/0x10 Nov 4 13:06:20 router [<c0149823>] Nov 4 13:06:20 router mempool_free+0x64/0x6a Nov 4 13:06:20 router [<c02992a8>] Nov 4 13:06:20 router clone_endio+0x93/0xa7 Nov 4 13:06:20 router [<c0181651>] Nov 4 13:06:20 router bio_endio+0x5e/0x66 Nov 4 13:06:20 router [<c01e0893>] Nov 4 13:06:20 router __end_that_request_first+0x165/0x40b Nov 4 13:06:20 router [<c01e0b4e>] Nov 4 13:06:20 router end_that_request_first+0xb/0xd Nov 4 13:06:20 router [<c026aed6>] Nov 4 13:06:20 router ide_end_request+0x87/0xd0 Nov 4 13:06:20 router [<c0271f65>] Nov 4 13:06:20 router ide_dma_intr+0x58/0x98 Nov 4 13:06:20 router [<c026badd>] Nov 4 13:06:20 router ide_intr+0x157/0x1ba Nov 4 13:06:20 router [<c0143685>] Nov 4 13:06:20 router handle_IRQ_event+0x1a/0x46 Nov 4 13:06:20 router [<c0144920>] Nov 4 13:06:20 router handle_level_irq+0x8a/0xda Nov 4 13:06:20 router [<c0104dc0>] Nov 4 13:06:20 router do_IRQ+0x9b/0xb7 Nov 4 13:06:20 router [<c0103779>] Nov 4 13:06:20 router common_interrupt+0x25/0x2c Nov 4 13:06:20 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c Nov 4 13:06:20 router Nov 4 13:06:20 router Leftover inexact backtrace: Nov 4 13:06:20 router Nov 4 13:06:20 router [<c02851df>] Nov 4 13:06:20 router page_is_zero+0x33/0x3f Nov 4 13:06:20 router [<c0288946>] Nov 4 13:06:20 router handle_stripe+0x1cff/0x237b Nov 4 13:06:20 router [<c02890d4>] Nov 4 13:06:20 router raid5d+0x112/0x13b Nov 4 13:06:20 router [<c0294d6a>] Nov 4 13:06:20 router md_thread+0xef/0x106 Nov 4 13:06:20 router [<c0129e86>] Nov 4 13:06:20 router kthread+0xb0/0xde Nov 4 13:06:20 router [<c010397b>] Nov 4 13:06:20 router kernel_thread_helper+0x7/0x10 Nov 4 13:06:20 router ======================= Nov 4 13:06:20 router BUG: spinlock lockup on CPU#0, md2_raid5/915, c164c974 Nov 4 13:06:20 router [<c0103c49>] Nov 4 13:06:20 router dump_trace+0x64/0x1c5 Nov 4 13:06:20 router [<c0103dc4>] Nov 4 13:06:20 router show_trace_log_lvl+0x1a/0x2f Nov 4 13:06:20 router [<c0104383>] Nov 4 13:06:20 router show_trace+0x12/0x14 Nov 4 13:06:20 router [<c01043c9>] Nov 4 13:06:20 router dump_stack+0x19/0x1b Nov 4 13:06:20 router [<c01fa933>] Nov 4 13:06:20 router _raw_spin_lock+0xe1/0x105 Nov 4 13:06:20 router [<c0307cd8>] Nov 4 13:06:20 router _spin_lock+0x32/0x38 Nov 4 13:06:20 router [<c015e63c>] Nov 4 13:06:20 router cache_flusharray+0x40/0x10a Nov 4 13:06:20 router [<c015e845>] Nov 4 13:06:20 router kmem_cache_free+0x82/0xad Nov 4 13:06:20 router [<c01497bd>] Nov 4 13:06:20 router mempool_free_slab+0xe/0x10 Nov 4 13:06:20 router [<c0149823>] Nov 4 13:06:20 router mempool_free+0x64/0x6a Nov 4 13:06:20 router [<c02992a8>] Nov 4 13:06:20 router clone_endio+0x93/0xa7 Nov 4 13:06:20 router [<c0181651>] Nov 4 13:06:20 router bio_endio+0x5e/0x66 Nov 4 13:06:20 router [<c01e0893>] Nov 4 13:06:20 router __end_that_request_first+0x165/0x40b Nov 4 13:06:20 router [<c01e0b4e>] Nov 4 13:06:20 router end_that_request_first+0xb/0xd Nov 4 13:06:20 router [<c026aed6>] Nov 4 13:06:20 router ide_end_request+0x87/0xd0 Nov 4 13:06:20 router [<c0271f65>] Nov 4 13:06:20 router ide_dma_intr+0x58/0x98 Nov 4 13:06:20 router [<c026badd>] Nov 4 13:06:20 router ide_intr+0x157/0x1ba Nov 4 13:06:20 router [<c0143685>] Nov 4 13:06:20 router handle_IRQ_event+0x1a/0x46 Nov 4 13:06:20 router [<c0144920>] Nov 4 13:06:20 router handle_level_irq+0x8a/0xda Nov 4 13:06:20 router [<c0104dc0>] Nov 4 13:06:20 router do_IRQ+0x9b/0xb7 Nov 4 13:06:20 router [<c0103779>] Nov 4 13:06:20 router common_interrupt+0x25/0x2c Nov 4 13:06:20 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c Nov 4 13:06:20 router Nov 4 13:06:20 router Leftover inexact backtrace: Nov 4 13:06:20 router Nov 4 13:06:20 router [<c02851df>] Nov 4 13:06:20 router page_is_zero+0x33/0x3f Nov 4 13:06:20 router [<c0288946>] Nov 4 13:06:20 router handle_stripe+0x1cff/0x237b Nov 4 13:06:20 router [<c02890d4>] Nov 4 13:06:20 router raid5d+0x112/0x13b Nov 4 13:06:20 router [<c0294d6a>] Nov 4 13:06:20 router md_thread+0xef/0x106 Nov 4 13:06:20 router [<c0129e86>] Nov 4 13:06:20 router kthread+0xb0/0xde Nov 4 13:06:20 router [<c010397b>] Nov 4 13:06:20 router kernel_thread_helper+0x7/0x10 Nov 4 13:06:20 router ======================= ===snap=== reboot ===snip=== Nov 4 13:41:55 router slab: Internal list corruption detected in cache 'biovec-1'(145), slabp c2b23000(54). Hexdump: Nov 4 13:41:55 router Nov 4 13:41:55 router 000: Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 01 Nov 4 13:41:55 router 10 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 02 Nov 4 13:41:55 router 20 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 60 Nov 4 13:41:55 router 02 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 60 Nov 4 13:41:55 router 32 Nov 4 13:41:55 router b2 Nov 4 13:41:55 router c2 Nov 4 13:41:55 router Nov 4 13:41:55 router 010: Nov 4 13:41:55 router 36 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router last message repeated 2 times Nov 4 13:41:55 router 46 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router last message repeated 4 times Nov 4 13:41:55 router 9b Nov 4 13:41:55 router 83 Nov 4 13:41:55 router 03 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router ff Nov 4 13:41:55 router ff Nov 4 13:41:55 router 72 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router last message repeated 2 times Nov 4 13:41:55 router 73 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router last message repeated 2 times Nov 4 13:41:55 router Nov 4 13:41:55 router ------------[ cut here ]------------ Nov 4 13:41:55 router kernel BUG at mm/slab.c:2911! Nov 4 13:41:55 router invalid opcode: 0000 [#1] Nov 4 13:41:55 router Modules linked in: Nov 4 13:41:55 router nls_utf8 Nov 4 13:41:55 router cifs Nov 4 13:41:55 router nls_cp850 Nov 4 13:41:55 router nls_iso8859_1 Nov 4 13:41:55 router smbfs Nov 4 13:41:55 router act_police Nov 4 13:41:55 router sch_ingress Nov 4 13:41:55 router cls_u32 Nov 4 13:41:55 router sch_sfq Nov 4 13:41:55 router sch_htb Nov 4 13:41:55 router rfcomm Nov 4 13:41:55 router hidp Nov 4 13:41:55 router l2cap Nov 4 13:41:55 router bluetooth Nov 4 13:41:55 router hisax Nov 4 13:41:55 router isdn Nov 4 13:41:55 router hfcpci Nov 4 13:41:55 router mISDN_dsp Nov 4 13:41:55 router l3udss1 Nov 4 13:41:55 router mISDN_l2 Nov 4 13:41:55 router mISDN_l1 Nov 4 13:41:55 router mISDN_core Nov 4 13:41:55 router slab: Internal list corruption detected in cache 'biovec-1'(145), slabp c2b23000(54). Hexdump: Nov 4 13:41:55 router Nov 4 13:41:55 router 000: Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 01 Nov 4 13:41:55 router 10 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 02 Nov 4 13:41:55 router 20 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 60 Nov 4 13:41:55 router 02 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 60 Nov 4 13:41:55 router 32 Nov 4 13:41:55 router b2 Nov 4 13:41:55 router c2 Nov 4 13:41:55 router Nov 4 13:41:55 router 010: Nov 4 13:41:55 router 36 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router last message repeated 2 times Nov 4 13:41:55 router 46 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router last message repeated 4 times Nov 4 13:41:55 router 9b Nov 4 13:41:55 router 83 Nov 4 13:41:55 router 03 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router ff Nov 4 13:41:55 router ff Nov 4 13:41:55 router 72 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router last message repeated 2 times Nov 4 13:41:55 router 73 Nov 4 13:41:55 router 00 Nov 4 13:41:55 router last message repeated 2 times Nov 4 13:41:55 router Nov 4 13:41:55 router ------------[ cut here ]------------ Nov 4 13:41:55 router kernel BUG at mm/slab.c:2911! Nov 4 13:41:55 router invalid opcode: 0000 [#1] Nov 4 13:41:55 router Modules linked in: Nov 4 13:41:55 router nls_utf8 Nov 4 13:41:55 router cifs Nov 4 13:41:55 router nls_cp850 Nov 4 13:41:55 router nls_iso8859_1 Nov 4 13:41:55 router smbfs Nov 4 13:41:55 router act_police Nov 4 13:41:55 router sch_ingress Nov 4 13:41:55 router cls_u32 Nov 4 13:41:55 router sch_sfq Nov 4 13:41:55 router sch_htb Nov 4 13:41:55 router rfcomm Nov 4 13:41:55 router hidp Nov 4 13:41:55 router l2cap Nov 4 13:41:55 router bluetooth Nov 4 13:41:56 router hisax Nov 4 13:41:56 router isdn Nov 4 13:41:56 router hfcpci Nov 4 13:41:56 router mISDN_dsp Nov 4 13:41:56 router l3udss1 Nov 4 13:41:56 router mISDN_l2 Nov 4 13:41:56 router mISDN_l1 Nov 4 13:41:56 router mISDN_core Message from syslogd@router at Sat Nov 4 13:42:32 2006 ... router BUG: spinlock lockup on CPU#0, swapper/0, c03a4400 Nov 4 13:42:32 router BUG: spinlock lockup on CPU#0, swapper/0, c03a4400 Nov 4 13:42:32 router [<c0103c49>] Nov 4 13:42:32 router dump_trace+0x64/0x1c5 Nov 4 13:42:32 router [<c0103dc4>] Nov 4 13:42:32 router show_trace_log_lvl+0x1a/0x2f Nov 4 13:42:32 router [<c0104383>] Nov 4 13:42:32 router show_trace+0x12/0x14 Nov 4 13:42:32 router [<c01043c9>] Nov 4 13:42:32 router dump_stack+0x19/0x1b Nov 4 13:42:32 router [<c01fa933>] Nov 4 13:42:32 router _raw_spin_lock+0xe1/0x105 Nov 4 13:42:32 router [<c03080ca>] Nov 4 13:42:32 router _spin_lock_irqsave+0x3b/0x44 Nov 4 13:42:33 router [<c026b99d>] Nov 4 13:42:33 router ide_intr+0x17/0x1ba Nov 4 13:42:33 router [<c0143685>] Nov 4 13:42:33 router handle_IRQ_event+0x1a/0x46 Nov 4 13:42:33 router [<c0144920>] Nov 4 13:42:33 router handle_level_irq+0x8a/0xda Nov 4 13:42:33 router [<c0104dc0>] Nov 4 13:42:33 router do_IRQ+0x9b/0xb7 Nov 4 13:42:33 router [<c0103779>] Nov 4 13:42:33 router common_interrupt+0x25/0x2c Nov 4 13:42:33 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c Nov 4 13:42:33 router Nov 4 13:42:33 router Leftover inexact backtrace: Nov 4 13:42:33 router Nov 4 13:42:33 router [<c010430d>] Nov 4 13:42:33 router die+0x269/0x29e Nov 4 13:42:33 router [<c0308343>] Nov 4 13:42:33 router do_trap+0x81/0x9b Nov 4 13:42:33 router [<c01047f3>] Nov 4 13:42:33 router do_invalid_op+0x97/0xa1 Nov 4 13:42:33 router [<c0308159>] Nov 4 13:42:33 router error_code+0x39/0x40 Nov 4 13:42:33 router [<c015ea11>] Nov 4 13:42:33 router BUG: spinlock lockup on CPU#0, swapper/0, c03a4400 Nov 4 13:42:33 router [<c0103c49>] Nov 4 13:42:33 router dump_trace+0x64/0x1c5 Nov 4 13:42:33 router [<c0103dc4>] Nov 4 13:42:33 router show_trace_log_lvl+0x1a/0x2f Nov 4 13:42:33 router [<c0104383>] Nov 4 13:42:33 router show_trace+0x12/0x14 Nov 4 13:42:33 router [<c01043c9>] Nov 4 13:42:33 router dump_stack+0x19/0x1b Nov 4 13:42:33 router [<c01fa933>] Nov 4 13:42:33 router _raw_spin_lock+0xe1/0x105 Nov 4 13:42:33 router [<c03080ca>] Nov 4 13:42:33 router _spin_lock_irqsave+0x3b/0x44 Nov 4 13:42:33 router [<c026b99d>] Nov 4 13:42:33 router ide_intr+0x17/0x1ba Nov 4 13:42:33 router [<c0143685>] Nov 4 13:42:33 router handle_IRQ_event+0x1a/0x46 Nov 4 13:42:33 router [<c0144920>] Nov 4 13:42:33 router handle_level_irq+0x8a/0xda Nov 4 13:42:33 router [<c0104dc0>] Nov 4 13:42:33 router do_IRQ+0x9b/0xb7 Nov 4 13:42:33 router [<c0103779>] Nov 4 13:42:33 router common_interrupt+0x25/0x2c Nov 4 13:42:33 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c Nov 4 13:42:33 router Nov 4 13:42:33 router Leftover inexact backtrace: Nov 4 13:42:33 router Nov 4 13:42:33 router [<c010430d>] Nov 4 13:42:33 router die+0x269/0x29e Nov 4 13:42:33 router [<c0308343>] Nov 4 13:42:33 router do_trap+0x81/0x9b Nov 4 13:42:33 router [<c01047f3>] Nov 4 13:42:33 router do_invalid_op+0x97/0xa1 Nov 4 13:42:33 router [<c0308159>] Nov 4 13:42:33 router error_code+0x39/0x40 Nov 4 13:42:33 router [<c015ea11>] Message from syslogd@router at Sat Nov 4 13:43:18 2006 ... router BUG: spinlock lockup on CPU#0, swapper/0, c03a4400 ===snap=== reboot ===snip=== Nov 4 14:14:26 router slab: Internal list corruption detected in cache 'dm_io'(113), slabp d1869000(72). Hexdump: Nov 4 14:14:26 router Nov 4 14:14:26 router 000: Nov 4 14:14:26 router 00 Nov 4 14:14:26 router 01 Nov 4 14:14:26 router 10 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router 02 Nov 4 14:14:26 router 20 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router e0 Nov 4 14:14:26 router 01 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router e0 Nov 4 14:14:26 router 91 Nov 4 14:14:26 router 86 Nov 4 14:14:26 router d1 Nov 4 14:14:26 router Nov 4 14:14:26 router 010: Nov 4 14:14:26 router 48 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router last message repeated 2 times Nov 4 14:14:26 router 0a Nov 4 14:14:26 router 00 Nov 4 14:14:26 router last message repeated 6 times Nov 4 14:14:26 router fd Nov 4 14:14:26 router ff Nov 4 14:14:26 router ff Nov 4 14:14:26 router x_tables Nov 4 14:14:26 router hisax Nov 4 14:14:26 router isdn Nov 4 14:14:26 router hfcpci Nov 4 14:14:26 router mISDN_dsp Nov 4 14:14:26 router l3udss1 Nov 4 14:14:26 router mISDN_l2 Nov 4 14:14:26 router mISDN_l1 Nov 4 14:14:26 router mISDN_core Nov 4 14:14:26 router eeprom Nov 4 14:14:26 router netconsole Nov 4 14:14:26 router lp Nov 4 14:14:26 router capability Nov 4 14:14:26 router commoncap Nov 4 14:14:26 router softdog Nov 4 14:14:26 router nls_iso8859_15 Nov 4 14:14:26 router isofs Nov 4 14:14:26 router zlib_inflate Nov 4 14:14:26 router loop Nov 4 14:14:26 router psmouse Nov 4 14:14:26 router pcips2 Nov 4 14:14:26 router via_agp Nov 4 14:14:26 router agpgart Nov 4 14:14:26 router i2c_viapro Nov 4 14:14:26 router via686a Nov 4 14:14:26 router i2c_isa Nov 4 14:14:26 router i2c_core Nov 4 14:14:26 router cyblafb Nov 4 14:14:26 router parport_pc Nov 4 14:14:26 router parport Nov 4 14:14:26 router 8250_pnp Nov 4 14:14:26 router usblp Nov 4 14:14:26 router slab: Internal list corruption detected in cache 'dm_io'(113), slabp d1869000(72). Hexdump: Nov 4 14:14:26 router Nov 4 14:14:26 router 000: Nov 4 14:14:26 router 00 Nov 4 14:14:26 router 01 Nov 4 14:14:26 router 10 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router 02 Nov 4 14:14:26 router 20 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router e0 Nov 4 14:14:26 router 01 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router e0 Nov 4 14:14:26 router 91 Nov 4 14:14:26 router 86 Nov 4 14:14:26 router d1 Nov 4 14:14:26 router Nov 4 14:14:26 router 010: Nov 4 14:14:26 router 48 Nov 4 14:14:26 router 00 Nov 4 14:14:26 router last message repeated 2 times Nov 4 14:14:26 router 0a Nov 4 14:14:26 router 00 Nov 4 14:14:26 router last message repeated 6 times Nov 4 14:14:26 router fd Nov 4 14:14:26 router ff Nov 4 14:14:26 router ff Nov 4 14:14:26 router x_tables Nov 4 14:14:26 router hisax Nov 4 14:14:26 router isdn Nov 4 14:14:26 router hfcpci Nov 4 14:14:26 router mISDN_dsp Nov 4 14:14:26 router l3udss1 Nov 4 14:14:26 router mISDN_l2 Nov 4 14:14:26 router mISDN_l1 Nov 4 14:14:26 router mISDN_core Nov 4 14:14:26 router eeprom Nov 4 14:14:26 router netconsole Nov 4 14:14:26 router lp Nov 4 14:14:26 router capability Nov 4 14:14:26 router commoncap Nov 4 14:14:26 router softdog Nov 4 14:14:26 router nls_iso8859_15 Nov 4 14:14:26 router isofs Nov 4 14:14:26 router zlib_inflate Nov 4 14:14:26 router loop Nov 4 14:14:26 router psmouse Nov 4 14:14:26 router pcips2 Nov 4 14:14:26 router via_agp Nov 4 14:14:26 router agpgart Nov 4 14:14:26 router i2c_viapro Nov 4 14:14:26 router via686a Nov 4 14:14:26 router i2c_isa Nov 4 14:14:26 router i2c_core Nov 4 14:14:26 router cyblafb Nov 4 14:14:26 router parport_pc Nov 4 14:14:26 router parport Nov 4 14:14:26 router 8250_pnp Nov 4 14:14:26 router usblp Nov 4 14:14:44 notebook kernel: CIFS VFS: server not responding Nov 4 14:14:44 notebook kernel: CIFS VFS: No response for cmd 50 mid 29076 Nov 4 14:15:01 notebook /usr/sbin/cron[9642]: (root) CMD ( [ -f /openpkg/etc/rc ] && /openpkg/etc/rc all quarterly) Message from syslogd@router at Sat Nov 4 14:15:03 2006 ... router BUG: spinlock lockup on CPU#0, smbd/7861, c03a4400 Nov 4 14:15:03 router BUG: spinlock lockup on CPU#0, smbd/7861, c03a4400 Nov 4 14:15:03 router [<c0103c49>] Nov 4 14:15:03 router dump_trace+0x64/0x1c5 Nov 4 14:15:03 router [<c0103dc4>] Nov 4 14:15:03 router show_trace_log_lvl+0x1a/0x2f Nov 4 14:15:03 router [<c0104383>] Nov 4 14:15:03 router show_trace+0x12/0x14 Nov 4 14:15:03 router [<c01043c9>] Nov 4 14:15:03 router dump_stack+0x19/0x1b Nov 4 14:15:03 router [<c01fa933>] Nov 4 14:15:03 router _raw_spin_lock+0xe1/0x105 Nov 4 14:15:03 router [<c03080ca>] Nov 4 14:15:03 router _spin_lock_irqsave+0x3b/0x44 Nov 4 14:15:03 router [<c026b99d>] Nov 4 14:15:03 router ide_intr+0x17/0x1ba Nov 4 14:15:03 router [<c0143685>] Nov 4 14:15:03 router handle_IRQ_event+0x1a/0x46 Nov 4 14:15:03 router [<c0144920>] Nov 4 14:15:03 router handle_level_irq+0x8a/0xda Nov 4 14:15:03 router [<c0104dc0>] Nov 4 14:15:03 router do_IRQ+0x9b/0xb7 Nov 4 14:15:03 router [<c0103779>] Nov 4 14:15:03 router common_interrupt+0x25/0x2c Nov 4 14:15:03 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c Nov 4 14:15:03 router Nov 4 14:15:03 router Leftover inexact backtrace: Nov 4 14:15:03 router Nov 4 14:15:03 router [<c010430d>] Nov 4 14:15:03 router die+0x269/0x29e Nov 4 14:15:03 router [<c0308343>] Nov 4 14:15:03 router do_trap+0x81/0x9b Nov 4 14:15:03 router [<c01047f3>] Nov 4 14:15:03 router do_invalid_op+0x97/0xa1 Nov 4 14:15:03 router [<c0308159>] Nov 4 14:15:03 router error_code+0x39/0x40 Nov 4 14:15:03 router [<c015ea11>] Nov 4 14:15:03 router free_block+0x6f/0x151 Nov 4 14:15:03 router [<c015e6a9>] Nov 4 14:15:03 router cache_flusharray+0xad/0x10a Nov 4 14:15:03 router [<c015e845>] Nov 4 14:15:03 router kmem_cache_free+0x82/0xad Nov 4 14:15:03 router [<c01497bd>] Nov 4 14:15:03 router mempool_free_slab+0xe/0x10 Nov 4 14:15:03 router [<c0149823>] Nov 4 14:15:03 router mempool_free+0x64/0x6a Nov 4 14:15:03 router [<c0299105>] Nov 4 14:15:03 router dec_pending+0x10d/0x115 Nov 4 14:15:03 router [<c0299285>] Nov 4 14:15:03 router clone_endio+0x70/0xa7 Nov 4 14:15:03 router [<c0181651>] Nov 4 14:15:03 router bio_endio+0x5e/0x66 Nov 4 14:15:03 router [<c01e0893>] Nov 4 14:15:03 router __end_that_request_first+0x165/0x40b Nov 4 14:15:03 router [<c01e0b4e>] Nov 4 14:15:03 router end_that_request_first+0xb/0xd Nov 4 14:15:03 router [<c026aed6>] Nov 4 14:15:03 router ide_end_request+0x87/0xd0 Nov 4 14:15:03 router BUG: spinlock lockup on CPU#0, smbd/7861, c03a4400 Nov 4 14:15:03 router [<c0271f65>] Nov 4 14:15:03 router ide_dma_intr+0x58/0x98 Nov 4 14:15:03 router [<c026badd>] Nov 4 14:15:03 router [<c0103c49>] Nov 4 14:15:03 router ide_intr+0x157/0x1ba Nov 4 14:15:03 router [<c0143685>] Nov 4 14:15:03 router handle_IRQ_event+0x1a/0x46 Nov 4 14:15:03 router dump_trace+0x64/0x1c5 Nov 4 14:15:03 router [<c0144920>] Nov 4 14:15:03 router handle_level_irq+0x8a/0xda Nov 4 14:15:03 router [<c0104dc0>] Nov 4 14:15:03 router do_IRQ+0x9b/0xb7 Nov 4 14:15:03 router [<c0103dc4>] Nov 4 14:15:03 router [<c0103779>] Nov 4 14:15:03 router show_trace_log_lvl+0x1a/0x2f Nov 4 14:15:03 router [<c0104383>] Nov 4 14:15:03 router show_trace+0x12/0x14 Nov 4 14:15:03 router [<c01043c9>] Nov 4 14:15:03 router dump_stack+0x19/0x1b Nov 4 14:15:03 router [<c01fa933>] Nov 4 14:15:03 router common_interrupt+0x25/0x2c Nov 4 14:15:03 router [<c015e3e8>] Nov 4 14:15:03 router cache_alloc_debugcheck_after+0x25/0x13e Nov 4 14:15:03 router [<c015f414>] Nov 4 14:15:03 router _raw_spin_lock+0xe1/0x105 Nov 4 14:15:03 router __kmalloc_track_caller+0xc9/0xd5 Nov 4 14:15:03 router [<c02adf08>] Nov 4 14:15:03 router __alloc_skb+0x4f/0xfa Nov 4 14:15:03 router [<c03080ca>] Nov 4 14:15:03 router [<c02d57c5>] Nov 4 14:15:03 router tcp_sendmsg+0x14f/0xa0a Nov 4 14:15:03 router [<c02ed6e0>] Nov 4 14:15:03 router inet_sendmsg+0x3e/0x49 Nov 4 14:15:03 router _spin_lock_irqsave+0x3b/0x44 Nov 4 14:15:03 router [<c02a8424>] Nov 4 14:15:03 router sock_aio_write+0xfb/0x107 Nov 4 14:15:03 router [<c0162623>] Nov 4 14:15:03 router [<c026b99d>] Nov 4 14:15:03 router do_sync_write+0xc5/0x102 Nov 4 14:15:03 router [<c0162e77>] Nov 4 14:15:03 router vfs_write+0xc3/0x168 Nov 4 14:15:03 router [<c01633be>] Nov 4 14:15:03 router ide_intr+0x17/0x1ba Nov 4 14:15:03 router sys_write+0x3d/0x61 Nov 4 14:15:03 router [<c0102d8b>] Nov 4 14:15:03 router syscall_call+0x7/0xb Nov 4 14:15:03 router [<c0143685>] Nov 4 14:15:03 router ======================= Nov 4 14:15:03 router handle_IRQ_event+0x1a/0x46 Nov 4 14:15:03 router [<c0144920>] Nov 4 14:15:03 router handle_level_irq+0x8a/0xda Nov 4 14:15:03 router [<c0104dc0>] Nov 4 14:15:03 router do_IRQ+0x9b/0xb7 Nov 4 14:15:03 router [<c0103779>] Nov 4 14:15:03 router common_interrupt+0x25/0x2c Nov 4 14:15:03 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c Nov 4 14:15:03 router Nov 4 14:15:03 router [<c010430d>] Nov 4 14:15:03 router do_trap+0x81/0x9b Nov 4 14:15:03 router error_code+0x39/0x40 Nov 4 14:15:03 router kmem_cache_free+0x82/0xad Nov 4 14:15:03 router [<c0149823>] Nov 4 14:15:03 router [<c0299105>] Nov 4 14:15:03 router dec_pending+0x10d/0x115 Nov 4 14:15:03 router [<c0299285>] Nov 4 14:15:03 router clone_endio+0x70/0xa7 Nov 4 14:15:03 router [<c0181651>] Nov 4 14:15:03 router bio_endio+0x5e/0x66 Nov 4 14:15:03 router [<c01e0893>] Nov 4 14:15:03 router __end_that_request_first+0x165/0x40b Nov 4 14:15:03 router [<c01e0b4e>] Nov 4 14:15:03 router end_that_request_first+0xb/0xd Nov 4 14:15:03 router [<c026aed6>] Nov 4 14:15:03 router ide_end_request+0x87/0xd0 Nov 4 14:15:03 router [<c0271f65>] Nov 4 14:15:03 router ide_dma_intr+0x58/0x98 Nov 4 14:15:03 router [<c026badd>] Nov 4 14:15:03 router ide_intr+0x157/0x1ba Nov 4 14:15:03 router [<c0143685>] Nov 4 14:15:03 router handle_IRQ_event+0x1a/0x46 Nov 4 14:15:03 router [<c0144920>] Nov 4 14:15:03 router handle_level_irq+0x8a/0xda Nov 4 14:15:03 router [<c0104dc0>] Nov 4 14:15:03 router do_IRQ+0x9b/0xb7 Nov 4 14:15:03 router [<c0103779>] Nov 4 14:15:03 router common_interrupt+0x25/0x2c Nov 4 14:15:03 router [<c015e3e8>] Nov 4 14:15:03 router cache_alloc_debugcheck_after+0x25/0x13e Nov 4 14:15:03 router [<c015f414>] Nov 4 14:15:03 router __kmalloc_track_caller+0xc9/0xd5 Nov 4 14:15:03 router [<c02adf08>] Nov 4 14:15:03 router __alloc_skb+0x4f/0xfa Nov 4 14:15:03 router [<c02d57c5>] Nov 4 14:15:03 router tcp_sendmsg+0x14f/0xa0a Nov 4 14:15:03 router [<c02ed6e0>] Nov 4 14:15:03 router inet_sendmsg+0x3e/0x49 Nov 4 14:15:03 router [<c02a8424>] Nov 4 14:15:03 router sock_aio_write+0xfb/0x107 Nov 4 14:15:03 router [<c0162623>] Nov 4 14:15:03 router do_sync_write+0xc5/0x102 Nov 4 14:15:03 router [<c0162e77>] Nov 4 14:15:03 router vfs_write+0xc3/0x168 Nov 4 14:15:03 router [<c01633be>] Nov 4 14:15:03 router sys_write+0x3d/0x61 Nov 4 14:15:03 router [<c0102d8b>] Nov 4 14:15:03 router syscall_call+0x7/0xb Nov 4 14:15:03 router ======================= Message from syslogd@router at Sat Nov 4 14:15:49 2006 ... router BUG: spinlock lockup on CPU#0, smbd/7861, c03a4400 ===snap=== For me, it seems that there is some locking error, allowing two concurrent accesses to the memory management data structures simultaneously. I hope that someone can help. I'm happy to do various experiments with that machine or apply various speculative kernel patches to that machine's kernel. Steps to reproduce: (1) Use 4 physical disks connected to 2 IDE ports. (2) Create LVM2 physical volumes on these disks. (3) Create a LVM2 volume group using these physical volumes. (4) Create 3 or 4 logical volumes on 3 or 4 of these physical volumes (5) Create a large MD-RAID5 device out of these logical volumes. (6) Let the RAID5 device resync. (7) Wait for crash.
This looks like random memory corruption in various slabs (dm_tio, dm_io, biovec-1). The slabs themselves have been trampled on. Does this problem persist with newer kernels?
BTW, I'm seeing CIFS messages in the mix. Can you reproduce this with no CIFS mounts?
Sorry, I'm currently overseas (and will remain overseas for some months), so I cannot afford to try to crash the machine in question, as I for myself cannot restart it, as long as I'm overseas.
Hello, I wished this bug has vanished (and I actually thought that), but it hit me hard on newer kernels (2.6.21.3) :-( So yes, this problem persists with newer kernels. Now, sometimes, the machine reboots suddenly. Sometimes, the machine prints this stack trace and reboots some minutes later: Jun 11 00:10:25 router kernel: [ 6656.688000] slab: Internal list corruption detected in cache 'biovec-1'(145), slabp e4141000(86). Hexdump: Jun 11 00:10:25 router kernel: [ 6656.688000] Jun 11 00:10:25 router kernel: [ 6656.688000] 000: 00 10 3e c2 dc 77 7e e7 60 02 00 00 60 12 14 e4 Jun 11 00:10:25 router kernel: [ 6656.688000] 010: 56 00 00 00 7b 00 00 00 00 00 c8 9a 22 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 020: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 030: fe ff ff ff fe ff ff ff 2d 00 00 00 2c 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 040: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 050: fe ff ff ff fe ff ff ff fe ff ff ff 25 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 060: fe ff ff ff 7d 00 00 00 ff ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 070: 48 00 00 00 fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 080: fe ff ff ff fe ff ff ff 20 00 00 00 1b 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 090: 44 00 00 00 fe ff ff ff fe ff ff ff 5b 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 0a0: 3e 00 00 00 4f 00 00 00 fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 0b0: 59 00 00 00 fe ff ff ff 51 00 00 00 07 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 0c0: fe ff ff ff fe ff ff ff fe ff ff ff 58 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 0d0: 27 00 00 00 fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 0e0: 87 00 00 00 fe ff ff ff 21 00 00 00 50 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 0f0: fe ff ff ff 15 00 00 00 fe ff ff ff 78 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 100: fe ff ff ff fe ff ff ff 1d 00 00 00 6d 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 110: fe ff ff ff 31 00 00 00 fe ff ff ff 1c 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 120: 4a 00 00 00 fe ff ff ff fe ff ff ff 81 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 130: fe ff ff ff 5d 00 00 00 3b 00 00 00 67 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 140: 5a 00 00 00 47 00 00 00 fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 150: fe ff ff ff fe ff ff ff 13 00 00 00 86 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 160: 55 00 00 00 fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 170: 00 00 00 00 12 00 00 00 fe ff ff ff 62 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 180: 46 00 00 00 00 00 00 00 7c 00 00 00 fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 190: 63 00 00 00 fe ff ff ff 49 00 00 00 fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 1a0: fe ff ff ff 5f 00 00 00 41 00 00 00 73 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 1b0: fe ff ff ff fe ff ff ff 64 00 00 00 fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 1c0: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 1d0: 80 00 00 00 fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 1e0: fe ff ff ff fe ff ff ff 28 00 00 00 fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 1f0: fe ff ff ff fe ff ff ff fe ff ff ff 36 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 200: fe ff ff ff fe ff ff ff 38 00 00 00 34 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 210: 33 00 00 00 fe ff ff ff fe ff ff ff 10 00 00 00 Jun 11 00:10:25 router kernel: [ 6656.688000] 220: 08 00 00 00 fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 230: fe ff ff ff 56 00 00 00 3c 00 00 00 fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 240: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] 250: fe ff ff ff fe ff ff ff fe ff ff ff fe ff ff ff Jun 11 00:10:25 router kernel: [ 6656.688000] ------------[ cut here ]------------ Jun 11 00:10:25 router kernel: [ 6656.688000] kernel BUG at mm/slab.c:2936! Jun 11 00:10:25 router kernel: [ 6656.688000] invalid opcode: 0000 [#1] Jun 11 00:10:25 router kernel: [ 6656.688000] PREEMPT Jun 11 00:10:25 router kernel: [ 6656.688000] Modules linked in: nls_utf8 cifs nls_cp850 nls_iso8859_1 smbfs act_police sch_ingress cls_u32 sch_sfq sch_htb rfcomm hidp l2cap bluetooth cls_fw sch_prio sch_tbf xt_mark xt_multiport xt_MARK ipt_MASQUERADE xt_TCPMSS ipt_TOS xt_length iptable_mangle nf_nat_ftp nf_conntrack_ftp ipt_REJECT iptable_filter xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables x_tables hisax isdn mISDN_dsp hfcpci mISDN_capi l3udss1 mISDN_l2 mISDN_l1 mISDN_core capi capifs kernelcapi eep rom lp capability commoncap softdog nls_iso8859_15 isofs zlib_inflate loop psmouse pcips2 8250_pnp 8250 usblp serial_core i2c_viapro via686a i2c_isa pcspkr i2c_core cyblafb via_agp parport_pc agpgart parport evdev dm_mirror pppoe pppox ppp_generic slhc ohci_hcd uhci_hcd usbmouse usbkbd usbhid usbcore ipv6 af_packet netconsole 8139too mii bitrev crc32 unix Jun 11 00:10:25 router kernel: [ 6656.688000] CPU: 0 Jun 11 00:10:25 router kernel: [ 6656.688000] EIP: 0060:[<c0171ff0>] Not tainted VLI Jun 11 00:10:25 router kernel: [ 6656.688000] EFLAGS: 00010086 (2.6.21.3lowLatency #2) Jun 11 00:10:25 router kernel: [ 6656.688000] EIP is at check_slabp+0xf0/0x110 Jun 11 00:10:25 router kernel: [ 6656.688000] eax: 00000001 ebx: e414125f ecx: c7444000 edx: 00000001 Jun 11 00:10:25 router kernel: [ 6656.688000] esi: e4141000 edi: 00000260 ebp: c74459ac esp: c7445988 Jun 11 00:10:25 router kernel: [ 6656.688000] ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Jun 11 00:10:25 router kernel: [ 6656.688000] Process md2_resync (pid: 8045, ti=c7444000 task=d835e490 task.ti=c7444000) Jun 11 00:10:25 router kernel: [ 6656.688000] Stack: c04405c1 000000ff 00000091 e4141000 00000056 e77c32a0 00000000 00000246 Jun 11 00:10:25 router kernel: [ 6656.688000] e4141000 c7445a18 c0173381 c0172602 00000000 00000044 c74459f0 00000000 Jun 11 00:10:25 router kernel: [ 6656.688000] 00011200 00011200 e77c32a0 e77d6dd0 00000010 e77e77dc e77db918 c74459f0 Jun 11 00:10:25 router kernel: [ 6656.688000] Call Trace: Jun 11 00:10:25 router kernel: [ 6656.688000] [<c010528a>] show_trace_log_lvl+0x1a/0x30 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0105351>] show_stack_log_lvl+0xb1/0xe0 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c010557f>] show_registers+0x1ff/0x380 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0105823>] die+0x123/0x260 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c03791f2>] do_trap+0x82/0xb0 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0105f07>] do_invalid_op+0x97/0xb0 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0378fbc>] error_code+0x74/0x7c Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0173381>] cache_alloc_refill+0xd1/0x6b0 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0173cf3>] kmem_cache_alloc+0xb3/0xc0 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c015780e>] mempool_alloc_slab+0xe/0x10 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0157941>] mempool_alloc+0x31/0x140 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c019d823>] bio_alloc_bioset+0x73/0x140 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c02f2407>] clone_bio+0x37/0x80 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c02f2b8e>] __split_bio+0x17e/0x470 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c02f39fe>] dm_request+0xce/0x140 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c020f90b>] generic_make_request+0x1bb/0x360 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c02da183>] handle_stripe5+0xb53/0x17b0 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c02dc8d2>] handle_stripe+0x382/0x1a10 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c02dec1d>] sync_request+0x21d/0xcc0 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c02ec8c7>] md_do_sync+0x7e7/0xd20 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c02eb901>] md_thread+0x31/0x110 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0134633>] kthread+0xa3/0xd0 Jun 11 00:10:25 router kernel: [ 6656.688000] [<c0104e77>] kernel_thread_helper+0x7/0x10 Jun 11 00:10:25 router kernel: [ 6656.688000] ======================= Jun 11 00:10:26 router kernel: [ 6656.688000] Code: ff 8b 55 f0 8b 42 20 8d 04 85 1c 00 00 00 39 f8 76 0d 83 c3 01 f7 c7 0f 00 00 00 75 ce eb b9 c7 04 24 c1 05 44 c0 e8 90 ea fa ff <0f> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 67 ff ff ff 8d Jun 11 00:10:26 router kernel: [ 6656.688000] EIP: [<c0171ff0>] check_slabp+0xf0/0x110 SS:ESP 0068:c7445988 Jun 11 00:10:26 router kernel: [ 6656.688000] note: md2_resync[8045] exited with preempt_count 1
One additional observation: In one incidence, the machine rebooted about 1..3 seconds after "smartd" has checked the SMART status of each of the IDE hard disks. Also, the monitoring of the file "/sys/block/md2/md/sync_completed" showed that the value of "/sys/block/md2/md/sync_completed" (while normally changing constantly during RAID rebuilding) did not change for about 1.5 seconds before, and additionally did change slower than usual before. This leads to a hypothesis that "smartd" may trigger these reboots, maybe by inducing longer delays in disk access, maybe leading to sudden error states or maybe leading to timeouts kicking in (which do not kick in normally). Maybe the sudden-reboot problem is unrelated to the slab corruption problem, maybe not.
I don't recognise the precise problem, but there have been fixes in related parts of the code, so do please keep retrying with newer kernels to see if it got fixed.
My guess is that this is a problem with the driver for the VIA ide controller. I don't suppose you have a spare IDE card from a different manufacturer that you could try putting in?? Should we assign it to the IDE people to see if they can help (I think you would need to do that Alasdair).
> did change slower than usual before. This leads to a hypothesis that "smartd" > may trigger these reboots, maybe by inducing longer delays in disk access, > maybe leading to sudden error states or maybe leading to timeouts kicking in Yes, SMART check may induce delays in disk access but this shouldn't cause other problems (at least for IDE). > My guess is that this is a problem with the driver for the VIA ide > controller. This is possible but there are no open/known issues with VIA host driver currently so more info is needed (dmesg output). > I don't suppose you have a spare IDE card from a different manufacturer > that you could try putting in?? That would be useful, also does the issue still happen with 2.6.23?
PS disabling "smartd" completely and seeing if it helps is also worth a try.
I have been doing extensive resyncs under linux 2.6.22.7 with the slab allocator as memory allocator on the same machine with the same setup, and I cannot reproduce the bug anymore, regardless whether smartd is switched on or off. Thus, I assume that this bug has been fixed (for some not exactly known reason) between linux 2.6.21.3 and linux 2.6.22.7. :-) Thank you very much for your support. :-) Thus, I'm closing this bug for the time being.
Great, thanks for reporting it.