Bug 7458 - Severe crashes using MD-RAID5 over LVM2 over IDE
Summary: Severe crashes using MD-RAID5 over LVM2 over IDE
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: IDE (show other bugs)
Hardware: i386 Linux
: P2 blocking
Assignee: Bartlomiej Zolnierkiewicz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-04 05:52 UTC by Xu
Modified: 2008-01-09 11:19 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.19-rc4
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Xu 2006-11-04 05:52:43 UTC
Most recent kernel where this bug did not occur:
Distribution: OpenSuSE 10.1
Hardware Environment:
> uname -a
Linux router 2.6.19-rc4-nopreempt #7 Wed Nov 1 03:50:53 CET 2006 i686 athlon
i386 GNU/Linux
> lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8361 [KLE133] Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:07.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 16)
00:07.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 16)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio
Controller (rev 50)
00:09.0 Network controller: Cologne Chip Designs GmbH ISDN network controller
[HFC-PCI] (rev 02)
00:0a.0 Network controller: ASUSTeK Computer Inc. ISDNLink P-IN100-ST-D (rev 01)
00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
01:00.0 VGA compatible controller: Trident Microsystems CyberBlade/i1
>

Software Environment:
Problem Description:

Hello,

I suffer severe crashes when using software RAID5 over LVM2 over 4 IDE disks on
an x86 UP machine. The kernel has not SMP support enabled. I upgraded the kernel
from 2.6.18.1 to version 2.6.19-rc4, but with no improvement. The crashes are
dependent on the RAID5 resync-speed. If the resync speed is high, the crashes
occur within minutes. If it is set to the lowest possible value, the crashes
occur within days. Up to now, I have been unable to do a full resync on that
RAID5 device, because crashes happened always far earlier than resync would have
finished.

I had been using software RAID4 on this machine for a long time (without
sandwitching LVM2 between RAID4 and IDE) which worked flawlessly. I already
disabled preemption and IRQ unmasking with no apparent success. Memtest86+ shows
no signs of memory corruption.

Using netpoll, I obtained following crash logs:


===snip===
Nov  4 13:05:44 router slab: Internal list corruption detected in cache
'dm_tio'(127), slabp cba31000(80). Hexdump:
Nov  4 13:05:44 router
Nov  4 13:05:44 router 000:
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  e0
Nov  4 13:05:44 router  4f
Nov  4 13:05:44 router  cb
Nov  4 13:05:44 router  50
Nov  4 13:05:44 router  c9
Nov  4 13:05:44 router  64
Nov  4 13:05:44 router  c1
Nov  4 13:05:44 router  18
Nov  4 13:05:44 router  02
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  18
Nov  4 13:05:44 router  12
Nov  4 13:05:44 router  a3
Nov  4 13:05:44 router  cb
Nov  4 13:05:44 router
Nov  4 13:05:44 router 010:
Nov  4 13:05:44 router  50
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router last message repeated 2 times
Nov  4 13:05:44 router  51
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router last message repeated 6 times
Nov  4 13:05:44 router  6b
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  iptable_mangle
Nov  4 13:05:44 router  ip_nat_ftp
Nov  4 13:05:44 router  ip_conntrack_ftp
Nov  4 13:05:44 router  ipt_REJECT
Nov  4 13:05:44 router  iptable_filter
Nov  4 13:05:44 router  sch_prio
Nov  4 13:05:44 router  xt_mark
Nov  4 13:05:44 router  xt_MARK
Nov  4 13:05:44 router  nls_utf8
Nov  4 13:05:44 router  cifs
Nov  4 13:05:44 router  act_police
Nov  4 13:05:44 router  nls_cp850
Nov  4 13:05:44 router  nls_iso8859_1
Nov  4 13:05:44 router  smbfs
Nov  4 13:05:44 router  sch_ingress
Nov  4 13:05:44 router  cls_u32
Nov  4 13:05:44 router  sch_htb
Nov  4 13:05:44 router  xt_tcpudp
Nov  4 13:05:44 router  iptable_nat
Nov  4 13:05:44 router  ip_nat
Nov  4 13:05:44 router  ip_conntrack
Nov  4 13:05:44 router  nfnetlink
Nov  4 13:05:44 router  ip_tables
Nov  4 13:05:44 router  x_tables
Nov  4 13:05:44 router  rfcomm
Nov  4 13:05:44 router  hidp
Nov  4 13:05:44 router  l2cap
Nov  4 13:05:44 router  bluetooth
Nov  4 13:05:44 router  hisax
Nov  4 13:05:44 router  isdn
Nov  4 13:05:44 router  hfcpci
Nov  4 13:05:44 router  mISDN_dsp
Nov  4 13:05:44 router slab: Internal list corruption detected in cache
'dm_tio'(127), slabp cba31000(80). Hexdump:
Nov  4 13:05:44 router
Nov  4 13:05:44 router 000:
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  e0
Nov  4 13:05:44 router  4f
Nov  4 13:05:44 router  cb
Nov  4 13:05:44 router  50
Nov  4 13:05:44 router  c9
Nov  4 13:05:44 router  64
Nov  4 13:05:44 router  c1
Nov  4 13:05:44 router  18
Nov  4 13:05:44 router  02
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  18
Nov  4 13:05:44 router  12
Nov  4 13:05:44 router  a3
Nov  4 13:05:44 router  cb
Nov  4 13:05:44 router
Nov  4 13:05:44 router 010:
Nov  4 13:05:44 router  50
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router last message repeated 2 times
Nov  4 13:05:44 router  51
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router last message repeated 6 times
Nov  4 13:05:44 router  6b
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  00
Nov  4 13:05:44 router  iptable_mangle
Nov  4 13:05:44 router  ip_nat_ftp
Nov  4 13:05:44 router  ip_conntrack_ftp
Nov  4 13:05:44 router  ipt_REJECT
Nov  4 13:05:44 router  iptable_filter
Nov  4 13:05:44 router  sch_prio
Nov  4 13:05:44 router  xt_mark
Nov  4 13:05:44 router  xt_MARK
Nov  4 13:05:44 router  nls_utf8
Nov  4 13:05:44 router  cifs
Nov  4 13:05:44 router  act_police
Nov  4 13:05:44 router  nls_cp850
Nov  4 13:05:44 router  nls_iso8859_1
Nov  4 13:05:44 router  smbfs
Nov  4 13:05:44 router  sch_ingress
Nov  4 13:05:44 router  cls_u32
Nov  4 13:05:44 router  sch_htb
Nov  4 13:05:44 router  xt_tcpudp
Nov  4 13:05:44 router  iptable_nat
Nov  4 13:05:44 router  ip_nat
Nov  4 13:05:44 router  ip_conntrack
Nov  4 13:05:44 router  nfnetlink
Nov  4 13:05:44 router  ip_tables
Nov  4 13:05:44 router  x_tables
Nov  4 13:05:44 router  rfcomm
Nov  4 13:05:44 router  hidp
Nov  4 13:05:44 router  l2cap
Nov  4 13:05:44 router  bluetooth
Nov  4 13:05:44 router  hisax
Nov  4 13:05:44 router  isdn
Nov  4 13:05:44 router  hfcpci
Nov  4 13:05:44 router  mISDN_dsp
Nov  4 13:06:20 router BUG: spinlock lockup on CPU#0, md2_raid5/915, c164c974
Nov  4 13:06:20 router  [<c0103c49>]
Nov  4 13:06:20 router dump_trace+0x64/0x1c5
Nov  4 13:06:20 router  [<c0103dc4>]
Nov  4 13:06:20 router show_trace_log_lvl+0x1a/0x2f
Nov  4 13:06:20 router  [<c0104383>]
Nov  4 13:06:20 router show_trace+0x12/0x14
Nov  4 13:06:20 router  [<c01043c9>]
Nov  4 13:06:20 router dump_stack+0x19/0x1b
Nov  4 13:06:20 router  [<c01fa933>]
Nov  4 13:06:20 router _raw_spin_lock+0xe1/0x105
Nov  4 13:06:20 router  [<c0307cd8>]
Nov  4 13:06:20 router _spin_lock+0x32/0x38
Nov  4 13:06:20 router  [<c015e63c>]
Nov  4 13:06:20 router cache_flusharray+0x40/0x10a
Nov  4 13:06:20 router  [<c015e845>]
Nov  4 13:06:20 router kmem_cache_free+0x82/0xad
Nov  4 13:06:20 router  [<c01497bd>]
Nov  4 13:06:20 router mempool_free_slab+0xe/0x10
Nov  4 13:06:20 router  [<c0149823>]
Nov  4 13:06:20 router mempool_free+0x64/0x6a
Nov  4 13:06:20 router  [<c02992a8>]
Nov  4 13:06:20 router clone_endio+0x93/0xa7
Nov  4 13:06:20 router  [<c0181651>]
Nov  4 13:06:20 router bio_endio+0x5e/0x66
Nov  4 13:06:20 router  [<c01e0893>]
Nov  4 13:06:20 router __end_that_request_first+0x165/0x40b
Nov  4 13:06:20 router  [<c01e0b4e>]
Nov  4 13:06:20 router end_that_request_first+0xb/0xd
Nov  4 13:06:20 router  [<c026aed6>]
Nov  4 13:06:20 router ide_end_request+0x87/0xd0
Nov  4 13:06:20 router  [<c0271f65>]
Nov  4 13:06:20 router ide_dma_intr+0x58/0x98
Nov  4 13:06:20 router  [<c026badd>]
Nov  4 13:06:20 router ide_intr+0x157/0x1ba
Nov  4 13:06:20 router  [<c0143685>]
Nov  4 13:06:20 router handle_IRQ_event+0x1a/0x46
Nov  4 13:06:20 router  [<c0144920>]
Nov  4 13:06:20 router handle_level_irq+0x8a/0xda
Nov  4 13:06:20 router  [<c0104dc0>]
Nov  4 13:06:20 router do_IRQ+0x9b/0xb7
Nov  4 13:06:20 router  [<c0103779>]
Nov  4 13:06:20 router common_interrupt+0x25/0x2c
Nov  4 13:06:20 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c
Nov  4 13:06:20 router
Nov  4 13:06:20 router Leftover inexact backtrace:
Nov  4 13:06:20 router
Nov  4 13:06:20 router  [<c02851df>]
Nov  4 13:06:20 router page_is_zero+0x33/0x3f
Nov  4 13:06:20 router  [<c0288946>]
Nov  4 13:06:20 router handle_stripe+0x1cff/0x237b
Nov  4 13:06:20 router  [<c02890d4>]
Nov  4 13:06:20 router raid5d+0x112/0x13b
Nov  4 13:06:20 router  [<c0294d6a>]
Nov  4 13:06:20 router md_thread+0xef/0x106
Nov  4 13:06:20 router  [<c0129e86>]
Nov  4 13:06:20 router kthread+0xb0/0xde
Nov  4 13:06:20 router  [<c010397b>]
Nov  4 13:06:20 router kernel_thread_helper+0x7/0x10
Nov  4 13:06:20 router  =======================
Nov  4 13:06:20 router BUG: spinlock lockup on CPU#0, md2_raid5/915, c164c974
Nov  4 13:06:20 router  [<c0103c49>]
Nov  4 13:06:20 router dump_trace+0x64/0x1c5
Nov  4 13:06:20 router  [<c0103dc4>]
Nov  4 13:06:20 router show_trace_log_lvl+0x1a/0x2f
Nov  4 13:06:20 router  [<c0104383>]
Nov  4 13:06:20 router show_trace+0x12/0x14
Nov  4 13:06:20 router  [<c01043c9>]
Nov  4 13:06:20 router dump_stack+0x19/0x1b
Nov  4 13:06:20 router  [<c01fa933>]
Nov  4 13:06:20 router _raw_spin_lock+0xe1/0x105
Nov  4 13:06:20 router  [<c0307cd8>]
Nov  4 13:06:20 router _spin_lock+0x32/0x38
Nov  4 13:06:20 router  [<c015e63c>]
Nov  4 13:06:20 router cache_flusharray+0x40/0x10a
Nov  4 13:06:20 router  [<c015e845>]
Nov  4 13:06:20 router kmem_cache_free+0x82/0xad
Nov  4 13:06:20 router  [<c01497bd>]
Nov  4 13:06:20 router mempool_free_slab+0xe/0x10
Nov  4 13:06:20 router  [<c0149823>]
Nov  4 13:06:20 router mempool_free+0x64/0x6a
Nov  4 13:06:20 router  [<c02992a8>]
Nov  4 13:06:20 router clone_endio+0x93/0xa7
Nov  4 13:06:20 router  [<c0181651>]
Nov  4 13:06:20 router bio_endio+0x5e/0x66
Nov  4 13:06:20 router  [<c01e0893>]
Nov  4 13:06:20 router __end_that_request_first+0x165/0x40b
Nov  4 13:06:20 router  [<c01e0b4e>]
Nov  4 13:06:20 router end_that_request_first+0xb/0xd
Nov  4 13:06:20 router  [<c026aed6>]
Nov  4 13:06:20 router ide_end_request+0x87/0xd0
Nov  4 13:06:20 router  [<c0271f65>]
Nov  4 13:06:20 router ide_dma_intr+0x58/0x98
Nov  4 13:06:20 router  [<c026badd>]
Nov  4 13:06:20 router ide_intr+0x157/0x1ba
Nov  4 13:06:20 router  [<c0143685>]
Nov  4 13:06:20 router handle_IRQ_event+0x1a/0x46
Nov  4 13:06:20 router  [<c0144920>]
Nov  4 13:06:20 router handle_level_irq+0x8a/0xda
Nov  4 13:06:20 router  [<c0104dc0>]
Nov  4 13:06:20 router do_IRQ+0x9b/0xb7
Nov  4 13:06:20 router  [<c0103779>]
Nov  4 13:06:20 router common_interrupt+0x25/0x2c
Nov  4 13:06:20 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c
Nov  4 13:06:20 router
Nov  4 13:06:20 router Leftover inexact backtrace:
Nov  4 13:06:20 router
Nov  4 13:06:20 router  [<c02851df>]
Nov  4 13:06:20 router page_is_zero+0x33/0x3f
Nov  4 13:06:20 router  [<c0288946>]
Nov  4 13:06:20 router handle_stripe+0x1cff/0x237b
Nov  4 13:06:20 router  [<c02890d4>]
Nov  4 13:06:20 router raid5d+0x112/0x13b
Nov  4 13:06:20 router  [<c0294d6a>]
Nov  4 13:06:20 router md_thread+0xef/0x106
Nov  4 13:06:20 router  [<c0129e86>]
Nov  4 13:06:20 router kthread+0xb0/0xde
Nov  4 13:06:20 router  [<c010397b>]
Nov  4 13:06:20 router kernel_thread_helper+0x7/0x10
Nov  4 13:06:20 router  =======================
===snap===

reboot

===snip===
Nov  4 13:41:55 router slab: Internal list corruption detected in cache
'biovec-1'(145), slabp c2b23000(54). Hexdump:
Nov  4 13:41:55 router
Nov  4 13:41:55 router 000:
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  01
Nov  4 13:41:55 router  10
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  02
Nov  4 13:41:55 router  20
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  60
Nov  4 13:41:55 router  02
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  60
Nov  4 13:41:55 router  32
Nov  4 13:41:55 router  b2
Nov  4 13:41:55 router  c2
Nov  4 13:41:55 router
Nov  4 13:41:55 router 010:
Nov  4 13:41:55 router  36
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router last message repeated 2 times
Nov  4 13:41:55 router  46
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router last message repeated 4 times
Nov  4 13:41:55 router  9b
Nov  4 13:41:55 router  83
Nov  4 13:41:55 router  03
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  ff
Nov  4 13:41:55 router  ff
Nov  4 13:41:55 router  72
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router last message repeated 2 times
Nov  4 13:41:55 router  73
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router last message repeated 2 times
Nov  4 13:41:55 router
Nov  4 13:41:55 router ------------[ cut here ]------------
Nov  4 13:41:55 router kernel BUG at mm/slab.c:2911!
Nov  4 13:41:55 router invalid opcode: 0000 [#1]
Nov  4 13:41:55 router Modules linked in:
Nov  4 13:41:55 router  nls_utf8
Nov  4 13:41:55 router  cifs
Nov  4 13:41:55 router  nls_cp850
Nov  4 13:41:55 router  nls_iso8859_1
Nov  4 13:41:55 router  smbfs
Nov  4 13:41:55 router  act_police
Nov  4 13:41:55 router  sch_ingress
Nov  4 13:41:55 router  cls_u32
Nov  4 13:41:55 router  sch_sfq
Nov  4 13:41:55 router  sch_htb
Nov  4 13:41:55 router  rfcomm
Nov  4 13:41:55 router  hidp
Nov  4 13:41:55 router  l2cap
Nov  4 13:41:55 router  bluetooth
Nov  4 13:41:55 router  hisax
Nov  4 13:41:55 router  isdn
Nov  4 13:41:55 router  hfcpci
Nov  4 13:41:55 router  mISDN_dsp
Nov  4 13:41:55 router  l3udss1
Nov  4 13:41:55 router  mISDN_l2
Nov  4 13:41:55 router  mISDN_l1
Nov  4 13:41:55 router  mISDN_core
Nov  4 13:41:55 router slab: Internal list corruption detected in cache
'biovec-1'(145), slabp c2b23000(54). Hexdump:
Nov  4 13:41:55 router
Nov  4 13:41:55 router 000:
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  01
Nov  4 13:41:55 router  10
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  02
Nov  4 13:41:55 router  20
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  60
Nov  4 13:41:55 router  02
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  60
Nov  4 13:41:55 router  32
Nov  4 13:41:55 router  b2
Nov  4 13:41:55 router  c2
Nov  4 13:41:55 router
Nov  4 13:41:55 router 010:
Nov  4 13:41:55 router  36
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router last message repeated 2 times
Nov  4 13:41:55 router  46
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router last message repeated 4 times
Nov  4 13:41:55 router  9b
Nov  4 13:41:55 router  83
Nov  4 13:41:55 router  03
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router  ff
Nov  4 13:41:55 router  ff
Nov  4 13:41:55 router  72
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router last message repeated 2 times
Nov  4 13:41:55 router  73
Nov  4 13:41:55 router  00
Nov  4 13:41:55 router last message repeated 2 times
Nov  4 13:41:55 router
Nov  4 13:41:55 router ------------[ cut here ]------------
Nov  4 13:41:55 router kernel BUG at mm/slab.c:2911!
Nov  4 13:41:55 router invalid opcode: 0000 [#1]
Nov  4 13:41:55 router Modules linked in:
Nov  4 13:41:55 router  nls_utf8
Nov  4 13:41:55 router  cifs
Nov  4 13:41:55 router  nls_cp850
Nov  4 13:41:55 router  nls_iso8859_1
Nov  4 13:41:55 router  smbfs
Nov  4 13:41:55 router  act_police
Nov  4 13:41:55 router  sch_ingress
Nov  4 13:41:55 router  cls_u32
Nov  4 13:41:55 router  sch_sfq
Nov  4 13:41:55 router  sch_htb
Nov  4 13:41:55 router  rfcomm
Nov  4 13:41:55 router  hidp
Nov  4 13:41:55 router  l2cap
Nov  4 13:41:55 router  bluetooth
Nov  4 13:41:56 router  hisax
Nov  4 13:41:56 router  isdn
Nov  4 13:41:56 router  hfcpci
Nov  4 13:41:56 router  mISDN_dsp
Nov  4 13:41:56 router  l3udss1
Nov  4 13:41:56 router  mISDN_l2
Nov  4 13:41:56 router  mISDN_l1
Nov  4 13:41:56 router  mISDN_core


Message from syslogd@router at Sat Nov  4 13:42:32 2006 ...
router BUG: spinlock lockup on CPU#0, swapper/0, c03a4400
Nov  4 13:42:32 router BUG: spinlock lockup on CPU#0, swapper/0, c03a4400
Nov  4 13:42:32 router  [<c0103c49>]
Nov  4 13:42:32 router dump_trace+0x64/0x1c5
Nov  4 13:42:32 router  [<c0103dc4>]
Nov  4 13:42:32 router show_trace_log_lvl+0x1a/0x2f
Nov  4 13:42:32 router  [<c0104383>]
Nov  4 13:42:32 router show_trace+0x12/0x14
Nov  4 13:42:32 router  [<c01043c9>]
Nov  4 13:42:32 router dump_stack+0x19/0x1b
Nov  4 13:42:32 router  [<c01fa933>]
Nov  4 13:42:32 router _raw_spin_lock+0xe1/0x105
Nov  4 13:42:32 router  [<c03080ca>]
Nov  4 13:42:32 router _spin_lock_irqsave+0x3b/0x44
Nov  4 13:42:33 router  [<c026b99d>]
Nov  4 13:42:33 router ide_intr+0x17/0x1ba
Nov  4 13:42:33 router  [<c0143685>]
Nov  4 13:42:33 router handle_IRQ_event+0x1a/0x46
Nov  4 13:42:33 router  [<c0144920>]
Nov  4 13:42:33 router handle_level_irq+0x8a/0xda
Nov  4 13:42:33 router  [<c0104dc0>]
Nov  4 13:42:33 router do_IRQ+0x9b/0xb7
Nov  4 13:42:33 router  [<c0103779>]
Nov  4 13:42:33 router common_interrupt+0x25/0x2c
Nov  4 13:42:33 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c
Nov  4 13:42:33 router
Nov  4 13:42:33 router Leftover inexact backtrace:
Nov  4 13:42:33 router
Nov  4 13:42:33 router  [<c010430d>]
Nov  4 13:42:33 router die+0x269/0x29e
Nov  4 13:42:33 router  [<c0308343>]
Nov  4 13:42:33 router do_trap+0x81/0x9b
Nov  4 13:42:33 router  [<c01047f3>]
Nov  4 13:42:33 router do_invalid_op+0x97/0xa1
Nov  4 13:42:33 router  [<c0308159>]
Nov  4 13:42:33 router error_code+0x39/0x40
Nov  4 13:42:33 router  [<c015ea11>]
Nov  4 13:42:33 router BUG: spinlock lockup on CPU#0, swapper/0, c03a4400
Nov  4 13:42:33 router  [<c0103c49>]
Nov  4 13:42:33 router dump_trace+0x64/0x1c5
Nov  4 13:42:33 router  [<c0103dc4>]
Nov  4 13:42:33 router show_trace_log_lvl+0x1a/0x2f
Nov  4 13:42:33 router  [<c0104383>]
Nov  4 13:42:33 router show_trace+0x12/0x14
Nov  4 13:42:33 router  [<c01043c9>]
Nov  4 13:42:33 router dump_stack+0x19/0x1b
Nov  4 13:42:33 router  [<c01fa933>]
Nov  4 13:42:33 router _raw_spin_lock+0xe1/0x105
Nov  4 13:42:33 router  [<c03080ca>]
Nov  4 13:42:33 router _spin_lock_irqsave+0x3b/0x44
Nov  4 13:42:33 router  [<c026b99d>]
Nov  4 13:42:33 router ide_intr+0x17/0x1ba
Nov  4 13:42:33 router  [<c0143685>]
Nov  4 13:42:33 router handle_IRQ_event+0x1a/0x46
Nov  4 13:42:33 router  [<c0144920>]
Nov  4 13:42:33 router handle_level_irq+0x8a/0xda
Nov  4 13:42:33 router  [<c0104dc0>]
Nov  4 13:42:33 router do_IRQ+0x9b/0xb7
Nov  4 13:42:33 router  [<c0103779>]
Nov  4 13:42:33 router common_interrupt+0x25/0x2c
Nov  4 13:42:33 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c
Nov  4 13:42:33 router
Nov  4 13:42:33 router Leftover inexact backtrace:
Nov  4 13:42:33 router
Nov  4 13:42:33 router  [<c010430d>]
Nov  4 13:42:33 router die+0x269/0x29e
Nov  4 13:42:33 router  [<c0308343>]
Nov  4 13:42:33 router do_trap+0x81/0x9b
Nov  4 13:42:33 router  [<c01047f3>]
Nov  4 13:42:33 router do_invalid_op+0x97/0xa1
Nov  4 13:42:33 router  [<c0308159>]
Nov  4 13:42:33 router error_code+0x39/0x40
Nov  4 13:42:33 router  [<c015ea11>]

Message from syslogd@router at Sat Nov  4 13:43:18 2006 ...
router BUG: spinlock lockup on CPU#0, swapper/0, c03a4400
===snap===

reboot

===snip===
Nov  4 14:14:26 router slab: Internal list corruption detected in cache
'dm_io'(113), slabp d1869000(72). Hexdump:
Nov  4 14:14:26 router
Nov  4 14:14:26 router 000:
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  01
Nov  4 14:14:26 router  10
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  02
Nov  4 14:14:26 router  20
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  e0
Nov  4 14:14:26 router  01
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  e0
Nov  4 14:14:26 router  91
Nov  4 14:14:26 router  86
Nov  4 14:14:26 router  d1
Nov  4 14:14:26 router
Nov  4 14:14:26 router 010:
Nov  4 14:14:26 router  48
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router last message repeated 2 times
Nov  4 14:14:26 router  0a
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router last message repeated 6 times
Nov  4 14:14:26 router  fd
Nov  4 14:14:26 router  ff
Nov  4 14:14:26 router  ff
Nov  4 14:14:26 router  x_tables
Nov  4 14:14:26 router  hisax
Nov  4 14:14:26 router  isdn
Nov  4 14:14:26 router  hfcpci
Nov  4 14:14:26 router  mISDN_dsp
Nov  4 14:14:26 router  l3udss1
Nov  4 14:14:26 router  mISDN_l2
Nov  4 14:14:26 router  mISDN_l1
Nov  4 14:14:26 router  mISDN_core
Nov  4 14:14:26 router  eeprom
Nov  4 14:14:26 router  netconsole
Nov  4 14:14:26 router  lp
Nov  4 14:14:26 router  capability
Nov  4 14:14:26 router  commoncap
Nov  4 14:14:26 router  softdog
Nov  4 14:14:26 router  nls_iso8859_15
Nov  4 14:14:26 router  isofs
Nov  4 14:14:26 router  zlib_inflate
Nov  4 14:14:26 router  loop
Nov  4 14:14:26 router  psmouse
Nov  4 14:14:26 router  pcips2
Nov  4 14:14:26 router  via_agp
Nov  4 14:14:26 router  agpgart
Nov  4 14:14:26 router  i2c_viapro
Nov  4 14:14:26 router  via686a
Nov  4 14:14:26 router  i2c_isa
Nov  4 14:14:26 router  i2c_core
Nov  4 14:14:26 router  cyblafb
Nov  4 14:14:26 router  parport_pc
Nov  4 14:14:26 router  parport
Nov  4 14:14:26 router  8250_pnp
Nov  4 14:14:26 router  usblp
Nov  4 14:14:26 router slab: Internal list corruption detected in cache
'dm_io'(113), slabp d1869000(72). Hexdump:
Nov  4 14:14:26 router
Nov  4 14:14:26 router 000:
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  01
Nov  4 14:14:26 router  10
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  02
Nov  4 14:14:26 router  20
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  e0
Nov  4 14:14:26 router  01
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router  e0
Nov  4 14:14:26 router  91
Nov  4 14:14:26 router  86
Nov  4 14:14:26 router  d1
Nov  4 14:14:26 router
Nov  4 14:14:26 router 010:
Nov  4 14:14:26 router  48
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router last message repeated 2 times
Nov  4 14:14:26 router  0a
Nov  4 14:14:26 router  00
Nov  4 14:14:26 router last message repeated 6 times
Nov  4 14:14:26 router  fd
Nov  4 14:14:26 router  ff
Nov  4 14:14:26 router  ff
Nov  4 14:14:26 router  x_tables
Nov  4 14:14:26 router  hisax
Nov  4 14:14:26 router  isdn
Nov  4 14:14:26 router  hfcpci
Nov  4 14:14:26 router  mISDN_dsp
Nov  4 14:14:26 router  l3udss1
Nov  4 14:14:26 router  mISDN_l2
Nov  4 14:14:26 router  mISDN_l1
Nov  4 14:14:26 router  mISDN_core
Nov  4 14:14:26 router  eeprom
Nov  4 14:14:26 router  netconsole
Nov  4 14:14:26 router  lp
Nov  4 14:14:26 router  capability
Nov  4 14:14:26 router  commoncap
Nov  4 14:14:26 router  softdog
Nov  4 14:14:26 router  nls_iso8859_15
Nov  4 14:14:26 router  isofs
Nov  4 14:14:26 router  zlib_inflate
Nov  4 14:14:26 router  loop
Nov  4 14:14:26 router  psmouse
Nov  4 14:14:26 router  pcips2
Nov  4 14:14:26 router  via_agp
Nov  4 14:14:26 router  agpgart
Nov  4 14:14:26 router  i2c_viapro
Nov  4 14:14:26 router  via686a
Nov  4 14:14:26 router  i2c_isa
Nov  4 14:14:26 router  i2c_core
Nov  4 14:14:26 router  cyblafb
Nov  4 14:14:26 router  parport_pc
Nov  4 14:14:26 router  parport
Nov  4 14:14:26 router  8250_pnp
Nov  4 14:14:26 router  usblp
Nov  4 14:14:44 notebook kernel:  CIFS VFS: server not responding
Nov  4 14:14:44 notebook kernel:  CIFS VFS: No response for cmd 50 mid 29076
Nov  4 14:15:01 notebook /usr/sbin/cron[9642]: (root) CMD ( [ -f /openpkg/etc/rc
] && /openpkg/etc/rc all quarterly)

Message from syslogd@router at Sat Nov  4 14:15:03 2006 ...
router BUG: spinlock lockup on CPU#0, smbd/7861, c03a4400
Nov  4 14:15:03 router BUG: spinlock lockup on CPU#0, smbd/7861, c03a4400
Nov  4 14:15:03 router  [<c0103c49>]
Nov  4 14:15:03 router dump_trace+0x64/0x1c5
Nov  4 14:15:03 router  [<c0103dc4>]
Nov  4 14:15:03 router show_trace_log_lvl+0x1a/0x2f
Nov  4 14:15:03 router  [<c0104383>]
Nov  4 14:15:03 router show_trace+0x12/0x14
Nov  4 14:15:03 router  [<c01043c9>]
Nov  4 14:15:03 router dump_stack+0x19/0x1b
Nov  4 14:15:03 router  [<c01fa933>]
Nov  4 14:15:03 router _raw_spin_lock+0xe1/0x105
Nov  4 14:15:03 router  [<c03080ca>]
Nov  4 14:15:03 router _spin_lock_irqsave+0x3b/0x44
Nov  4 14:15:03 router  [<c026b99d>]
Nov  4 14:15:03 router ide_intr+0x17/0x1ba
Nov  4 14:15:03 router  [<c0143685>]
Nov  4 14:15:03 router handle_IRQ_event+0x1a/0x46
Nov  4 14:15:03 router  [<c0144920>]
Nov  4 14:15:03 router handle_level_irq+0x8a/0xda
Nov  4 14:15:03 router  [<c0104dc0>]
Nov  4 14:15:03 router do_IRQ+0x9b/0xb7
Nov  4 14:15:03 router  [<c0103779>]
Nov  4 14:15:03 router common_interrupt+0x25/0x2c
Nov  4 14:15:03 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c
Nov  4 14:15:03 router
Nov  4 14:15:03 router Leftover inexact backtrace:
Nov  4 14:15:03 router
Nov  4 14:15:03 router  [<c010430d>]
Nov  4 14:15:03 router die+0x269/0x29e
Nov  4 14:15:03 router  [<c0308343>]
Nov  4 14:15:03 router do_trap+0x81/0x9b
Nov  4 14:15:03 router  [<c01047f3>]
Nov  4 14:15:03 router do_invalid_op+0x97/0xa1
Nov  4 14:15:03 router  [<c0308159>]
Nov  4 14:15:03 router error_code+0x39/0x40
Nov  4 14:15:03 router  [<c015ea11>]
Nov  4 14:15:03 router free_block+0x6f/0x151
Nov  4 14:15:03 router  [<c015e6a9>]
Nov  4 14:15:03 router cache_flusharray+0xad/0x10a
Nov  4 14:15:03 router  [<c015e845>]
Nov  4 14:15:03 router kmem_cache_free+0x82/0xad
Nov  4 14:15:03 router  [<c01497bd>]
Nov  4 14:15:03 router mempool_free_slab+0xe/0x10
Nov  4 14:15:03 router  [<c0149823>]
Nov  4 14:15:03 router mempool_free+0x64/0x6a
Nov  4 14:15:03 router  [<c0299105>]
Nov  4 14:15:03 router dec_pending+0x10d/0x115
Nov  4 14:15:03 router  [<c0299285>]
Nov  4 14:15:03 router clone_endio+0x70/0xa7
Nov  4 14:15:03 router  [<c0181651>]
Nov  4 14:15:03 router bio_endio+0x5e/0x66
Nov  4 14:15:03 router  [<c01e0893>]
Nov  4 14:15:03 router __end_that_request_first+0x165/0x40b
Nov  4 14:15:03 router  [<c01e0b4e>]
Nov  4 14:15:03 router end_that_request_first+0xb/0xd
Nov  4 14:15:03 router  [<c026aed6>]
Nov  4 14:15:03 router ide_end_request+0x87/0xd0
Nov  4 14:15:03 router BUG: spinlock lockup on CPU#0, smbd/7861, c03a4400
Nov  4 14:15:03 router  [<c0271f65>]
Nov  4 14:15:03 router ide_dma_intr+0x58/0x98
Nov  4 14:15:03 router  [<c026badd>]
Nov  4 14:15:03 router  [<c0103c49>]
Nov  4 14:15:03 router ide_intr+0x157/0x1ba
Nov  4 14:15:03 router  [<c0143685>]
Nov  4 14:15:03 router handle_IRQ_event+0x1a/0x46
Nov  4 14:15:03 router dump_trace+0x64/0x1c5
Nov  4 14:15:03 router  [<c0144920>]
Nov  4 14:15:03 router handle_level_irq+0x8a/0xda
Nov  4 14:15:03 router  [<c0104dc0>]
Nov  4 14:15:03 router do_IRQ+0x9b/0xb7
Nov  4 14:15:03 router  [<c0103dc4>]
Nov  4 14:15:03 router  [<c0103779>]
Nov  4 14:15:03 router show_trace_log_lvl+0x1a/0x2f
Nov  4 14:15:03 router  [<c0104383>]
Nov  4 14:15:03 router show_trace+0x12/0x14
Nov  4 14:15:03 router  [<c01043c9>]
Nov  4 14:15:03 router dump_stack+0x19/0x1b
Nov  4 14:15:03 router  [<c01fa933>]
Nov  4 14:15:03 router common_interrupt+0x25/0x2c
Nov  4 14:15:03 router  [<c015e3e8>]
Nov  4 14:15:03 router cache_alloc_debugcheck_after+0x25/0x13e
Nov  4 14:15:03 router  [<c015f414>]
Nov  4 14:15:03 router _raw_spin_lock+0xe1/0x105
Nov  4 14:15:03 router __kmalloc_track_caller+0xc9/0xd5
Nov  4 14:15:03 router  [<c02adf08>]
Nov  4 14:15:03 router __alloc_skb+0x4f/0xfa
Nov  4 14:15:03 router  [<c03080ca>]
Nov  4 14:15:03 router  [<c02d57c5>]
Nov  4 14:15:03 router tcp_sendmsg+0x14f/0xa0a
Nov  4 14:15:03 router  [<c02ed6e0>]
Nov  4 14:15:03 router inet_sendmsg+0x3e/0x49
Nov  4 14:15:03 router _spin_lock_irqsave+0x3b/0x44
Nov  4 14:15:03 router  [<c02a8424>]
Nov  4 14:15:03 router sock_aio_write+0xfb/0x107
Nov  4 14:15:03 router  [<c0162623>]
Nov  4 14:15:03 router  [<c026b99d>]
Nov  4 14:15:03 router do_sync_write+0xc5/0x102
Nov  4 14:15:03 router  [<c0162e77>]
Nov  4 14:15:03 router vfs_write+0xc3/0x168
Nov  4 14:15:03 router  [<c01633be>]
Nov  4 14:15:03 router ide_intr+0x17/0x1ba
Nov  4 14:15:03 router sys_write+0x3d/0x61
Nov  4 14:15:03 router  [<c0102d8b>]
Nov  4 14:15:03 router syscall_call+0x7/0xb
Nov  4 14:15:03 router  [<c0143685>]
Nov  4 14:15:03 router  =======================
Nov  4 14:15:03 router handle_IRQ_event+0x1a/0x46
Nov  4 14:15:03 router  [<c0144920>]
Nov  4 14:15:03 router handle_level_irq+0x8a/0xda
Nov  4 14:15:03 router  [<c0104dc0>]
Nov  4 14:15:03 router do_IRQ+0x9b/0xb7
Nov  4 14:15:03 router  [<c0103779>]
Nov  4 14:15:03 router common_interrupt+0x25/0x2c
Nov  4 14:15:03 router DWARF2 unwinder stuck at common_interrupt+0x25/0x2c
Nov  4 14:15:03 router
Nov  4 14:15:03 router  [<c010430d>]
Nov  4 14:15:03 router do_trap+0x81/0x9b
Nov  4 14:15:03 router error_code+0x39/0x40
Nov  4 14:15:03 router kmem_cache_free+0x82/0xad
Nov  4 14:15:03 router  [<c0149823>]
Nov  4 14:15:03 router  [<c0299105>]
Nov  4 14:15:03 router dec_pending+0x10d/0x115
Nov  4 14:15:03 router  [<c0299285>]
Nov  4 14:15:03 router clone_endio+0x70/0xa7
Nov  4 14:15:03 router  [<c0181651>]
Nov  4 14:15:03 router bio_endio+0x5e/0x66
Nov  4 14:15:03 router  [<c01e0893>]
Nov  4 14:15:03 router __end_that_request_first+0x165/0x40b
Nov  4 14:15:03 router  [<c01e0b4e>]
Nov  4 14:15:03 router end_that_request_first+0xb/0xd
Nov  4 14:15:03 router  [<c026aed6>]
Nov  4 14:15:03 router ide_end_request+0x87/0xd0
Nov  4 14:15:03 router  [<c0271f65>]
Nov  4 14:15:03 router ide_dma_intr+0x58/0x98
Nov  4 14:15:03 router  [<c026badd>]
Nov  4 14:15:03 router ide_intr+0x157/0x1ba
Nov  4 14:15:03 router  [<c0143685>]
Nov  4 14:15:03 router handle_IRQ_event+0x1a/0x46
Nov  4 14:15:03 router  [<c0144920>]
Nov  4 14:15:03 router handle_level_irq+0x8a/0xda
Nov  4 14:15:03 router  [<c0104dc0>]
Nov  4 14:15:03 router do_IRQ+0x9b/0xb7
Nov  4 14:15:03 router  [<c0103779>]
Nov  4 14:15:03 router common_interrupt+0x25/0x2c
Nov  4 14:15:03 router  [<c015e3e8>]
Nov  4 14:15:03 router cache_alloc_debugcheck_after+0x25/0x13e
Nov  4 14:15:03 router  [<c015f414>]
Nov  4 14:15:03 router __kmalloc_track_caller+0xc9/0xd5
Nov  4 14:15:03 router  [<c02adf08>]
Nov  4 14:15:03 router __alloc_skb+0x4f/0xfa
Nov  4 14:15:03 router  [<c02d57c5>]
Nov  4 14:15:03 router tcp_sendmsg+0x14f/0xa0a
Nov  4 14:15:03 router  [<c02ed6e0>]
Nov  4 14:15:03 router inet_sendmsg+0x3e/0x49
Nov  4 14:15:03 router  [<c02a8424>]
Nov  4 14:15:03 router sock_aio_write+0xfb/0x107
Nov  4 14:15:03 router  [<c0162623>]
Nov  4 14:15:03 router do_sync_write+0xc5/0x102
Nov  4 14:15:03 router  [<c0162e77>]
Nov  4 14:15:03 router vfs_write+0xc3/0x168
Nov  4 14:15:03 router  [<c01633be>]
Nov  4 14:15:03 router sys_write+0x3d/0x61
Nov  4 14:15:03 router  [<c0102d8b>]
Nov  4 14:15:03 router syscall_call+0x7/0xb
Nov  4 14:15:03 router  =======================

Message from syslogd@router at Sat Nov  4 14:15:49 2006 ...
router BUG: spinlock lockup on CPU#0, smbd/7861, c03a4400
===snap===


For me, it seems that there is some locking error, allowing two concurrent
accesses to the memory management data structures simultaneously.

I hope that someone can help. I'm happy to do various experiments with that
machine or apply various speculative kernel patches to that machine's kernel.



Steps to reproduce:
(1) Use 4 physical disks connected to 2 IDE ports.
(2) Create LVM2 physical volumes on these disks.
(3) Create a LVM2 volume group using these physical volumes.
(4) Create 3 or 4 logical volumes on 3 or 4 of these physical volumes
(5) Create a large MD-RAID5 device out of these logical volumes.
(6) Let the RAID5 device resync.
(7) Wait for crash.
Comment 1 Olaf Kirch 2007-03-08 23:29:07 UTC
This looks like random memory corruption in various slabs (dm_tio, dm_io,
biovec-1). The slabs themselves have been trampled on.

Does this problem persist with newer kernels?
Comment 2 Olaf Kirch 2007-03-21 08:02:51 UTC
BTW, I'm seeing CIFS messages in the mix. Can you reproduce this with
no CIFS mounts?
Comment 3 Xu 2007-03-21 09:17:17 UTC
Sorry, I'm currently overseas (and will remain overseas for some months), so I
cannot afford to try to crash the machine in question, as I for myself cannot
restart it, as long as I'm overseas.

Comment 4 Xu 2007-06-10 23:33:54 UTC
Hello, I wished this bug has vanished (and I actually thought that), but it hit
me hard on newer kernels (2.6.21.3) :-( So yes, this problem persists with newer
kernels.

Now, sometimes, the machine reboots suddenly. Sometimes, the machine prints this
stack trace and reboots some minutes later:

Jun 11 00:10:25 router kernel: [ 6656.688000] slab: Internal list corruption
detected in cache 'biovec-1'(145), slabp e4141000(86). Hexdump:
Jun 11 00:10:25 router kernel: [ 6656.688000]
Jun 11 00:10:25 router kernel: [ 6656.688000] 000: 00 10 3e c2 dc 77 7e e7 60 02
00 00 60 12 14 e4
Jun 11 00:10:25 router kernel: [ 6656.688000] 010: 56 00 00 00 7b 00 00 00 00 00
c8 9a 22 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 020: fe ff ff ff fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 030: fe ff ff ff fe ff ff ff 2d 00
00 00 2c 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 040: fe ff ff ff fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 050: fe ff ff ff fe ff ff ff fe ff
ff ff 25 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 060: fe ff ff ff 7d 00 00 00 ff ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 070: 48 00 00 00 fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 080: fe ff ff ff fe ff ff ff 20 00
00 00 1b 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 090: 44 00 00 00 fe ff ff ff fe ff
ff ff 5b 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 0a0: 3e 00 00 00 4f 00 00 00 fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 0b0: 59 00 00 00 fe ff ff ff 51 00
00 00 07 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 0c0: fe ff ff ff fe ff ff ff fe ff
ff ff 58 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 0d0: 27 00 00 00 fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 0e0: 87 00 00 00 fe ff ff ff 21 00
00 00 50 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 0f0: fe ff ff ff 15 00 00 00 fe ff
ff ff 78 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 100: fe ff ff ff fe ff ff ff 1d 00
00 00 6d 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 110: fe ff ff ff 31 00 00 00 fe ff
ff ff 1c 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 120: 4a 00 00 00 fe ff ff ff fe ff
ff ff 81 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 130: fe ff ff ff 5d 00 00 00 3b 00
00 00 67 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 140: 5a 00 00 00 47 00 00 00 fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 150: fe ff ff ff fe ff ff ff 13 00
00 00 86 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 160: 55 00 00 00 fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 170: 00 00 00 00 12 00 00 00 fe ff
ff ff 62 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 180: 46 00 00 00 00 00 00 00 7c 00
00 00 fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 190: 63 00 00 00 fe ff ff ff 49 00
00 00 fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 1a0: fe ff ff ff 5f 00 00 00 41 00
00 00 73 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 1b0: fe ff ff ff fe ff ff ff 64 00
00 00 fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 1c0: fe ff ff ff fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 1d0: 80 00 00 00 fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 1e0: fe ff ff ff fe ff ff ff 28 00
00 00 fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 1f0: fe ff ff ff fe ff ff ff fe ff
ff ff 36 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 200: fe ff ff ff fe ff ff ff 38 00
00 00 34 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 210: 33 00 00 00 fe ff ff ff fe ff
ff ff 10 00 00 00
Jun 11 00:10:25 router kernel: [ 6656.688000] 220: 08 00 00 00 fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 230: fe ff ff ff 56 00 00 00 3c 00
00 00 fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 240: fe ff ff ff fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] 250: fe ff ff ff fe ff ff ff fe ff
ff ff fe ff ff ff
Jun 11 00:10:25 router kernel: [ 6656.688000] ------------[ cut here ]------------
Jun 11 00:10:25 router kernel: [ 6656.688000] kernel BUG at mm/slab.c:2936!
Jun 11 00:10:25 router kernel: [ 6656.688000] invalid opcode: 0000 [#1]
Jun 11 00:10:25 router kernel: [ 6656.688000] PREEMPT
Jun 11 00:10:25 router kernel: [ 6656.688000] Modules linked in: nls_utf8 cifs
nls_cp850 nls_iso8859_1 smbfs act_police sch_ingress cls_u32 sch_sfq sch_htb
rfcomm hidp l2cap bluetooth cls_fw sch_prio sch_tbf xt_mark xt_multiport xt_MARK
ipt_MASQUERADE xt_TCPMSS ipt_TOS
xt_length iptable_mangle nf_nat_ftp nf_conntrack_ftp ipt_REJECT iptable_filter
xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables
x_tables hisax isdn mISDN_dsp hfcpci mISDN_capi l3udss1 mISDN_l2 mISDN_l1
mISDN_core capi capifs kernelcapi eep
rom lp capability commoncap softdog nls_iso8859_15 isofs zlib_inflate loop
psmouse pcips2 8250_pnp 8250 usblp serial_core i2c_viapro via686a i2c_isa pcspkr
i2c_core cyblafb via_agp parport_pc agpgart parport evdev dm_mirror pppoe pppox
ppp_generic slhc ohci_hcd uhci_hcd
 usbmouse usbkbd usbhid usbcore ipv6 af_packet netconsole 8139too mii bitrev
crc32 unix
Jun 11 00:10:25 router kernel: [ 6656.688000] CPU:    0
Jun 11 00:10:25 router kernel: [ 6656.688000] EIP:    0060:[<c0171ff0>]    Not
tainted VLI
Jun 11 00:10:25 router kernel: [ 6656.688000] EFLAGS: 00010086  
(2.6.21.3lowLatency #2)
Jun 11 00:10:25 router kernel: [ 6656.688000] EIP is at check_slabp+0xf0/0x110
Jun 11 00:10:25 router kernel: [ 6656.688000] eax: 00000001   ebx: e414125f  
ecx: c7444000   edx: 00000001
Jun 11 00:10:25 router kernel: [ 6656.688000] esi: e4141000   edi: 00000260  
ebp: c74459ac   esp: c7445988
Jun 11 00:10:25 router kernel: [ 6656.688000] ds: 007b   es: 007b   fs: 00d8 
gs: 0000  ss: 0068
Jun 11 00:10:25 router kernel: [ 6656.688000] Process md2_resync (pid: 8045,
ti=c7444000 task=d835e490 task.ti=c7444000)
Jun 11 00:10:25 router kernel: [ 6656.688000] Stack: c04405c1 000000ff 00000091
e4141000 00000056 e77c32a0 00000000 00000246
Jun 11 00:10:25 router kernel: [ 6656.688000]        e4141000 c7445a18 c0173381
c0172602 00000000 00000044 c74459f0 00000000
Jun 11 00:10:25 router kernel: [ 6656.688000]        00011200 00011200 e77c32a0
e77d6dd0 00000010 e77e77dc e77db918 c74459f0
Jun 11 00:10:25 router kernel: [ 6656.688000] Call Trace:
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c010528a>]
show_trace_log_lvl+0x1a/0x30
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0105351>]
show_stack_log_lvl+0xb1/0xe0
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c010557f>]
show_registers+0x1ff/0x380
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0105823>] die+0x123/0x260
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c03791f2>] do_trap+0x82/0xb0
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0105f07>] do_invalid_op+0x97/0xb0
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0378fbc>] error_code+0x74/0x7c
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0173381>]
cache_alloc_refill+0xd1/0x6b0
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0173cf3>]
kmem_cache_alloc+0xb3/0xc0
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c015780e>]
mempool_alloc_slab+0xe/0x10
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0157941>] mempool_alloc+0x31/0x140
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c019d823>]
bio_alloc_bioset+0x73/0x140
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c02f2407>] clone_bio+0x37/0x80
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c02f2b8e>] __split_bio+0x17e/0x470
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c02f39fe>] dm_request+0xce/0x140
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c020f90b>]
generic_make_request+0x1bb/0x360
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c02da183>]
handle_stripe5+0xb53/0x17b0
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c02dc8d2>]
handle_stripe+0x382/0x1a10
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c02dec1d>] sync_request+0x21d/0xcc0
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c02ec8c7>] md_do_sync+0x7e7/0xd20
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c02eb901>] md_thread+0x31/0x110
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0134633>] kthread+0xa3/0xd0
Jun 11 00:10:25 router kernel: [ 6656.688000]  [<c0104e77>]
kernel_thread_helper+0x7/0x10
Jun 11 00:10:25 router kernel: [ 6656.688000]  =======================
Jun 11 00:10:26 router kernel: [ 6656.688000] Code: ff 8b 55 f0 8b 42 20 8d 04
85 1c 00 00 00 39 f8 76 0d 83 c3 01 f7 c7 0f 00 00 00 75 ce eb b9 c7 04 24 c1 05
44 c0 e8 90 ea fa ff <0f> 0b eb fe 83 c4 18 5b 5e 5f 5d c3 8b 56 10 e9 67 ff ff
ff 8d
Jun 11 00:10:26 router kernel: [ 6656.688000] EIP: [<c0171ff0>]
check_slabp+0xf0/0x110 SS:ESP 0068:c7445988
Jun 11 00:10:26 router kernel: [ 6656.688000] note: md2_resync[8045] exited with
preempt_count 1


Comment 5 Xu 2007-06-11 17:06:43 UTC
One additional observation:

In one incidence, the machine rebooted about 1..3 seconds after "smartd" has checked the SMART status of each of the IDE hard disks. Also, the monitoring of the file "/sys/block/md2/md/sync_completed" showed that the value of "/sys/block/md2/md/sync_completed" (while normally changing constantly during RAID rebuilding) did not change for about 1.5 seconds before, and additionally did change slower than usual before. This leads to a hypothesis that "smartd" may trigger these reboots, maybe by inducing longer delays in disk access, maybe leading to sudden error states or maybe leading to timeouts kicking in (which do not kick in normally). Maybe the sudden-reboot problem is unrelated to the slab corruption problem, maybe not.
Comment 6 Alasdair G Kergon 2007-11-13 04:41:07 UTC
I don't recognise the precise problem, but there have been fixes in related parts of the code, so do please keep retrying with newer kernels to see if it got fixed.
Comment 7 Neil Brown 2007-11-15 17:49:02 UTC
My guess is that this is a problem with the driver for the VIA ide controller.
I don't suppose you have a spare IDE card from a different manufacturer
that you could try putting in??

Should we assign it to the IDE people to see if they can help (I think you
would need to do that Alasdair).
Comment 8 Bartlomiej Zolnierkiewicz 2007-11-16 03:21:47 UTC
> did change slower than usual before. This leads to a hypothesis that "smartd"
> may trigger these reboots, maybe by inducing longer delays in disk access,
> maybe leading to sudden error states or maybe leading to timeouts kicking in

Yes, SMART check may induce delays in disk access but this shouldn't cause other  problems (at least for IDE).

> My guess is that this is a problem with the driver for the VIA ide
> controller.

This is possible but there are no open/known issues with VIA host driver currently so more info is needed (dmesg output).

> I don't suppose you have a spare IDE card from a different manufacturer
> that you could try putting in??

That would be useful, also does the issue still happen with 2.6.23?
Comment 9 Bartlomiej Zolnierkiewicz 2007-11-16 03:22:46 UTC
PS disabling "smartd" completely and seeing if it helps is also worth a try.
Comment 10 Xu 2008-01-09 05:09:04 UTC
I have been doing extensive resyncs under linux 2.6.22.7 with the slab allocator as memory allocator on the same machine with the same setup, and I cannot reproduce the bug anymore, regardless whether smartd is switched on or off. Thus, I assume that this bug has been fixed (for some not exactly known reason) between linux 2.6.21.3 and linux 2.6.22.7. :-) Thank you very much for your support. :-)

Thus, I'm closing this bug for the time being.
Comment 11 Bartlomiej Zolnierkiewicz 2008-01-09 11:19:23 UTC
Great, thanks for reporting it.

Note You need to log in before you can comment on or make changes to this bug.