Bug 53281

Summary: megaraid_mbox kernel panic during boot
Product: SCSI Drivers Reporter: miguel (romomoruno)
Component: OtherAssignee: scsi_drivers-other
Status: NEW ---    
Severity: high CC: justgivemeafkenaccountplx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26, 2.6.32, 3.2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: /proc/version
/proc/cpuinfo
/proc/modules
/proc/ioports
/proc/iomem
lspci -vvv
/proc/scsi/scsi

Description miguel 2013-01-31 13:00:32 UTC
Distribution: Debian Squeeze
Kernel: 2.6.32

Tested kernel not failing: 2.6.18 (etch)
Tested kernel failing: 2.6.26 (lenny), 2.6.32 (squeeze), 3.2 (squeeze-backport)

It appears be a problem with megaraid_mbox kernel module during boot on linux systems with a LSI Megaraid SCSI adaptor, with three SCSI hard disks in raid 5 and a SCSI DAT Tape (non-RAID device) connected on same adaptor.

I have 17 servers affected with different hardware configurations, and common point combination is a DAT device tape connected to LSI Megaraid adaptor with three hard disks array in raid 5.

Disconecting SCSI DAT Tape device from LSI adaptor or connecting it on a second adaptor solved the problem, but unfortunatelly no option for me.

Hardware configurations:

LSI Megaraid Adaptor	3 SCSI Hard Disks in raid 5	SCSI DAT Tape

530 SCSI 320-0X	69618MB FUJITSU MAT3073NC 0108	        SONY SDX-450V ver0100
530 SCSI 320-0X	69618MB FUJITSU MAP3735NC 0108	        SONY SDX-450V ver0100
Intel SRCU41L	69618MB MAXTOR ATLAS10K4_73WLS DFL0	SEAGATE DAT DAT72-000 A1A0
520 SCSI 320-1	69808MB MAXTOR ATLAS10K5_73WLS JS03	HP C7438A V310
530 SCSI 320-0X	69618MB FUJITSU MAT3073NC 0108	        SONY SDX-450V ver0100
530 SCSI 320-0X	69618MB FUJITSU MAW3073NC 0104	        SONY SDX-450V ver0100
530 SCSI 320-0X	69618MB FUJITSU MAT3073NC 0108	        SEAGATE DAT DAT72-000 A060
530 SCSI 320-0X	69618MB FUJITSU MAT3073NC 0108	        SONY SDX-450V ver0100
Intel SRCU42E	140190MB FUJITSU MAU3147NP 0102	        SEAGATE DAT DAT72-000 A1A0
520 SCSI 320-1	70006MB FUJITSU MAP3735NP HPF6	        HP C7438A ZP5B
530 SCSI 320-0X	69618MB FUJITSU MAT3073NC 0108	        SONY SDX-450V ver0100
520 SCSI 320-1	69808MB MAXTOR ATLAS10K5_73WLS JS03	HP C7438A ZP5B
Intel SRCU42E	140268MB FUJITSU MAU3147NP 0102	        SEAGATE DAT DAT72-000 A1A0
520 SCSI 320-1	70006MB HP 73.4G ST373207LW HPC1	HP C7438A V310
520 SCSI 320-1	139712MB MAXTOR ATLAS10K5_147WLS JS03	HP C7438A ZP5B
520 SCSI 320-1	69808MB MAXTOR ATLAS10K5_73WLS JS03	HP C7438A ZP5B
518 DELL PERC 4/DC	70140MB MAXTOR ATLAS10K5_73WLS JNZH	SEAGATE DAT DAT72-000 A1A0


modinfo megaraid_mbox (kernel 2.6.32)
filename: /lib/modules/2.6.32-5-686/kernel/drivers/scsi/megaraid/megaraid_mbox.ko
version: 2.20.5.1
license: GPL
description: LSI Logic MegaRAID Mailbox Driver
author: megaraidlinux@…
srcversion: 4846B66054D71B69BF4A7D3
alias: pci:v00001000d00000409sv*sd*bc*sc*i*
alias: pci:v00001000d00001960sv*sd*bc*sc*i*
alias: pci:v0000101Ed00001960sv*sd*bc*sc*i*
alias: pci:v00001000d00000408sv*sd*bc*sc*i*
alias: pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
alias: pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
alias: pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
alias: pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
alias: pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
alias: pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
alias: pci:v00001000d00000407sv*sd*bc*sc*i*
alias: pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
alias: pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
alias: pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
depends: scsi_mod,megaraid_mm
vermagic: 2.6.32-5-686 SMP mod_unload modversions 686 
parm: unconf_disks:Set to expose unconfigured disks to kernel (default=0) (int)
parm: busy_wait:Max wait for mailbox in microseconds if busy (default=10) (int)
parm: max_sectors:Maximum number of sectors per IO command (default=128) (int)
parm: cmd_per_lun:Maximum number of commands per logical unit (default=64) (int)
parm: fast_load:Faster loading of the driver, skips physical devices! (default=0) (int)
parm: debug_level:Debug level for driver (default=0) (int)


modinfo megaraid_mbox (kernel 2.6.18)
filename: /lib/modules/2.6.18-6-686/kernel/drivers/scsi/megaraid/megaraid_mbox.ko
author: sju@…
description: LSI Logic MegaRAID Mailbox Driver
license: GPL
version: 2.20.4.9
vermagic: 2.6.18-6-686 SMP mod_unload 686 REGPARM gcc-4.1
depends: scsi_mod,megaraid_mm
alias: pci:v00001028d0000000Esv00001028sd00000123bc*sc*i*
alias: pci:v00001000d00001960sv00001028sd00000520bc*sc*i*
alias: pci:v00001000d00001960sv00001028sd00000518bc*sc*i*
alias: pci:v00001000d00000407sv*sd*bc*sc*i*
alias: pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i*
alias: pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i*
alias: pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i*
alias: pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i*
alias: pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i*
alias: pci:v00001028d00000013sv00001028sd00000170bc*sc*i*
alias: pci:v00001000d00000408sv*sd*bc*sc*i*
alias: pci:v0000101Ed00001960sv*sd*bc*sc*i*
alias: pci:v00001000d00001960sv*sd*bc*sc*i*
alias: pci:v00001000d00000409sv*sd*bc*sc*i*
srcversion: 0B71F30F1E95E778A74A4D1
parm: debug_level:Debug level for driver (default=0) (int)
parm: fast_load:Faster loading of the driver, skips physical devices! (default=0) (int)
parm: cmd_per_lun:Maximum number of commands per logical unit (default=64) (int)
parm: max_sectors:Maximum number of sectors per IO command (default=128) (int)
parm: busy_wait:Max wait for mailbox in microseconds if busy (default=10) (int)
parm: unconf_disks:Set to expose unconfigured disks to kernel (default=0) (int)


Kernel panic trace is attached in a screenshot: megaraid_mbox_panic.jpg
Comment 1 miguel 2013-01-31 13:05:32 UTC
Created attachment 92321 [details]
/proc/version
Comment 2 miguel 2013-01-31 13:05:54 UTC
Created attachment 92331 [details]
/proc/cpuinfo
Comment 3 miguel 2013-01-31 13:06:22 UTC
Created attachment 92341 [details]
/proc/modules
Comment 4 miguel 2013-01-31 13:07:03 UTC
Created attachment 92351 [details]
/proc/ioports
Comment 5 miguel 2013-01-31 13:07:21 UTC
Created attachment 92361 [details]
/proc/iomem
Comment 6 miguel 2013-01-31 13:07:45 UTC
Created attachment 92371 [details]
lspci -vvv
Comment 7 miguel 2013-01-31 13:08:05 UTC
Created attachment 92381 [details]
/proc/scsi/scsi
Comment 8 miguel 2013-01-31 13:19:00 UTC
URL to image: Kernel panic trace megaraid_mbox

http://imageshack.us/photo/my-images/19/megaraidmboxpanic.jpg/
Comment 9 miguel 2013-01-31 13:28:16 UTC
Download link of image (Kernel panic trace megaraid_mbox)

http://imageshack.us/download/19/megaraidmboxpanic.jpg
Comment 10 justgivemeafkenaccountplx 2013-03-18 11:52:33 UTC
I'm getting a very similar kernel dump on an LSI MegaRaid SATA 300-8X card. (1000:0409) This card also uses the megaraid_mbox driver. Boots fine, but crashes consistently when running badblocks -wsv over an array, usually within an hour. Sometimes it will dump to console and flash the keyboard lights, sometimes it just hangs.

Server is running nothing else at the moment beyond the basics, and the LSI currently has no filesystems. I have no reason to suspect the card or the server as both were running Win2003 server for the last 5 - 6 years without any issues beyond the fact Windows was running on it :)

Ubuntu 12.04 i386 server, same results with Ubuntu kernel 3.5.0-25 and latest compiled kernel 3.8.3. If there is any interest here I can post the kernel dump, hardware details, etc. In a nutshell it's a single hyperthread P4 Xeon 2.4GHz on an Intel PCI-X server board with one jigglybyte of DDR1 ECC. LSI card is running four 200GB SATA drives in a RAID5, configured with Write Through caching and DirectIO, 128MB of cache, no backup battery.

I have tried various kernel flags to no avail but seemed to have some success when I turned off all the performance enhancing settings in the LSI BIOS. (Multiple PCI delayed transactions, command queuing, HDD write caching) It got 4 - 5 hours through badblocks without crashing but I stopped it as it was taking forever. Could well have been just a result of decreased load though. Turning hyperthread on/off makes no difference.

Currently testing various different settings in the LSI BIOS but it's slow going. Any help would be appreciated here.