Distribution: Debian Squeeze Kernel: 2.6.32 Tested kernel not failing: 2.6.18 (etch) Tested kernel failing: 2.6.26 (lenny), 2.6.32 (squeeze), 3.2 (squeeze-backport) It appears be a problem with megaraid_mbox kernel module during boot on linux systems with a LSI Megaraid SCSI adaptor, with three SCSI hard disks in raid 5 and a SCSI DAT Tape (non-RAID device) connected on same adaptor. I have 17 servers affected with different hardware configurations, and common point combination is a DAT device tape connected to LSI Megaraid adaptor with three hard disks array in raid 5. Disconecting SCSI DAT Tape device from LSI adaptor or connecting it on a second adaptor solved the problem, but unfortunatelly no option for me. Hardware configurations: LSI Megaraid Adaptor 3 SCSI Hard Disks in raid 5 SCSI DAT Tape 530 SCSI 320-0X 69618MB FUJITSU MAT3073NC 0108 SONY SDX-450V ver0100 530 SCSI 320-0X 69618MB FUJITSU MAP3735NC 0108 SONY SDX-450V ver0100 Intel SRCU41L 69618MB MAXTOR ATLAS10K4_73WLS DFL0 SEAGATE DAT DAT72-000 A1A0 520 SCSI 320-1 69808MB MAXTOR ATLAS10K5_73WLS JS03 HP C7438A V310 530 SCSI 320-0X 69618MB FUJITSU MAT3073NC 0108 SONY SDX-450V ver0100 530 SCSI 320-0X 69618MB FUJITSU MAW3073NC 0104 SONY SDX-450V ver0100 530 SCSI 320-0X 69618MB FUJITSU MAT3073NC 0108 SEAGATE DAT DAT72-000 A060 530 SCSI 320-0X 69618MB FUJITSU MAT3073NC 0108 SONY SDX-450V ver0100 Intel SRCU42E 140190MB FUJITSU MAU3147NP 0102 SEAGATE DAT DAT72-000 A1A0 520 SCSI 320-1 70006MB FUJITSU MAP3735NP HPF6 HP C7438A ZP5B 530 SCSI 320-0X 69618MB FUJITSU MAT3073NC 0108 SONY SDX-450V ver0100 520 SCSI 320-1 69808MB MAXTOR ATLAS10K5_73WLS JS03 HP C7438A ZP5B Intel SRCU42E 140268MB FUJITSU MAU3147NP 0102 SEAGATE DAT DAT72-000 A1A0 520 SCSI 320-1 70006MB HP 73.4G ST373207LW HPC1 HP C7438A V310 520 SCSI 320-1 139712MB MAXTOR ATLAS10K5_147WLS JS03 HP C7438A ZP5B 520 SCSI 320-1 69808MB MAXTOR ATLAS10K5_73WLS JS03 HP C7438A ZP5B 518 DELL PERC 4/DC 70140MB MAXTOR ATLAS10K5_73WLS JNZH SEAGATE DAT DAT72-000 A1A0 modinfo megaraid_mbox (kernel 2.6.32) filename: /lib/modules/2.6.32-5-686/kernel/drivers/scsi/megaraid/megaraid_mbox.ko version: 2.20.5.1 license: GPL description: LSI Logic MegaRAID Mailbox Driver author: megaraidlinux@… srcversion: 4846B66054D71B69BF4A7D3 alias: pci:v00001000d00000409sv*sd*bc*sc*i* alias: pci:v00001000d00001960sv*sd*bc*sc*i* alias: pci:v0000101Ed00001960sv*sd*bc*sc*i* alias: pci:v00001000d00000408sv*sd*bc*sc*i* alias: pci:v00001028d00000013sv00001028sd00000170bc*sc*i* alias: pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i* alias: pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i* alias: pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i* alias: pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i* alias: pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i* alias: pci:v00001000d00000407sv*sd*bc*sc*i* alias: pci:v00001000d00001960sv00001028sd00000518bc*sc*i* alias: pci:v00001000d00001960sv00001028sd00000520bc*sc*i* alias: pci:v00001028d0000000Esv00001028sd00000123bc*sc*i* depends: scsi_mod,megaraid_mm vermagic: 2.6.32-5-686 SMP mod_unload modversions 686 parm: unconf_disks:Set to expose unconfigured disks to kernel (default=0) (int) parm: busy_wait:Max wait for mailbox in microseconds if busy (default=10) (int) parm: max_sectors:Maximum number of sectors per IO command (default=128) (int) parm: cmd_per_lun:Maximum number of commands per logical unit (default=64) (int) parm: fast_load:Faster loading of the driver, skips physical devices! (default=0) (int) parm: debug_level:Debug level for driver (default=0) (int) modinfo megaraid_mbox (kernel 2.6.18) filename: /lib/modules/2.6.18-6-686/kernel/drivers/scsi/megaraid/megaraid_mbox.ko author: sju@… description: LSI Logic MegaRAID Mailbox Driver license: GPL version: 2.20.4.9 vermagic: 2.6.18-6-686 SMP mod_unload 686 REGPARM gcc-4.1 depends: scsi_mod,megaraid_mm alias: pci:v00001028d0000000Esv00001028sd00000123bc*sc*i* alias: pci:v00001000d00001960sv00001028sd00000520bc*sc*i* alias: pci:v00001000d00001960sv00001028sd00000518bc*sc*i* alias: pci:v00001000d00000407sv*sd*bc*sc*i* alias: pci:v00001028d0000000Fsv00001028sd0000014Abc*sc*i* alias: pci:v00001028d00000013sv00001028sd0000016Cbc*sc*i* alias: pci:v00001028d00000013sv00001028sd0000016Dbc*sc*i* alias: pci:v00001028d00000013sv00001028sd0000016Ebc*sc*i* alias: pci:v00001028d00000013sv00001028sd0000016Fbc*sc*i* alias: pci:v00001028d00000013sv00001028sd00000170bc*sc*i* alias: pci:v00001000d00000408sv*sd*bc*sc*i* alias: pci:v0000101Ed00001960sv*sd*bc*sc*i* alias: pci:v00001000d00001960sv*sd*bc*sc*i* alias: pci:v00001000d00000409sv*sd*bc*sc*i* srcversion: 0B71F30F1E95E778A74A4D1 parm: debug_level:Debug level for driver (default=0) (int) parm: fast_load:Faster loading of the driver, skips physical devices! (default=0) (int) parm: cmd_per_lun:Maximum number of commands per logical unit (default=64) (int) parm: max_sectors:Maximum number of sectors per IO command (default=128) (int) parm: busy_wait:Max wait for mailbox in microseconds if busy (default=10) (int) parm: unconf_disks:Set to expose unconfigured disks to kernel (default=0) (int) Kernel panic trace is attached in a screenshot: megaraid_mbox_panic.jpg
Created attachment 92321 [details] /proc/version
Created attachment 92331 [details] /proc/cpuinfo
Created attachment 92341 [details] /proc/modules
Created attachment 92351 [details] /proc/ioports
Created attachment 92361 [details] /proc/iomem
Created attachment 92371 [details] lspci -vvv
Created attachment 92381 [details] /proc/scsi/scsi
URL to image: Kernel panic trace megaraid_mbox http://imageshack.us/photo/my-images/19/megaraidmboxpanic.jpg/
Download link of image (Kernel panic trace megaraid_mbox) http://imageshack.us/download/19/megaraidmboxpanic.jpg
I'm getting a very similar kernel dump on an LSI MegaRaid SATA 300-8X card. (1000:0409) This card also uses the megaraid_mbox driver. Boots fine, but crashes consistently when running badblocks -wsv over an array, usually within an hour. Sometimes it will dump to console and flash the keyboard lights, sometimes it just hangs. Server is running nothing else at the moment beyond the basics, and the LSI currently has no filesystems. I have no reason to suspect the card or the server as both were running Win2003 server for the last 5 - 6 years without any issues beyond the fact Windows was running on it :) Ubuntu 12.04 i386 server, same results with Ubuntu kernel 3.5.0-25 and latest compiled kernel 3.8.3. If there is any interest here I can post the kernel dump, hardware details, etc. In a nutshell it's a single hyperthread P4 Xeon 2.4GHz on an Intel PCI-X server board with one jigglybyte of DDR1 ECC. LSI card is running four 200GB SATA drives in a RAID5, configured with Write Through caching and DirectIO, 128MB of cache, no backup battery. I have tried various kernel flags to no avail but seemed to have some success when I turned off all the performance enhancing settings in the LSI BIOS. (Multiple PCI delayed transactions, command queuing, HDD write caching) It got 4 - 5 hours through badblocks without crashing but I stopped it as it was taking forever. Could well have been just a result of decreased load though. Turning hyperthread on/off makes no difference. Currently testing various different settings in the LSI BIOS but it's slow going. Any help would be appreciated here.