Distribution: gentoo 2004.3 Hardware Environment: ppc64/PowerMac G5 Software Environment: Problem Description: kernel oops/hang with Fusion MPT/mptscsih driver and LSI22320/LSI21320 SCSI-card Steps to reproduce: modprobe -a mptscsih result: Oops: Kernel access of bad area, sig: 11 [#3] SMP NR_CPUS=2 POWERMAC NIP: C00000000007F94C XER: 0000000000000000 LR: C00000000007FAD8 REGS: c00000000ff8f900 TRAP: 0300 Not tainted (2.6.10-pristine-noide-snd) MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: 000000013d762008, DSISR: 0000000042000000 TASK: c00000003ef887b0[6] 'events/0' THREAD: c00000000ff8c000 CPU: 0 GPR00: 0000000000000038 C00000000FF8FB80 C00000000060AC98 C00000003EFD0B80 GPR04: C00000003EFD1410 C00000003D6F4028 0000000000100100 C00000000066B400 GPR08: 0000000000200200 000000013D762000 C00000003EB52000 C00000003D6F4280 GPR12: C00000003D6F4000 C0000000004D2800 0000000000000000 0000000000000000 GPR16: 0000000001400000 C000000000475990 0000000001872FA0 0000000001872FA0 GPR20: BFFFFFFFFEC00000 C000000000489000 C00000003EFD07B8 0000000000100100 GPR24: C00000003EFD0BB8 C00000003EFD1410 0000000000000018 C00000003EFD0B98 GPR28: 000000000000000D C00000003EFD0B80 C0000000004F8258 C00000000FF8FB80 NIP [c00000000007f94c] .free_block+0xf0/0x1b8 LR [c00000000007fad8] .drain_array_locked+0xc4/0x138 Call Trace: [c00000000ff8fb80] [c00000000ff8fc20] 0xc00000000ff8fc20 (unreliable) [c00000000ff8fc30] [c00000000007fad8] .drain_array_locked+0xc4/0x138 [c00000000ff8fcd0] [c000000000081100] .cache_reap+0xc4/0x2b4 [c00000000ff8fdb0] [c0000000000610a4] .worker_thread+0x254/0x320 [c00000000ff8fee0] [c000000000067964] .kthread+0x168/0x1bc [c00000000ff8ff90] [c000000000013ee0] .kernel_thread+0x4c/0x6c
Created attachment 4443 [details] dmesg/slabinfo/ps -ef dumps Few dumps with slab.c/FORCE_DEBUG and mptbase.h/MPT_DEBUG Should I use CONFIG_DEBUG_SLAB with slab.c ?
Software Environment: 2.6.10 pristine
Still failing with command "modprobe -r mptscsih"; kernel is 2.6.12.3 [ 3399.025987] Fusion MPT base driver 3.01.20 [ 3399.025994] Copyright (c) 1999-2004 LSI Logic Corporation [ 3399.026265] PCI: Enabling device: (0001:06:03.0), cmd 7 [ 3399.026300] mptbase: Initiating ioc0 bringup [ 3399.141540] ioc0: 53C1030: Capabilities={Initiator} [ 3415.355685] PCI: Enabling device: (0001:06:03.1), cmd 7 [ 3415.355715] mptbase: Initiating ioc1 bringup [ 3415.472040] ioc1: 53C1030: Capabilities={Initiator} [ 3415.649725] Fusion MPT SCSI Host driver 3.01.20 [ 3415.649855] scsi5 : ioc0: LSI53C1030, FwRev=01030700h, Ports=1, MaxQ=222, IRQ=53 [ 3419.401423] scsi6 : ioc1: LSI53C1030, FwRev=01030700h, Ports=1, MaxQ=222, IRQ=53 [ 3508.120049] mptbase: Initiating ioc0 recovery [ 3513.072570] mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999), IntStatus=80000000! [ 3538.370503] Oops: Kernel access of bad area, sig: 11 [#1] [ 3538.370707] SMP NR_CPUS=2 POWERMAC [ 3538.371179] Modules linked in: mptscsih mptbase sungem sungem_phy [ 3538.371772] NIP: C000000000080864 XER: 00000000 LR: C000000000080758 CTR: 0000000000000000 [ 3538.372370] REGS: c00000003d063800 TRAP: 0300 Not tainted (2.6.12.3-noide-snd) [ 3538.372945] MSR: 9000000002003032 EE: 0 PR: 0 FP: 1 ME: 1 IR/DR: 11 CR: 24222442 [ 3538.373518] DAR: 000000013d6a9008 DSISR: 0000000042000000 [ 3538.374035] TASK: c0000000262e0030[20286] 'run-crons' THREAD: c00000003d060000 CPU: 1 [ 3538.374617] GPR00: 0000000000000016 C00000003D063A80 C000000000622560 0000000000200200 [ 3538.375259] GPR04: C00000003FF85810 C00000003D4D5028 0000000000000000 C00000003D4D5000 [ 3538.375893] GPR08: 00000000000001D0 000000013D6A9000 00000000000000B0 C00000003FFD0B98 [ 3538.376541] GPR12: C00000003FFD0BA8 C0000000004E6800 C00000003D4CF2C0 C000000001E7BBE8 [ 3538.377190] GPR16: C00000003D4CF2A8 C00000003D4CF2D0 0000000000000000 0000000000000000 [ 3538.377839] GPR20: C00000003D063EA0 000001FFFFBB1440 00000000000000D0 0000000000000001 [ 3538.378485] GPR24: C00000003FFD0BF8 C00000003FFD0BB8 C00000003FFD0B98 C00000003FFD0B80 [ 3538.379134] GPR28: C00000003FF85800 0000000000100100 C0000000004FE190 0000000000000001 [ 3538.379805] NIP [c000000000080864] .cache_alloc_refill+0x1c8/0x6ec [ 3538.380363] LR [c000000000080758] .cache_alloc_refill+0xbc/0x6ec [ 3538.380910] Call Trace: [ 3538.381341] [c00000003d063a80] [0000001000000004] 0x1000000004 (unreliable) [ 3538.381933] [c00000003d063b50] [c000000000080440] .kmem_cache_alloc+0x74/0x78 [ 3538.382521] [c00000003d063bd0] [c000000000049460] .copy_process+0x6a8/0x1404 [ 3538.383106] [c00000003d063ce0] [c00000000004a30c] .do_fork+0x94/0x244 [ 3538.383672] [c00000003d063dc0] [c0000000000119e4] .sys_fork+0x28/0x40 [ 3538.384240] [c00000003d063e30] [c00000000000d97c] .ppc_fork+0x8/0xc [ 3538.384798] Instruction dump: [ 3538.385250] 91670024 60000000 60000000 801b0070 7f890040 409c000c 7cdf07b4 409aff80 [ 3538.385922] 2f1f0000 e9670008 e9270000 f92b0000 <f9690008> 80070024 fba70000 f8670008
Hmm, bug/problem seems to be disappeared with 2.6.13-rc5 kernel (-> MPT version is 3.03.02) Atleast, both commands (modprobe -a/-r) works now; though, not sure about how stable system is in the long run...