Bug 4077 - kernel oops with Fusion MPT SCSI driver
Summary: kernel oops with Fusion MPT SCSI driver
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Mike Anderson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-21 10:59 UTC by jd
Modified: 2006-02-04 21:10 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.10
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg/slabinfo/ps -ef dumps (33.25 KB, text/plain)
2005-01-22 10:28 UTC, jd
Details

Description jd 2005-01-21 10:59:33 UTC
Distribution:   gentoo 2004.3
Hardware Environment:   ppc64/PowerMac G5
Software Environment:
Problem Description:
    kernel oops/hang with Fusion MPT/mptscsih driver and 
    LSI22320/LSI21320 SCSI-card

Steps to reproduce:

   modprobe -a mptscsih

  result:

    Oops: Kernel access of bad area, sig: 11 [#3]
    SMP NR_CPUS=2 POWERMAC 
    NIP: C00000000007F94C XER: 0000000000000000 LR: C00000000007FAD8
    REGS: c00000000ff8f900 TRAP: 0300   Not tainted  (2.6.10-pristine-noide-snd)
    MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
    DAR: 000000013d762008, DSISR: 0000000042000000
    TASK: c00000003ef887b0[6] 'events/0' THREAD: c00000000ff8c000 CPU: 0
    GPR00: 0000000000000038 C00000000FF8FB80 C00000000060AC98 C00000003EFD0B80 
    GPR04: C00000003EFD1410 C00000003D6F4028 0000000000100100 C00000000066B400 
    GPR08: 0000000000200200 000000013D762000 C00000003EB52000 C00000003D6F4280 
    GPR12: C00000003D6F4000 C0000000004D2800 0000000000000000 0000000000000000 
    GPR16: 0000000001400000 C000000000475990 0000000001872FA0 0000000001872FA0 
    GPR20: BFFFFFFFFEC00000 C000000000489000 C00000003EFD07B8 0000000000100100 
    GPR24: C00000003EFD0BB8 C00000003EFD1410 0000000000000018 C00000003EFD0B98 
    GPR28: 000000000000000D C00000003EFD0B80 C0000000004F8258 C00000000FF8FB80 
    NIP [c00000000007f94c] .free_block+0xf0/0x1b8
    LR [c00000000007fad8] .drain_array_locked+0xc4/0x138
    Call Trace:
    [c00000000ff8fb80] [c00000000ff8fc20] 0xc00000000ff8fc20 (unreliable)
    [c00000000ff8fc30] [c00000000007fad8] .drain_array_locked+0xc4/0x138
    [c00000000ff8fcd0] [c000000000081100] .cache_reap+0xc4/0x2b4
    [c00000000ff8fdb0] [c0000000000610a4] .worker_thread+0x254/0x320
    [c00000000ff8fee0] [c000000000067964] .kthread+0x168/0x1bc
    [c00000000ff8ff90] [c000000000013ee0] .kernel_thread+0x4c/0x6c
Comment 1 jd 2005-01-22 10:28:08 UTC
Created attachment 4443 [details]
dmesg/slabinfo/ps -ef  dumps

Few dumps with slab.c/FORCE_DEBUG and mptbase.h/MPT_DEBUG
Should I use CONFIG_DEBUG_SLAB with slab.c ?
Comment 2 jd 2005-02-14 05:38:59 UTC
Software Environment: 2.6.10  pristine
Comment 3 jd 2005-08-04 09:22:31 UTC
Still failing with command "modprobe -r mptscsih"; kernel is 2.6.12.3


[ 3399.025987] Fusion MPT base driver 3.01.20
[ 3399.025994] Copyright (c) 1999-2004 LSI Logic Corporation
[ 3399.026265] PCI: Enabling device: (0001:06:03.0), cmd 7
[ 3399.026300] mptbase: Initiating ioc0 bringup
[ 3399.141540] ioc0: 53C1030: Capabilities={Initiator}
[ 3415.355685] PCI: Enabling device: (0001:06:03.1), cmd 7
[ 3415.355715] mptbase: Initiating ioc1 bringup
[ 3415.472040] ioc1: 53C1030: Capabilities={Initiator}
[ 3415.649725] Fusion MPT SCSI Host driver 3.01.20
[ 3415.649855] scsi5 : ioc0: LSI53C1030, FwRev=01030700h, Ports=1, MaxQ=222, IRQ=53
[ 3419.401423] scsi6 : ioc1: LSI53C1030, FwRev=01030700h, Ports=1, MaxQ=222, IRQ=53
[ 3508.120049] mptbase: Initiating ioc0 recovery
[ 3513.072570] mptbase: ioc0: ERROR - Doorbell ACK timeout (count=4999),
IntStatus=80000000!
[ 3538.370503] Oops: Kernel access of bad area, sig: 11 [#1]
[ 3538.370707] SMP NR_CPUS=2 POWERMAC 
[ 3538.371179] Modules linked in: mptscsih mptbase sungem sungem_phy
[ 3538.371772] NIP: C000000000080864 XER: 00000000 LR: C000000000080758 CTR:
0000000000000000
[ 3538.372370] REGS: c00000003d063800 TRAP: 0300   Not tainted  (2.6.12.3-noide-snd)
[ 3538.372945] MSR: 9000000002003032 EE: 0 PR: 0 FP: 1 ME: 1 IR/DR: 11 CR: 24222442
[ 3538.373518] DAR: 000000013d6a9008 DSISR: 0000000042000000
[ 3538.374035] TASK: c0000000262e0030[20286] 'run-crons' THREAD:
c00000003d060000 CPU: 1
[ 3538.374617] GPR00: 0000000000000016 C00000003D063A80 C000000000622560
0000000000200200 
[ 3538.375259] GPR04: C00000003FF85810 C00000003D4D5028 0000000000000000
C00000003D4D5000 
[ 3538.375893] GPR08: 00000000000001D0 000000013D6A9000 00000000000000B0
C00000003FFD0B98 
[ 3538.376541] GPR12: C00000003FFD0BA8 C0000000004E6800 C00000003D4CF2C0
C000000001E7BBE8 
[ 3538.377190] GPR16: C00000003D4CF2A8 C00000003D4CF2D0 0000000000000000
0000000000000000 
[ 3538.377839] GPR20: C00000003D063EA0 000001FFFFBB1440 00000000000000D0
0000000000000001 
[ 3538.378485] GPR24: C00000003FFD0BF8 C00000003FFD0BB8 C00000003FFD0B98
C00000003FFD0B80 
[ 3538.379134] GPR28: C00000003FF85800 0000000000100100 C0000000004FE190
0000000000000001 
[ 3538.379805] NIP [c000000000080864] .cache_alloc_refill+0x1c8/0x6ec
[ 3538.380363] LR [c000000000080758] .cache_alloc_refill+0xbc/0x6ec
[ 3538.380910] Call Trace:
[ 3538.381341] [c00000003d063a80] [0000001000000004] 0x1000000004 (unreliable)
[ 3538.381933] [c00000003d063b50] [c000000000080440] .kmem_cache_alloc+0x74/0x78
[ 3538.382521] [c00000003d063bd0] [c000000000049460] .copy_process+0x6a8/0x1404
[ 3538.383106] [c00000003d063ce0] [c00000000004a30c] .do_fork+0x94/0x244
[ 3538.383672] [c00000003d063dc0] [c0000000000119e4] .sys_fork+0x28/0x40
[ 3538.384240] [c00000003d063e30] [c00000000000d97c] .ppc_fork+0x8/0xc
[ 3538.384798] Instruction dump:
[ 3538.385250] 91670024 60000000 60000000 801b0070 7f890040 409c000c 7cdf07b4
409aff80 
[ 3538.385922] 2f1f0000 e9670008 e9270000 f92b0000 <f9690008> 80070024 fba70000
f8670008 
Comment 4 jd 2005-08-04 13:02:34 UTC
Hmm, bug/problem seems to be disappeared with 2.6.13-rc5 kernel
(-> MPT version is 3.03.02)

Atleast, both commands (modprobe -a/-r) works now; though,
not sure about how stable system is in the long run...

Note You need to log in before you can comment on or make changes to this bug.