Bug 16547
Summary: | mptscsih: ioc0: attempting task abort, raid array LUNs not detected properly on some boots | ||
---|---|---|---|
Product: | SCSI Drivers | Reporter: | Martin Steigerwald (martin.steigerwald) |
Component: | Other | Assignee: | scsi_drivers-other |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, io_other, kashyap.desai, ksb, linux-scsi |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32-bpo.5-amd64 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
lspci -nnvv of one of the servers
config for the 2.6.32-5-amd64 debian backport kernel |
Description
Martin Steigerwald
2010-08-09 09:22:06 UTC
Created attachment 27386 [details]
lspci -nnvv of one of the servers
Created attachment 27387 [details]
config for the 2.6.32-5-amd64 debian backport kernel
Some additional information on the MPT driver version and controller: backend01:~# grep -r "" /proc/mpt/* /proc/mpt/ioc0/summary:ioc0: LSIFC949E A1, FwRev=01030e00h, Ports=1, MaxQ=1023, LanAddr=00:06:[...], IRQ=33 /proc/mpt/ioc0/info:ioc0: /proc/mpt/ioc0/info: ProductID = 0x1005 (LSIFC949E A1) /proc/mpt/ioc0/info: FWVersion = 0x01030e00 (fw_size=190556) /proc/mpt/ioc0/info: MsgVersion = 0x0105 /proc/mpt/ioc0/info: FirstWhoInit = 0x00 /proc/mpt/ioc0/info: EventState = 0x00 /proc/mpt/ioc0/info: CurrentHostMfaHighAddr = 0x00000004 /proc/mpt/ioc0/info: CurrentSenseBufferHighAddr = 0x00000004 /proc/mpt/ioc0/info: MaxChainDepth = 0x3e frames /proc/mpt/ioc0/info: MinBlockSize = 0x20 bytes /proc/mpt/ioc0/info: RequestFrames @ 0xffff88043c102800 (Dma @ 0x000000043c102800) /proc/mpt/ioc0/info: {CurReqSz=128} x {CurReqDepth=1023} = 130944 bytes ^= 0x20000 /proc/mpt/ioc0/info: {MaxReqSz=128} {MaxReqDepth=1023} /proc/mpt/ioc0/info: Frames @ 0xffff88043c100000 (Dma @ 0x000000043c100000) /proc/mpt/ioc0/info: {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880 /proc/mpt/ioc0/info: {MaxRepSz=0} {MaxRepDepth=1023} /proc/mpt/ioc0/info: MaxDevices = 255 /proc/mpt/ioc0/info: MaxBuses = 2 /proc/mpt/ioc0/info: PortNumber = 1 (of 1) /proc/mpt/ioc0/info: LanAddr = 00:06:[...] /proc/mpt/ioc0/info: WWN = 2000[...] /proc/mpt/ioc1/summary:ioc1: LSIFC949E A1, FwRev=01030e00h, Ports=1, MaxQ=1023, LanAddr=00:06:2B:11:3B:79, IRQ=31 /proc/mpt/ioc1/info:ioc1: /proc/mpt/ioc1/info: ProductID = 0x1005 (LSIFC949E A1) /proc/mpt/ioc1/info: FWVersion = 0x01030e00 (fw_size=190556) /proc/mpt/ioc1/info: MsgVersion = 0x0105 /proc/mpt/ioc1/info: FirstWhoInit = 0x00 /proc/mpt/ioc1/info: EventState = 0x00 /proc/mpt/ioc1/info: CurrentHostMfaHighAddr = 0x00000004 /proc/mpt/ioc1/info: CurrentSenseBufferHighAddr = 0x00000004 /proc/mpt/ioc1/info: MaxChainDepth = 0x3e frames /proc/mpt/ioc1/info: MinBlockSize = 0x20 bytes /proc/mpt/ioc1/info: RequestFrames @ 0xffff88043c202800 (Dma @ 0x000000043c202800) /proc/mpt/ioc1/info: {CurReqSz=128} x {CurReqDepth=1023} = 130944 bytes ^= 0x20000 /proc/mpt/ioc1/info: {MaxReqSz=128} {MaxReqDepth=1023} /proc/mpt/ioc1/info: Frames @ 0xffff88043c200000 (Dma @ 0x000000043c200000) /proc/mpt/ioc1/info: {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880 /proc/mpt/ioc1/info: {MaxRepSz=0} {MaxRepDepth=1023} /proc/mpt/ioc1/info: MaxDevices = 255 /proc/mpt/ioc1/info: MaxBuses = 2 /proc/mpt/ioc1/info: PortNumber = 1 (of 1) /proc/mpt/ioc1/info: LanAddr = 00:06:[...] /proc/mpt/ioc1/info: WWN = 2000[...] /proc/mpt/summary:ioc0: LSIFC949E A1, FwRev=01030e00h, Ports=1, MaxQ=1023, LanAddr=00:06:2B:11:3B:78, IRQ=33 /proc/mpt/summary:ioc1: LSIFC949E A1, FwRev=01030e00h, Ports=1, MaxQ=1023, LanAddr=00:06:2B:11:3B:79, IRQ=31 /proc/mpt/version:mptlinux-3.04.12 /proc/mpt/version: Fusion MPT base driver /proc/mpt/version: Fusion MPT FC host driver I'm also have something like that: [ 4499.860030] mptscsih: ioc0: attempting task abort! (sc=ffff88007a588200) [ 4499.860036] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dc f8 9f 00 04 00 00 [ 4499.894551] mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) cb_idx mptbase_reply [ 4501.256258] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_done [ 4501.268298] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007a588200) [ 4503.256426] mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) cb_idx mptscsih_io_done [ 4503.256439] mptscsih: ioc0: attempting task abort! (sc=ffff88007ab5cc00) [ 4503.256443] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dc fc 9f 00 04 00 00 [ 4503.256455] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007ab5cc00) [ 4503.506394] mptscsih: ioc0: attempting task abort! (sc=ffff88007a588000) [ 4503.506399] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dd 00 9f 00 04 00 00 [ 4503.506412] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007a588000) ... and so on. Happens when heavy disk write operations ongoing. Identically on ubuntu's stock 2.6.32-24 and also on custom built 2.6.35.4 and 2.6.36-rc3 kernels. cat /proc/mpt/version mptlinux-3.04.17 Fusion MPT base driver Fusion MPT SAS host driver cat /proc/mpt/summary ioc0: LSISAS1064E B2, FwRev=01140000h, Ports=1, MaxQ=511, IRQ=17 cat /proc/mpt/ioc0/info ioc0: ProductID = 0x2204 (LSISAS1064E B2) FWVersion = 0x01140000 MsgVersion = 0x0105 FirstWhoInit = 0x00 EventState = 0x00 CurrentHostMfaHighAddr = 0x00000000 CurrentSenseBufferHighAddr = 0x00000000 MaxChainDepth = 0x60 frames MinBlockSize = 0x20 bytes RequestFrames @ 0xffff88007a502800 (Dma @ 0x000000007a502800) {CurReqSz=128} x {CurReqDepth=511} = 65408 bytes ^= 0x10000 {MaxReqSz=128} {MaxReqDepth=511} Frames @ 0xffff88007a500000 (Dma @ 0x000000007a500000) {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880 {MaxRepSz=0} {MaxRepDepth=511} MaxDevices = 173 MaxBuses = 1 PortNumber = 1 (of 1) (In reply to comment #3) > Some additional information on the MPT driver version and controller: > > backend01:~# grep -r "" /proc/mpt/* > /proc/mpt/ioc0/summary:ioc0: LSIFC949E A1, FwRev=01030e00h, Ports=1, > MaxQ=1023, > LanAddr=00:06:[...], IRQ=33 > /proc/mpt/ioc0/info:ioc0: > /proc/mpt/ioc0/info: ProductID = 0x1005 (LSIFC949E A1) > /proc/mpt/ioc0/info: FWVersion = 0x01030e00 (fw_size=190556) > /proc/mpt/ioc0/info: MsgVersion = 0x0105 > /proc/mpt/ioc0/info: FirstWhoInit = 0x00 > /proc/mpt/ioc0/info: EventState = 0x00 > /proc/mpt/ioc0/info: CurrentHostMfaHighAddr = 0x00000004 > /proc/mpt/ioc0/info: CurrentSenseBufferHighAddr = 0x00000004 > /proc/mpt/ioc0/info: MaxChainDepth = 0x3e frames > /proc/mpt/ioc0/info: MinBlockSize = 0x20 bytes > /proc/mpt/ioc0/info: RequestFrames @ 0xffff88043c102800 (Dma @ > 0x000000043c102800) > /proc/mpt/ioc0/info: {CurReqSz=128} x {CurReqDepth=1023} = 130944 bytes ^= > 0x20000 > /proc/mpt/ioc0/info: {MaxReqSz=128} {MaxReqDepth=1023} > /proc/mpt/ioc0/info: Frames @ 0xffff88043c100000 (Dma @ > 0x000000043c100000) > /proc/mpt/ioc0/info: {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= > 0x2880 > /proc/mpt/ioc0/info: {MaxRepSz=0} {MaxRepDepth=1023} > /proc/mpt/ioc0/info: MaxDevices = 255 > /proc/mpt/ioc0/info: MaxBuses = 2 > /proc/mpt/ioc0/info: PortNumber = 1 (of 1) > /proc/mpt/ioc0/info: LanAddr = 00:06:[...] > /proc/mpt/ioc0/info: WWN = 2000[...] > /proc/mpt/ioc1/summary:ioc1: LSIFC949E A1, FwRev=01030e00h, Ports=1, > MaxQ=1023, > LanAddr=00:06:2B:11:3B:79, IRQ=31 > /proc/mpt/ioc1/info:ioc1: > /proc/mpt/ioc1/info: ProductID = 0x1005 (LSIFC949E A1) > /proc/mpt/ioc1/info: FWVersion = 0x01030e00 (fw_size=190556) > /proc/mpt/ioc1/info: MsgVersion = 0x0105 > /proc/mpt/ioc1/info: FirstWhoInit = 0x00 > /proc/mpt/ioc1/info: EventState = 0x00 > /proc/mpt/ioc1/info: CurrentHostMfaHighAddr = 0x00000004 > /proc/mpt/ioc1/info: CurrentSenseBufferHighAddr = 0x00000004 > /proc/mpt/ioc1/info: MaxChainDepth = 0x3e frames > /proc/mpt/ioc1/info: MinBlockSize = 0x20 bytes > /proc/mpt/ioc1/info: RequestFrames @ 0xffff88043c202800 (Dma @ > 0x000000043c202800) > /proc/mpt/ioc1/info: {CurReqSz=128} x {CurReqDepth=1023} = 130944 bytes ^= > 0x20000 > /proc/mpt/ioc1/info: {MaxReqSz=128} {MaxReqDepth=1023} > /proc/mpt/ioc1/info: Frames @ 0xffff88043c200000 (Dma @ > 0x000000043c200000) > /proc/mpt/ioc1/info: {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= > 0x2880 > /proc/mpt/ioc1/info: {MaxRepSz=0} {MaxRepDepth=1023} > /proc/mpt/ioc1/info: MaxDevices = 255 > /proc/mpt/ioc1/info: MaxBuses = 2 > /proc/mpt/ioc1/info: PortNumber = 1 (of 1) > /proc/mpt/ioc1/info: LanAddr = 00:06:[...] > /proc/mpt/ioc1/info: WWN = 2000[...] > /proc/mpt/summary:ioc0: LSIFC949E A1, FwRev=01030e00h, Ports=1, MaxQ=1023, > LanAddr=00:06:2B:11:3B:78, IRQ=33 > /proc/mpt/summary:ioc1: LSIFC949E A1, FwRev=01030e00h, Ports=1, MaxQ=1023, > LanAddr=00:06:2B:11:3B:79, IRQ=31 > /proc/mpt/version:mptlinux-3.04.12 > /proc/mpt/version: Fusion MPT base driver > /proc/mpt/version: Fusion MPT FC host driver Your bug is completely different issue. Whatever you are point to redhat bugzilla is with respect to SAS controller. In your case it is FC controller. You have mentioned that "Latest kernel known to work: 2.6.26 from Debian Backports" Can you provide me driver version where things are working fine. In case of some working kernel is there, I would like to simply upgrade MPTFUSION driver (do not upgrade a whole kernel). This way I would like to change only one component of the system at a time... This will help to understand where things are broken. FYI, MPTFC drive is highly in mentionation mode. There are very very minimal changes happened to MPTFC driver since 2008. Last change went to upstream for MPTFC is http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=03cb3829e0e5650518ce37e2b4420a35e034dc9e Thanks, Kashyap (In reply to comment #4) > I'm also have something like that: > [ 4499.860030] mptscsih: ioc0: attempting task abort! (sc=ffff88007a588200) > [ 4499.860036] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dc f8 9f 00 04 00 > 00 > [ 4499.894551] mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, > Code={Abort}, SubCode(0x0403) cb_idx mptbase_reply > [ 4501.256258] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO > Executed}, SubCode(0x0000) cb_idx mptscsih_io_done > [ 4501.268298] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007a588200) > [ 4503.256426] mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, > Code={Abort}, SubCode(0x0403) cb_idx mptscsih_io_done > [ 4503.256439] mptscsih: ioc0: attempting task abort! (sc=ffff88007ab5cc00) > [ 4503.256443] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dc fc 9f 00 04 00 > 00 > [ 4503.256455] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007ab5cc00) > [ 4503.506394] mptscsih: ioc0: attempting task abort! (sc=ffff88007a588000) > [ 4503.506399] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dd 00 9f 00 04 00 > 00 > [ 4503.506412] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007a588000) > ... and so on. > Happens when heavy disk write operations ongoing. > Identically on ubuntu's stock 2.6.32-24 and also on custom built 2.6.35.4 and > 2.6.36-rc3 kernels. > > cat /proc/mpt/version > mptlinux-3.04.17 > Fusion MPT base driver > Fusion MPT SAS host driver > > cat /proc/mpt/summary > ioc0: LSISAS1064E B2, FwRev=01140000h, Ports=1, MaxQ=511, IRQ=17 > > cat /proc/mpt/ioc0/info > ioc0: > ProductID = 0x2204 (LSISAS1064E B2) > FWVersion = 0x01140000 > MsgVersion = 0x0105 > FirstWhoInit = 0x00 > EventState = 0x00 > CurrentHostMfaHighAddr = 0x00000000 > CurrentSenseBufferHighAddr = 0x00000000 > MaxChainDepth = 0x60 frames > MinBlockSize = 0x20 bytes > RequestFrames @ 0xffff88007a502800 (Dma @ 0x000000007a502800) > {CurReqSz=128} x {CurReqDepth=511} = 65408 bytes ^= 0x10000 > {MaxReqSz=128} {MaxReqDepth=511} > Frames @ 0xffff88007a500000 (Dma @ 0x000000007a500000) > {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880 > {MaxRepSz=0} {MaxRepDepth=511} > MaxDevices = 173 > MaxBuses = 1 > PortNumber = 1 (of 1) your bug is not similar to first reported bug. Please open new bugzilla. since your product is LSI SAS controller and first bug has been reported for LSI FC controller. thanks, Kashyap (In reply to comment #5) > (In reply to comment #3) > > Some additional information on the MPT driver version and controller: > > > > backend01:~# grep -r "" /proc/mpt/* > > /proc/mpt/ioc0/summary:ioc0: LSIFC949E A1, FwRev=01030e00h, Ports=1, > MaxQ=1023, > > LanAddr=00:06:[...], IRQ=33 > > /proc/mpt/ioc0/info:ioc0: > > /proc/mpt/ioc0/info: ProductID = 0x1005 (LSIFC949E A1) > > /proc/mpt/ioc0/info: FWVersion = 0x01030e00 (fw_size=190556) [...] > Your bug is completely different issue. Whatever you are point to redhat > bugzilla is with respect to SAS controller. I thought it might be related nevertheless. I don't know the inner structure of the MPT driver. It also sounded similar, cause in that bug report there is also the mention that it worked with 2.6.26, but I AFAIR not with 2.6.27. Maybe its a general change in the SCSI layer that triggers the issue. > In your case it is FC controller. Yes, I know. > You have mentioned that > "Latest kernel known to work: 2.6.26 from Debian Backports" > > Can you provide me driver version where things are working fine. Here is the version from a 2.6.26 lenny kernel, which should be the one that has been backported to Etch: pasta:~# modinfo /lib/modules/2.6.26-2-amd64/kernel/drivers/message/fusion/mptfc.ko filename: /lib/modules/2.6.26-2-amd64/kernel/drivers/message/fusion/mptfc.ko version: 3.04.06 license: GPL description: Fusion MPT FC Host driver author: LSI Corporation srcversion: F3D99FE0544BDDD1455BAAA alias: pci:v00001657d00000646sv*sd*bc*sc*i* alias: pci:v00001000d00000646sv*sd*bc*sc*i* alias: pci:v00001000d00000640sv*sd*bc*sc*i* alias: pci:v00001000d00000642sv*sd*bc*sc*i* alias: pci:v00001000d00000626sv*sd*bc*sc*i* alias: pci:v00001000d00000628sv*sd*bc*sc*i* alias: pci:v00001000d00000622sv*sd*bc*sc*i* alias: pci:v00001000d00000624sv*sd*bc*sc*i* alias: pci:v00001000d00000621sv*sd*bc*sc*i* depends: mptscsih,scsi_transport_fc,scsi_mod,mptbase vermagic: 2.6.26-2-amd64 SMP mod_unload modversions parm: mptfc_dev_loss_tmo: Initial time the driver programs the transport to wait for an rport to return following a device loss event. Default=60. (int) parm: max_lun: max lun, default=16895 (int) The 2.6.32 kernel, where we see described issues has: backend01:~# modinfo mptfc filename: /lib/modules/2.6.32-bpo.5-amd64/kernel/drivers/message/fusion/mptfc.ko version: 3.04.12 license: GPL description: Fusion MPT FC Host driver author: LSI Corporation srcversion: 92E350C096B75A9714B8B0E alias: pci:v00001657d00000646sv*sd*bc*sc*i* alias: pci:v00001000d00000646sv*sd*bc*sc*i* alias: pci:v00001000d00000640sv*sd*bc*sc*i* alias: pci:v00001000d00000642sv*sd*bc*sc*i* alias: pci:v00001000d00000626sv*sd*bc*sc*i* alias: pci:v00001000d00000628sv*sd*bc*sc*i* alias: pci:v00001000d00000622sv*sd*bc*sc*i* alias: pci:v00001000d00000624sv*sd*bc*sc*i* alias: pci:v00001000d00000621sv*sd*bc*sc*i* depends: mptscsih,mptbase,scsi_transport_fc,scsi_mod vermagic: 2.6.32-bpo.5-amd64 SMP mod_unload modversions parm: mptfc_dev_loss_tmo: Initial time the driver programs the transport to wait for an rport to return following a device loss event. Default=60. (int) parm: max_lun: max lun, default=16895 (int) backend01:~# > In case of > some working kernel is there, I would like to simply upgrade MPTFUSION driver > (do not upgrade a whole kernel). This way I would like to change only one > component of the system at a time... Well the old 2.6.26 kernel worked. But actually it does not boot on the new servers, cause the old version ata_piix does not talk to the newer onboard SATA controller. Thus it would be required to use a newer ata_piix and a newer MPT FUSION FC driver with 2.6.26 kernel. I don't know whether thats feasible. Its a production machine and I need to be careful with testing. I can only test with agreement of the customer. But for a defined test case it might be workable. Would it be as easy as to replace the directories with the driver source with a newer version? From 2.6.26 to 2.6.32 is quite a step. > This will help to understand where things are broken. I understand. > FYI, > MPTFC drive is highly in mentionation mode. There are very very minimal > changes > happened to MPTFC driver since 2008. > > Last change went to upstream for MPTFC is > > > http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commit;h=03cb3829e0e5650518ce37e2b4420a35e034dc9e I don't think that commit has landed in 2.6.32, since Linus released it on 3rd december 2009. It also does not seem to be in one of the stable patches: ms@mango:~/Linux/Kernel/Mainline> ls ChangeLog-2.6.32* ChangeLog-2.6.32 ChangeLog-2.6.32.16 ChangeLog-2.6.32.3 ChangeLog-2.6.32.1 ChangeLog-2.6.32.17 ChangeLog-2.6.32.4 ChangeLog-2.6.32.10 ChangeLog-2.6.32.18 ChangeLog-2.6.32.5 ChangeLog-2.6.32.11 ChangeLog-2.6.32.19 ChangeLog-2.6.32.6 ChangeLog-2.6.32.12 ChangeLog-2.6.32.2 ChangeLog-2.6.32.7 ChangeLog-2.6.32.13 ChangeLog-2.6.32.20 ChangeLog-2.6.32.8 ChangeLog-2.6.32.14 ChangeLog-2.6.32.21 ChangeLog-2.6.32.9 ChangeLog-2.6.32.15 ChangeLog-2.6.32.22 ms@mango:~/Linux/Kernel/Mainline> grep 03cb3829e0e5650518ce37e2b4420a35e034dc9e ChangeLog-2.6.32* ms@mango:~/Linux/Kernel/Mainline#1> Thanks, Martin Since issue is seen on production system and it is MPTFC controller, I would recommend customer to report this issue to LSI support channel. Thanks, Kashyap |