Bug 18652 - mptscsih: ioc0: attempting task abort when heavy disk operations on MPT SAS
mptscsih: ioc0: attempting task abort when heavy disk operations on MPT SAS
Status: RESOLVED OBSOLETE
Product: SCSI Drivers
Classification: Unclassified
Component: Other
All Linux
: P1 normal
Assigned To: scsi_drivers-other
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-09-16 08:14 UTC by ksb
Modified: 2013-12-10 22:06 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.35.4 - 2.6.36.1
Tree: Mainline
Regression: No


Attachments

Description ksb 2010-09-16 08:14:46 UTC
Happens when heavy disk write operations ongoing:
[ 4499.860030] mptscsih: ioc0: attempting task abort! (sc=ffff88007a588200)
[ 4499.860036] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dc f8 9f 00 04 00 00
[ 4499.894551] mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) cb_idx mptbase_reply
[ 4501.256258] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_do
[ 4501.268298] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007a588200)
[ 4503.256426] mptbase: ioc0: LogInfo(0x31120403): Originator={PL}, Code={Abort}, SubCode(0x0403) cb_idx mptscsih_io_done
[ 4503.256439] mptscsih: ioc0: attempting task abort! (sc=ffff88007ab5cc00)
[ 4503.256443] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dc fc 9f 00 04 00 00
[ 4503.256455] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007ab5cc00)
[ 4503.506394] mptscsih: ioc0: attempting task abort! (sc=ffff88007a588000)
[ 4503.506399] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dd 00 9f 00 04 00 00
[ 4503.506412] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007a588000)
[ 4503.506698] mptscsih: ioc0: attempting task abort! (sc=ffff88007ab5c700)
[ 4503.506702] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dd 04 9f 00 02 50 00
[ 4503.506715] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007ab5c700)
[ 4503.506999] mptscsih: ioc0: attempting task abort! (sc=ffff88007a589600)
[ 4503.507002] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dd 06 f7 00 01 80 00
[ 4503.507014] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007a589600)

[.. a lot of them, and ends with..]

[ 4503.525540] mptscsih: ioc0: attempting task abort! (sc=ffff88007becbe00)
[ 4503.525543] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 10 4c 78 ff 00 01 50 00
[ 4503.525555] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88007becbe00)
[ 4503.525858] mptscsih: ioc0: attempting target reset! (sc=ffff88007a588200)
[ 4503.525861] sd 4:0:0:0: [sda] CDB: Write(10): 2a 00 0f dc f8 9f 00 04 00 00
[ 4505.006219] mptscsih: ioc0: target reset: SUCCESS (sc=ffff88007a588200)
Happens when heavy disk write operations ongoing.
Identically on ubuntu's stock 2.6.32-24 and also on custom built 2.6.35.4 and
2.6.36-rc3 kernels.

following on 2.6.35.4 kernel:
# cat /proc/mpt/version
mptlinux-3.04.15
  Fusion MPT base driver
  Fusion MPT SAS host driver

# cat /proc/mpt/summary
ioc0: LSISAS1064E B2, FwRev=01140000h, Ports=1, MaxQ=511, IRQ=17 

# cat /proc/mpt/ioc0/info
ioc0:
  ProductID = 0x2204 (LSISAS1064E B2)
  FWVersion = 0x01140000
  MsgVersion = 0x0105
  FirstWhoInit = 0x00
  EventState = 0x00
  CurrentHostMfaHighAddr = 0x00000000
  CurrentSenseBufferHighAddr = 0x00000000
  MaxChainDepth = 0x60 frames
  MinBlockSize = 0x20 bytes
  RequestFrames @ 0xffff88007a402800 (Dma @ 0x000000007a402800)
    {CurReqSz=128} x {CurReqDepth=511} = 65408 bytes ^= 0x10000
    {MaxReqSz=128}   {MaxReqDepth=511}
  Frames   @ 0xffff88007a400000 (Dma @ 0x000000007a400000)
    {CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880
    {MaxRepSz=0}   {MaxRepDepth=511}
  MaxDevices = 173
  MaxBuses = 1
  PortNumber = 1 (of 1)

# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: TSSTcorp Model: DVD-ROM SH-D162D Rev: SB00
  Type:   CD-ROM                           ANSI  SCSI revision: 05
Host: scsi4 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: WDC WD2500AAKS-0 Rev: 1B01
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi4 Channel: 00 Id: 01 Lun: 00
  Vendor: ATA      Model: WDC WD2500AAKS-0 Rev: 1B01
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi4 Channel: 00 Id: 02 Lun: 00
  Vendor: FUJITSU  Model: MAX3147RC        Rev: 0104
  Type:   Direct-Access                    ANSI  SCSI revision: 03
Host: scsi4 Channel: 00 Id: 03 Lun: 00
  Vendor: FUJITSU  Model: MAX3147RC        Rev: 0104
  Type:   Direct-Access                    ANSI  SCSI revision: 03
Host: scsi4 Channel: 00 Id: 04 Lun: 00
  Vendor: ESG-SHV. Model: SCA HSBP M1..... Rev: 2.15
  Type:   Enclosure                        ANSI  SCSI revision: 03

Happens at least on WDC's (SATA).
Comment 1 pipa.tk 2010-10-13 07:31:36 UTC
I have the same issue, and I am testing the patch from http://lkml.org/lkml/2010/4/26/335

It seems resolved, try it.
Comment 2 ksb 2010-10-14 14:09:18 UTC
Tested again with 2.6.35.7 where the patch already included. Unfortunately nothing changed - errors on heavy writes. Test cases like smartctl don't trigger any errors.
Further tests will be with other sata hdd's.
Comment 3 pipa.tk 2010-10-15 01:10:37 UTC
Yes, this patch not functioned completely. After I apply this patch with 2.6.18.8-xen and run for 2 days, IO still hang every 1~3 hours. It's more less than before. The message shows: 

mptscsih: ioc0: attempting task abort! (sc=ffff88028efdcb40)
sd 4:0:6:0:
        command: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88028efdcb40)
mptscsih: ioc0: attempting task abort! (sc=ffff88028f12e080)
sd 4:0:6:0:
        command: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff88028f12e080)
Comment 4 pipa.tk 2010-10-29 03:34:31 UTC
Which harddisk ary you used, ST31500341AS?

Relate bug: https://bugzilla.kernel.org/show_bug.cgi?id=13594
Comment 5 ksb 2010-11-04 09:59:46 UTC
As I said before - happens only on WD'c, exactly WD2500AAKS. It don't happen on other older hdd's, i.e. WDC WD2500KS.
HDD's are connected to 6-Drive Hot-Swap Expander Board AXX6DRV3GEXP, but when I attach SATA hdd's directly to mainboard - everything works ok, without any errors.
Comment 6 quintin 2010-11-06 11:51:34 UTC
I too am seeing this behavior on 2.6.26 after applying the patch from: http://lkml.org/lkml/2010/4/26/335 - The drives we're using are: WD5000BEKT

Sometimes we have 1 drive drop from the array, other times multiple. The messages from the latest were:

[2149681.954263] EXT3 FS on dm-1, internal journal
[2149681.954396] EXT3-fs: mounted filesystem with ordered data mode.
[2150378.948488] mptscsih: ioc0: attempting task abort! (sc=ffff88000b13b580)
[2150378.948589] sd 1:0:6:0: [sdh] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[2150383.337348] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[2150383.337806] mptsas: ioc0: removing sata device, channel 0, id 6, phy 6
[2150383.337902]  port-1:6: mptsas: ioc0: delete port (6)
[2150383.338043] sd 1:0:6:0: [sdh] Synchronizing SCSI cache
[2150383.554191] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88000b13b580)
[2150383.554292] mptscsih: ioc0: attempting target reset! (sc=ffff88000b13b580)
[2150383.554372] sd 1:0:6:0: [sdh] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[2150383.810191] mptscsih: ioc0: target reset: SUCCESS (sc=ffff88000b13b580)
[2150383.810288] mptscsih: ioc0: attempting bus reset! (sc=ffff88000b13b580)
[2150383.810367] sd 1:0:6:0: [sdh] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[2150384.578124] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff88000b13b580)
[2150394.582457] mptscsih: ioc0: attempting host reset! (sc=ffff88000b13b580)
[2150394.582555] mptbase: ioc0: Initiating recovery
[2150415.384673] mptscsih: ioc0: host reset: SUCCESS (sc=ffff88000b13b580)
[2150415.384773] sd 1:0:6:0: Device offlined - not ready after error recovery
[2150415.384865] end_request: I/O error, dev sdh, sector 976767931
[2150415.384944] md: super_written gets error=-5, uptodate=0
[2150415.385021] raid10: Disk failure on sdh6, disabling device.
[2150415.385022] raid10: Operation continuing on 7 devices.
[2150415.385223] raid10: sdh6: rescheduling sector 439149440
[2150415.385302] raid10: sdh6: rescheduling sector 439149456
[2150415.385378] raid10: sdh6: rescheduling sector 439149464
[2150415.385454] raid10: sdh6: rescheduling sector 439149472
[2150415.385530] raid10: sdh6: rescheduling sector 439149480
[2150415.385606] raid10: sdh6: rescheduling sector 439149488
[2150415.385682] raid10: sdh6: rescheduling sector 439149496
[2150415.385758] raid10: sdh6: rescheduling sector 439149504
[2150415.385834] raid10: sdh6: rescheduling sector 439149512
[2150415.385910] raid10: sdh6: rescheduling sector 439149520
[2150415.385990] end_request: I/O error, dev sdh, sector 20000703
[2150415.386067] md: super_written gets error=-5, uptodate=0
[2150415.386143] raid1: Disk failure on sdh1, disabling device.
[2150415.386144] raid1: Operation continuing on 7 devices.
[2150415.386422] sd 1:0:6:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
[2150415.760573] raid10: Disk failure on sdh5, disabling device.
[2150415.760576] raid10: Operation continuing on 7 devices.
[2150415.786990] RAID1 conf printout:
[2150415.787080]  --- wd:7 rd:8
[2150415.787151]  disk 0, wo:0, o:1, dev:sdb1
[2150415.787225]  disk 1, wo:0, o:1, dev:sdc1
[2150415.787298]  disk 2, wo:0, o:1, dev:sdd1
[2150415.787381]  disk 3, wo:0, o:1, dev:sde1
[2150415.787455]  disk 4, wo:0, o:1, dev:sdf1
[2150415.787528]  disk 5, wo:0, o:1, dev:sdg1
[2150415.787611]  disk 6, wo:1, o:0, dev:sdh1
[2150415.787664]  disk 7, wo:0, o:1, dev:sdi1
[2150415.796490] RAID1 conf printout:
[2150415.796581]  --- wd:7 rd:8
[2150415.796653]  disk 0, wo:0, o:1, dev:sdb1
[2150415.796727]  disk 1, wo:0, o:1, dev:sdc1
[2150415.796800]  disk 2, wo:0, o:1, dev:sdd1
[2150415.796873]  disk 3, wo:0, o:1, dev:sde1
[2150415.796958]  disk 4, wo:0, o:1, dev:sdf1
[2150415.797032]  disk 5, wo:0, o:1, dev:sdg1
[2150415.797105]  disk 7, wo:0, o:1, dev:sdi1
[2150415.813487] RAID10 conf printout:
[2150415.813573]  --- wd:7 rd:8
[2150415.813645]  disk 0, wo:0, o:1, dev:sdb6
[2150415.813730]  disk 1, wo:0, o:1, dev:sdc6
[2150415.813803]  disk 2, wo:0, o:1, dev:sdd6
[2150415.813885]  disk 3, wo:0, o:1, dev:sde6
[2150415.813970]  disk 4, wo:0, o:1, dev:sdf6
[2150415.814044]  disk 5, wo:0, o:1, dev:sdg6
[2150415.814117]  disk 6, wo:1, o:0, dev:sdh6
[2150415.814200]  disk 7, wo:0, o:1, dev:sdi6
[2150415.815864] RAID10 conf printout:
[2150415.815937]  --- wd:7 rd:8
[2150415.816009]  disk 0, wo:0, o:1, dev:sdb6
[2150415.816082]  disk 1, wo:0, o:1, dev:sdc6
[2150415.816156]  disk 2, wo:0, o:1, dev:sdd6
[2150415.816229]  disk 3, wo:0, o:1, dev:sde6
[2150415.816302]  disk 4, wo:0, o:1, dev:sdf6
[2150415.821507]  disk 5, wo:0, o:1, dev:sdg6
[2150415.821580]  disk 7, wo:0, o:1, dev:sdi6
[2150415.843762] RAID10 conf printout:
[2150415.843839]  --- wd:7 rd:8
[2150415.843911]  disk 0, wo:0, o:1, dev:sdb5
[2150415.843984]  disk 1, wo:0, o:1, dev:sdc5
[2150415.844058]  disk 2, wo:0, o:1, dev:sdd5
[2150415.844131]  disk 3, wo:0, o:1, dev:sde5
[2150415.844204]  disk 4, wo:0, o:1, dev:sdf5
[2150415.844278]  disk 5, wo:0, o:1, dev:sdg5
[2150415.844351]  disk 6, wo:1, o:0, dev:sdh5
[2150415.844424]  disk 7, wo:0, o:1, dev:sdi5
[2150415.853059] RAID10 conf printout:
[2150415.853135]  --- wd:7 rd:8
[2150415.853206]  disk 0, wo:0, o:1, dev:sdb5
[2150415.853279]  disk 1, wo:0, o:1, dev:sdc5
[2150415.853352]  disk 2, wo:0, o:1, dev:sdd5
[2150415.853426]  disk 3, wo:0, o:1, dev:sde5
[2150415.853499]  disk 4, wo:0, o:1, dev:sdf5
[2150415.853572]  disk 5, wo:0, o:1, dev:sdg5
[2150415.853645]  disk 7, wo:0, o:1, dev:sdi5
[2150450.460260] mptsas: ioc0: attaching sata device, channel 0, id 6, phy 6
[2150450.466970] scsi 1:0:8:0: Direct-Access     ATA      WDC WD5000BEKT-6 1A01 PQ: 0 ANSI: 5
[2150450.468312] sd 1:0:8:0: [sdj] 976773168 512-byte hardware sectors (500108 MB)
[2150450.475724] sd 1:0:8:0: [sdj] Write Protect is off
[2150450.475802] sd 1:0:8:0: [sdj] Mode Sense: 73 00 00 08
[2150450.481065] sd 1:0:8:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[2150450.481385] sd 1:0:8:0: [sdj] 976773168 512-byte hardware sectors (500108 MB)
[2150450.489103] sd 1:0:8:0: [sdj] Write Protect is off
[2150450.489179] sd 1:0:8:0: [sdj] Mode Sense: 73 00 00 08
[2150450.494628] sd 1:0:8:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[2150450.494731]  sdj: sdj1 sdj2 < sdj5 sdj6 >
[2150450.831656] sd 1:0:8:0: [sdj] Attached SCSI disk

Upon further testing the drive turns out to be fine. I have hardware available & are happy to test patches / perform additional debugging if required.
Comment 7 ksb 2010-11-10 20:53:25 UTC
After testing on 2.6.36 situation seems to be slightly better. Probably it happens little bit rarer, but the problem still exists.
Comment 8 Daobang Wang 2011-08-03 06:31:49 UTC
I test it on 2.6.36.4 with HitachiHDE72101, it has the same issue.

sd 7:0:9:0: [sdj] CDB: cdb[0]=0x28: 28 00 00 18 ac 00 00 01 c8 00
mptbase: ioc0: LogInfo(0x31110700): Originator={PL}, Code={Reset}, SubCode(0x0700) cb_idx mptbase_reply
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_done
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8801937cd1c0)
mptbase: ioc0: LogInfo(0x31110700): Originator={PL}, Code={Reset}, SubCode(0x0700) cb_idx mptscsih_io_done
mptscsih: ioc0: attempting task abort! (sc=ffff8801e562d4c0)
sd 7:0:9:0: [sdj] CDB: cdb[0]=0x2a: 2a 00 00 18 ab 00 00 01 00 00
mptscsih: ioc0: task abort: SUCCESS (sc=ffff8801e562d4c0)
mptscsih: ioc0: attempting task abort! (sc=ffff88021eb6c880)
...............

And i saw the /var/log/messages, there were some log:

Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1627904
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1628160
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1628920
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1629096
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1630120
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1631144
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619456
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1632168
Aug  3 12:00:13 nss12345678 kernel: 
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1616384
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1620992
Aug  3 12:00:13 nss12345678 kernel:  00 00 18 ae 00 00 04 00 00
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1617408
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1618432
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619808
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619968
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619984
Aug  3 12:00:13 nss12345678 kernel:  00 18 b8 18 00 00 e8 00
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619992
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1620224
Aug  3 12:00:13 nss12345678 kernel:  00
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1620480
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1620736
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1621248
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1621504
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1621760
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1622784
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1623808
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1624832
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1625856
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1626880
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1627648
Aug  3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 16
Aug  3 12:00:13 nss12345678 kernel: md: super_written gets error=-5, uptodate=0
Aug  3 12:00:13 nss12345678 kernel: md/raid:md127: Disk failure on sdb, disabling device.
But it was ok when test it with smartctl
Comment 9 Eric 2012-02-14 18:52:44 UTC
I have this issue as well (Bug #42765) in Linux on VMWare, but the patch from lkml is not in ESXi 5.0. I created a VIB to patch the ESXi scsi-mptsas driver and will follow up when I have time to test it.

Mine occurs during high disk write utilization, seemingly independent of what disk is attached to the controller.

Note You need to log in before you can comment on or make changes to this bug.