Bug 18652
Summary: | mptscsih: ioc0: attempting task abort when heavy disk operations on MPT SAS | ||
---|---|---|---|
Product: | SCSI Drivers | Reporter: | ksb |
Component: | Other | Assignee: | scsi_drivers-other |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, bigplum, eric.hidle, jamireste, quintin, tobias.pal, wangdb |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.35.4 - 2.6.36.1 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
ksb
2010-09-16 08:14:46 UTC
I have the same issue, and I am testing the patch from http://lkml.org/lkml/2010/4/26/335 It seems resolved, try it. Tested again with 2.6.35.7 where the patch already included. Unfortunately nothing changed - errors on heavy writes. Test cases like smartctl don't trigger any errors. Further tests will be with other sata hdd's. Yes, this patch not functioned completely. After I apply this patch with 2.6.18.8-xen and run for 2 days, IO still hang every 1~3 hours. It's more less than before. The message shows: mptscsih: ioc0: attempting task abort! (sc=ffff88028efdcb40) sd 4:0:6:0: command: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) mptscsih: ioc0: task abort: SUCCESS (sc=ffff88028efdcb40) mptscsih: ioc0: attempting task abort! (sc=ffff88028f12e080) sd 4:0:6:0: command: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) mptscsih: ioc0: task abort: SUCCESS (sc=ffff88028f12e080) Which harddisk ary you used, ST31500341AS? Relate bug: https://bugzilla.kernel.org/show_bug.cgi?id=13594 As I said before - happens only on WD'c, exactly WD2500AAKS. It don't happen on other older hdd's, i.e. WDC WD2500KS. HDD's are connected to 6-Drive Hot-Swap Expander Board AXX6DRV3GEXP, but when I attach SATA hdd's directly to mainboard - everything works ok, without any errors. I too am seeing this behavior on 2.6.26 after applying the patch from: http://lkml.org/lkml/2010/4/26/335 - The drives we're using are: WD5000BEKT Sometimes we have 1 drive drop from the array, other times multiple. The messages from the latest were: [2149681.954263] EXT3 FS on dm-1, internal journal [2149681.954396] EXT3-fs: mounted filesystem with ordered data mode. [2150378.948488] mptscsih: ioc0: attempting task abort! (sc=ffff88000b13b580) [2150378.948589] sd 1:0:6:0: [sdh] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [2150383.337348] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) [2150383.337806] mptsas: ioc0: removing sata device, channel 0, id 6, phy 6 [2150383.337902] port-1:6: mptsas: ioc0: delete port (6) [2150383.338043] sd 1:0:6:0: [sdh] Synchronizing SCSI cache [2150383.554191] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88000b13b580) [2150383.554292] mptscsih: ioc0: attempting target reset! (sc=ffff88000b13b580) [2150383.554372] sd 1:0:6:0: [sdh] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [2150383.810191] mptscsih: ioc0: target reset: SUCCESS (sc=ffff88000b13b580) [2150383.810288] mptscsih: ioc0: attempting bus reset! (sc=ffff88000b13b580) [2150383.810367] sd 1:0:6:0: [sdh] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00 [2150384.578124] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff88000b13b580) [2150394.582457] mptscsih: ioc0: attempting host reset! (sc=ffff88000b13b580) [2150394.582555] mptbase: ioc0: Initiating recovery [2150415.384673] mptscsih: ioc0: host reset: SUCCESS (sc=ffff88000b13b580) [2150415.384773] sd 1:0:6:0: Device offlined - not ready after error recovery [2150415.384865] end_request: I/O error, dev sdh, sector 976767931 [2150415.384944] md: super_written gets error=-5, uptodate=0 [2150415.385021] raid10: Disk failure on sdh6, disabling device. [2150415.385022] raid10: Operation continuing on 7 devices. [2150415.385223] raid10: sdh6: rescheduling sector 439149440 [2150415.385302] raid10: sdh6: rescheduling sector 439149456 [2150415.385378] raid10: sdh6: rescheduling sector 439149464 [2150415.385454] raid10: sdh6: rescheduling sector 439149472 [2150415.385530] raid10: sdh6: rescheduling sector 439149480 [2150415.385606] raid10: sdh6: rescheduling sector 439149488 [2150415.385682] raid10: sdh6: rescheduling sector 439149496 [2150415.385758] raid10: sdh6: rescheduling sector 439149504 [2150415.385834] raid10: sdh6: rescheduling sector 439149512 [2150415.385910] raid10: sdh6: rescheduling sector 439149520 [2150415.385990] end_request: I/O error, dev sdh, sector 20000703 [2150415.386067] md: super_written gets error=-5, uptodate=0 [2150415.386143] raid1: Disk failure on sdh1, disabling device. [2150415.386144] raid1: Operation continuing on 7 devices. [2150415.386422] sd 1:0:6:0: [sdh] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK [2150415.760573] raid10: Disk failure on sdh5, disabling device. [2150415.760576] raid10: Operation continuing on 7 devices. [2150415.786990] RAID1 conf printout: [2150415.787080] --- wd:7 rd:8 [2150415.787151] disk 0, wo:0, o:1, dev:sdb1 [2150415.787225] disk 1, wo:0, o:1, dev:sdc1 [2150415.787298] disk 2, wo:0, o:1, dev:sdd1 [2150415.787381] disk 3, wo:0, o:1, dev:sde1 [2150415.787455] disk 4, wo:0, o:1, dev:sdf1 [2150415.787528] disk 5, wo:0, o:1, dev:sdg1 [2150415.787611] disk 6, wo:1, o:0, dev:sdh1 [2150415.787664] disk 7, wo:0, o:1, dev:sdi1 [2150415.796490] RAID1 conf printout: [2150415.796581] --- wd:7 rd:8 [2150415.796653] disk 0, wo:0, o:1, dev:sdb1 [2150415.796727] disk 1, wo:0, o:1, dev:sdc1 [2150415.796800] disk 2, wo:0, o:1, dev:sdd1 [2150415.796873] disk 3, wo:0, o:1, dev:sde1 [2150415.796958] disk 4, wo:0, o:1, dev:sdf1 [2150415.797032] disk 5, wo:0, o:1, dev:sdg1 [2150415.797105] disk 7, wo:0, o:1, dev:sdi1 [2150415.813487] RAID10 conf printout: [2150415.813573] --- wd:7 rd:8 [2150415.813645] disk 0, wo:0, o:1, dev:sdb6 [2150415.813730] disk 1, wo:0, o:1, dev:sdc6 [2150415.813803] disk 2, wo:0, o:1, dev:sdd6 [2150415.813885] disk 3, wo:0, o:1, dev:sde6 [2150415.813970] disk 4, wo:0, o:1, dev:sdf6 [2150415.814044] disk 5, wo:0, o:1, dev:sdg6 [2150415.814117] disk 6, wo:1, o:0, dev:sdh6 [2150415.814200] disk 7, wo:0, o:1, dev:sdi6 [2150415.815864] RAID10 conf printout: [2150415.815937] --- wd:7 rd:8 [2150415.816009] disk 0, wo:0, o:1, dev:sdb6 [2150415.816082] disk 1, wo:0, o:1, dev:sdc6 [2150415.816156] disk 2, wo:0, o:1, dev:sdd6 [2150415.816229] disk 3, wo:0, o:1, dev:sde6 [2150415.816302] disk 4, wo:0, o:1, dev:sdf6 [2150415.821507] disk 5, wo:0, o:1, dev:sdg6 [2150415.821580] disk 7, wo:0, o:1, dev:sdi6 [2150415.843762] RAID10 conf printout: [2150415.843839] --- wd:7 rd:8 [2150415.843911] disk 0, wo:0, o:1, dev:sdb5 [2150415.843984] disk 1, wo:0, o:1, dev:sdc5 [2150415.844058] disk 2, wo:0, o:1, dev:sdd5 [2150415.844131] disk 3, wo:0, o:1, dev:sde5 [2150415.844204] disk 4, wo:0, o:1, dev:sdf5 [2150415.844278] disk 5, wo:0, o:1, dev:sdg5 [2150415.844351] disk 6, wo:1, o:0, dev:sdh5 [2150415.844424] disk 7, wo:0, o:1, dev:sdi5 [2150415.853059] RAID10 conf printout: [2150415.853135] --- wd:7 rd:8 [2150415.853206] disk 0, wo:0, o:1, dev:sdb5 [2150415.853279] disk 1, wo:0, o:1, dev:sdc5 [2150415.853352] disk 2, wo:0, o:1, dev:sdd5 [2150415.853426] disk 3, wo:0, o:1, dev:sde5 [2150415.853499] disk 4, wo:0, o:1, dev:sdf5 [2150415.853572] disk 5, wo:0, o:1, dev:sdg5 [2150415.853645] disk 7, wo:0, o:1, dev:sdi5 [2150450.460260] mptsas: ioc0: attaching sata device, channel 0, id 6, phy 6 [2150450.466970] scsi 1:0:8:0: Direct-Access ATA WDC WD5000BEKT-6 1A01 PQ: 0 ANSI: 5 [2150450.468312] sd 1:0:8:0: [sdj] 976773168 512-byte hardware sectors (500108 MB) [2150450.475724] sd 1:0:8:0: [sdj] Write Protect is off [2150450.475802] sd 1:0:8:0: [sdj] Mode Sense: 73 00 00 08 [2150450.481065] sd 1:0:8:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [2150450.481385] sd 1:0:8:0: [sdj] 976773168 512-byte hardware sectors (500108 MB) [2150450.489103] sd 1:0:8:0: [sdj] Write Protect is off [2150450.489179] sd 1:0:8:0: [sdj] Mode Sense: 73 00 00 08 [2150450.494628] sd 1:0:8:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [2150450.494731] sdj: sdj1 sdj2 < sdj5 sdj6 > [2150450.831656] sd 1:0:8:0: [sdj] Attached SCSI disk Upon further testing the drive turns out to be fine. I have hardware available & are happy to test patches / perform additional debugging if required. After testing on 2.6.36 situation seems to be slightly better. Probably it happens little bit rarer, but the problem still exists. I test it on 2.6.36.4 with HitachiHDE72101, it has the same issue. sd 7:0:9:0: [sdj] CDB: cdb[0]=0x28: 28 00 00 18 ac 00 00 01 c8 00 mptbase: ioc0: LogInfo(0x31110700): Originator={PL}, Code={Reset}, SubCode(0x0700) cb_idx mptbase_reply mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000) cb_idx mptscsih_io_done mptscsih: ioc0: task abort: SUCCESS (sc=ffff8801937cd1c0) mptbase: ioc0: LogInfo(0x31110700): Originator={PL}, Code={Reset}, SubCode(0x0700) cb_idx mptscsih_io_done mptscsih: ioc0: attempting task abort! (sc=ffff8801e562d4c0) sd 7:0:9:0: [sdj] CDB: cdb[0]=0x2a: 2a 00 00 18 ab 00 00 01 00 00 mptscsih: ioc0: task abort: SUCCESS (sc=ffff8801e562d4c0) mptscsih: ioc0: attempting task abort! (sc=ffff88021eb6c880) ............... And i saw the /var/log/messages, there were some log: Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1627904 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1628160 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1628920 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1629096 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1630120 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1631144 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619456 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1632168 Aug 3 12:00:13 nss12345678 kernel: Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1616384 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1620992 Aug 3 12:00:13 nss12345678 kernel: 00 00 18 ae 00 00 04 00 00 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1617408 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1618432 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619808 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619968 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619984 Aug 3 12:00:13 nss12345678 kernel: 00 18 b8 18 00 00 e8 00 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1619992 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1620224 Aug 3 12:00:13 nss12345678 kernel: 00 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1620480 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1620736 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1621248 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1621504 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1621760 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1622784 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1623808 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1624832 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1625856 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1626880 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 1627648 Aug 3 12:00:13 nss12345678 kernel: end_request: I/O error, dev sdb, sector 16 Aug 3 12:00:13 nss12345678 kernel: md: super_written gets error=-5, uptodate=0 Aug 3 12:00:13 nss12345678 kernel: md/raid:md127: Disk failure on sdb, disabling device. But it was ok when test it with smartctl I have this issue as well (Bug #42765) in Linux on VMWare, but the patch from lkml is not in ESXi 5.0. I created a VIB to patch the ESXi scsi-mptsas driver and will follow up when I have time to test it. Mine occurs during high disk write utilization, seemingly independent of what disk is attached to the controller. |