Bug 15573

Summary: marvell 9123 sata ahci initialization errors
Product: IO/Storage Reporter: lethalwp
Component: Serial ATAAssignee: Tejun Heo (tj)
Status: RESOLVED CODE_FIX    
Severity: normal CC: crashbit, laurent, lmacken, matrix.use.linux, moojix, phodyssey, postillion, thomas, tj, willp
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34-rc1-next-20100318+ Subsystem:
Regression: No Bisected commit-id:
Attachments: grep ata7 from /var/log/messages to show the various (but same) errors
Patch that adds the PCI ID of the Marvell 9123 sata controller to the ahci driver
disable-fpdma-aa.patch
dmesg with patch from comment #15
A dmesg containing errors (with patch from comment #15 applied)

Description lethalwp 2010-03-18 12:09:19 UTC
Created attachment 25590 [details]
grep ata7 from /var/log/messages to show the various (but same) errors

Hello,


I've plugged a Crucial RealSSD on a marvell 9123 sata3 6gbps chip (mobo asus P7H57D-V EVO), i get some errors at initialisation & retries (this hangs the boot for about a minute), and after the errors, finally works.

The ssd has been tried on an intel sata controller & worked flawlessly, so it must be in the marvell generic-ahci-module. But i haven't tried any other hdd on the marvell.

distro: fedora 13 (alpha), with "upgraded kernel" to check the problem.


If more information is needed, i would be happy to comply =)


The main error seems to be "ata7.00: device reported invalid CHS sector 0", with a command send, but 00000? replied from the controller.



Here is a part of my /var/log/messages (in this case, sdb is the ssd, ata7 & ata8 is the marvell chip):

Mar 18 12:31:14 little kernel: dracut: Mounted root filesystem /dev/sdb5
Mar 18 12:31:14 little kernel: ata9.00: ATAPI: PIONEER DVD-RW  DVR-112D, 1.24, max UDMA/66
Mar 18 12:31:14 little kernel: ata9.00: configured for UDMA/66
Mar 18 12:31:14 little kernel: scsi 8:0:0:0: CD-ROM            PIONEER  DVD-RW  DVR-112D 1.24 PQ: 0 ANSI: 5
Mar 18 12:31:14 little kernel: dracut: Switching root
Mar 18 12:31:14 little kernel: sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
Mar 18 12:31:14 little kernel: Uniform CD-ROM driver Revision: 3.20
Mar 18 12:31:14 little kernel: sr 8:0:0:0: Attached scsi generic sg2 type 5
Mar 18 12:31:14 little kernel: ata7.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen
Mar 18 12:31:14 little kernel: ata7.00: failed command: READ FPDMA QUEUED
Mar 18 12:31:14 little kernel: ata7.00: cmd 60/08:08:4e:f9:47/00:00:09:00:00/40 tag 1 ncq 4096 in
Mar 18 12:31:14 little kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 18 12:31:14 little kernel: ata7.00: status: { DRDY }
Mar 18 12:31:14 little kernel: ata7: hard resetting link
Mar 18 12:31:14 little kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
Mar 18 12:31:14 little kernel: ata7.00: configured for UDMA/133
Mar 18 12:31:14 little kernel: ata7.00: device reported invalid CHS sector 0
Mar 18 12:31:14 little kernel: ata7: EH complete
Mar 18 12:31:14 little kernel: ata7.00: exception Emask 0x0 SAct 0x6 SErr 0x0 action 0x6 frozen
Mar 18 12:31:14 little kernel: ata7.00: failed command: READ FPDMA QUEUED
Mar 18 12:31:14 little kernel: ata7.00: cmd 60/08:08:4e:ee:47/00:00:08:00:00/40 tag 1 ncq 4096 in
Mar 18 12:31:14 little kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 18 12:31:14 little kernel: ata7.00: status: { DRDY }
Mar 18 12:31:14 little kernel: ata7.00: failed command: READ FPDMA QUEUED
Mar 18 12:31:14 little kernel: ata7.00: cmd 60/20:10:c6:4c:4b/00:00:07:00:00/40 tag 2 ncq 16384 in
Mar 18 12:31:14 little kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Mar 18 12:31:14 little kernel: ata7.00: status: { DRDY }
Mar 18 12:31:14 little kernel: ata7: hard resetting link
Mar 18 12:31:14 little kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
Mar 18 12:31:14 little kernel: ata7.00: configured for UDMA/133
Mar 18 12:31:14 little kernel: ata7.00: device reported invalid CHS sector 0
Mar 18 12:31:14 little kernel: ata7.00: device reported invalid CHS sector 0
Mar 18 12:31:14 little kernel: ata7: EH complete
<snip>
Mar 18 12:31:14 little kernel: ata7.00: device reported invalid CHS sector 0
Mar 18 12:31:14 little kernel: ata7: EH complete
Mar 18 12:31:14 little kernel: udev: starting version 151
Mar 18 12:31:14 little kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
Mar 18 12:31:14 little kernel: r8169 0000:01:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
<rest of the boot...>



all ata7 related info grepped from messages has been file attached
Comment 1 lethalwp 2010-03-18 12:11:57 UTC
forgot to add pci ids, in case of:
08:00.0 SATA controller: Device 1b4b:9123 (rev 10)





]$ lspci
00:00.0 Host bridge: Intel Corporation Auburndale/Havendale DRAM Controller (rev 12)
00:02.0 VGA compatible controller: Intel Corporation Auburndale/Havendale Integrated Graphics Controller (rev 12)
00:16.0 Communication controller: Intel Corporation Ibex Peak HECI Controller (rev 06)
00:1a.0 USB Controller: Intel Corporation Ibex Peak USB2 Enhanced Host Controller (rev 06)
00:1c.0 PCI bridge: Intel Corporation Ibex Peak PCI Express Root Port 1 (rev 06)
00:1c.4 PCI bridge: Intel Corporation Ibex Peak PCI Express Root Port 5 (rev 06)
00:1c.5 PCI bridge: Intel Corporation Ibex Peak PCI Express Root Port 6 (rev 06)
00:1c.6 PCI bridge: Intel Corporation Ibex Peak PCI Express Root Port 7 (rev 06)
00:1c.7 PCI bridge: Intel Corporation Ibex Peak PCI Express Root Port 8 (rev 06)
00:1d.0 USB Controller: Intel Corporation Ibex Peak USB2 Enhanced Host Controller (rev 06)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a6)
00:1f.0 ISA bridge: Intel Corporation Ibex Peak LPC Interface Controller (rev 06)
00:1f.2 SATA controller: Intel Corporation Ibex Peak 6 port SATA AHCI Controller (rev 06)
00:1f.3 SMBus: Intel Corporation Ibex Peak SMBus Controller (rev 06)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b2)
05:00.0 PCI bridge: PLX Technology, Inc. Device 8608 (rev ba)
06:01.0 PCI bridge: PLX Technology, Inc. Device 8608 (rev ba)
06:05.0 PCI bridge: PLX Technology, Inc. Device 8608 (rev ba)
06:07.0 PCI bridge: PLX Technology, Inc. Device 8608 (rev ba)
06:09.0 PCI bridge: PLX Technology, Inc. Device 8608 (rev ba)
08:00.0 SATA controller: Device 1b4b:9123 (rev 10)
Comment 2 lethalwp 2010-03-19 14:15:15 UTC
playing with queue_depth seems to fs*k the drive even more, once:


[root@little device]# cat queue_depth 
31
[root@little device]# hdparm -t /dev/sdb

/dev/sdb:
 Timing buffered disk reads:  624 MB in  3.01 seconds = 207.54 MB/sec
[root@little device]# hdparm -t /dev/sdb

/dev/sdb:
 Timing buffered disk reads:  636 MB in  3.00 seconds = 211.86 MB/sec
[root@little device]# echo 2 > queue_depth 
[root@little device]# hdparm -t /dev/sdb

/dev/sdb:
 Timing buffered disk reads:  112 MB in 31.52 seconds =   3.55 MB/sec
[root@little device]# 

before finally:
/dev/sdb:
 Timing buffered disk reads:  read() failed: Input/output error


in the mean time, when the read rate drops (every timeout takes 20/30 seconds), dmesg reads:
ata7.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: cmd 60/00:00:00:79:03/01:00:00:00:00/40 tag 0 ncq 131072 in
         res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: cmd 60/08:08:f8:78:03/00:00:00:00:00/40 tag 1 ncq 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7: hard resetting link
ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
ata7.00: configured for UDMA/133
ata7.00: device reported invalid CHS sector 0
ata7.00: device reported invalid CHS sector 0
ata7: EH complete
ata7.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: cmd 60/00:00:00:2b:00/01:00:00:00:00/40 tag 0 ncq 131072 in
         res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: cmd 60/08:08:f8:2a:00/00:00:00:00:00/40 tag 1 ncq 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7: hard resetting link
ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
ata7.00: configured for UDMA/133
ata7.00: device reported invalid CHS sector 0
ata7.00: device reported invalid CHS sector 0
ata7: EH complete
ata7.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: cmd 60/f0:00:00:c1:01/00:00:00:00:00/40 tag 0 ncq 122880 in
         res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: cmd 60/10:08:f0:c0:01/00:00:00:00:00/40 tag 1 ncq 8192 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7: hard resetting link
ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
ata7.00: qc timeout (cmd 0xec)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata7.00: revalidation failed (errno=-5)
ata7: hard resetting link
ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
ata7.00: qc timeout (cmd 0xec)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata7.00: revalidation failed (errno=-5)
ata7: limiting SATA link speed to 3.0 Gbps
ata7: hard resetting link
ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
ata7.00: qc timeout (cmd 0xec)
ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata7.00: revalidation failed (errno=-5)
ata7.00: disabled
ata7.00: device reported invalid CHS sector 0
ata7.00: device reported invalid CHS sector 0
ata7: hard resetting link
ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
ata7: EH complete
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 01 c0 f0 00 00 10 00
end_request: I/O error, dev sdb, sector 114928
Buffer I/O error on device sdb, logical block 14366
Buffer I/O error on device sdb, logical block 14367
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 01 c1 00 00 00 f0 00
end_request: I/O error, dev sdb, sector 114944
Buffer I/O error on device sdb, logical block 14368
Buffer I/O error on device sdb, logical block 14369
Buffer I/O error on device sdb, logical block 14370
Buffer I/O error on device sdb, logical block 14371
Buffer I/O error on device sdb, logical block 14372
Buffer I/O error on device sdb, logical block 14373
Buffer I/O error on device sdb, logical block 14374
Buffer I/O error on device sdb, logical block 14375
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 01 c1 f0 00 00 10 00
end_request: I/O error, dev sdb, sector 115184
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 09 4e 4c 46 00 00 10 00
end_request: I/O error, dev sdb, sector 156126278
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 09 b8 00 fe 00 00 08 00
end_request: I/O error, dev sdb, sector 163053822
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 0a 96 a6 e6 00 00 08 00
end_request: I/O error, dev sdb, sector 177645286
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 03 c7 b9 be 00 00 08 00
end_request: I/O error, dev sdb, sector 63420862
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 05 51 aa ee 00 00 88 00
end_request: I/O error, dev sdb, sector 89238254
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 09 57 ae ee 00 00 08 00
end_request: I/O error, dev sdb, sector 156741358
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 0a 97 90 d6 00 00 08 00
end_request: I/O error, dev sdb, sector 177705174
Aborting journal on device sdb5-8.
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 0a 97 90 d6 00 00 08 00
end_request: I/O error, dev sdb, sector 177705174
EXT4-fs error (device sdb5): ext4_journal_start_sb: Detected aborted journal
EXT4-fs (sdb5): Remounting filesystem read-only
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 07 62 18 76 00 00 08 00
end_request: I/O error, dev sdb, sector 123869302
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 07 62 18 a6 00 00 08 00
end_request: I/O error, dev sdb, sector 123869350
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 06 61 36 7e 00 00 08 00
end_request: I/O error, dev sdb, sector 107034238
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 06 61 36 8e 00 00 08 00
end_request: I/O error, dev sdb, sector 107034254
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 06 61 36 a6 00 00 08 00
end_request: I/O error, dev sdb, sector 107034278
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 06 61 36 e6 00 00 08 00
end_request: I/O error, dev sdb, sector 107034342
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 06 61 36 e6 00 00 08 00
end_request: I/O error, dev sdb, sector 107034342
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 09 4a 8e e6 00 00 08 00
end_request: I/O error, dev sdb, sector 155881190
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 01 c0 f0 00 00 08 00
end_request: I/O error, dev sdb, sector 114928
JBD2: I/O error detected when updating journal superblock for sdb5-8.
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 09 b8 1b e6 00 00 08 00
end_request: I/O error, dev sdb, sector 163060710
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 03 c7 b9 be 00 00 08 00
end_request: I/O error, dev sdb, sector 63420862
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Write(10): 2a 00 09 57 b6 f6 00 00 08 00
end_request: I/O error, dev sdb, sector 156743414
JBD2: Detected IO errors while flushing file data on sdb5-8
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 02 00 00
end_request: I/O error, dev sdb, sector 0
quiet_error: 61 callbacks suppressed
Buffer I/O error on device sdb, logical block 0
Buffer I/O error on device sdb, logical block 1
Buffer I/O error on device sdb, logical block 2
Buffer I/O error on device sdb, logical block 3
Buffer I/O error on device sdb, logical block 4
Buffer I/O error on device sdb, logical block 5
Buffer I/O error on device sdb, logical block 6
Buffer I/O error on device sdb, logical block 7
Buffer I/O error on device sdb, logical block 8
Buffer I/O error on device sdb, logical block 9
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 02 00 00
end_request: I/O error, dev sdb, sector 0
quiet_error: 55 callbacks suppressed
Buffer I/O error on device sdb, logical block 0
Buffer I/O error on device sdb, logical block 1
Buffer I/O error on device sdb, logical block 2
Buffer I/O error on device sdb, logical block 3
Buffer I/O error on device sdb, logical block 4
Buffer I/O error on device sdb, logical block 5
Buffer I/O error on device sdb, logical block 6
Buffer I/O error on device sdb, logical block 7
Buffer I/O error on device sdb, logical block 8
Buffer I/O error on device sdb, logical block 9
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 02 00 00
end_request: I/O error, dev sdb, sector 0
quiet_error: 55 callbacks suppressed
Buffer I/O error on device sdb, logical block 0
Buffer I/O error on device sdb, logical block 1
Buffer I/O error on device sdb, logical block 2
Buffer I/O error on device sdb, logical block 3
Buffer I/O error on device sdb, logical block 4
Buffer I/O error on device sdb, logical block 5
Buffer I/O error on device sdb, logical block 6
Buffer I/O error on device sdb, logical block 7
Buffer I/O error on device sdb, logical block 8
Buffer I/O error on device sdb, logical block 9
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 02 00 00
end_request: I/O error, dev sdb, sector 0
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 0b 8a bb 36 00 00 20 00
end_request: I/O error, dev sdb, sector 193641270
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 0b 8a bb 36 00 00 08 00
end_request: I/O error, dev sdb, sector 193641270
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 0b 8a bb 36 00 00 08 00
end_request: I/O error, dev sdb, sector 193641270
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 02 00 00
end_request: I/O error, dev sdb, sector 0
quiet_error: 120 callbacks suppressed
Buffer I/O error on device sdb, logical block 0
Buffer I/O error on device sdb, logical block 1
Buffer I/O error on device sdb, logical block 2
Buffer I/O error on device sdb, logical block 3
Buffer I/O error on device sdb, logical block 4
Buffer I/O error on device sdb, logical block 5
Buffer I/O error on device sdb, logical block 6
Buffer I/O error on device sdb, logical block 7
Buffer I/O error on device sdb, logical block 8
Buffer I/O error on device sdb, logical block 9
sd 6:0:0:0: [sdb] Unhandled error code
sd 6:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0
end_request: I/O error, dev sdb, sector 0


and the drive doesn't seem to recover, i think i need a reboot
Comment 3 Tejun Heo 2010-03-23 06:10:25 UTC
Does it also happen with "libata.force=3Gbps" or "libata.force=1.5Gbps"?

Thanks.
Comment 4 lethalwp 2010-03-23 09:14:11 UTC
It behaves slightly differently with the force=3.0Gbps or 1.5Gbps (delaying the errors at boot time after the rootswitch, and at udev load for the last case),
but the errors are still there

first boot in 1.5G, got only 1 time the error, the second boot got half a dozen,
for instance, from the first boot:
ata7.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: cmd 60/20:08:c6:3f:4c/00:00:07:00:00/40 tag 1 ncq 16384 in
ata7.00: status: { DRDY }
ata7: hard resetting link
ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata7.00: configured for UDMA/133
ata7.00: device reported invalid CHS sector 0
ata7: EH complete


iirc hdparm -T (cached) acts better with 1.5 or 3.0 than 6.0


if you wish, i can try to plug another hdd (sata2) on the second marvell port, but i suppose it will behave the same as the first one (sata3)
Comment 5 Tejun Heo 2010-03-23 09:32:31 UTC
Hmmm.... if you have a different hard drive (different model, preferably different vendor), giving it a shot might shed some light on where the problem is.
Comment 6 lethalwp 2010-03-23 17:17:20 UTC
ok i've tried this:
adding a 40GB sata1 hdd on the second port
and after that, replacing it with a 750GB sata2 on the second port.

At boot, only the ata7 (first port) gave the mentioned dmesg problems.
After the boot, playing with queue_depth & hdparm -tT to make the drive read only gave problems on the same first port (when changing the queue of the first drive to confirm the problem), but was ok with the second port, no dmesg at all.


In case it's worth mentioning, when the first port is having a 'retry', communication with the second port seems to be delayed also (hdparm -tT on second port will wait the first one to be ready before continuing, probably some sort of spinlock?)


I've already tried to plug the ssd6gbps on the second port, the behaviour is the same


So, it's crucialssd + 6gbps + marvel combination that gives problems.

I have no other 6gbps drive to test, and no other 6gbps controller
Comment 7 lethalwp 2010-03-27 17:42:37 UTC
i've tested 2.6.34-rc2-git3 kernel,

additional error messages appeared:
same kind of messages:
ata7.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
ata7.00: failed command: WRITE FPDMA QUEUED
ata7.00: cmd 61/10:00:a6:99:c7/00:00:08:00:00/40 tag 0 ncq 8192 out
         res 40/00:04:a0:8e:c6/00:00:03:00:00/40 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7.00: failed command: WRITE FPDMA QUEUED
ata7.00: cmd 61/10:08:be:99:c7/00:00:08:00:00/40 tag 1 ncq 8192 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }
ata7.00: failed command: WRITE FPDMA QUEUED
ata7.00: cmd 61/10:10:d6:99:c7/00:00:08:00:00/40 tag 2 ncq 8192 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata7.00: status: { DRDY }

etc

+new kind messages:
ata7.00: device reported invalid CHS sector 0
sd 6:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 00 00 00 
sd 6:0:0:0: [sdb] Add. Sense: No additional sense information
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 07 46 f4 c6 00 00 08 00
end_request: I/O error, dev sdb, sector 122090694
sd 6:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 6:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 00 00 00 
sd 6:0:0:0: [sdb] Add. Sense: No additional sense information
sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 07 46 f4 ce 00 00 20 00
end_request: I/O error, dev sdb, sector 122090702
ata7: EH complete
ata7.00: NCQ disabled due to excessive errors
ata7.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen
ata7.00: failed command: READ FPDMA QUEUED
ata7.00: cmd 60/28:00:06:d5:c8/00:00:0b:00:00/40 tag 0 ncq 20480 in
         res 40/00:04:a0:8e:c6/00:00:03:00:00/40 Emask 0x4 (timeout)
ata7.00: status: { DRDY }




is there something i can do to help debugging & fix this issue?
Comment 8 Tejun Heo 2010-03-29 04:29:29 UTC
Can you please try libata.force=noncq?  If it alone doesn't work, try to combine it with 3.0 or 1.5Gbps?  ie. "libata.force=noncq,3.0Gbps".

As for the root cause of the problem, I'm a bit skeptical that it's something which can be worked around from driver.  Contacting the drive vendor would be a good idea.

Thanks.
Comment 9 lethalwp 2010-03-29 17:46:11 UTC
when using the noncq, the drives runs at 6gbps, and no errors are shown at all on this 2.6.34-rc2-git3 kernel

i will send an email to asus, marvel & crucial, but i don't have high hopes to get something else than a generic response :/
Comment 10 Luke Macken 2010-03-29 21:25:35 UTC
I am able to reproduce this issue with an Asus P6X58D mobo + Marvell 88SE9123 sata3 6g controller + Crucial C300 + kernel-2.6.33.1-19.fc13.x86_64
Comment 11 Luke Macken 2010-03-30 08:31:44 UTC
Created attachment 25763 [details]
Patch that adds the PCI ID of the Marvell 9123 sata controller to the ahci driver

This patch appears to get things working properly...

    ata7: SATA max UDMA/133 irq_stat 0x80400040, connection status changed irq 56
    ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
    ata7.00: ATA-8: C300-CTFDDAC256MAG, 0001, max UDMA/133
    ata7.00: 500118192 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
    ata7.00: configured for UDMA/133
Comment 12 lethalwp 2010-03-30 09:55:35 UTC
It doesn't seem to work for me:

tried on kernel 2.6.34-rc1-next-20100318+ and 2.6.34-rc2-git3, where i've patched the ahci.c file & recompiled with make (didn't do a mrproper)

if i boot without the noncq, i still have the messages, tried a hotreboot & cold boot


should i try on the fc13 kernel? or something else?
Comment 13 lethalwp 2010-03-30 09:57:01 UTC
does it still work for you if you change the queue_depth & hdparm -tT the drive? (this gets the drive unresponding for me after a while, forcing a reboot, be sure to sync before the test)
Comment 14 Luke Macken 2010-03-30 19:17:56 UTC
Ok, so my patch seems to fix the initial timeout issue during startup.  However, during heavy I/O, similar problems occur.  pci=nomsi gets me further along, but it still unstable.
Comment 15 Tejun Heo 2010-03-31 01:55:46 UTC
Created attachment 25770 [details]
disable-fpdma-aa.patch

The patch attached in comment#11 is no-op as long as the controller's class code indicates ahci.  The default board id is board_ahci anyway.

Does the attached patch make any difference?
Comment 16 Luke Macken 2010-03-31 03:20:01 UTC
(In reply to comment #15)
> Created an attachment (id=25770) [details]
> disable-fpdma-aa.patch
> 
> The patch attached in comment#11 is no-op as long as the controller's class
> code indicates ahci.  The default board id is board_ahci anyway.
> 
> Does the attached patch make any difference?

This gives me a usable OS under minimal I/O, as did the other patch.  Under high I/O with this patch I'm no longer getting the previous ata exceptions, but now ext4 errors:

    EXT4-fs error (device dm-1) in ext4_reserve_inode_write: IO failure
    EXT4-fs warning (device dm-1): ext4_delete_inode:
    EXT4-fs(dm-1): couln't mark inode dirty (err -5)
    I/O error while writing superblock
    JDB2: Detected IO errors while flushing file data on dm-1-8
Comment 17 Tejun Heo 2010-03-31 04:55:03 UTC
Luck, can you please attach full kernel log including boot messages and the error log?
Comment 18 Luke Macken 2010-03-31 05:12:22 UTC
Created attachment 25771 [details]
dmesg with patch from comment #15
Comment 19 Luke Macken 2010-03-31 05:18:48 UTC
Created attachment 25772 [details]
A dmesg containing errors (with patch from comment #15 applied)
Comment 20 Tejun Heo 2010-03-31 07:09:15 UTC
The drive caused timeout and then completely checked out.  libata tried to talk to it several times and then gave up leaving ext4 with a filesystem on top of a dead block device.

* Just to clarify, a traditional hard drive works fine w/ NCQ enabled connected to the marvell controller, right?

* Does marvell controller + the ssd combination work under windows w/ NCQ enabled?

Thanks.
Comment 21 lethalwp 2010-03-31 07:25:12 UTC
Just got an email back from crucial, quoting:

Thank you for contacting us. We know that with our M225 line of SSDs you sometimes need to disable NCQ (native command queuing) to avoid just the type of errors you're seeing. Our recommendation for the M225 is to add libata.force=noncq to your Linux kernel boot options, under the kernel ATA library option.

I have sent your feedback to the engineers working on the C300, and asked them to please pass it on to the firmware team. I have been notified that they are in the process of testing and finalizing a new firmware version, that you can expect to see released around the end of April. We’ll keep you posted as to when it will be available for download.


---

A far as the ahci+NCQ under windows, it worked for me, but i'll double check after work
Comment 22 lethalwp 2010-03-31 13:46:28 UTC
and just got another reply from asus, translating:

"we don't support linux for your motherboard, we would if the problem occured in windows"
Comment 23 Tejun Heo 2010-04-01 01:44:17 UTC
Hmmm... I suppose we'll need to blacklist NCQ for the drive w/ old and current firmwares.  Can you please post "hdparm -I" output of the drive?  Also, please notify here if they release new firmware which fixes NCQ.

Thanks.
Comment 24 lethalwp 2010-04-03 13:11:58 UTC
AHCI is enabled in windows, so i suppose NCQ is too, only way i found to check(?) was by using 3rd party EVEREST tool which says NCQ is supported on the drive.


Here is the hdparm -I of the drive:
]# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
        Model Number:       C300-CTFDDAC128MAG                      
        Serial Number:      0000000010080000873B
        Firmware Revision:  0001    
Standards:
        Used: unknown (minor revision code 0x0028) 
        Supported: 8 7 6 5 
        Likely used: 8
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:  250069680
        LBA    user addressable sectors:  250069680
        LBA48  user addressable sectors:  250069680
        Logical  Sector size:                   512 bytes
        Physical Sector size:                   512 bytes
        Logical Sector-0 offset:                  0 bytes
        device size with M = 1024*1024:      122104 MBytes
        device size with M = 1000*1000:      128035 MBytes (128 GB)
        cache/buffer size  = unknown
        Nominal Media Rotation Rate: Solid State Device
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 1
        Advanced power management level: disabled
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4 
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Advanced Power Management feature set
                Power-Up In Standby feature set
                SET_MAX security extension
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
                Media Card Pass-Through
           *    General Purpose Logging feature set
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
           *    64-bit World wide name
           *    IDLE_IMMEDIATE with UNLOAD
                Write-Read-Verify feature set
           *    WRITE_UNCORRECTABLE_EXT command
           *    {READ,WRITE}_DMA_EXT_GPL commands
           *    Segmented DOWNLOAD_MICROCODE
           *    Gen1 signaling speed (1.5Gb/s)
           *    Gen2 signaling speed (3.0Gb/s)
           *    unknown 76[3]
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
                Device-initiated interface power management
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
           *    Data Set Management TRIM supported
Security: 
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
                supported: enhanced erase
        10min for SECURITY ERASE UNIT. 10min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 507500a13b870000
        NAA             : 5
        IEEE OUI        : 07500a
        Unique ID       : 13b870000
Checksum: correct



I'd like to remind that the problem occurs only when on the marvell 6gbps controller, not when on the intel 3gbps one

It seems the new firmware that should be out at the end of the month also fixes corruption issues (& slow startup after trimming),  i don't have those problems, just mentionning =)
Comment 25 Tejun Heo 2010-04-05 01:53:22 UTC
Patch sent upstream.  Resolving as FIXED for now.  Please report whether the new firmware fixes NCQ here.  Thanks.

  http://article.gmane.org/gmane.linux.ide/45663
Comment 26 Luke Macken 2010-05-04 17:00:03 UTC
Crucial released new firmware for the C300 today:
   
    http://www.crucial.com/support/firmware.aspx

I'm currently unable to test this at the moment, so hopefully someone else can see if it works under the Marvell SATA 6g controller...
Comment 27 lethalwp 2010-05-04 22:53:09 UTC
Thank you Luke for noticing the update, looked a couple of days ago, it wasn't there yet =)



But i'm a bit disapointed, it's still a no-go.

After the backup, update (it WILL delete all your data), restore, i've had the same symptoms on the FC13-alpha rescue cd kernel & my 2.6.32-rc2-git3 kernel (before the patch).


Since it took me the whole evening to backup/upgrade/restore to a bootable half system (win will wait :p ), i'll add more information in the next couple of days
Comment 28 Pho Dyssey 2010-05-05 17:50:26 UTC
I also have this bug, but i'm not using a C300.

My setup:

- Asus P6X58D Premium with onboard Marvell 9123 sata3 controller
- two seagate Barracuda XT SATA 6Gb/s 2TB
- Fedora 13 beta (with all updates)
- 2.6.33.3-72.fc13.x86_64

In the bios, the Marvell controler is set to AHCI. When i set it to IDE, the linux kernel doesn't even find the drive!

---------------

When i plug one of the drive (whichever or both) in the marvell controller i get some errors after 1 or 2 minutes of high load transfert (several GB):

kernel: ata8.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
kernel: ata8.00: failed command: WRITE FPDMA QUEUED
kernel: ata8.00: cmd 61/00:00:00:3c:b0/04:00:01:00:00/40 tag 0 ncq 524288 out
kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
kernel: ata8.00: status: { DRDY }

and the sata link is reset and i have to hard reboot:

kernel: ata8: hard resetting link
kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
kernel: ata8.00: configured for UDMA/133
kernel: ata8.00: device reported invalid CHS sector 0

--------------

I have tried 2.6.34-0.38.rc5.git0.fc14.x86_64 with no success (same errors).

--------------

Both drives work ok (as far as i can tell) when they are connected in the sata2 controller.

--------------

I have tried to boot with libata.force=noncq. I usually get farther (may be 10 minutes of sustained writing). But at some (random?) point i get another error. I haven't been able to log it (drive fails before dumping to /var/log/messages), but if i remember correctly it's something like:

kernel: ata8.00: failed command: WRITE DMA EXT
kernel: ata8.00: failed command: READ DMA EXT

-----------

No matter what controller i use and no matter if i booted with libata.force=noncq or not, hdparm always give the same numbers (which are sata2 numbers aren't they?):
hdparm -t /dev/sdc1
 Timing buffered disk reads:  414 MB in  3.01 seconds = 137.54 MB/sec

I expect transfer rates in the order of 200MB/sec

-----------

The status of this bug is currently "Resolved". Is it true for my case or only for the C300 drive? Where can i find a rpm for a kernel with the patch applied? Will it appear in fedora-rawhide soon?

-----------

If you need any further info (dmesg, /var/log/messages, etc.), please let me know.

Pho.
Comment 29 lethalwp 2010-05-05 19:21:27 UTC
Pho: This bug is bypassed for the C300 fw 001 drive: it forces the libata=noncq to the drive which bypasses the problem by limiting the drives abilities.

So it won't apply to your drive model.

But i'm surprised the noncq parameter didn't fix it for you :/ This means the noncq could be only part of your problem.



Btw, have you tried to put your drive on the intel sata ports? Have you tried another sata3 cable for your drive too?  (i think asus mobos come with 2 sata3 cables, and 2 sata*classic* cables)
Comment 30 Pho Dyssey 2010-05-05 19:47:14 UTC
Thank you lethalwp for your reply.

I have 2 sata3 drives and 1 sata2 drive currently pluged-in. As i am typing these lines, they are all connected through the Intel ICH10R sata2 ports (there are 6 such ports on the board). With this configuration, everything works fine, but of course all drives are running at sata2 speed.

I have indeed 2 sata3 cables. There is a 6Gb/s writing on them. Those are the cables that i have used when i tried the sata3 drives in the Marvell ports. No matter which one of the two sata3 cable i choose, no matter which one of the two sata 3 drive i choose, i get the errors mention in comment #28 when i plug them in the Marvell ports. It would be really surprising if both cables or both drives had defects. So my guess is: 1) the Marvell controller is broken or 2) there is a bug in the kernel. BTW the computer is brand new.

Pho
Comment 31 Luke Macken 2010-05-05 22:49:23 UTC
P6X58D Premium 0808 BIOS (beta) was released a couple of weeks ago. It only claims to improve memory compatibility, but is probably still worth testing against.

So, it is sounding like this bug should probably be reopened, as it seems to be a problem with the Marvell SATA3 controller and not the C300 drive specifically.
Comment 32 Pho Dyssey 2010-05-06 01:33:03 UTC
Thank you for the suggestion. I upgraded from bios version 0703 to bios version 0808 and reconnected one of the drive in the Marvell port. Unfortunately, no change. After 4 mins of continuous writing to the drive, i get some 

kernel: ata2.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen 
kernel: ata2.00: failed command: WRITE FPDMA QUEUED
kernel: ata2.00: cmd 61/00:00:00:88:39/04:00:53:00:00/40 tag 0 ncq 524288 out
kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
kernel: ata2.00: status: { DRDY }

and then contact is lost.

Again, if i can provide more information to help spot where the bug is, please tell me.

Pho.
Comment 33 Tejun Heo 2010-05-06 14:56:55 UTC
cc'ing Mark.

Mark, it looks like maybe there's a PHY compatibility issues at 6Gbps.  6Gbps Seagates are timing out.  Do you have any contact in Marvell who can look into this?

Pho: Does forcing lower link speed make the situation better?

Thanks.
Comment 34 Pho Dyssey 2010-05-06 22:33:56 UTC
I don't know how i can force lower link speed.

However, i did this: i plugged a Barracuda SATA 3Gb/s 1-TB disk in the Marvell controller (i have 3 seagate disks, 2 are sata3 and 1 is sata2, so i tried with the sata2 drive this time).

Messages in /var/log/messages confirm that the link is at 3Gb/s.
kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 370)
kernel: ata2.00: ATA-8: ST31000333AS, CC1F, max UDMA/133
kernel: ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)
kernel: ata2.00: configured for UDMA/133

The situation is not better. After 2 mins of file transfer, the errors appear again and the link is broken:
kernel: ata2.00: exception Emask 0x0 SAct 0x1fffff SErr 0x0 action 0x6 frozen
kernel: ata2.00: failed command: WRITE FPDMA QUEUED
kernel: ata2.00: cmd 61/00:00:3f:90:58/04:00:3c:00:00/40 tag 0 ncq 524288 out
kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
kernel: ata2.00: status: { DRDY }
kernel: ata2: hard resetting link
kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 370)
kernel: ata2.00: qc timeout (cmd 0xec)
kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
kernel: ata2.00: revalidation failed (errno=-5)
kernel: ata2: hard resetting link
kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 370)
kernel: ata2.00: qc timeout (cmd 0xec)
kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
kernel: ata2.00: revalidation failed (errno=-5)
kernel: ata2: limiting SATA link speed to 1.5 Gbps
kernel: ata2: hard resetting link
kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Pho.
Comment 35 Mark Lord 2010-05-07 01:02:34 UTC
I have zero information on any of Marvell's AHCI chips.

Saeed Bishara <saeed@marvell.com> is the person to contact -- he also sometimes posts patches to linux-ide.
Comment 36 Tejun Heo 2010-05-07 05:28:51 UTC
You can force the link speed with libata.force=3Gbps or libata.force=1.5Gbps.  Can you please try both?
Comment 37 Pho Dyssey 2010-05-07 11:58:04 UTC
ata: failed to parse force parameter "3Gbps" (unknown value)

I had to replace libata.force=3Gbps by libata.force=3.0Gbps ...

In both cases (libata.force=3.0Gbps or libata.force=1.5Gbps) the same error happens again. It looks like forcing the link speed down doesn't help.

Pho
Comment 38 Pho Dyssey 2010-05-07 15:08:19 UTC
Just for the record, here are two links where other people seem to also have problems with their Marvell 9123 controller:
https://bugs.launchpad.net/ubuntu/+bug/550559
http://ubuntuforums.org/showthread.php?t=1396465

Pho.
Comment 39 lethalwp 2010-05-07 16:01:40 UTC
Pho: have you checked if you had the latest firmware of the drives? (be
careful, it can be data destructive! don't do it on all directly, test on one
first)

from seagate.com -> downloads -> the firmware topic goes to a knowledge base
on:
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931&NewLang=en

but in case your drive isn't listed, check again from the main site


Or another possibility is that a firmware update for your drive hasn't be released yet
Comment 40 Pho Dyssey 2010-05-07 17:13:30 UTC
My drives:
- ST32000641AS (sata3 2T), 2 of them
- ST31000333AS (sata2 1T)

I found no firmware update for the 2T drives, but i found one for the 1T drive. However on the download page it is written:
"Note: If your drive has CC firmware, your drive is not affected and no further action is required. Attempting to flash the firmware of a drive with CC firmware will result in rendering your drive inoperable."
After checking with their firmware update utility, the firmwares on my drives are:
- ST32000641AS : CC13
- ST31000333AS : CC1F
So i decided not to flash the firmware on the 1T drive.

--------------------------

On boot screen it is shown that the marvell controller has firmware/bios version 1.0.0.1012. A little googling shows that there exists a 1.0.0.1027 version somewhere (and maybe even 1030). How can i upgrade to 1.0.0.1027? I haven't found any info on marvell's site.

Pho.
Comment 41 Pho Dyssey 2010-05-07 21:44:54 UTC
Correction to the second part of comment #40.

The exact wording of what i see at boot time is:
Marvell 88SE91xx Adapter - BIOS Version 0.0.0.1012

What i have found on the Internet looks like a windows driver for the controller:
Asus P6X58D Premium Marvell 9123 Controller Driver 1.0.0.1027 WHQL

Pho.
Comment 42 Luke Macken 2010-05-09 11:32:06 UTC
(In reply to comment #35)
> I have zero information on any of Marvell's AHCI chips.
> 
> Saeed Bishara <saeed@marvell.com> is the person to contact -- he also
> sometimes
> posts patches to linux-ide.

I contacted Saeed, who forwarded this issue to the relevant people at Marvell.
Comment 43 lethalwp 2010-05-15 14:03:11 UTC
is it me or did crucial remove the C300 firmware?
(which means they have troubles with it)
Comment 44 Luke Macken 2010-05-15 19:08:28 UTC
(In reply to comment #43)
> is it me or did crucial remove the C300 firmware?
> (which means they have troubles with it)

Yep, it looks like this firmware update bricked many drives...

http://forum.crucial.com/t5/Solid-State-Drives-SSD/C300-Firmware-update-and-Feedback-thread/td-p/12363
Comment 45 matrix 2010-07-02 02:01:54 UTC
(In reply to comment #28)
> I also have this bug, but i'm not using a C300.
> 
> My setup:
> 
> - Asus P6X58D Premium with onboard Marvell 9123 sata3 controller
> - two seagate Barracuda XT SATA 6Gb/s 2TB
> - Fedora 13 beta (with all updates)
> - 2.6.33.3-72.fc13.x86_64
> 
> In the bios, the Marvell controler is set to AHCI. When i set it to IDE, the
> linux kernel doesn't even find the drive!
> 
> ---------------
> 
> When i plug one of the drive (whichever or both) in the marvell controller i
> get some errors after 1 or 2 minutes of high load transfert (several GB):
> 
> kernel: ata8.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6
> frozen
> kernel: ata8.00: failed command: WRITE FPDMA QUEUED
> kernel: ata8.00: cmd 61/00:00:00:3c:b0/04:00:01:00:00/40 tag 0 ncq 524288 out
> kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
> kernel: ata8.00: status: { DRDY }
> 
> and the sata link is reset and i have to hard reboot:
> 
> kernel: ata8: hard resetting link
> kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 370)
> kernel: ata8.00: configured for UDMA/133
> kernel: ata8.00: device reported invalid CHS sector 0
> 
> --------------
> 
> I have tried 2.6.34-0.38.rc5.git0.fc14.x86_64 with no success (same errors).
> 
> --------------
> 
> Both drives work ok (as far as i can tell) when they are connected in the
> sata2
> controller.
> 
> --------------
> 
> I have tried to boot with libata.force=noncq. I usually get farther (may be
> 10
> minutes of sustained writing). But at some (random?) point i get another
> error.
> I haven't been able to log it (drive fails before dumping to
> /var/log/messages), but if i remember correctly it's something like:
> 
> kernel: ata8.00: failed command: WRITE DMA EXT
> kernel: ata8.00: failed command: READ DMA EXT
> 
> -----------
> 
> No matter what controller i use and no matter if i booted with
> libata.force=noncq or not, hdparm always give the same numbers (which are
> sata2
> numbers aren't they?):
> hdparm -t /dev/sdc1
>  Timing buffered disk reads:  414 MB in  3.01 seconds = 137.54 MB/sec
> 
> I expect transfer rates in the order of 200MB/sec
> 
> -----------
> 
> The status of this bug is currently "Resolved". Is it true for my case or
> only
> for the C300 drive? Where can i find a rpm for a kernel with the patch
> applied 
> Will it appear in fedora-rawhide soon?
> 
> -----------
> 
> If you need any further info (dmesg, /var/log/messages, etc.), please let me
> know.
> 
> Pho.

I have 2 wd1002faex (sata3) connected to marvell 9123 controller, hd connected to port 0 show the same error...

Details -> http://vip.asus.com/forum/view.aspx?id=20100702050055531&board_id=1&model=P6X58D+Premium&page=1&SLanguage=en-us

For this hd i can't found any firmware... news from Marvell ?

m4tr1x
Comment 46 Crashbit 2010-07-31 13:40:07 UTC
I have the same problem posted at comment #45, I'm using Ubuntu Maverick Meerkat with 2.6.35-12 kernel.

The Barracuda HD work fines if I connect to the JMICRON controller, but it fails if I connect to Marvell controller.

Sorry for my English
Comment 47 moojix 2010-08-06 14:40:52 UTC
I have the same ata errors (frozen/timeout) with my new MoBo/C300 system.
(https://bugs.launchpad.net/ubuntu/+bug/550559/comments/11).
I got only a generic non-relevant response from Asus to my support-request.

2.6.32-24 kernel (64-bit Ubuntu Lucid)
Asus P7P55D-E PREMIUM (firmware 1002)
Crucial C300-CTFDDAC256M (firmware 002)

lspci: SATA controller: Device 1b4b:9123 (rev 10)

I could solve this issue only with the recommended workaround libata.force=noncq.
Comment 48 postillion 2010-08-13 09:47:09 UTC
While setting up a new system with two C300's and a Marvell 9128 onboard
controller as successor to Marvell 9123 (only difference is AFAIK RAID
support and a newer firmware Rev.11) I ran into the same problems here.
Notably only one of both drives threw errors. My Configuration:

Gigabyte GA-X58A-UD7 Rev.1 BIOS F6-F7x, Marvell (88SE)9128 (rev.11),
2 x C300-CTFDDAC128M Rev.0002, openSUSE 11.3, kernel 2.6.34 w/, w/o
ATA_HORKAGE_NONCQ patch for C300 Rev.0002


1. with NCQ support (because drive with Rev.0002 doesn't match Rev.0001
in libata-core.c)-->queue_depth:31

...
ata12.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6
ata12.00: irq_stat 0x40000008
ata12.00: failed command: READ FPDMA QUEUED
ata12.00: cmd 60/20:00:e0:b4:c9/00:00:03:00:00/40 tag 0 ncq 16384 in
          res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x403 (HSM violation) <F>
ata12: hard resetting link
...
ata12.00: exception Emask 0x1 SAct 0xc SErr 0x0 action 0x0
ata12.00: irq_stat 0x40000008
ata12.00: failed command: READ FPDMA QUEUED
ata12.00: cmd 60/00:10:00:1c:80/04:00:02:00:00/40 tag 2 ncq 524288 in
          res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
ata12.00: status: { DRDY }
...
ata12.00: NCQ disabled due to excessive errors
...
ata12: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata12.00: configured for UDMA/100
ata12.00: device reported invalid CHS sector 0
ata12: EH complete
...

Configured as boot/root device, this drive was absolutely not usable. Typically,
failed commands contained "READ FPDMA QUEUED". Heavy filesystem corruption; new
installation. 


2. without NCQ support (patched libata-core.c to
...
{ "C300-CTFDDAC128MAG", "0002",         ATA_HORKAGE_NONCQ, },
...
)-->queue_depth:1

...
ata12.00: ATA-9: C300-CTFDDAC128MAG, 0002, max UDMA/100
ata12.00: 250069680 sectors, multi 16: LBA48 NCQ (not used)
ata12.00: configured for UDMA/100
...
ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata12.00: irq_stat 0x40000001
ata12.00: failed command: READ DMA
ata12.00: cmd c8/00:00:60:aa:11/00:00:00:00:00/e4 tag 0 dma 131072 in
          res 51/04:01:01:00:00/00:00:04:00:00/00 Emask 0x1 (device error)
ata12.00: status: { DRDY ERR }
ata12.00: error: { ABRT }
...
ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata12.00: irq_stat 0x40000001
ata12.00: failed command: READ DMA
ata12.00: cmd c8/00:f8:08:24:00/00:00:00:00:00/e4 tag 0 dma 126976 in
          res 51/04:01:01:00:00/00:00:04:00:00/00 Emask 0x1 (device error)
ata12.00: status: { DRDY ERR }
ata12.00: error: { ABRT }
ata12.00: configured for UDMA/100
ata12: EH complete
...

Drive was usable, but when failed command "READ DMA" appeared, it timed out. No
filesystem corruption, but quiet unsatisfying. Top read speeds showed 310 MB/s
when not interrupted by an ATA exception.

Again, the other C300 never issued any errors. So I changed the original SATA II
cables shipped with the motherboard between SSD and ROM drive. All errors stopped
immediately. SATA-IO (http://www.sata-io.org) recommends high quality cables for
use with 6G devices but technical specifications for SATA II/III are the same.

After switching back to NCQ, both drives perform very well. Max. write speed:

dd if=/dev/zero of=testfile bs=4M count=1024 conv=fsync
4294967296 bytes (4.3 GB) copied, 29.8529 s, 144 MB/s

Max read speed:

dd if=/dev/sde of=/dev/null conv=fsync bs=4M count=1024
4294967296 bytes (4.3 GB) copied, 11.5944 s, 370 MB/s

But when running two dd's on both drives at same time: 

dd if=/dev/sde of=/dev/null conv=fsync bs=4M
128035676160 bytes (128 GB) copied, 601.066 s, 213 MB/s

dd if=/dev/sdd of=/dev/null conv=fsync bs=4M
128035676160 bytes (128 GB) copied, 602.902 s, 212 MB/s

This costs 25 % of I/O wait so obviously the controller isn't able to
handle more than ~ 430 MB/s.

Cheers
Postillion
Comment 49 Will P 2010-11-13 16:35:15 UTC
I am experiencing this same problem using Marvell 9128 sata controller with 2x2TB drives in raid1 mirrored configuration, on Fedora 14 ( 2.6.35.6-48 kernel ).  

It only happens under high IO load.

Setting libata.force=noncq so far has eliminated the problem (with some performance hit.)  I have only tested with libata.force=noncq for about half an hour, so I'm not sure if this is a complete workaround or if the threshold is much higher now.


Output from hdparm -I:
/dev/sda:

ATA device, with non-removable media
	Model Number:       MARVELL Raid VD 0                       
	Serial Number:      d1201dc47aed0010
	Firmware Revision:  MV.R00-0
Standards:
	Used: ATA/ATAPI-7 T13 1532D revision 4a 
	Supported: 7 6 5 4 
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	3
	--
	CHS current addressable sectors:    7864440
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors: 3906898048
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:     1907665 MBytes
	device size with M = 1000*1000:     2000331 MBytes (2000 GB)
	cache/buffer size  = 8192 KBytes (type=DualPortCache)
Capabilities:
	LBA, IORDY(cannot be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 16	Current = ?
	Recommended acoustic management value: 254, current value: 0
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 udma6 *udma7 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	48-bit Address feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	unknown 76[3]
	   *	unknown 76[4]
	   *	unknown 76[5]
	   *	unknown 76[6]
	   *	unknown 76[7]
	   *	Native Command Queueing (NCQ)
Security: 
	Master password revision code = 65534
	88min for SECURITY ERASE UNIT. 88min for ENHANCED SECURITY ERASE UNIT.

And lspci -v output for the marvell controller:
01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9128 (rev 11) (prog-if 01 [AHCI 1.0])
	Subsystem: Giga-byte Technology Device b000
	Flags: bus master, fast devsel, latency 0, IRQ 47
	I/O ports at af00 [size=8]
	I/O ports at ae00 [size=4]
	I/O ports at ad00 [size=8]
	I/O ports at ac00 [size=4]
	I/O ports at ab00 [size=16]
	Memory at fb9ff000 (32-bit, non-prefetchable) [size=2K]
	[virtual] Expansion ROM at f4000000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [70] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Kernel driver in use: ahci
Comment 50 Thomas Hallgren 2011-01-21 11:43:46 UTC
I have the exact same Marvell controller and Seagate disks as Pho reports in comment #28 but a more recent 2.6.35.10-74.fc14.x86_64 kernel and the controller firmware is 1.0.0.1019. I get the same " failed command: WRITE FPDMA QUEUE" errors on my system.

I'd be happy to help provide any info needed from my system to resolve this.
Comment 51 Laurent Dinclaux 2012-11-01 06:00:29 UTC
Hello,

Can this bug get reopened ? I have the exact same issue with Ubuntu using the mainline kernel v3.7-rc2. (Marvell 9123, brand new Corsair Neutron 240 GB SSD)

Regards.
Comment 52 Laurent Dinclaux 2013-04-26 01:10:34 UTC
The bug still exist in 3.8 (ubuntu raring)