Bug 13551

Summary: SATA: link online but device misclassified
Product: IO/Storage Reporter: Marc Bowes (marcbowes)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, alan, chrisgaukroger, Nicolas.Mailhot, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29-gentoo-r5 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
lspci
dmidecode
dmidecode with Onboard SATA/IDE Disable - Quick Boot OK
dmidecode with Onboard SATA/IDE Enable - Slow SATA startup
dmesg with Onboard SATA/IDE Enable - Slow SATA startup
gigabyte-unreliable-online-1.patch
gigabyte-unreliable-online-2.patch
dmesg with patch #22176
gigabyte-unreliable-online-3.patch
dmesg on GA-EP45-DS5 with patch #22253

Description Marc Bowes 2009-06-16 18:07:00 UTC
Created attachment 21940 [details]
dmesg

There is a long wait while booting with my new motherboard (Gigabyte EP45-DQ6) as it appears to be unable to detect the presence of drives correctly. It (the motherboard) has 6 SATA slots, and I have 4 drives plugged in. I have attached the full dmesg output, but the area of interest is:

ata1: link is slow to respond, please be patient (ready=0)
ata1: softreset failed (device not ready)
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1: link online but device misclassified, retrying
ata1: link is slow to respond, please be patient (ready=0)
ata1: softreset failed (device not ready)
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1: link online but device misclassified, retrying
ata1: link is slow to respond, please be patient (ready=0)
ata1: softreset failed (device not ready)
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1: link online but device misclassified, retrying
ata1: limiting SATA link speed to 1.5 Gbps
ata1: softreset failed (device not ready)
ata1: SATA link up to 1.5 Gbps (SStatus 113 SControl 310)
ata1: link online but device misclassified, device detection might fail
ata2: link is slow to respond, please be patient (ready=0)
ata2: softreset failed (device not ready)
... (as with ata1)

Enabling AHCI in the BIOS makes difference (except for numbering). The 4 drives appear to be working correctly (speeds are good, and I haven't noticed any hiccups). From the linux-ide mailing list, it has been suggested that the issue is that the kernel is unable to detect that there are no drives on the other end of the 2 unused ports.

From lspci, the following is relevant (I will attach the full output in a comment):

0a:00.0 SATA controller: JMicron Technologies, Inc. JMicron
20360/20363 AHCI Controller (rev 02)
Comment 1 Marc Bowes 2009-06-16 18:09:08 UTC
Created attachment 21941 [details]
lspci
Comment 2 Marc Bowes 2009-06-16 18:10:10 UTC
I forgot to mention that the two unoccupied slots are the first two numbered slots on the motherboard. This is because they are awkward to plug into (facing a different direction).
Comment 3 Andrew Morton 2009-06-23 06:12:50 UTC
Reassigned to SATA.
Comment 4 Tejun Heo 2009-06-26 03:23:23 UTC
Can you please attach the output of dmidecode?
Comment 5 Tejun Heo 2009-06-26 04:22:57 UTC
Ah.. this is the same report as the one on the mailing list.  No need to attach anything.  Thanks.
Comment 6 Marc Bowes 2009-06-26 10:21:28 UTC
Created attachment 22102 [details]
dmidecode

Here is dmidecode anyways
Comment 7 Chris Gaukroger 2009-06-28 19:06:03 UTC
A temporary workaround for me (with GA-EP45-DS5) was:
- disable the extra SATA ports not being used.
Remember the ROM BIOS settings in case you disable the ports you are using!
In ROM BIOS set Onboard SATA/IDE Device to Disable
Now quick boots OK!
Comment 8 Tejun Heo 2009-06-29 10:31:51 UTC
Chris, please attach the output of dmidecode.  Yours is a different board.  Thanks.
Comment 9 Chris Gaukroger 2009-06-30 18:26:48 UTC
I collected the dmidecode for a quick boot with Onboard SATA/IDE Device Disable and another with one Enable. I also colleced the dmesg for the Enable showing the problems on startup. It takes several minutes to get past this stage.
Comment 10 Chris Gaukroger 2009-06-30 18:30:01 UTC
Created attachment 22151 [details]
dmidecode with Onboard SATA/IDE Disable - Quick Boot OK
Comment 11 Chris Gaukroger 2009-06-30 18:31:39 UTC
Created attachment 22152 [details]
dmidecode with Onboard SATA/IDE Enable - Slow SATA startup
Comment 12 Chris Gaukroger 2009-06-30 18:32:43 UTC
Created attachment 22153 [details]
dmesg with Onboard SATA/IDE Enable - Slow SATA startup

This is just the SATA initialiation section. It boots OK but takes several minutes.
Comment 13 Chris Gaukroger 2009-06-30 18:48:35 UTC
I reviewed the dmidecode outputs:
The dmidecode output for a quick boot with Onboard SATA Disable is the same as that with with Onboard SATA Enable. I have repeated this and the dmidecode output is identical, yet with it Disabled it boots fast and with it Enabled it takes several minutes of trying as show in the dmesg output.
Comment 14 Tejun Heo 2009-07-01 23:35:51 UTC
Created attachment 22172 [details]
gigabyte-unreliable-online-1.patch

Yeap, the dmidecode output won't change.  I just needed to add it to the workaround list.  Can you please test the attached patch and post the boot log?  Thanks.
Comment 15 Tejun Heo 2009-07-02 11:35:40 UTC
*** Bug 12083 has been marked as a duplicate of this bug. ***
Comment 16 Tejun Heo 2009-07-02 11:56:13 UTC
Created attachment 22176 [details]
gigabyte-unreliable-online-2.patch

Please test this one.  I got the bus number wrong.  Thanks.
Comment 17 Nicolas Mailhot 2009-07-05 19:31:27 UTC
Created attachment 22222 [details]
dmesg with patch #22176

Seems to work fine on EP45-DS5. Thank you very much, please get it merged quick
Comment 18 Tejun Heo 2009-07-08 04:11:56 UTC
Eh... the fix collides with a different fix.  I'll prep another patch soon.  Thanks.
Comment 19 Tejun Heo 2009-07-08 05:02:15 UTC
Created attachment 22253 [details]
gigabyte-unreliable-online-3.patch

Please test the attached patch and post the kernel boot log preferably with printk timestamp turned on.  Thanks.
Comment 20 Marc Bowes 2009-07-08 12:28:41 UTC
$ patch -p1 -R < gigabyte-unreliable-online.patch 
patching file drivers/ata/ahci.c
Hunk #1 succeeded at 2579 with fuzz 1 (offset -98 lines).
Hunk #2 succeeded at 2684 with fuzz 1 (offset -104 lines).
patching file drivers/ata/libata-eh.c
Hunk #1 succeeded at 2580 (offset -15 lines).
patching file include/linux/libata.h

$ patch -p1 < gigabyte-unreliable-online-3.patch 
patching file drivers/ata/ahci.c
Hunk #1 FAILED at 219.
Hunk #2 succeeded at 1637 (offset -23 lines).
Hunk #3 succeeded at 1676 (offset -23 lines).
Hunk #4 succeeded at 2591 with fuzz 1 (offset -142 lines).
Hunk #5 succeeded at 2746 with fuzz 1 (offset -152 lines).
1 out of 5 hunks FAILED -- saving rejects to file drivers/ata/ahci.c.rej

$ cat drivers/ata/ahci.c.rej 
***************
*** 219,224 ****
  	AHCI_HFLAG_SECT255		= (1 << 8), /* max 255 sectors */
  	AHCI_HFLAG_YES_NCQ		= (1 << 9), /* force NCQ cap on */
  	AHCI_HFLAG_NO_SUSPEND		= (1 << 10), /* don't suspend */
  
  	/* ap->flags bits */
  
--- 219,226 ----
  	AHCI_HFLAG_SECT255		= (1 << 8), /* max 255 sectors */
  	AHCI_HFLAG_YES_NCQ		= (1 << 9), /* force NCQ cap on */
  	AHCI_HFLAG_NO_SUSPEND		= (1 << 10), /* don't suspend */
+ 	AHCI_HFLAG_SRST_TOUT_IS_OFFLINE	= (1 << 11), /* treat SRST timeout as
+ 							link offline */
  
  	/* ap->flags bits */
Comment 21 Tejun Heo 2009-07-08 12:39:27 UTC
Eh... the patch is against the current devel branch and the conflict is probably from missing or extra HFLAG flags defined.  Just add AHCI_HFLAG_SRST_TOUT_IS_OFFLINE and give it a unique bit and it should be fine.
Comment 22 Marc Bowes 2009-07-08 14:11:29 UTC
A snippet from my drivers/ata/ahci.c

AHCI_HFLAG_SECT255                 = (1 << 8),  /* max 255 sectors */
AHCI_HFLAG_YES_NCQ                 = (1 << 9),  /* force NCQ cap on */
AHCI_HFLAG_SRST_TOUT_IS_OFFLINE    = (1 << 11), /* treat SRST timeout as link offline */

Will post dmesg when I get a change to reboot.
Comment 23 Nicolas Mailhot 2009-08-01 12:32:28 UTC
Created attachment 22564 [details]
dmesg on GA-EP45-DS5 with patch #22253

rawhide finaly stabilized enough to do some kernel building, and the result works fine. Can patch #22253 be merged in time for 2.6.31 now?

Built kernel & build logs at
https://koji.fedoraproject.org/koji/taskinfo?taskID=1571709
Comment 24 Tejun Heo 2009-08-04 07:12:00 UTC
Patch forwarded upstream.  Sorry about the delay.

  http://thread.gmane.org/gmane.linux.ide/42148
Comment 25 Nicolas Mailhot 2009-08-04 10:18:53 UTC
Thank you very much, it will be nice not to dread boot anymore!