Bug 7170 - sata_piix fails to detect any drives
sata_piix fails to detect any drives
Status: RESOLVED CODE_FIX
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA
i386 Linux
: P2 normal
Assigned To: Tejun Heo
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-20 03:07 UTC by Joakim Crafack
Modified: 2007-02-27 06:58 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.17
Tree: Mainline
Regression: ---


Attachments
Output lspci -vvv (on working kernel) (15.13 KB, text/plain)
2006-09-20 03:10 UTC, Joakim Crafack
Details

Description Joakim Crafack 2006-09-20 03:07:21 UTC
Most recent kernel where this bug did not occur: 2.6.16.23
Distribution: Gentoo (Tested with git kernels, see below)
Hardware Environment: P4, Intel ICH6 (chipset rev. 3), 1 SATA disk
Problem Description:
Kernel fails to detect SATA drive during boot.

[output snippet]
ata1: SRST failed (status 0xFF)
ata1: SRST failed (err_mask=0x100)
ata1: softreset failed, retrying in 5 secs
ata1: SRST failed (status 0xFF)
ata1: SRST failed (err_mask=0x100)
ata1: softreset failed, retrying in 5 secs
ata1: reset failed, giving up
[end]

git bisect says the following commit is responsible:

  d133ecab8ff1233c2eb3ecb94f7956aa10002300 is first bad commit
  commit d133ecab8ff1233c2eb3ecb94f7956aa10002300
  Author: Tejun Heo <htejun@gmail.com>
  Date:   Wed Mar 1 01:25:39 2006 +0900
  [PATCH] ata_piix: reimplement piix_sata_probe()

Steps to reproduce:
Reboot kernel including above mentioned commit.
Comment 1 Joakim Crafack 2006-09-20 03:10:17 UTC
Created attachment 9054 [details]
Output lspci -vvv (on working kernel)
Comment 2 Tejun Heo 2006-09-21 07:14:09 UTC
Hello,

We've been seeing a number of detection failure reports on various piix chips
and there have been several different attempts to fix it - but none succeeded to
fix all of them yet.  The commit that broke your machine is one of those and it
actually fixed problems on other machines.

Can you please test 2.6.18 and report boot dmesg?

Thanks a lot for reporting the bug and taking the time to bisect it.
Comment 3 Joakim Crafack 2006-09-21 09:49:14 UTC
Hi.

I've tested the release 2.6.18, and still get the SRST error above.

Unfortunately I've not (yet) had time to set up another box on the network to
catch a netfeed of the bootlog, so I can't provide the boot output.

I'll try to set up the other box tomorrow.

Kind regards, and thanhs for the reply!
/Joakim
Comment 4 Tejun Heo 2006-10-08 17:06:06 UTC
Can you test the patch in the following mail?  You can use either git or
download the full kernel tarball to get the modified kernel.

http://article.gmane.org/gmane.linux.ide/13284
Comment 5 Joakim Crafack 2006-10-09 02:28:01 UTC
Hi.

The patch doesn't help.
The reported error is the same as originally reported.

Used config: http://crafack.dk/kernel/config
Comment 6 Tejun Heo 2006-10-09 16:05:44 UTC
Thanks for testing.

Hmmm... the offending commit is related to PCS and the patch makes ata_piix not
use PCS for device detection anymore.  However, the SRST failure messages can be
caused by another bug which is fixed by another patch, so you might see the
error message for unoccupied ports.  So...

* Can you post dmesg of successful boot on earlier kernel?
* Does the SRST failure occur on the port w/ device attached?

Thanks.
Comment 7 Joakim Crafack 2006-10-09 23:01:22 UTC
Hi.

Dmesg for working kernel (2.6.16.something):

  http://crafack.dk/kernel/dmesg

As far as I can see, the error occurs on ATA1, where the disk is normally found.

Comment 8 Tejun Heo 2006-10-31 01:05:37 UTC
Can you please test 2.6.19-rc3-mm1?  Sorry about all the trouble but this PCS
problem is really puzzling and different generations of piix controllers show
various behaviors.  We're trying to solve the PCS problem but aren't there yet.
 Thanks.
Comment 9 Joakim Crafack 2006-10-31 02:27:17 UTC
Hi. Just tried with 2.6.19-rc3-git8, but that was a no go. Will give rc3-mm1 a go.

I have some spare time in the following days/weeks, Is there anything I could
try (e.g. specially instrumented driver) that could narrow the bug hunt ?
Comment 10 Joakim Crafack 2006-10-31 14:09:50 UTC
Hi. Just tested on 2.6.19-rc3-mm1, with negative result.

I also tried disabling "old" ATA (CONFIG_IDE), which caused renumbering of
devices, but didn't work either.

Please let me know how I can be of assistance (unfortunately I have no
experience doing bughunts in the kernel).
Comment 11 Tejun Heo 2007-02-27 06:58:12 UTC
All ata_piix detection bugs have been ironed out as of 2.6.20.  We're defaulting
to polling IDENTIFY and skipping 0xff wait.  Closing as fixed.  Please reopen if
it's still broken for you.

Note You need to log in before you can comment on or make changes to this bug.