Bug 43153 - Random SATA drives on PMPs on sata_sil24 cards not being detected at boot since 3.2/3.4
Summary: Random SATA drives on PMPs on sata_sil24 cards not being detected at boot sin...
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Jeff Garzik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-23 20:26 UTC by Daniel Smedegaard Buus
Modified: 2013-11-24 11:02 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.2, 3.4, 3.6
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg output (93.28 KB, text/plain)
2012-04-23 20:27 UTC, Daniel Smedegaard Buus
Details
lspci output (21.16 KB, text/x-log)
2012-04-23 20:27 UTC, Daniel Smedegaard Buus
Details
version info (34 bytes, text/x-log)
2012-04-23 20:28 UTC, Daniel Smedegaard Buus
Details

Description Daniel Smedegaard Buus 2012-04-23 20:26:55 UTC
Hi :)

I originally reported this to the Ubuntu kernel bugzilla (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/987353) and was directed here.

Since switching my Kubuntu system from Oneiric (kernel 3.0) to Precise daily (kernel 3.2), GRUB will hang for a minute or more immediately after boot selection while (according to dmesg) hard resetting the links on my sata_sil24 based PCIe controllers that have 1:5 port multipliers attached to them.

Eventually it will semi-succeed and continue booting, but I'll be missing one or two SATA drives until I manually hotplug them out and back in, at which point they'll function normally (AFAICT - I haven't really stress-tested this, but at least they're all present and seem to work without issues).

The box in question (amd64) has 22 SATA drives,
6 on ICH10R
15 on three sata_sil24 PCIe 1-port cards using three 1:5 PMPs
1 on a sata_sil PCI32 4-port card

There is no fakeraid configured.

The problem showed with the kernel shipped with the Precise daily build I installed, 3.2.0-23-generic. I installed 3.4.0-030400rc4-generic from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc4-precise/ which didn't help, and then reverted to 3.0.0-17-generic, which resolved the issue immediately.

I'll attach some files for reference (all from the 3.2 configuration - there are more at the Ubuntu link previously mentioned, not sure which are relevant to you), please let me know if I should provide or do anything else.

Thanks for your time and effort,
Daniel :)
Comment 1 Daniel Smedegaard Buus 2012-04-23 20:27:17 UTC
Created attachment 73050 [details]
dmesg output
Comment 2 Daniel Smedegaard Buus 2012-04-23 20:27:33 UTC
Created attachment 73051 [details]
lspci output
Comment 3 Daniel Smedegaard Buus 2012-04-23 20:28:15 UTC
Created attachment 73052 [details]
version info
Comment 4 Daniel Smedegaard Buus 2012-05-04 11:56:40 UTC
Just checking up after a short couple of weeks. Do I need to report this elsewhere or am I just impatient? :)

Thanks
Comment 5 Daniel Smedegaard Buus 2012-05-30 04:54:01 UTC
It's now been another three weeks plus change. As I haven't heard anything and that status hasn't changed at all, I'm concerned I haven't reported this bug correctly or in the right place?

Either way, diffing the sata_sil24 driver module from 3.0 with the one from 3.3 doesn't really show any difference AFAICT if you ignore renaming of some function calls and a couple of type changes. My C knowledge isn't exactly vast, but it'd appear the problem originates elsewhere?
Comment 6 Daniel Smedegaard Buus 2012-07-29 15:09:22 UTC
And now two additional months. Anyone there?
Comment 7 Daniel Smedegaard Buus 2012-08-29 20:44:21 UTC
Just thought I'd update the bug, adding 3.6 to the list of affected versions as I just had a test run on the mainline 3.6 RC3 kernel for Quantal :)
Comment 8 Alan 2012-09-06 11:07:30 UTC
Bugzilla is just used for tracking bugs existence. Your distribution is the point of contact for bug fixing.

If you are working off an upstream kernel it's probably also worth reporting to linux-ide@vger.kernel.org especially if you have time to build a series of kernels to bisect it to a specific patch.
Comment 9 Daniel Smedegaard Buus 2012-09-07 10:06:51 UTC
Hi Alan,

Not sure what you mean... The Ubuntu folk sent me here after confirming that the bug existed in the "clean" kernel, not just the Ubuntu one.

Should I report the bug to that mailing list instead?

Thanks
Comment 10 Alan 2012-09-07 10:23:10 UTC
The Ubuntu folks ask people to file bugs upstream as well so that we know about it (and so that we can see common problems between distributions). Upstream bugs don't however magically fix themselves.

Best thing to do given you can reproduce this reliably is to send a mail to the list.
Comment 11 Daniel Smedegaard Buus 2012-09-07 13:47:51 UTC
Righty-o, I'll do that :) Thanks
Comment 12 Daniel Smedegaard Buus 2013-11-24 11:02:53 UTC
Fixed as of kernel v3.11 (might be as of v3.10, not entirely sure).

Note You need to log in before you can comment on or make changes to this bug.