Bug 8933 (ata_piix_failure)
|Summary:||ata_piix blocks ahci from accessing the harddisk|
|Product:||IO/Storage||Reporter:||Robert M. Albrecht (mail)|
|Component:||Serial ATA||Assignee:||Tejun Heo (htejun)|
|Severity:||normal||CC:||greg, hmh, htejun, jgarzik, kay, lcapitulino, martin, notting, protasnb|
lspci -nn -vvv
/var/log/messages from a working system
dmesg from a working system
kernel-log from booting
several logs from a non-working installation
/proc/iomem after fresh boot
/proc/oports after a fresh boot
Description Robert M. Albrecht 2007-08-24 10:58:59 UTC
Most recent kernel where this bug did not occur: 2.6.17 Distribution: Fedora 6 Core test 3 Hardware Environment: Toshiba Tecra S3 Software Environment: Fedora & Ubunu Problem Description: If I boot Fedora 7 or Ubuntu 7 install dvd with kernel parameter noprobe, the problem can be fixed. When manually selecting the drivers, the driver ahci has to be loaded first, that brings up the harddisk. Second the driver ata_piix has to be loaded, to get access to the optical drive. Installed system does not boot. Steps to reproduce: lspci 00:00.0 0600: 8086:2590 (rev 03) 00:01.0 0604: 8086:2591 (rev 03) 00:1b.0 0403: 8086:2668 (rev 03) 00:1c.0 0604: 8086:2660 (rev 03) 00:1c.1 0604: 8086:2662 (rev 03) 00:1d.0 0c03: 8086:2658 (rev 03) 00:1d.1 0c03: 8086:2659 (rev 03) 00:1d.2 0c03: 8086:265a (rev 03) 00:1d.3 0c03: 8086:265b (rev 03) 00:1d.7 0c03: 8086:265c (rev 03) 00:1e.0 0604: 8086:2448 (rev d3) 00:1f.0 0601: 8086:2641 (rev 03) 00:1f.1 0101: 8086:266f (rev 03) 00:1f.2 0106: 8086:2653 (rev 03) 01:00.0 0300: 10de:0148 (rev a2) 02:00.0 0200: 11ab:4362 (rev 15) 05:05.0 0280: 8086:4220 (rev 05) 05:0b.0 0607: 104c:8031 05:0b.2 0c00: 104c:8032 05:0b.4 0805: 104c:8034 Thus bug is present in Ubuntu and Fedora, so I don`t think this is a distribution-specific bug.
Comment 1 Andrew Morton 2007-08-24 11:07:40 UTC
recategorised as an ata bug, assigned to Jeff, marked as a regression.
Comment 2 Tejun Heo 2007-08-26 19:57:27 UTC
We need the boot log from the failed boot showing how ahci loading fails. Also, the result of 'cat /proc/ioports', 'cat/proc/iomem' and the result of 'lspci -nnvvv' would be useful. * Use serial or net console * Most installation systems have command console where you can copy the boot logs to a usb disk or manually configure network interface and send it. * Take a photo if none of the above works. Thanks.
Comment 4 Robert M. Albrecht 2007-08-27 00:05:23 UTC
Created attachment 12558 [details] /proc/ioports
Comment 5 Robert M. Albrecht 2007-08-27 00:05:44 UTC
Created attachment 12559 [details] lspci -nn -vvv
Comment 6 Robert M. Albrecht 2007-08-27 00:06:14 UTC
Created attachment 12560 [details] /var/log/messages from a working system
Comment 7 Robert M. Albrecht 2007-08-27 00:06:36 UTC
Created attachment 12561 [details] dmesg from a working system
Comment 8 Robert M. Albrecht 2007-08-27 00:08:24 UTC
I attached the requestes files generated on a working installation (Fedora 6 Test 3) updated since then. New installations will not work, I will generate the logs on a non-working system this evening.
Comment 9 Robert M. Albrecht 2007-08-27 13:51:39 UTC
Created attachment 12578 [details] kernel-log from booting Afterward the systems freezes and the caps-lock is flashing.
Comment 10 Robert M. Albrecht 2007-08-27 13:56:09 UTC
Created attachment 12579 [details] several logs from a non-working installation I bootet the Fedora Live cd with noprobe and did modprobe -r ahci modprobe -r ata_piix modprobe ahci modprobe ata_piix The dmesg-* are from loading the modules.
Comment 11 Tejun Heo 2007-08-28 17:50:30 UTC
* The invalid MAP value message can be ignored. The message is bogus. * If PCI device 1f.2 is driven in ata_piix mode, depending on configuration, the controller can only access P0, P2 or P1, P3 (only two of the four SATA ports) and your installer is loading ata_piix before ahci thus limiting access to the other ports. Please file bug reports against the disros. If both ahci and ata_piix are candidates, ahci should be loaded before ata_piix, which your previous installation apparently did. * Dunno why ata_piix didn't attach to 1f.1 when loaded first tho. Please post the result of 'cat /proc/ioports' and 'cat /proc/iomem' _before_ you unload and reload modules. Thanks.
Comment 12 Robert M. Albrecht 2007-08-28 23:56:59 UTC
Created attachment 12594 [details] /proc/iomem after fresh boot
Comment 13 Robert M. Albrecht 2007-08-28 23:57:21 UTC
Created attachment 12595 [details] /proc/oports after a fresh boot
Comment 14 Robert M. Albrecht 2007-08-28 23:58:30 UTC
Hi, I booted from a Fedora 7 Live CD where the hdd is not accessible and took the files directly after booting. cu romal
Comment 15 Tejun Heo 2007-08-31 17:50:10 UTC
Okay, ata_piix is attached to 1f.1. Checking log again... OIC, it's there. ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 21 ata1: PATA max UDMA/100 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001af10 irq 14 ata2: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001af18 irq 15 scsi0 : ata_piix ata1.00: ATAPI, max UDMA/33 ata1.00: configured for UDMA/33 scsi1 : ata_piix scsi 0:0:0:0: CD-ROM MATSHITA DVD-RAM UJ-832S 1.00 PQ: 0 ANSI: 5 That leaves us only with the module loading order. Rejecting as INVALID. Feel free to reopen if there are any left issues.
Comment 16 Robert M. Albrecht 2007-09-17 11:07:58 UTC
Hi, Bill Nottingham from Redhat has a different opinion. ------- Additional Comments From email@example.com 2007-09-17 14:00 EST ------- Tejun is wrong. Module loading is non-deterministic, thanks to the kernel-mandated use of udev. If drivers cannot cope with this, they are broken. ----- And this problems shows up in Ubuntu and Fedora. cu romal
Comment 17 Tejun Heo 2007-09-18 21:48:45 UTC
The controller has two interfaces - compatible IDE interface and an AHCI one. The IDE interface can be driven by ata_piix while the AHCI interface can be driven by ahci. If ata_piix is loaded first, it grabs the device; otherwise, vice-versa. This is a special case. IIRC, in all other cases, dual mode controllers can be put into one of either mode and that is determined either by BIOS or PCI quirks, so no driver contention there. Only ICH6s use the same PCI ID for both IDE and AHCI modes and those two drivers differ in capability. Drivers are coping quite well but they can't cope beyond hardware restrictions. One solution I can think of is to conditionalize out those duplicate PCI IDs from ata_piix if CONFIG_SATA_AHCI is 'y' or 'm' but I'm afraid that would just end up upsetting more users. Things have been this way for a very long time now. Distros can comment out duplicate PCI IDs from ata_piix if making module loading order deterministic is difficult. Robert, can you cc Bill Nottingham here?
Comment 18 Robert M. Albrecht 2007-09-18 22:56:31 UTC
Hi Tejun, yes, that`s a problem. If automatic detection is difficult, could be a kernel-parameter like ata_piix=ich6s help ?
Comment 19 Tejun Heo 2007-09-18 23:12:28 UTC
[cc'ing a bunch of distro people.] The problem is that on ICH6 AHCI and IDE mode share the same PCI IDs. This creates two problems. 1. ABAR might not be allocated by BIOS in which case ahci will fail to probe. ata_piix should be loaded as a fallback. 2. If ata_piix is loaded before ahci, it will grab the device but it's more limited in things like NCQ or hotplug and depending on configuration might be able to access only parts of all available ports. Another interesting aspect is that ahci implementation on ICH6 is somewhat flaky and might not work with certain devices, so mainline needs to allow choosing between ata_piix and ahci. Distros can solve this by doing : if both ahci & other drivers are candidate for a device, load ahci first. If that fails, fall back to the other driver. I don't really see how this can be solved from mainline such that it can magically solve it on all distros, and the above solution shouldn't be too difficult to implement. If you have different ideas, please let me know. If your distro doesn't implement such mechanism, please tell the person in charge to do so. Thanks.
Comment 20 Bill Nottingham 2007-09-19 07:35:22 UTC
Well, module loading has never been deterministic - udev simply loads anything that matches the modalias, and the ordering is determined by a) timing b) order of modules.alias This is the same issue you have with rtl8139 and 8139too, but *that* is handled in the kernel by having only one driver attach. The same should be done here - make the drivers handle it. Suggestion: - make ata_piix attach if ABAR is not allocated - make ata_piix *NOT* attach if ABARs are allocated - fix achi for the issues mentioned - add some sort of module parameter to blacklist/force if necessary So, only the last of those would require user configuration - the rest would work out of the box.
Comment 21 Tejun Heo 2007-09-19 18:33:35 UTC
The problem is that one driver is not necessarily better than the other. ahci is generally better than ata_piix but it isn't necessarily true on at least ich6 (and I have an ich6 right beside me). The intel spec even says not to use ahci on ich6s which is a bit twisted as not all ports are accessible via IDE interface under certain configurations as reported here, so we can't really exclusively support only one driver in this case - this takes out the first two suggestions. ahci isn't broken in anyway. Currently the kernel can't do PCI BAR allocation reliably. We depend on the BIOS for that. It can be pretty complex and the only sane approach would be reallocating all IO BARs, which isn't guaranteed to work as BIOSen may use fixed IO ports and/or mem regions not listed anywhere. I think it can create more problems than it solves. The fourth suggestion can be done, but I'm not sure whether that would solve the problem completely. Let's say we have "ata_piix.skip_possible_ahcis=1" and udev tries to load ata_piix. ata_piix will refuse to attach to the device, so udev falls back to ahci but unfortunately ABAR isn't mapped or ahci controller reset fails. We end up with no driver. Would it be difficult to make udev or whatever favor ahci when there are multiple choices? Also, I don't know how other guys handle initrd modules but SUSE doesn't use udev to load modules for the root device. initrd is generated during kernel installation and that's where ahci priority is needed & implemented. Is it different for other distros?
Comment 22 Natalie Protasevich 2007-11-08 05:40:49 UTC
Greg can probably help to answer this question.
Comment 23 Greg Kroah-Hartman 2007-11-08 09:05:36 UTC
Greg can answer which question?
Comment 24 Natalie Protasevich 2007-11-08 09:21:08 UTC
Oops, sorry - the ones in #21, by Tejin. I think this came to the discussion if this problem can be handled through udev and if this is something unique for each distro...
Comment 25 Robert M. Albrecht 2007-11-29 09:46:21 UTC
Still broken in 2.6.24-rc3 . Any further ideas ?
Comment 26 Tejun Heo 2007-11-29 16:50:09 UTC
This isn't a kernel problem. You need to file bug report against your distro.
Comment 27 Tejun Heo 2007-11-29 16:51:29 UTC
Rejecting as INVALID. Feel free to re-open if you don't agree.
Comment 28 Bill Nottingham 2007-11-29 18:45:01 UTC
Reopening then. (Well, I would, but your bz is sufficiently draconian.) udev has no mechanism for device priority or ordering. It simply does 'modprobe $modalias' (i.e., the PCI string.) Ergo, if we have two drivers that claim the same alias that both need to be enabled for a normal setup, they need to coexist and DTRT. This is not a distro issue, unless it's up to the distros to paper over broken drivers.
Comment 29 Bill Nottingham 2007-11-29 19:13:37 UTC
Note that the same issue (driver ordering on load) has been referenced in the USB stack as well (uhci/ohci vs ehci). In that case as well the solution proffered is to fix the kernel to enforce proper ordering.
Comment 30 Tejun Heo 2007-11-29 19:26:21 UTC
No no, driver isn't broken here. Module loader should implement proper priority and getting that right falls on distros. Drivers are that way because hardware is that way. Hard coding such policy into drivers isn't a good idea. My experience with distros is limited. I use debian for many years but I always used custom kernels. I recently switch to opensuse and now am using distro kernel and thus distro mdule loading mechanism and it does the right thing. It has sensible defaults for module priority and lets me choose differently in its configuration tool and/or boot command line. It would be nice to have generic module precedence mechanisms and acceptable policies in udev but I'm not sure it would fly. Different distros and even different versions of the same distro can easily require different polices. Plus, that wouldn't solve all the problem because loading modules for root device is done by distro-specific initrd which doesn't always use udev. As different distros use different mechanisms to probe, configure and manage hardware. They need to extend udev according to their setup which should include mechanism to prioritize drivers. This issue isn't even new. We have quite a few drivers where supported device list overlaps. Sometimes certain driver is clearly better. In such cases, distros usually just ship only the better one but there are cases where one isn't always better than others and all duplicate drivers need to be shipped. eepro and e100 back a while and competing wifi drivers these days are good examples. And, in this case, the priority isn't clear because of different behaviors ICHs show depending on configuration and revision. If there's clear winning choice, libata always chooses that. For example, JMicrons and ATI SB00's are always forced into ahci mode but we can't do that for ICHs. AHCI can't be enabled for all controllers depending on BIOS configuration while ata_piix interface is inferior in feature set and sometimes doesn't cover all ports depending on configuration. I hope I explained things better. cc'd Kay (hello) just in case.
Comment 31 Tejun Heo 2007-11-29 19:29:37 UTC
ehci is different in that there is one good solution and changing the module behavior doesn't break anything. This is different. Both drivers have pros and cons. Can't force the choice from kernel.
Comment 32 Tejun Heo 2007-11-29 19:37:53 UTC
Then again, I really like the idea of standardizing the mechanism. Kay, Greg, would that be difficult?
Comment 33 Bill Nottingham 2007-11-29 20:00:02 UTC
g(In reply to comment #31) > ehci is different in that there is one good solution and changing the module > behavior doesn't break anything. This is different. Both drivers have pros > and cons. Can't force the choice from kernel. It's not forcing a choice, it's forcing ordering *that you just requested*. The user would be free to disable either driver at compile time or via blacklist, the same way they are now. I think the USB situation is a direct analogy - the improper order leads to a broken system. Furthermore, by saying 'punt it to distros and userspace', you're saying that *EVERY SINGLE DISTRO* should implement similar hacks for its installer, its initramfs, and it's udev configuration, *just for your driver*, that it never has to do for other drivers. You've got a situation right now where the default mechanism used to load drivers leads to a broken situation, *directly because of the way the drivers are architected to both claim IDs and initialize*. Comparing it to e100/eepro100 doesn't make sense, either - in that case both drivers *actually supported* all the hardware they claimed IDs for - no matter what happened, the user would have something that worked reasonably (modulo the bugs of the day.) In this case you have a driver that doesn't support hardware it claims, based on vagaries of BIOS configuration, that actively breaks in ways that you claim is 'not broken'. Claiming that it's a problem for distros to solve, or simply pushing it to userspace - I find that incredibly short-sighted.
Comment 34 Tejun Heo 2007-11-29 21:26:53 UTC
(In reply to comment #33) > It's not forcing a choice, it's forcing ordering *that you just requested*. > The > user would be free to disable either driver at compile time or via blacklist, > the same way they are now. I think the USB situation is a direct analogy - > the > improper order leads to a broken system. Furthermore, by saying 'punt it to > distros and userspace', you're saying that *EVERY SINGLE DISTRO* should > implement similar hacks for its installer, its initramfs, and it's udev > configuration, *just for your driver*, that it never has to do for other > drivers. First of all, it's not my driver. Second, yes, I'm sorry you're annoyed but that's how ATA controllers are these days. Adding brand new features and hardware interface while maintaining compatibility with 15+ years old software result in this kind of mess. And yes, every single distro will need to deal with it which doesn't mean distros can't cooperate and come up with standard solution. > You've got a situation right now where the default mechanism used to load > drivers leads to a broken situation, *directly because of the way the drivers > are architected to both claim IDs and initialize*. Comparing it to > e100/eepro100 doesn't make sense, either - in that case both drivers > *actually > supported* all the hardware they claimed IDs for - no matter what happened, > the > user would have something that worked reasonably (modulo the bugs of the > day.) So, it would be okay if the module loader loads e100 one day and eepro100 another day? Come on... You gotta be kidding me. Module loading is automated these days and selections which used be made by users should be made automatically too. There have been, are and will continue to be duplicate modules supporting the same device or feature. Distros have always made those choices and this is no different. > In this case you have a driver that doesn't support hardware it claims, > based > on vagaries of BIOS configuration, that actively breaks in ways that you > claim > is 'not broken'. Claiming that it's a problem for distros to solve, or simply > pushing it to userspace - I find that incredibly short-sighted. If you put an ICH controller into combined mode on most controllers you won't be able to access half of SATA ports and because linux on pc depends on BIOS for IO resource allocation (or rather can't override it because something random might break), we can't change the mode afterwards. We can flip bits in the controller and put it into different mode but we can't reallocate necessary IO resources, so the user is stuck with only half of working ports. So, yes, depending on BIOS configuration, not all hardware capability can be exported. If you put ICH6 into enhanced IDE mode and the BIOS is nice enough to allocate ABAR. You can select between IDE mode and AHCI mode. AHCI mode has more features such as NCQ and hotplug support while ata_piix interface can be more reliable depending on which device is connected. They both have slight pros and cons but generally work okay. So, here you have to make a choice. There's no one clearly better solution unlike ehci case. So, what do you want to see here? Dropping ICH6 DIDs from ata_piix? Adding module parameters to ata_piix and ahci such that they fail to attach for certain DIDs according to the parameter?
Comment 35 Tejun Heo 2007-11-29 21:56:25 UTC
Robert, can you please open a separate bug report w/ subject ata_piix fail to detect device on ICH6M and attach probing message of ata_piix on 2.6.24-rc3? Let's see if we can fix that. Thanks.
Comment 36 Bill Nottingham 2007-11-30 11:21:47 UTC
(In reply to comment #34) > First of all, it's not my driver. Sorry for the implication. I'll make sure I yell at garzik. > And yes, every single distro will need to deal > with it which doesn't mean distros can't cooperate and come up with standard > solution. You're still saying that these two drivers are *special* enough that it needs these hacks coded for it that no other drivers need. I find that the sort of argument that leads to bad hacks and workarounds. > > You've got a situation right now where the default mechanism used to load > > drivers leads to a broken situation, *directly because of the way the > drivers > > are architected to both claim IDs and initialize*. Comparing it to > > e100/eepro100 doesn't make sense, either - in that case both drivers > *actually > > supported* all the hardware they claimed IDs for - no matter what happened, > the > > user would have something that worked reasonably (modulo the bugs of the > day.) > > So, it would be okay if the module loader loads e100 one day and eepro100 > another day? Come on... Right now, it does. (Well, it depends on the order of modules.alias, so it's probably stable across kernel boots.) So, any sane distributor has to work around this kernel brokenness by just not shipping one driver, or neutering the IDs out of one of them. > automatically too. There have been, are and will continue to be duplicate > modules supporting the same device or feature. Distros have always made > those > choices and this is no different. Yes, *it is*. This is a case of a driver where it claims IDs that *will not work* in some configurations. e100/eepro100 will at least work for everything they claim. > > In this case you have a driver that doesn't support hardware it claims, > based > > on vagaries of BIOS configuration, that actively breaks in ways that you > claim > > is 'not broken'. Claiming that it's a problem for distros to solve, or > simply > > pushing it to userspace - I find that incredibly short-sighted. > > If you put an ICH controller into combined mode on most controllers you won't > be able to access half of SATA ports and because linux on pc depends on BIOS > for IO resource allocation (or rather can't override it because something > random might break), we can't change the mode afterwards. We can flip bits > in > the controller and put it into different mode but we can't reallocate > necessary > IO resources, so the user is stuck with only half of working ports. So, yes, > depending on BIOS configuration, not all hardware capability can be exported. > > If you put ICH6 into enhanced IDE mode and the BIOS is nice enough to > allocate > ABAR. You can select between IDE mode and AHCI mode. AHCI mode has more > features such as NCQ and hotplug support while ata_piix interface can be more > reliable depending on which device is connected. They both have slight pros > and cons but generally work okay. So, here you have to make a choice. > There's > no one clearly better solution unlike ehci case. 'can be more reliable based on which device is connected'. Sounds like driver bugs that should be fixed from my end. All I'm saying is that if you want the algorithm to be 'try ahci first', then enforce that at the lowest level so everyone can benefit (even those using raw modprobe.) That's at the kernel/driver.
Comment 37 Tejun Heo 2007-11-30 15:45:42 UTC
> > So, it would be okay if the module loader loads e100 one day and eepro100 > > another day? Come on... > > Right now, it does. (Well, it depends on the order of modules.alias, so it's > probably stable across kernel boots.) > > So, any sane distributor has to work around this kernel brokenness by just > not > shipping one driver, or neutering the IDs out of one of them. Now, it's an easy choice. e100 is the answer. Back then, you couldn't really make the choice statically because for a while e100 was broken for a while while eepro was broken for others. Someone at some level *had to* deal with those issues and made choice. > > automatically too. There have been, are and will continue to be duplicate > > modules supporting the same device or feature. Distros have always made > those > > choices and this is no different. > > Yes, *it is*. This is a case of a driver where it claims IDs that *will not > work* in some configurations. e100/eepro100 will at least work for everything > they claim. Yeah, now you can say that about e100 and eepro100. All I'm saying is situations like this arise. At many times, it's transient situation which might take short enough so that no one has to really care but at other times those transitional phases are long enough so that there needs to be mechanism to deal with it. > 'can be more reliable based on which device is connected'. Sounds like > driver bugs that should be fixed from my end. That's chip bug. If I remember the SATA trace correctly, its PHY spews complete garbage when combined with certain device. > All I'm saying is that if you want the algorithm to be 'try ahci first', then > enforce that at the lowest level so everyone can benefit (even those using > raw > modprobe.) That's at the kernel/driver. When users use modprobe, they name the module they wanna load. Now module load is done automatically, something gotta make the choice && it should allow the user to choose between alternatives while supplying sensible defaults. A distro just needs to implement such mechanism, if it intends to load modules automatically for users. Granted there is only one module in most cases but there have been and will be cases when this doesn't hold. This is a job that a distro has to do. I mean, how do you guys handle wifi drivers? They will settle down in time but when development is progressing fast, it's expected to have duplicate drivers with different partial capabilities. You can say "it's a driver bug, not our problem" and wait till things settle down. I don't think not many people will like such distro. I think I repeated this multiple times now but just one more time - Module precedence and selection are real existing problem that requires configurable selection logic in automatic module loader. That said, I think ICH6R ahci probing can be improved such that it vetoes if the controller is in combined mode and for ICH6M maybe not allowing ata_piix to attach if class code is AHCI as in newer controllers (not too sure yet). But this wouldn't solve all your problem. I don't want to make ahci to not attach to ich6r based on class code (especially because it's a write-once/read register) and I'm pretty sure other libata developers would agree, which means ahci and ata_piix will always have duplicate entries for the DID.
Comment 38 Robert M. Albrecht 2007-12-02 10:11:24 UTC
I created another bug: http://bugzilla.kernel.org/show_bug.cgi?id=9491