This bug report is based on investigation for https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1137513 I have an old Acer Aspire One AOA150 netbook which has a BIOS that does not disable ASPM for ath5k wireless device, 168c:001c, Atheros Communications Inc. AR242x / AR542x PCI-Express adapter. 2 years ago in 6ccf15a1a76d2ff915cdef6ae4d12d0170087118 ath5k driver got this: pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S); The card was working fine until Ubuntu raring 13.04 switched to 3.8 and the same symptoms that were fixed by the commit above started to appear - random md5 generator instead of file downloads, SSH disconnects, etc. The card had LnkCtl: ASPM L0s L1 Enabled; according to lspci in 3.8+ while it had ASPM disabled in kernels <= 3.7. I have tested all the available builds of 3.8 mainline kernel and found that it was introduced early in 3.8 series. Then I did a bisect between 3.7 and 3.8 and found that the first bad commit is: commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6 Author: Taku Izumi <izumi.taku@jp.fujitsu.com> Date: Tue Oct 30 15:27:13 2012 +0900 PCI/ACPI: Request _OSC control before scanning PCI root bus This patch moves up the code block to request _OSC control in order to separate ACPI work and PCI work in acpi_pci_root_add(). Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> I see the following in dmesg: [ 0.145833] pci_root PNP0A08:00: ACPI _OSC support notification failed, disabling PCIe ASPM [ 0.145840] pci_root PNP0A08:00: Unable to request _OSC control (_OSC support mask: 0x08) ... [ 0.181256] ACPI _OSC control for PCIe not granted, disabling ASPM So it looks like it is actually doing something, yet ASPM remains enabled causing the card to malfunction during data transfers.
Created attachment 95431 [details] fix aspm sequence Please check if attached patch fixes the problem.
Created attachment 95441 [details] remove not needed checking Looks like we can go further to make hotplug working with disabling aspm in quirks and drivers. Could just remove aspm_disabled checking.
Thank you! The "fix aspm sequence" patch fixes the issue. 03:00.0 Class [0200]: Device [168c:001c] (rev 01) Subsystem: Device [105b:e008] ... LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
Can you post boog log with and without patch?
Created attachment 95581 [details] Original 3.8 kernel dmesg
Created attachment 95591 [details] 3.8 kernel dmesg with patch applied
Done. As I see, the difference between the output is: -pci_root PNP0A08:00: ACPI _OSC support notification failed, disabling PCIe ASPM +pci_root PNP0A08:00: ACPI _OSC support notification failed, PCIe ASPM will be disabled ... +pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force' ... -ACPI _OSC control for PCIe not granted, disabling ASPM +pci_root PNP0A08:00: ACPI _OSC support notification failed, PCIe ASPM disabled 03:00.0 is Atheros Communications Inc. AR242x / AR542x Wireless Network Adapter (PCI-Express)
Created attachment 95601 [details] remove not needed checking ok, that is from sanity_checking, that is why old sequence is happened to work. Anyway please test new patch that should be more complete and handle hotplug case. Thanks
Re-tested on two laptops - AOA150, ath5k device got ASPM disabled and on a Lenovo E420 I got ASPM disabled for iwlwifi-driven Intel Corporation Centrino Wireless-N 1000 [8086:0084]. Also I found http://article.gmane.org/gmane.linux.kernel.pci/20640 where a question was raised why pci_disable_link_state stopped doing anything for iwlwifi devices too.
On Sun, Mar 17, 2013 at 8:50 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=55211 > --- Comment #9 from Roman Yepishev <roman.yepishev@gmail.com> 2013-03-17 > 15:50:23 --- > Re-tested on two laptops - AOA150, ath5k device got ASPM disabled and on a > Lenovo E420 I got ASPM disabled for iwlwifi-driven Intel Corporation Centrino > Wireless-N 1000 [8086:0084]. > > Also I found http://article.gmane.org/gmane.linux.kernel.pci/20640 where a > question was raised why pci_disable_link_state stopped doing anything for > iwlwifi devices too. > good. Did you test first patch or second patch? Please test second patch only on affected platforms. old logic is quite strange, on boot path and hotplug path it have different aspm_disabled setting. as it could set aspm_disabled after pci root bus scanning. Second patch will not restore to old logic, and just remove not needed aspm_disabled checking for disabling path. So second patch is right fix, but it need more test. Thanks Yinghai
Created attachment 95651 [details] First patch fix aspm sequence restore old logic
Created attachment 95661 [details] Second patch remove not needed checking this one should be complete fix that still keep hotplug and boot path have same aspm_disabled set.
I got a bit confused regarding the fixes, so tested everything with the netbook/ath5k: Booting without patches: LnkCtl: ASPM L0s L1 Enabled; Booting with first patch only, "fix aspm sequence", id 95651 LnkCtl: ASPM Disabled; Booting with second patch only, "remove not needed checking", id 95661 LnkCtl: ASPM Disabled; Booting with first+second patch applied: LnkCtl: ASPM Disabled; So, in any case the issue appeared fixed. As I don't have any hotpluggable PCI-X device I am not sure how to test the hotplug interaction.
I think this regression has nothing to do with pci_disable_link_state(). When aspm_disabled is set, pci_disable_link_state() doesn't do anything. In both 3.7 and 3.8, aspm_disabled is set in acpi_pci_root_add() before any driver probe routines are run, so it looks like calling pci_disable_link_state() from a driver had no effect even in 3.7. This is a problem, of course, but not the one Roman is seeing, because ath5k calls pci_disable_link_state() from the driver probe routine. There are also PCI_FIXUP_FINAL quirks for 82575 and 82598 NICs that call pci_disable_link_state(). In 3.7, these quirks are run before aspm_disabled is set, but 8c33f51d moved the pcie_no_aspm() call up before we start scanning the bus, so in 3.8, aspm_disabled is set *before* we run them. I think that means 8c33f51d broke all these quirks. That's also a problem, of course, but this isn't the one Roman is seeing either. I think the problem Roman is seeing happens when pcie_aspm_init_link_state() calls pcie_aspm_sanity_check() during device enumeration. In 3.8, the fact that aspm_disabled is already set by the time we get here means we skip the check for pre-1.1 PCIe devices, and I think *this* is what Roman is seeing. The dmesg log attached at comment #6 (https://bugzilla.kernel.org/attachment.cgi?id=95591) confirms that Roman's ath5k is indeed a pre-1.1 device: pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device. ... ath5k 0000:03:00.0: registered as 'phy0' I'm not comfortable with any of Yinghai's patches yet. I know they fix Roman's issue, but I'm worried they may introduce other regressions, so I plan to simply revert 8c33f51d for now.
This should be fixed by this commit, which appeared in v3.9-rc6: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/acpi/pci_root.c?id=b8178f130e25c1bdac1c33e0996f1ff6e20ec08e
This should be resolved in v3.9. Please reopen if you disagree. Note that bug #57331 is a similar but different ASPM problem where a driver uses pci_disable_link_state(..., L0S | L1 | CKLPM), but the device still enters L1.