Kernel Bug Tracker – Bug 55211
ASPM enabled on pre-1.1 PCIe devices; causes ath5k and other device malfunctions
Last modified: 2013-04-30 19:38:33 UTC
This bug report is based on investigation for https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1137513
I have an old Acer Aspire One AOA150 netbook which has a BIOS that does not disable ASPM for ath5k wireless device, 168c:001c, Atheros Communications Inc. AR242x / AR542x PCI-Express adapter.
2 years ago in 6ccf15a1a76d2ff915cdef6ae4d12d0170087118 ath5k driver got this:
The card was working fine until Ubuntu raring 13.04 switched to 3.8 and the same symptoms that were fixed by the commit above started to appear - random md5 generator instead of file downloads, SSH disconnects, etc. The card had LnkCtl: ASPM L0s L1 Enabled; according to lspci in 3.8+ while it had ASPM disabled in kernels <= 3.7.
I have tested all the available builds of 3.8 mainline kernel and found that it was introduced early in 3.8 series. Then I did a bisect between 3.7 and 3.8 and found that the first bad commit is:
Author: Taku Izumi <email@example.com>
Date: Tue Oct 30 15:27:13 2012 +0900
PCI/ACPI: Request _OSC control before scanning PCI root bus
This patch moves up the code block to request _OSC control in order to
separate ACPI work and PCI work in acpi_pci_root_add().
Signed-off-by: Taku Izumi <firstname.lastname@example.org>
Signed-off-by: Bjorn Helgaas <email@example.com>
I see the following in dmesg:
[ 0.145833] pci_root PNP0A08:00: ACPI _OSC support notification failed, disabling PCIe ASPM
[ 0.145840] pci_root PNP0A08:00: Unable to request _OSC control (_OSC support mask: 0x08)
[ 0.181256] ACPI _OSC control for PCIe not granted, disabling ASPM
So it looks like it is actually doing something, yet ASPM remains enabled causing the card to malfunction during data transfers.
Created attachment 95431 [details]
fix aspm sequence
Please check if attached patch fixes the problem.
Created attachment 95441 [details]
remove not needed checking
Looks like we can go further to make hotplug working with disabling
aspm in quirks and drivers.
Could just remove aspm_disabled checking.
The "fix aspm sequence" patch fixes the issue.
03:00.0 Class : Device [168c:001c] (rev 01)
Subsystem: Device [105b:e008]
LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
Can you post boog log with and without patch?
Created attachment 95581 [details]
Original 3.8 kernel dmesg
Created attachment 95591 [details]
3.8 kernel dmesg with patch applied
As I see, the difference between the output is:
-pci_root PNP0A08:00: ACPI _OSC support notification failed, disabling PCIe ASPM
+pci_root PNP0A08:00: ACPI _OSC support notification failed, PCIe ASPM will be disabled
+pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force'
-ACPI _OSC control for PCIe not granted, disabling ASPM
+pci_root PNP0A08:00: ACPI _OSC support notification failed, PCIe ASPM disabled
03:00.0 is Atheros Communications Inc. AR242x / AR542x Wireless Network Adapter (PCI-Express)
Created attachment 95601 [details]
remove not needed checking
ok, that is from sanity_checking, that is why old sequence is happened to
Anyway please test new patch that should be more complete and handle
Re-tested on two laptops - AOA150, ath5k device got ASPM disabled and on a Lenovo E420 I got ASPM disabled for iwlwifi-driven Intel Corporation Centrino Wireless-N 1000 [8086:0084].
Also I found http://article.gmane.org/gmane.linux.kernel.pci/20640 where a question was raised why pci_disable_link_state stopped doing anything for iwlwifi devices too.
On Sun, Mar 17, 2013 at 8:50 AM, <firstname.lastname@example.org> wrote:
> --- Comment #9 from Roman Yepishev <email@example.com> 2013-03-17 15:50:23 ---
> Re-tested on two laptops - AOA150, ath5k device got ASPM disabled and on a
> Lenovo E420 I got ASPM disabled for iwlwifi-driven Intel Corporation Centrino
> Wireless-N 1000 [8086:0084].
> Also I found http://article.gmane.org/gmane.linux.kernel.pci/20640 where a
> question was raised why pci_disable_link_state stopped doing anything for
> iwlwifi devices too.
Did you test first patch or second patch?
Please test second patch only on affected platforms.
old logic is quite strange, on boot path and hotplug path it have
different aspm_disabled setting.
as it could set aspm_disabled after pci root bus scanning.
Second patch will not restore to old logic, and just remove not needed
for disabling path.
So second patch is right fix, but it need more test.
Created attachment 95651 [details]
First patch fix aspm sequence
restore old logic
Created attachment 95661 [details]
Second patch remove not needed checking
this one should be complete fix that still keep hotplug and boot path have
same aspm_disabled set.
I got a bit confused regarding the fixes, so tested everything with the netbook/ath5k:
Booting without patches:
LnkCtl: ASPM L0s L1 Enabled;
Booting with first patch only, "fix aspm sequence", id 95651
LnkCtl: ASPM Disabled;
Booting with second patch only, "remove not needed checking", id 95661
LnkCtl: ASPM Disabled;
Booting with first+second patch applied:
LnkCtl: ASPM Disabled;
So, in any case the issue appeared fixed. As I don't have any hotpluggable PCI-X device I am not sure how to test the hotplug interaction.
I think this regression has nothing to do with pci_disable_link_state().
When aspm_disabled is set, pci_disable_link_state() doesn't do anything.
In both 3.7 and 3.8, aspm_disabled is set in acpi_pci_root_add() before
any driver probe routines are run, so it looks like calling
pci_disable_link_state() from a driver had no effect even in 3.7. This
is a problem, of course, but not the one Roman is seeing, because ath5k
calls pci_disable_link_state() from the driver probe routine.
There are also PCI_FIXUP_FINAL quirks for 82575 and 82598 NICs that call
pci_disable_link_state(). In 3.7, these quirks are run before
aspm_disabled is set, but 8c33f51d moved the pcie_no_aspm() call up
before we start scanning the bus, so in 3.8, aspm_disabled is set
*before* we run them. I think that means 8c33f51d broke all these
quirks. That's also a problem, of course, but this isn't the one Roman
is seeing either.
I think the problem Roman is seeing happens when
pcie_aspm_init_link_state() calls pcie_aspm_sanity_check() during device
enumeration. In 3.8, the fact that aspm_disabled is already set by the
time we get here means we skip the check for pre-1.1 PCIe devices, and
I think *this* is what Roman is seeing. The dmesg log attached at
comment #6 (https://bugzilla.kernel.org/attachment.cgi?id=95591) confirms
that Roman's ath5k is indeed a pre-1.1 device:
pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device. ...
ath5k 0000:03:00.0: registered as 'phy0'
I'm not comfortable with any of Yinghai's patches yet. I know they fix
Roman's issue, but I'm worried they may introduce other regressions, so
I plan to simply revert 8c33f51d for now.
This should be fixed by this commit, which appeared in v3.9-rc6:
This should be resolved in v3.9. Please reopen if you disagree.
Note that bug #57331 is a similar but different ASPM problem where a driver uses pci_disable_link_state(..., L0S | L1 | CKLPM), but the device still enters L1.