Bug 55211
Summary: | ASPM enabled on pre-1.1 PCIe devices; causes ath5k and other device malfunctions | ||
---|---|---|---|
Product: | Drivers | Reporter: | Roman Yepishev (roman.yepishev) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | bjorn, roman.yepishev, yinghai |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.8.0 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
fix aspm sequence
remove not needed checking Original 3.8 kernel dmesg 3.8 kernel dmesg with patch applied remove not needed checking First patch fix aspm sequence Second patch remove not needed checking |
Description
Roman Yepishev
2013-03-14 10:04:48 UTC
Created attachment 95431 [details]
fix aspm sequence
Please check if attached patch fixes the problem.
Created attachment 95441 [details]
remove not needed checking
Looks like we can go further to make hotplug working with disabling
aspm in quirks and drivers.
Could just remove aspm_disabled checking.
Thank you! The "fix aspm sequence" patch fixes the issue. 03:00.0 Class [0200]: Device [168c:001c] (rev 01) Subsystem: Device [105b:e008] ... LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ Can you post boog log with and without patch? Created attachment 95581 [details]
Original 3.8 kernel dmesg
Created attachment 95591 [details]
3.8 kernel dmesg with patch applied
Done. As I see, the difference between the output is: -pci_root PNP0A08:00: ACPI _OSC support notification failed, disabling PCIe ASPM +pci_root PNP0A08:00: ACPI _OSC support notification failed, PCIe ASPM will be disabled ... +pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force' ... -ACPI _OSC control for PCIe not granted, disabling ASPM +pci_root PNP0A08:00: ACPI _OSC support notification failed, PCIe ASPM disabled 03:00.0 is Atheros Communications Inc. AR242x / AR542x Wireless Network Adapter (PCI-Express) Created attachment 95601 [details]
remove not needed checking
ok, that is from sanity_checking, that is why old sequence is happened to
work.
Anyway please test new patch that should be more complete and handle
hotplug case.
Thanks
Re-tested on two laptops - AOA150, ath5k device got ASPM disabled and on a Lenovo E420 I got ASPM disabled for iwlwifi-driven Intel Corporation Centrino Wireless-N 1000 [8086:0084]. Also I found http://article.gmane.org/gmane.linux.kernel.pci/20640 where a question was raised why pci_disable_link_state stopped doing anything for iwlwifi devices too. On Sun, Mar 17, 2013 at 8:50 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=55211 > --- Comment #9 from Roman Yepishev <roman.yepishev@gmail.com> 2013-03-17 > 15:50:23 --- > Re-tested on two laptops - AOA150, ath5k device got ASPM disabled and on a > Lenovo E420 I got ASPM disabled for iwlwifi-driven Intel Corporation Centrino > Wireless-N 1000 [8086:0084]. > > Also I found http://article.gmane.org/gmane.linux.kernel.pci/20640 where a > question was raised why pci_disable_link_state stopped doing anything for > iwlwifi devices too. > good. Did you test first patch or second patch? Please test second patch only on affected platforms. old logic is quite strange, on boot path and hotplug path it have different aspm_disabled setting. as it could set aspm_disabled after pci root bus scanning. Second patch will not restore to old logic, and just remove not needed aspm_disabled checking for disabling path. So second patch is right fix, but it need more test. Thanks Yinghai Created attachment 95651 [details]
First patch fix aspm sequence
restore old logic
Created attachment 95661 [details]
Second patch remove not needed checking
this one should be complete fix that still keep hotplug and boot path have
same aspm_disabled set.
I got a bit confused regarding the fixes, so tested everything with the netbook/ath5k: Booting without patches: LnkCtl: ASPM L0s L1 Enabled; Booting with first patch only, "fix aspm sequence", id 95651 LnkCtl: ASPM Disabled; Booting with second patch only, "remove not needed checking", id 95661 LnkCtl: ASPM Disabled; Booting with first+second patch applied: LnkCtl: ASPM Disabled; So, in any case the issue appeared fixed. As I don't have any hotpluggable PCI-X device I am not sure how to test the hotplug interaction. I think this regression has nothing to do with pci_disable_link_state(). When aspm_disabled is set, pci_disable_link_state() doesn't do anything. In both 3.7 and 3.8, aspm_disabled is set in acpi_pci_root_add() before any driver probe routines are run, so it looks like calling pci_disable_link_state() from a driver had no effect even in 3.7. This is a problem, of course, but not the one Roman is seeing, because ath5k calls pci_disable_link_state() from the driver probe routine. There are also PCI_FIXUP_FINAL quirks for 82575 and 82598 NICs that call pci_disable_link_state(). In 3.7, these quirks are run before aspm_disabled is set, but 8c33f51d moved the pcie_no_aspm() call up before we start scanning the bus, so in 3.8, aspm_disabled is set *before* we run them. I think that means 8c33f51d broke all these quirks. That's also a problem, of course, but this isn't the one Roman is seeing either. I think the problem Roman is seeing happens when pcie_aspm_init_link_state() calls pcie_aspm_sanity_check() during device enumeration. In 3.8, the fact that aspm_disabled is already set by the time we get here means we skip the check for pre-1.1 PCIe devices, and I think *this* is what Roman is seeing. The dmesg log attached at comment #6 (https://bugzilla.kernel.org/attachment.cgi?id=95591) confirms that Roman's ath5k is indeed a pre-1.1 device: pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device. ... ath5k 0000:03:00.0: registered as 'phy0' I'm not comfortable with any of Yinghai's patches yet. I know they fix Roman's issue, but I'm worried they may introduce other regressions, so I plan to simply revert 8c33f51d for now. This should be fixed by this commit, which appeared in v3.9-rc6: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/acpi/pci_root.c?id=b8178f130e25c1bdac1c33e0996f1ff6e20ec08e This should be resolved in v3.9. Please reopen if you disagree. Note that bug #57331 is a similar but different ASPM problem where a driver uses pci_disable_link_state(..., L0S | L1 | CKLPM), but the device still enters L1. |