Bug 57331

Summary: pci_disable_link_state() doesn't disable L1
Product: Drivers Reporter: Bjorn Helgaas (bjorn)
Component: PCIAssignee: drivers_pci (drivers_pci)
Severity: normal CC: linville
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://lkml.kernel.org/r/CANUX_P3F5YhbZX3WGU-j1AGpbXb_T9Bis2ErhvKkFMtDvzatVQ@mail.gmail.com
Kernel Version: 3.7.9 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: v3.7.9+ dmesg
lspci before loading iwlwifi
lspci after loading iwlwifi
instrumentation patches
experiment logs

Description Bjorn Helgaas 2013-04-30 19:36:07 UTC
Created attachment 100331 [details]
v3.7.9+ dmesg

Emmanuel found that an iwlwifi device seems to enter the L1 power-saving state even though the driver uses pci_disable_link_state() to disable L1 in this path:

        pci_disable_link_state(..., L0S | L1 | CKLPM)

lspci shows LnkCtl with L1 enabled even after iwlwifi is loaded:

  02:00.0 Network controller: Intel Corporation Device 08b1 (rev 5b)
    LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ...
Comment 1 Bjorn Helgaas 2013-04-30 19:36:44 UTC
Created attachment 100341 [details]
lspci before loading iwlwifi
Comment 2 Bjorn Helgaas 2013-04-30 19:37:09 UTC
Created attachment 100351 [details]
lspci after loading iwlwifi
Comment 3 Bjorn Helgaas 2013-05-16 22:13:34 UTC
Created attachment 101781 [details]
instrumentation patches

Some devices have hardware problems related to using ASPM.  Linux
drivers for these devices use pci_disable_link_state() to prevent
their device from entering L0s or L1.  But if the BIOS declines to
grant us control over ASPM (either via the FADT ACPI_FADT_NO_ASPM bit
or the _OSC method), pci_disable_link_state() currently does nothing.

Windows has a similar mechanism: drivers can use "PciASPMOptOut" in
their .INF file, which specifies that the device does not properly
support ASPM.

The question is whether pci_disable_link_state() should disable ASPM
even when the OS doesn't have control over ASPM in general.

I instrumented qemu to log PCI config space accesses and did several
experiments with Linux and Windows.  Attached here are:

  - qemu patches to add a bit more ASPM support, turn on ASPM
    by default, and log PCI config accesses.
  - qemu q35-chipset.cfg with XHCI PCIe device with ASPM support.
  - seabios patch to remove _OSC method.
  - Linux XHCI driver patch to call pci_disable_link_state().
  - Windows XHCI .INF file patch to add PciASPMOptOut.
Comment 4 Bjorn Helgaas 2013-05-16 22:15:49 UTC
Created attachment 101791 [details]
experiment logs

Attached here are the results of the experiments.  I tested the
following cases using Linux v3.9 and Windows 7 (power plan set
to "power saver"):

  01: no XHCI driver installed.

  02: Unmodified XHCI driver installed.  Windows "NEC USB3.0 xHCI
  Driver" from dell.com.

  03: Added pci_disable_link_state() call to Linux driver.  Added
  PciASPMOptOut to Windows driver .INF file.

  04: Same as 03, but removed _OSC method from qemu firmware.

The table below shows the resulting ASPM state and the accesses
made to the Link Control register.  The Link Control encodings
are (PCIe spec v3.0, sec 7.8.7):

  0x0000 - ASPM disabled
  0x0001 - ASPM L0s enabled
  0x0002 - ASPM L1 enabled
  0x0003 - ASPM L0s and L1 enabled

                     Linux v3.9        Windows 7

01  no XHCI driver   enabled           disabled

                     read  0x0003      read  0x0003
                     write 0x0003      write 0x0000
                     read  0x0003

02  XHCI driver      enabled           enabled (L1 only)
                     read  0x0003      read  0x0003
                     write 0x0003      write 0x0000
                     read  0x0003      write 0x0000
                                       write 0x0002

03  XHCI driver      disabled          disabled
    ASPM OptOut
                     read  0x0003      read  0x0003
                     write 0x0003      write 0x0000
                     read  0x0003      write 0x0000
                     read  0x0003
                     write 0x0000

04  XHCI driver      enabled           enabled (never written)
    ASPM OptOut
    no OSC method
                     read  0x0003      read  0x0003
                     write 0x0003
                     read  0x0003
Comment 5 Bjorn Helgaas 2013-05-16 22:16:52 UTC
The case relevant to Emmanuel's issue is 04, where the driver requests
that ASPM be disabled, but the _OSC method failed (his dmesg log in
attachment 100331 [details] shows "pci0000:00: ACPI _OSC support notification
failed, disabling PCIe ASPM").

In this case, the OS does not have permission to manage the PCIe
ASPM functionality, and both Linux and Windows leave ASPM enabled.

This may not be what the driver writer intended, but it does seem
safest to follow the Windows behavior.  Maybe we can add note in dmesg
to at least explain what's happening.

It's not relevant to this bug, but it is interesting that in case 01,
where no driver is present, Windows disables ASPM but Linux leaves it