Bug 79611

Summary: Incorrect pciehp messages occur during boot
Product: Drivers Reporter: Myron Stowe (myron.stowe)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED CODE_FIX    
Severity: normal CC: bjorn, rajatxjain
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.16-rc4 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg log from a RHEL based system (issue verified to occur with upstream kernel also)

Description Myron Stowe 2014-07-07 17:11:39 UTC
Created attachment 142301 [details]
dmesg log from a RHEL based system (issue verified to occur with upstream kernel also)

Seeing the following 'pciehp' messages during boot:
  ...
  pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at 0000:83:00, cannot hot-add
  pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
  ...

These are occurring during normal booting, no forced hot-plug activity is in play.

The corresponding thread on the linux-pci list is https://lkml.org/lkml/2014/6/17/531
Comment 1 Myron Stowe 2014-07-07 20:04:55 UTC
From the 'dmesg' log we can isolate the hierarchy involved

ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-fe])
acpi PNP0A08:01: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
acpi PNP0A08:01: PCIe AER handled by firmware
acpi PNP0A08:01: _OSC: OS now controls [PCIeHotplug PME PCIeCapability]
PCI host bridge to bus 0000:80
pci_bus 0000:80: root bus resource [bus 80-fe]
pci_bus 0000:80: root bus resource [io  0x8000-0xffff]
pci_bus 0000:80: root bus resource [mem 0xc8000000-0xfbffbfff]
pci_bus 0000:80: root bus resource [mem 0x34000000000-0x37fffffffff]

pci 0000:80:03.0: PCI bridge to [bus 81-86]
pci 0000:80:03.0:   bridge window [mem 0xc8000000-0xc87fffff]

pci 0000:81:00.0: PCI bridge to [bus 82-86]
pci 0000:81:00.0:   bridge window [mem 0xc8000000-0xc86fffff]

pci 0000:82:04.0: [10b5:8734] type 01 class 0x060400
pci 0000:82:04.0: PCI bridge to [bus 83]
pci 0000:82:04.0:   bridge window [mem 0xc8600000-0xc86fffff]
 pciehp 0000:82:04.0:pcie24: Slot #4 AttnBtn- AttnInd- PwrInd- PwrCtrl- MRL-
                                     Interlock- NoCompl- LLActRep+

pci 0000:83:00.0: [144d:a820] type 00 class 0x010802
pci 0000:83:00.0: reg 0x10: [mem 0xc8600000-0xc8603fff 64bit]

The device is a Samsung Electronics Co Ltd z820 NVMe SSD Controller (171X) residing under a PLX Technology, Inc PEX 8734 32-line, 8-Port PCIe Gen 3 Switch's downstream port.

The messages occur during boot and come from 'board_added' so the logical path to follow is
  pciehp_probe
    pciehp_enable_slot
      board_added

We get into 'pciehp_enable_slot' via a work queue based "ENABLE_REQ" event.  pciehp_ctrl.c::'interrupt_event_handler' calls handlers for various events:
  INT_BUTTON_PRESS       handle_button_press_event()
  INT_POWER_FAULT        pciehp_set_attention_status(), pciehp_green_led_off()
  INT_PRESENCE_{ON,OFF}  handle_surprise_event()
  INT_LINK_{UP,DOWN}     handle_link_event()

Looking at the slot capabilities of the switch's downstream port leading to the SSD device we see

 pciehp 0000:82:04.0:pcie24: Slot #4 AttnBtn- AttnInd- PwrInd- PwrCtrl- MRL-
                                     Interlock- NoCompl- LLActRep+

So the switch's downstream port does not support any "slot control" except
LLActRep (Data Link Layer Link Active Reporting Capable) [7.8.10 Slot Control
Register (offset 18h)].

Putting everything together we have

pciehp_probe            // Entry point for setting up PCIe Hot-Plug
  pcie_init
    if (NO_CMD_CMPL(ctrl) ||
        !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
            ctrl->no_cmd_complete = 1;          // Intel switch HACK

    /* Check if Data Link Layer Link Active Reporting is implemented */
    pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
    if (link_cap & PCI_EXP_LNKCAP_DLLLARC) {
            ctrl_dbg(ctrl, "Link Active Reporting supported\n");
            ctrl->link_active_reporting = 1;
    }

    /* Clear all remaining event bits in Slot Status register */
    pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
            PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
            PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_PDC |
            PCI_EXP_SLTSTA_CC);

Note: The "Data Link Layer State Changed" bit is *not* cleared!  [7.8.11 Slot Status Register (Offset 1Ah)]

    init_slot
    pcie_init_notification                      // Put event handlers in place
      pciehp_request_irq
        request_irq(..., pcie_isr, ...          // PCIe hot-plug ISR: 'pcie_isr'
      pcie_enable_notification                  // Enable events
        pcie_write_cmd(PCI_EXP_SLTCTL_DLLSCE | PCI_EXP_SLTCTL_PDCE |
                       PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE)

We likely get a LLActRep interrupt at this point which puts us into 'pcie_isr':

pcie_isr
  wake_up(&ctrl->queue)                         // ??  Expect occurs
  ...
  pciehp_handle_linkstate_change
    pciehp_check_link_active
    queue_interrupt_event(p_slot, INT_LINK_UP)
      interrupt_event_handler                   // INT_LINK_UP
        handle_link_event(p_slot, INT_LINK_UP)
          info->req = event == INT_LINK_UP ? ENABLE_REQ : DISABLE_REQ;
          INIT_WORK(&info->work, pciehp_power_thread);
            case ENABLE_REQ:
              pciehp_enable_slot(p_slot);

So we likely are getting a LLActRep.  I believe this is occurring as a result of commit e48f1b67f668 *and* the BIOS not clearing DLLSC before handing off to the OS.


Adding PCI_EXP_SLTSTA_DLLSC to the 'pcie_capability_write_word' of 'pcie_init' to clear out any outstanding event bits in the slot status register fixes the issue indicating that the above analysis is correct.  I do not know if BIOS should, or should not, be clearing DLLSC before handoff to the OS.  Seems as if it should but perhaps there is a reason for not doing such I'm unaware of.  In either case, adding DLLSC to the cleared out events during initialization solves this issue.
Comment 2 Myron Stowe 2014-07-08 17:59:33 UTC
Created attachment 142501 [details]
dmesg log from a RHEL based system (issue verified to occur with upstream kernel also)
Comment 3 Bjorn Helgaas 2014-11-04 14:56:18 UTC
Fixed by 0d25d35c987d ("PCI: pciehp: Clear Data Link Layer State Changed during init") which appeared in v3.17-rc1.