Bug 64821

Summary: pciehp prints erroneous error messages & causes 1 second delay on each HP controller
Product: Drivers Reporter: Rajat Jain (rajatxjain)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: NEW ---    
Severity: normal CC: rajatxjain, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://marc.info/?t=138369090600001&r=1&w=2
Kernel Version: 3.10.16 Subsystem:
Regression: No Bisected commit-id:

Description Rajat Jain 2013-11-11 20:49:20 UTC
Hi,

This problem was discussed here:

http://marc.info/?t=138369090600001&r=1&w=2

Essentially, today the pciehp assumes that any HP controller that does not implement the following 4 HP elements, but yet advertises support for "command completed" notifications, will not generate command completed notifications on the slot control register:

/*
 * Controller doesn't notify of command completion if the "No
 * Command Completed Support" bit is set in Slot Capability
 * register or the controller supports none of power
 * controller, attention led, power led and EMI.
 */
if (NO_CMD_CMPL(ctrl) ||
    !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
    ctrl->no_cmd_complete = 1;

However, this is apparently not according to the PCI Express specs:

Bjorn Helgaas <bhelgaas@google.com> wrote:
> I'm not sure this "cmd completed" is actually spurious.  Spec section
> 7.8.10 is very clear that any write to Slot Control must cause a
> hot-plug command to be generated (if the port is hot-plug capable).

The current behaviour causes a problem for the HP controllers that do not implement the above mentioned HP elements, but yet generate "command Completed" notifications for pcie_disable_notification():

    pciehp_probe
       pcie_init
         readl(SLTCAP)
         if (NO_CMD_CMPL || !(POWER_CTRL | ATTN_LED | PWR_LED | EMI))  # true
           ctrl->no_cmd_complete = 1     # set (condition above is true)
         writew(SLTSTA, 0x1f)            # clear ABP PFD MRLSC PDC CC
         pcie_disable_notification
           pcie_write_cmd(0, PDCE | ABPE | MRLSCE | PFDE | HPIE | CCIE | DLLSCE)
             readw(SLTSTA)               # CC == 0 (was cleared above)
             writew(SLTCTL)
             ...
             # no waiting here because no_cmd_complete == 1
             ...
             hardware sets SLTSTA.CC
             ...
       pcie_init_notification
         pcie_enable_notification
           pcie_write_cmd
             readw(SLTSTA)               # CC == 1 (was set by HW above)
             if (SLTSTA.CC)              # true
               if (!NO_CMD_CMPL))        # true
                 dbg("Unexpected CMD_COMPLETED. ...") <---- Problem 1 
                 ctrl->no_cmd_complete = 0
             writew(SLTCTL)
             ...
             # this time we wait because no_cmd_complete == 0
             ...
             pcie_wait_cmd(..., poll == 0)# now no_cmd_complete == 0
               wait_event_timeout(..., !ctrl->cmd_busy,...) #cmd_busy is never cleared <-- Problem 2
                 Timeout & print(""Command not completed in 1000 msec\n"")

Why cmd_busy is not cleared?
  -> because the pcie_isr is never invoked
     -> because IRQ is not generated
        -> since the CC bit was never cleared.

I verified that setting ctrl->no_cmd_complete = 0 solves the problem.

I see 2 problems in the current code:

1) In case of spurious "command completed" interrupt detected in pcie_write_cmd(), it is never cleared:
        if (slot_status & PCI_EXP_SLTSTA_CC) {
           /* Not cleared here. Unless we cleared it, further interrupts will not be generated */
	}

2) 1 second delay in probe and following 2 error messages needs to get rid of:
 Unexpected CMD_COMPLETED. Need to wait for command completed event.
 Command not completed in 1000 msec


Here is the lspci -vv and relevent dmesg output:

PCI bridge: Integrated Device Technology, Inc. Device 807a (rev 02)
 (prog-if 00 [Normal decode])
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
 Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
 Bus: primary=02, secondary=50, subordinate=5f, sec-latency=0
 I/O behind bridge: 00003000-00003fff
 Memory behind bridge: 8c000000-8fffffff
 Prefetchable memory behind bridge: 00000000b0600000-00000000b07fffff
 Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
        PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00
         DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                 ExtTag+ RBE+ FLReset-
         DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                 RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                 MaxPayload 128 bytes, MaxReadReq 128 bytes
         DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
         LnkCap: Port #3, Speed 5GT/s, Width x4, ASPM L0s L1, Latency
 L0 <4us, L1 <4us
                 ClockPM- Surprise+ LLActRep+ BwNot+
         LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
                 ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
         LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk-
 DLActive- BWMgmt- ABWMgmt-
         SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
                 Slot #3, PowerLimit 0.000W; Interlock- NoCompl-
         SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ HPIrq+ LinkChg+
                 Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
         SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt+ PresDet- Interlock-
                 Changed: MRL- PresDet- LinkState-
         DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
 OBFF Not Supported ARIFwd+
         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
 OBFF Disabled ARIFwd-
         LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
 Selectable De-emphasis: -6dB
                  Transmit Margin: Normal Operating Range,
 EnterModifiedCompliance- ComplianceSOS-
                  Compliance De-emphasis: -6dB
         LnkSta2: Current De-emphasis Level: -6dB,
 EqualizationComplete-, EqualizationPhase1-
                  EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [c0] Power Management version 3
         Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
 PME(D0+,D1-,D2-,D3hot+,D3cold+)
         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
         Address: 00000000ff041740  Data: 0003
 Capabilities: [100 v1] Advanced Error Reporting
         UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
         CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
         AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
 Capabilities: [200 v1] Virtual Channel
         Caps:   LPEVC=0 RefClk=100ns PATEntryBits=4
         Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:   ArbSelect=Fixed
         Status: InProgress-
         VC0:    Caps:   PATOffset=04 MaxTimeSlots=1 RejSnoopTrans-
                 Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                 Status: NegoPending- InProgress-
                 Port Arbitration Table <?>
 Capabilities: [320 v1] Access Control Services
         ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+
 EgressCtrl+ DirectTrans+
         ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
 EgressCtrl- DirectTrans-
 Capabilities: [330 v1] #12
 Kernel driver in use: pcieport

 In addition, here is what the pciehp driver spews out:

 Hotplug Controller:
   Seg/Bus/Dev/Func/IRQ : 0000:02:03.0 IRQ 21
   Vendor ID            : 0x111d
   Device ID            : 0x807a
   Subsystem ID         : 0x0000
   Subsystem Vendor ID  : 0x0000
   PCIe Cap offset      : 0x40
   PCI resource [7]     : [io  0x13000-0x13fff]
   PCI resource [8]     : [mem 0xc0c000000-0xc0fffffff]
   PCI resource [9]     : [mem 0xc30600000-0xc307fffff 64bit pref]
 Slot Capabilities      : 0x00180040
   Physical Slot Number : 3
   Attention Button     :  no
   Power Controller     :  no
   MRL Sensor           :  no
   Attention Indicator  :  no
   Power Indicator      :  no
   Hot-Plug Surprise    :  no
   EMI Present          :  no
   Command Completed    : yes
 Slot Status            : 0x0010
 Slot Control           : 0x0000
 Link Active Reporting supported
 HPC vendor_id 111d device_id 807a ss_vid 0 ss_did 0
 Registering domain:bus:dev=0000:50:00 sun=3
 Unexpected CMD_COMPLETED. Need to wait for command completed event.
 Command not completed in 1000 msec


Thanks,

Rajat