Bug 64821 - pciehp prints erroneous error messages & causes 1 second delay on each HP controller
Summary: pciehp prints erroneous error messages & causes 1 second delay on each HP con...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL: http://marc.info/?t=138369090600001&r...
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-11 20:49 UTC by Rajat Jain
Modified: 2016-02-18 21:32 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.10.16
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Rajat Jain 2013-11-11 20:49:20 UTC
Hi,

This problem was discussed here:

http://marc.info/?t=138369090600001&r=1&w=2

Essentially, today the pciehp assumes that any HP controller that does not implement the following 4 HP elements, but yet advertises support for "command completed" notifications, will not generate command completed notifications on the slot control register:

/*
 * Controller doesn't notify of command completion if the "No
 * Command Completed Support" bit is set in Slot Capability
 * register or the controller supports none of power
 * controller, attention led, power led and EMI.
 */
if (NO_CMD_CMPL(ctrl) ||
    !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
    ctrl->no_cmd_complete = 1;

However, this is apparently not according to the PCI Express specs:

Bjorn Helgaas <bhelgaas@google.com> wrote:
> I'm not sure this "cmd completed" is actually spurious.  Spec section
> 7.8.10 is very clear that any write to Slot Control must cause a
> hot-plug command to be generated (if the port is hot-plug capable).

The current behaviour causes a problem for the HP controllers that do not implement the above mentioned HP elements, but yet generate "command Completed" notifications for pcie_disable_notification():

    pciehp_probe
       pcie_init
         readl(SLTCAP)
         if (NO_CMD_CMPL || !(POWER_CTRL | ATTN_LED | PWR_LED | EMI))  # true
           ctrl->no_cmd_complete = 1     # set (condition above is true)
         writew(SLTSTA, 0x1f)            # clear ABP PFD MRLSC PDC CC
         pcie_disable_notification
           pcie_write_cmd(0, PDCE | ABPE | MRLSCE | PFDE | HPIE | CCIE | DLLSCE)
             readw(SLTSTA)               # CC == 0 (was cleared above)
             writew(SLTCTL)
             ...
             # no waiting here because no_cmd_complete == 1
             ...
             hardware sets SLTSTA.CC
             ...
       pcie_init_notification
         pcie_enable_notification
           pcie_write_cmd
             readw(SLTSTA)               # CC == 1 (was set by HW above)
             if (SLTSTA.CC)              # true
               if (!NO_CMD_CMPL))        # true
                 dbg("Unexpected CMD_COMPLETED. ...") <---- Problem 1 
                 ctrl->no_cmd_complete = 0
             writew(SLTCTL)
             ...
             # this time we wait because no_cmd_complete == 0
             ...
             pcie_wait_cmd(..., poll == 0)# now no_cmd_complete == 0
               wait_event_timeout(..., !ctrl->cmd_busy,...) #cmd_busy is never cleared <-- Problem 2
                 Timeout & print(""Command not completed in 1000 msec\n"")

Why cmd_busy is not cleared?
  -> because the pcie_isr is never invoked
     -> because IRQ is not generated
        -> since the CC bit was never cleared.

I verified that setting ctrl->no_cmd_complete = 0 solves the problem.

I see 2 problems in the current code:

1) In case of spurious "command completed" interrupt detected in pcie_write_cmd(), it is never cleared:
        if (slot_status & PCI_EXP_SLTSTA_CC) {
           /* Not cleared here. Unless we cleared it, further interrupts will not be generated */
	}

2) 1 second delay in probe and following 2 error messages needs to get rid of:
 Unexpected CMD_COMPLETED. Need to wait for command completed event.
 Command not completed in 1000 msec


Here is the lspci -vv and relevent dmesg output:

PCI bridge: Integrated Device Technology, Inc. Device 807a (rev 02)
 (prog-if 00 [Normal decode])
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
 Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
 Bus: primary=02, secondary=50, subordinate=5f, sec-latency=0
 I/O behind bridge: 00003000-00003fff
 Memory behind bridge: 8c000000-8fffffff
 Prefetchable memory behind bridge: 00000000b0600000-00000000b07fffff
 Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
        PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00
         DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                 ExtTag+ RBE+ FLReset-
         DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                 RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                 MaxPayload 128 bytes, MaxReadReq 128 bytes
         DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
         LnkCap: Port #3, Speed 5GT/s, Width x4, ASPM L0s L1, Latency
 L0 <4us, L1 <4us
                 ClockPM- Surprise+ LLActRep+ BwNot+
         LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
                 ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
         LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk-
 DLActive- BWMgmt- ABWMgmt-
         SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
                 Slot #3, PowerLimit 0.000W; Interlock- NoCompl-
         SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ HPIrq+ LinkChg+
                 Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
         SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt+ PresDet- Interlock-
                 Changed: MRL- PresDet- LinkState-
         DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
 OBFF Not Supported ARIFwd+
         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
 OBFF Disabled ARIFwd-
         LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
 Selectable De-emphasis: -6dB
                  Transmit Margin: Normal Operating Range,
 EnterModifiedCompliance- ComplianceSOS-
                  Compliance De-emphasis: -6dB
         LnkSta2: Current De-emphasis Level: -6dB,
 EqualizationComplete-, EqualizationPhase1-
                  EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [c0] Power Management version 3
         Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
 PME(D0+,D1-,D2-,D3hot+,D3cold+)
         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
         Address: 00000000ff041740  Data: 0003
 Capabilities: [100 v1] Advanced Error Reporting
         UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
         CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
         AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
 Capabilities: [200 v1] Virtual Channel
         Caps:   LPEVC=0 RefClk=100ns PATEntryBits=4
         Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:   ArbSelect=Fixed
         Status: InProgress-
         VC0:    Caps:   PATOffset=04 MaxTimeSlots=1 RejSnoopTrans-
                 Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                 Status: NegoPending- InProgress-
                 Port Arbitration Table <?>
 Capabilities: [320 v1] Access Control Services
         ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+
 EgressCtrl+ DirectTrans+
         ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
 EgressCtrl- DirectTrans-
 Capabilities: [330 v1] #12
 Kernel driver in use: pcieport

 In addition, here is what the pciehp driver spews out:

 Hotplug Controller:
   Seg/Bus/Dev/Func/IRQ : 0000:02:03.0 IRQ 21
   Vendor ID            : 0x111d
   Device ID            : 0x807a
   Subsystem ID         : 0x0000
   Subsystem Vendor ID  : 0x0000
   PCIe Cap offset      : 0x40
   PCI resource [7]     : [io  0x13000-0x13fff]
   PCI resource [8]     : [mem 0xc0c000000-0xc0fffffff]
   PCI resource [9]     : [mem 0xc30600000-0xc307fffff 64bit pref]
 Slot Capabilities      : 0x00180040
   Physical Slot Number : 3
   Attention Button     :  no
   Power Controller     :  no
   MRL Sensor           :  no
   Attention Indicator  :  no
   Power Indicator      :  no
   Hot-Plug Surprise    :  no
   EMI Present          :  no
   Command Completed    : yes
 Slot Status            : 0x0010
 Slot Control           : 0x0000
 Link Active Reporting supported
 HPC vendor_id 111d device_id 807a ss_vid 0 ss_did 0
 Registering domain:bus:dev=0000:50:00 sun=3
 Unexpected CMD_COMPLETED. Need to wait for command completed event.
 Command not completed in 1000 msec


Thanks,

Rajat

Note You need to log in before you can comment on or make changes to this bug.