Bug 71691

Summary: pciehp surprise removal broken by INTx enable
Product: Drivers Reporter: Bjorn Helgaas (bjorn)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED CODE_FIX    
Severity: normal CC: andreas.noever
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://lkml.kernel.org/r/CAMxnaaXd9-VbxSYa9q1YVT+fwfDFYKGG2Hq-1jkc-awLX-HwtQ@mail.gmail.com
Kernel Version: v3.14-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg
lscpi -vv
lspci -vvv -xxxx -s 06:03 (after boot, cold-plugged adapter)
lspci -vvv -xxxx -s 06:03 (unplugged adapter (first time, worked))
lspci -vvv -xxxx -s 06:03 (hotplugged adapter)
lspci -vvv -xxxx -s 06:03 (unplugged adapter (second time, failed))

Description Bjorn Helgaas 2014-03-07 16:39:32 UTC
Andreas noticed that 1f42db786b14 ("PCI: Enable INTx if BIOS left them disabled") broke pciehp surprise removal.  Original report at http://lkml.kernel.org/r/CAMxnaaXd9-VbxSYa9q1YVT+fwfDFYKGG2Hq-1jkc-awLX-HwtQ@mail.gmail.com
Comment 1 Andreas Noever 2014-03-07 23:18:56 UTC
Created attachment 128501 [details]
dmesg
Comment 2 Andreas Noever 2014-03-07 23:19:52 UTC
Created attachment 128511 [details]
lscpi -vv
Comment 3 Andreas Noever 2014-03-07 23:58:16 UTC
The hp bridge in question is the Thunderbolt downstream port (06:03). 

Note that the MacBook firmware only initializes cold-plugged thunderbolt devices. Scenario 3 requires my driver: https://lkml.org/lkml/2013/11/28/492

I observed the following:

Scenario 1 (as expected):
 - boot with coldplugged Thunderbolt adapter
 - DisINTx is not set
 - load pcieport driver
 - DisINTx is set (by pci_enable_msi)
 - unplug adapter
pciehp 0000:06:03.0:pcie24: Card not present on Slot(3-1)
pciehp 0000:09:00.0:pcie24: unloading service driver pciehp
pci_bus 0000:0a: busn_res: [bus 0a] is released
pci_bus 0000:09: busn_res: [bus 09-0a] is released

Scenario 2 (verify that DisINTx- breaks notifications):
 - boot with cold-plugged Thunderbolt adapter
 - DisINTx is not set
 - load pcieport driver
 - DisINTx is set (by pci_enable_msi)
 - clear DisINTx: setpci -s 06:03 COMMAND=0007
 - unplug adapter
 - pciehp does nothing, lspci -vv now shows INTx+ (the bridge is asserting the intx interrupt)
 - set DisINTx again: setpci -s 06:03 COMMAND=0407
 - pciehp finally reacts, as in Scenario 1

Scenario 3 (verify that 1f42db786b14 clears DisINTx):
 - boot with cold-plugged Thunderbolt adapter
 - unplug, pciehp removes the device
 - load thunderbolt driver, hotplug adapter
 - pciehp detects the adapter
 - do_pci_enable_device get called for 06:03, clears DisINTx
 - unplug
 - pciehp does nothing, lspci -vv now shows INTx+ (the bridge is asserting the intx interrupt)
 - set DisINTx again: setpci -s 06:03 COMMAND=0407
 - pciehp finally reacts, as in Scenario 1

Without 1f42db786b14 Scenario 3 works as expected (the devices are removed when unplugged).

It would be interesting to see whether that problem is specific to the thunderbolt controller or whether it also appears on other bridges.


The PCI Express Base Spec v3 states in 6.7.3.4 Software Notification of Hot-Plug Events:

If the Port is enabled for level-triggered interrupt signaling using the INTx messages, the virtual INTx wire must be asserted whenever and as long as the following conditions are satisfied: ...

and

If the Port is enabled for edge-triggered interrupt signaling using MSI or MSI-X, an interrupt message must be sent every time the logical AND of the following conditions transitions from FALSE to TRUE: ...


Since we enable MSI on the bridge I would say that the device is at fault here. Still the msi code is very careful and always calls pci_intx_for_msi (which calls pci_intx, unless the device is blacklisted) whenever we enable/disable msi (see for example: http://lxr.free-electrons.com/source/drivers/pci/msi.c#L627 ). Since the original patch was addressing a problem where the BIOS set the flag I would suggest that we only clear it if it was indeed set by the BIOS and leave it alone otherwise.
Comment 4 Andreas Noever 2014-03-08 00:10:45 UTC
Created attachment 128521 [details]
lspci -vvv -xxxx -s 06:03 (after boot, cold-plugged adapter)
Comment 5 Andreas Noever 2014-03-08 00:12:00 UTC
Created attachment 128531 [details]
lspci -vvv -xxxx -s 06:03 (unplugged adapter (first time, worked))
Comment 6 Andreas Noever 2014-03-08 00:12:24 UTC
Created attachment 128541 [details]
lspci -vvv -xxxx -s 06:03 (hotplugged adapter)
Comment 7 Andreas Noever 2014-03-08 00:12:45 UTC
Created attachment 128551 [details]
lspci -vvv -xxxx -s 06:03 (unplugged adapter (second time, failed))
Comment 8 Bjorn Helgaas 2014-06-03 23:09:33 UTC
This should be fixed by 

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3cdeb713dc66057b50682048c151eae07b186c42

which appeared in v3.14-rc7.  Andreas, let me know if that is not the case.