Bug 204639 - PCIEHP deadlock w/ Thunderbolt on ICL hardware
Summary: PCIEHP deadlock w/ Thunderbolt on ICL hardware
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-20 16:50 UTC by Mario Limonciello
Modified: 2019-08-27 16:37 UTC (History)
4 users (show)

See Also:
Kernel Version: v5.3-rc5
Tree: Mainline
Regression: No


Attachments
lspci (310 bytes, text/plain)
2019-08-20 16:50 UTC, Mario Limonciello
Details
dmesg (124.50 KB, text/plain)
2019-08-20 16:50 UTC, Mario Limonciello
Details
lspci (fixed truncation) (37.59 KB, text/plain)
2019-08-21 15:34 UTC, Mario Limonciello
Details
dmesg (pure 5.3-rc5 no extra) (97.19 KB, text/plain)
2019-08-21 15:34 UTC, Mario Limonciello
Details
[PATCH] PCI: pciehp: Avoid lockdep false positives for reset_lock (2.68 KB, patch)
2019-08-25 10:42 UTC, Lukas Wunner
Details | Diff
updated dmesg, 5.3-rc6 with Lukas' patch (105.21 KB, text/plain)
2019-08-26 14:07 UTC, Mario Limonciello
Details
[PATCH v2] PCI: pciehp: Avoid lockdep false positives for reset_lock (2.49 KB, patch)
2019-08-27 11:39 UTC, Lukas Wunner
Details | Diff
dmesg w/ 5.3-rc6 + v2 of patch (104.65 KB, text/plain)
2019-08-27 16:37 UTC, Mario Limonciello
Details

Description Mario Limonciello 2019-08-20 16:50:09 UTC
Created attachment 284535 [details]
lspci

w/ 5.3rc5 I find that plugging in or unplugging a Thunderbolt dock (WD19TB) into this ICL system will cause a deadlock.  At that point it's not possible to use the dock anymore.

As a debugging tactic I did test applying https://lore.kernel.org/patchwork/patch/1113961/ but it did not help the behavior.
Comment 1 Mario Limonciello 2019-08-20 16:50:36 UTC
Created attachment 284537 [details]
dmesg
Comment 2 Mario Limonciello 2019-08-20 16:54:59 UTC
A few notes:

1) Kernel command line does have modprobe.blacklist=intel_lpss_pci currently due to a BIOS issue that is not yet resolved.

2) I also has pci=nobar for that BIOS issue, but it's actually no longer needed.  It does allow the dock to enumerate the next time.  The deadlock spew only comes up the first time.
Comment 3 Lukas Wunner 2019-08-21 04:54:56 UTC
Previously reported by Theodore Ts'o for v5.0:
https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/

Similar report by Oliver O'Halloran for v5.2:
https://lore.kernel.org/linux-pci/CAOSf1CGnYaGUTZQHbgy39dC47XNZRi4At2aTWRK0MeBvupKrxg@mail.gmail.com/

Report by Mario Limonciello for v5.3:
https://lore.kernel.org/linux-acpi/5486107424db48f2a06ed4c8a81f75b0@AUSX13MPC101.AMER.DELL.COM/

As I've said in my response to Theodore, this looks like a false positive to me, maybe we can annotate the locks to quiesce lockdep:
https://lore.kernel.org/linux-pci/20190402083257.kyqmirq4ovzscxsv@wunner.de/
Comment 4 Mika Westerberg 2019-08-21 07:50:04 UTC
I noticed you have pci=nobar in the command line. Probably has nothing to do with the issue but just curious why you need it? Ideally you should not need any additional command line parameters.
Comment 5 Mika Westerberg 2019-08-21 07:51:11 UTC
Please ignore the above, I just noticed your comment above explaining it ;-)
Comment 6 Mika Westerberg 2019-08-21 11:31:26 UTC
I agree with Lukas' analysis that the lockdep splat is most likely false positive.

Regardless of that we still have the issue that the dock does not work anymore after re-plug:

[   71.169906] xhci_hcd 0000:03:00.0: init 0000:03:00.0 fail, -16
[   71.169914] xhci_hcd: probe of 0000:03:00.0 failed with error -16

I saw something similar recently and it was caused by the fact that the IOMMU code did not detach the domain from xHCI when it was hot-removed. The following commit fixes it and it should be in v5.4-rc5 but can you double-check that you have it?

458b7c8e0dde iommu/vt-d: Detach domain when move device out of group

If that does not help, please try to disable IOMMU (intel_iommu=off).
Comment 7 Mika Westerberg 2019-08-21 13:00:36 UTC
Additionally please add xhci_hcd.dyndbg to the kernel command line so we can hopefully see where the -EBUSY gets returned.
Comment 8 Mario Limonciello 2019-08-21 15:34:16 UTC
Created attachment 284557 [details]
lspci (fixed truncation)
Comment 9 Mario Limonciello 2019-08-21 15:34:47 UTC
Created attachment 284559 [details]
dmesg (pure 5.3-rc5 no extra)
Comment 10 Mario Limonciello 2019-08-21 15:40:54 UTC
I've added updated attachments:
* Fix the truncation on lspci output
* just 5.3-rc5 
* adding xhci_hcd.dyndbg
* remove pci=nobar
* leave modprobe.blacklist=intel_lpss_pci (BIOS bug)


> I noticed you have pci=nobar in the command line. Probably has nothing to do
> with the issue but just curious why you need it? Ideally you should not need
> any additional command line parameters.

I had added it when I was identifying the BIOS bug and forgot to remove it.

> [   71.169906] xhci_hcd 0000:03:00.0: init 0000:03:00.0 fail, -16
> [   71.169914] xhci_hcd: probe of 0000:03:00.0 failed with error -16

Actually removing pci=nobar fixes the dock add/remove hotplug not working properly as well.  So I guess the USB debugging wasn't really needed.

>As I've said in my response to Theodore, this looks like a false positive to
>me, maybe we can annotate the locks to quiesce lockdep:

This part I can readily reproduce, so If you'd like me to try something, happy to do so.
Comment 11 Lukas Wunner 2019-08-25 10:42:03 UTC
Created attachment 284595 [details]
[PATCH] PCI: pciehp: Avoid lockdep false positives for reset_lock

Here's an attempt to quiesce lockdep. Mario, could you give this a spin?
Comment 12 Mario Limonciello 2019-08-26 14:07:03 UTC
@Lukas,

I booted with dock connected, unplugged it and plugged it back in.
Unfortunately no help.  I tested just that on top of 5.3-rc6.
I'll attach an updated dmesg.
Comment 13 Mario Limonciello 2019-08-26 14:07:33 UTC
Created attachment 284611 [details]
updated dmesg, 5.3-rc6 with Lukas' patch
Comment 14 Lukas Wunner 2019-08-26 17:45:10 UTC
(In reply to Mario Limonciello from comment #12)
> I booted with dock connected, unplugged it and plugged it back in.
> Unfortunately no help.

Well, it helped a *little* bit. The lockdep splat complained about a reset_lock/reset_lock deadlock before. Now it's complaining about a reset_lock/pci_lock_rescan_remove deadlock. :-)

That second lockdep splat might even be real, not a false positive, but I'm not sure because it involves acpiphp, which I'm not familiar with. If I understand the splat correctly, acpiphp can acquire pci_lock_rescan_remove, then runtime resume a pciehp hotplug port, whereupon reset_lock is acquired. lockdep further assumes that the same hotplug port's IRQ thread pciehp_ist() may run concurrently, acquire reset_lock and then acquire pci_lock_rescan_remove to bring down the slot and deconfigure the hotplugged device. So a AB/BA deadlock. Mika, help, can this happen?

Note, as I've explained in the commit message of the patch, lockdep considers all reset_locks on the same hierarchy level as identical. So the splat may also be caused if a pciehp hotplug port's pciehp_ist() runs while  acpiphp acquires pci_lock_rescan_remove and then reset_lock of a *different* hotplug port on the same hierarchy level.
Comment 15 Mika Westerberg 2019-08-27 09:50:27 UTC
I'm not 100% sure about this but reading through the code one scenario
where I think this may happen is when we get ACPI Notify() to the root
port which is currently runtime suspended (this happens when the root
port itself is in D3cold). acpiphp runtime resumes the root port which
ends up enabling its interrupt in pciehp_resume(). Any hotplug event to
that port now eventually schedules pciehp_ist() which may run on another
CPU concurrently.

I wonder if we can get rid of the reset_lock completely by stopping
pciehp event handling while pciehp_reset_slot() is run?
Comment 16 Lukas Wunner 2019-08-27 11:15:38 UTC
(In reply to Mika Westerberg from comment #15)
> I'm not 100% sure about this but reading through the code one scenario
> where I think this may happen is when we get ACPI Notify() to the root
> port which is currently runtime suspended (this happens when the root
> port itself is in D3cold). acpiphp runtime resumes the root port which
> ends up enabling its interrupt in pciehp_resume(). Any hotplug event to
> that port now eventually schedules pciehp_ist() which may run on another
> CPU concurrently.

Note that at the top of pciehp_ist(), the root port is runtime resumed using pci_config_pm_runtime_get(). (That's assuming that the root port's current_state == PCI_D3cold, anything else would be a bug.)

If the root port is already being runtime resumed via ACPI Notify(), then pciehp_ist() will wait for the runtime resume to finish. And that includes pciehp_resume() having finished. So reset_lock has already been released by the time pciehp_ist() continues execution after pci_config_pm_runtime_get().

Hence I don't see how the particular scenario you've described can deadlock. (Again, if the root port's current_state isn't updated to D3cold upon runtime suspend, we may be in trouble, you may want to verify on an ICL system whether that's the case.)

> I wonder if we can get rid of the reset_lock completely by stopping
> pciehp event handling while pciehp_reset_slot() is run?

Well, that's precisely the point of the reset_lock. How else do you want to stop event handling if not with a lock?

Regarding the patch I've attached above, I've realized that the maximum number of subclasses is 8, not 255 (see MAX_LOCKDEP_SUBCLASSES), which seems too small, so I've updated the patch to register and assign a class for each controller. Will attach the new patch to this bugzilla once I've compile-tested it.
Comment 17 Lukas Wunner 2019-08-27 11:39:41 UTC
Created attachment 284631 [details]
[PATCH v2] PCI: pciehp: Avoid lockdep false positives for reset_lock

Second attempt at making lockdep happy, now with a per-controller lock class. Mario, if you could test this and post dmesg again I'd be grateful. Thanks.
Comment 18 Mika Westerberg 2019-08-27 11:44:44 UTC
(In reply to Lukas Wunner from comment #16)
> (In reply to Mika Westerberg from comment #15)
> > I'm not 100% sure about this but reading through the code one scenario
> > where I think this may happen is when we get ACPI Notify() to the root
> > port which is currently runtime suspended (this happens when the root
> > port itself is in D3cold). acpiphp runtime resumes the root port which
> > ends up enabling its interrupt in pciehp_resume(). Any hotplug event to
> > that port now eventually schedules pciehp_ist() which may run on another
> > CPU concurrently.
> 
> Note that at the top of pciehp_ist(), the root port is runtime resumed using
> pci_config_pm_runtime_get(). (That's assuming that the root port's
> current_state == PCI_D3cold, anything else would be a bug.)
> 
> If the root port is already being runtime resumed via ACPI Notify(), then
> pciehp_ist() will wait for the runtime resume to finish. And that includes
> pciehp_resume() having finished. So reset_lock has already been released by
> the time pciehp_ist() continues execution after pci_config_pm_runtime_get().

Yes, I think you are right. I missed that pci_config_pm_runtime_get() calls
pm_runtime_barrier().

> Hence I don't see how the particular scenario you've described can deadlock.
> (Again, if the root port's current_state isn't updated to D3cold upon
> runtime suspend, we may be in trouble, you may want to verify on an ICL
> system whether that's the case.)

With v5.3-rcX kernels and beyond the root port should be put into D3cold when it drops its reference to its power resources. So we should be fine.

> > I wonder if we can get rid of the reset_lock completely by stopping
> > pciehp event handling while pciehp_reset_slot() is run?
> 
> Well, that's precisely the point of the reset_lock. How else do you want to
> stop event handling if not with a lock?

I was thinking more like what pciehp_remove() and pcie_shutdown_notification() in particular are doing.
Comment 19 Lukas Wunner 2019-08-27 11:52:03 UTC
(In reply to Mika Westerberg from comment #18)
> (In reply to Lukas Wunner from comment #16)
> > (Again, if the root port's current_state isn't updated to D3cold upon
> > runtime suspend, we may be in trouble, you may want to verify on an ICL
> > system whether that's the case.)
> 
> With v5.3-rcX kernels and beyond the root port should be put into D3cold
> when it drops its reference to its power resources. So we should be fine.

I'm just wondering whether current_state is properly updated to D3cold as well?
Comment 20 Mika Westerberg 2019-08-27 12:00:00 UTC
> I'm just wondering whether current_state is properly updated to D3cold as
> well?

Yes, after 83a16e3f6d70 ("PCI / ACPI: Use cached ACPI device state to get PCI device power state") it should be updated properly.
Comment 21 Mario Limonciello 2019-08-27 16:37:12 UTC
Created attachment 284637 [details]
dmesg w/ 5.3-rc6 + v2 of patch

Note You need to log in before you can comment on or make changes to this bug.