Bug 200985 - Problems with recent change to use slot reset instead of bus reset
Summary: Problems with recent change to use slot reset instead of bus reset
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-31 17:22 UTC by Dennis
Modified: 2022-01-19 22:20 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.19-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Extend pci_reset_bus() for probe time. (3.23 KB, patch)
2018-08-31 20:16 UTC, Sinan Kaya
Details | Diff
[PATCH 1/2] PCI: Request reset type as part of function reset (6.96 KB, patch)
2018-09-05 03:39 UTC, Sinan Kaya
Details | Diff
[PATCH 2/2] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type (1.06 KB, patch)
2018-09-05 03:40 UTC, Sinan Kaya
Details | Diff
[PATCH 1/2] PCI: Request reset type as part of function reset (6.61 KB, patch)
2018-09-05 03:43 UTC, Sinan Kaya
Details | Diff

Description Dennis 2018-08-31 17:22:45 UTC
Issue #1
--------
pci_reset_bus(dev)
	return pci_probe_reset_slot(pdev->slot) ?
		__pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus);

pci_probe_rest can return an errno. This check needs to be modified to handle	the case where we are not dealing with a slot or it's not resetable.

Root cause: 811c5cb   PCI: Unify try slot and bus reset API

Issue #2
--------
If we fix this and end up calling into __pci_reset_bus(pdev->bus)
the first thing that happens is pci_bus_reset(bus, 1) is called. Since we are passing in 1, the function early exists and returns 0.

Back in __pci_reset_bus() the code falls down to pci_bus_save_and_disable(bus)however since this is possible to be called during probe we hit a deadlock when trying to lock the dev.

To fix this we need to implement a locked reset API or something of that nature. Until then these patches should probably just be reverted. I will also post a band-aid patch to linux-pci and linux-rdma.

Root cause:
c6a44ba950d1 ("PCI: Rename pci_try_reset_bus() to pci_reset_bus()")
409888e0966e ("IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset")

This has also been discussed on linux-pci:
https://marc.info/?l=linux-pci&m=153539692917785&w=2
Comment 1 Sinan Kaya 2018-08-31 17:35:47 UTC
I would rather fix the issue rather than reverting the patch. By revert, we are leaving another hole open.
Comment 2 Dennis 2018-08-31 17:40:46 UTC
The hfi1 driver is not usable due to these regressions and it is important to get a fix in the -rc timeframe.
Comment 3 Dennis 2018-08-31 17:42:35 UTC
(In reply to Sinan Kaya from comment #1)
> I would rather fix the issue rather than reverting the patch. By revert, we
> are leaving another hole open.

I'm not suggesting we revert and abandon the idea. Just until we can properly rework the API. If that is something doable in -RC time frame I'm OK with that too. My main concern is 4.19 is broken for our HW.
Comment 4 Sinan Kaya 2018-08-31 17:47:07 UTC
I see the code now. It might be OK. Let me circulate some ideas on the list. We might come up with a better fix if time permits. Otherwise, I don't object your code.
Comment 5 Dennis 2018-08-31 17:57:24 UTC
Sounds good. Do let me know if you have something you want me to test.
Comment 6 Sinan Kaya 2018-08-31 20:16:05 UTC
Created attachment 278221 [details]
Extend pci_reset_bus() for probe time.
Comment 7 Sinan Kaya 2018-08-31 20:16:57 UTC
Can you please test the attached patch as a replacement for 2/2 along with your first parch (1/2)?

I just did a compile test.
Comment 8 Sinan Kaya 2018-09-04 18:56:48 UTC
Dennis,

Can you confirm that you have seen the attached patch in the Bugzilla and are testing on your system?

I'll post to the list only after you confirm that it fixes the issue for you.

Sinan
Comment 9 Dennis 2018-09-04 19:07:48 UTC
(In reply to Sinan Kaya from comment #8)
> Can you confirm that you have seen the attached patch in the Bugzilla and
> are testing on your system?

Yep, building an RPM right now actually.  I will throw it on and run some sanity checks and report back here.
Comment 10 Sinan Kaya 2018-09-04 20:03:44 UTC
Thanks,

Note that you still need 1/2 of your patchset. Attached patch is a replacement for 2/2.
Comment 11 Dennis 2018-09-05 02:46:34 UTC
Yes I kept my 1/2 and applied your patch as 2/2. It passes basic tests on my hardware. The driver loads and appears to do the gen3 bump as expected.
Comment 12 Sinan Kaya 2018-09-05 03:39:45 UTC
Created attachment 278297 [details]
[PATCH 1/2] PCI: Request reset type as part of function reset
Comment 13 Sinan Kaya 2018-09-05 03:40:19 UTC
Created attachment 278299 [details]
[PATCH 2/2] IB/hfi1: Prefer new __pci_reset_function_locked() API  with reset type
Comment 14 Sinan Kaya 2018-09-05 03:43:57 UTC
Created attachment 278301 [details]
[PATCH 1/2] PCI: Request reset type as part of function reset
Comment 15 Dennis 2018-09-05 13:18:40 UTC
I have tested the two new patches applied on top of 4.19-rc1 and they seem to be working as well.
Comment 16 Bjorn Helgaas 2022-01-19 22:20:04 UTC
Not sure where we are with this.

We did merge this: https://git.kernel.org/linus/bfc456060d0c ("IB/hfi1,PCI: Allow bus reset while probing"), which mentions this bugzilla.

I'm going to close this as obsolete, but please reopen or open a new report if we still need to do something.

Note You need to log in before you can comment on or make changes to this bug.