Issue #1 -------- pci_reset_bus(dev) return pci_probe_reset_slot(pdev->slot) ? __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); pci_probe_rest can return an errno. This check needs to be modified to handle the case where we are not dealing with a slot or it's not resetable. Root cause: 811c5cb PCI: Unify try slot and bus reset API Issue #2 -------- If we fix this and end up calling into __pci_reset_bus(pdev->bus) the first thing that happens is pci_bus_reset(bus, 1) is called. Since we are passing in 1, the function early exists and returns 0. Back in __pci_reset_bus() the code falls down to pci_bus_save_and_disable(bus)however since this is possible to be called during probe we hit a deadlock when trying to lock the dev. To fix this we need to implement a locked reset API or something of that nature. Until then these patches should probably just be reverted. I will also post a band-aid patch to linux-pci and linux-rdma. Root cause: c6a44ba950d1 ("PCI: Rename pci_try_reset_bus() to pci_reset_bus()") 409888e0966e ("IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset") This has also been discussed on linux-pci: https://marc.info/?l=linux-pci&m=153539692917785&w=2
I would rather fix the issue rather than reverting the patch. By revert, we are leaving another hole open.
The hfi1 driver is not usable due to these regressions and it is important to get a fix in the -rc timeframe.
(In reply to Sinan Kaya from comment #1) > I would rather fix the issue rather than reverting the patch. By revert, we > are leaving another hole open. I'm not suggesting we revert and abandon the idea. Just until we can properly rework the API. If that is something doable in -RC time frame I'm OK with that too. My main concern is 4.19 is broken for our HW.
I see the code now. It might be OK. Let me circulate some ideas on the list. We might come up with a better fix if time permits. Otherwise, I don't object your code.
Sounds good. Do let me know if you have something you want me to test.
Created attachment 278221 [details] Extend pci_reset_bus() for probe time.
Can you please test the attached patch as a replacement for 2/2 along with your first parch (1/2)? I just did a compile test.
Dennis, Can you confirm that you have seen the attached patch in the Bugzilla and are testing on your system? I'll post to the list only after you confirm that it fixes the issue for you. Sinan
(In reply to Sinan Kaya from comment #8) > Can you confirm that you have seen the attached patch in the Bugzilla and > are testing on your system? Yep, building an RPM right now actually. I will throw it on and run some sanity checks and report back here.
Thanks, Note that you still need 1/2 of your patchset. Attached patch is a replacement for 2/2.
Yes I kept my 1/2 and applied your patch as 2/2. It passes basic tests on my hardware. The driver loads and appears to do the gen3 bump as expected.
Created attachment 278297 [details] [PATCH 1/2] PCI: Request reset type as part of function reset
Created attachment 278299 [details] [PATCH 2/2] IB/hfi1: Prefer new __pci_reset_function_locked() API with reset type
Created attachment 278301 [details] [PATCH 1/2] PCI: Request reset type as part of function reset
I have tested the two new patches applied on top of 4.19-rc1 and they seem to be working as well.
Not sure where we are with this. We did merge this: https://git.kernel.org/linus/bfc456060d0c ("IB/hfi1,PCI: Allow bus reset while probing"), which mentions this bugzilla. I'm going to close this as obsolete, but please reopen or open a new report if we still need to do something.