Bug 201127

Summary: [REGRESSION][BISECTED] Boot stall related to drivers/pci/hotplug/acpiphp_glue.c
Product: Drivers Reporter: Peter Anemone (peter.anemone)
Component: PCIAssignee: Greg Kroah-Hartman (greg)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: bjorn, lenb, mika.westerberg, rjw
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.18 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: more information about my system
First fix attempt
acpidump
Add more debugging to ACPI hotplug and skip resource distribution
dmesg
new dmesg
Third fix attempt
dmesg after third patch

Description Peter Anemone 2018-09-14 09:28:27 UTC
Created attachment 278533 [details]
more information about my system

Linux has been unbootable for me from 4.18 upwards. Even the fallback image does not boot. This has not been fixed yet in 4.19-rc3

I used git bisect to track down the first bad commit:
84c8b58ed3addf17d3beb2e5037b001ffa65c5ef

The commit is about:
"ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84c8b58ed3addf17d3beb2e5037b001ffa65c5ef

Linux 4.17 booted fine.

Reverting commit 84c8b58ed3addf17d3beb2e5037b001ffa65c5ef makes 4.18 and 4.19-rc3 bootable again.

Information about my system:
Model: HP 6730b laptop
CPU: Intel Core 2 Duo P8600
Boot manager: systemd-boot (UEFI)
More information of my system as attachments
Comment 1 Greg Kroah-Hartman 2018-09-14 10:33:22 UTC
On Fri, Sep 14, 2018 at 09:28:27AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=201127
> 
>             Bug ID: 201127
>            Summary: [REGRESSION][BISECTED] Boot stall related to
>                     drivers/pci/hotplug/acpiphp_glue.c
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 4.18

All USB bugs should be sent to the linux-usb@vger.kernel.org mailing
list, and not entered into bugzilla.  Please bring this issue up there,
if it is still a problem in the latest kernel release.
Comment 2 Peter Anemone 2018-09-14 10:51:10 UTC
Okay, thanks. I had no idea of this being an USB bug.

The MAINTAINERS file has this section in it: 

ACPI
M:	"Rafael J. Wysocki" <rjw@rjwysocki.net>
M:	Len Brown <lenb@kernel.org>
L:	linux-acpi@vger.kernel.org
W:	https://01.org/linux-acpi
Q:	https://patchwork.kernel.org/project/linux-acpi/list/
T:	git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
B:	https://bugzilla.kernel.org
S:	Supported
F:	drivers/acpi/
F:	drivers/pnp/pnpacpi/
F:	include/linux/acpi.h
F:	include/linux/fwnode.h
F:	include/acpi/
F:	Documentation/acpi/
F:	Documentation/ABI/testing/sysfs-bus-acpi
F:	Documentation/ABI/testing/configfs-acpi
F:	drivers/pci/*acpi*
F:	drivers/pci/*/*acpi*
F:	drivers/pci/*/*/*acpi*
F:	tools/power/acpi/

That's why I entered this to Bugzilla. The product might need to be changed to ACPI though.
Comment 3 Greg Kroah-Hartman 2018-09-14 11:04:28 UTC
On Fri, Sep 14, 2018 at 10:51:10AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=201127
> 
> --- Comment #2 from Peter Anemone (peter.anemone@gmail.com) ---
> Okay, thanks. I had no idea of this being an USB bug.

Sorry, it isn't, my scripts just triggered the response, sorry.

It's a PCI issue, please reassign to the PCI component and email the
linux-pci@vger.kernel.org list.
Comment 4 Mika Westerberg 2018-09-17 07:45:43 UTC
Created attachment 278583 [details]
First fix attempt

Could you try the attached patch and see if it helps? If not can you next try to comment out the following lines (with the patch applied) in drivers/pci/hotplug/acpiphp_glue.c::enable_slot()?

//  if (bus->self->subordinate)
//          pci_assign_unassigned_bridge_resources(bus->self);
Comment 5 Mika Westerberg 2018-09-17 12:18:43 UTC
In addition, can you attach output of 'sudo lspci -vv' and acpidump from the working kernel?
Comment 6 Mika Westerberg 2018-09-17 13:16:57 UTC
Sorry did not notice that you already have lspci output in the tar file you attached. So only acpidump then :)
Comment 7 Peter Anemone 2018-09-18 03:09:12 UTC
Created attachment 278619 [details]
acpidump

Commenting out the requested lines worked (and the patch didn't work without commenting them):

//  if (bus->self->subordinate)
//          pci_assign_unassigned_bridge_resources(bus->self);

Here's an acpidump from the patched kernel 4.19-rc4
Comment 8 Mika Westerberg 2018-09-18 07:52:01 UTC
Created attachment 278621 [details]
Add more debugging to ACPI hotplug and skip resource distribution

Could you try this patch instead? Please attach full dmesg as well (assuming your system does not crash).
Comment 9 Peter Anemone 2018-09-18 11:28:19 UTC
Created attachment 278625 [details]
dmesg

Here's dmesg from the previous patched kernel.

The new patch did not work.
Comment 10 Mika Westerberg 2018-09-18 11:40:26 UTC
Hi, can you do so that you keep the last patch applied but comment out the line:

  pci_assign_unassigned_bridge_resources(bus->self);

in drivers/pci/hotplug/acpiphp_glue.c::enable_slot() just like you did previosly and then attach full dmesg? It should show when the ACPI hotplug event happens.
Comment 11 Peter Anemone 2018-09-18 12:33:31 UTC
Created attachment 278627 [details]
new dmesg

(In reply to Mika Westerberg from comment #10)
> Hi, can you do so that you keep the last patch applied but comment out the
> line:
> 
>   pci_assign_unassigned_bridge_resources(bus->self);
> 
> in drivers/pci/hotplug/acpiphp_glue.c::enable_slot() just like you did
> previosly and then attach full dmesg? It should show when the ACPI hotplug
> event happens.

Did that, here's the output of dmesg
Comment 12 Mika Westerberg 2018-09-18 13:32:52 UTC
Thanks. It looks like the ACPI notify (bus check) comes to the NIC:

[    0.942239] ACPI: \_SB_.PCI0.RP06.NIC_: Bus check in hotplug_event()

and I think enable_slot() in acpiphp_glue.c should not treat it as "ACPI hotplug to native PCIe hotplug port" special case here.
Comment 13 Mika Westerberg 2018-09-18 13:36:23 UTC
Created attachment 278629 [details]
Third fix attempt

Can you try the attached patch and see if it works? Please attach dmesg again.
Comment 14 Peter Anemone 2018-09-18 14:32:13 UTC
Created attachment 278631 [details]
dmesg after third patch

Here's new dmesg for the latest patch. It worked.
Comment 15 Mika Westerberg 2018-09-19 11:28:09 UTC
Thanks for testing. I'll submit a proper patch upstream shortly.
Comment 16 Peter Anemone 2018-10-03 08:52:10 UTC
Marking this resolved as the patch is included in 4.19-rc6 and in the stable-queue.

Thank you Mika :)
(and all the others involved)