Bug 218268

Summary: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
Product: Drivers Reporter: Jonathan Woithe (jwoithe)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: NEW ---    
Severity: normal CC: imammedo
Priority: P3    
Hardware: Intel   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id: 40613da52b13fb21c5566f10b287e0ca8c12c4e9
Attachments: Kernel configuration for 5.15.139
Kernel messages from /var/log/messages
Kernel messages from /var/log/syslog
Output from "lspci -tv"
Tarball of "acpidump -b" output
Output from dmesg with extra debug flags
Output from dmesg with "pci=realloc" command line parameter

Description Jonathan Woithe 2023-12-15 11:12:28 UTC
Following an update from 5.15.72 to 5.15.139 on one of my machines, the console froze part way through the boot process.  The machine still managed to boot: it could be reached via the network and a keyboard-initiated shutdown would do the right thing.  The problem was that the screen remained static the whole time: the X login did not appear.  Only a reboot would restore the display's functionality.  The kernel logs reported

    radeon 0000:4b:00.0: Fatal error during GPU init
    radeon: probe of 0000:4b:00.0 failed with error -12

Elsewhere, the sequence

    kernel: ATOM BIOS: C09302
    kernel: radeon 0000:4b:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
    kernel: radeon 0000:4b:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
    kernel: [drm] Detected VRAM RAM=1024M, BAR=0M
    kernel: [drm] RAM width 64bits DDR
    kernel: [drm] radeon: finishing device.

appeared where I would normally see

    kernel: ATOM BIOS: C09302
    kernel: radeon 0000:4b:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
    kernel: radeon 0000:4b:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
    kernel: [drm] Detected VRAM RAM=1024M, BAR=256M
    kernel: [drm] RAM width 64bits DDR
    kernel: [drm] radeon: 1024M of VRAM memory ready
    kernel: [drm] radeon: 1024M of GTT memory ready.
    kernel: [drm] Loading CEDAR Microcode
    kernel: [drm] Internal thermal controller with fan control

There were also some new errors from the thunderbolt and xhci_hcd devices (see following attached kernel logs).

The machine runs Slackware64 15.0 which uses unmodified kernel.org kernels.

A git bisect resulted in the following report:

    d9ce077f8b1f731407e6b612b03bba464fd18d9b is the first bad commit
    commit d9ce077f8b1f731407e6b612b03bba464fd18d9b
    Author: Igor Mammedov <imammedo@redhat.com>
    Date:   Mon Apr 24 21:15:57 2023 +0200

        PCI: acpiphp: Reassign resources on bridge if necessary

        [ Upstream commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 ]

This has been reported on list: https://lore.kernel.org/linux-pci/20231215102921.587a9857@imammedo.users.ipa.redhat.com/T/

The following additional items requested on-list will be attached:

 * The kernel configuration used (config-5.15.139)

 * Kernel messages (syslog, messages)

 * Output from "lspci -tv"
Comment 1 Jonathan Woithe 2023-12-15 11:14:06 UTC
Created attachment 305602 [details]
Kernel configuration for 5.15.139

config-5.15.139: The configuration used for the 5.15.139 kernel which exhibited the problem.
Comment 2 Jonathan Woithe 2023-12-15 11:15:35 UTC
Created attachment 305603 [details]
Kernel messages from /var/log/messages

messages: Kernel messages from 5.15.139 which was affected by the regression.
Comment 3 Jonathan Woithe 2023-12-15 11:17:43 UTC
Created attachment 305604 [details]
Kernel messages from /var/log/syslog

syslog: kernel messages from 5.15.139 which was affected by the regression.
Comment 4 Jonathan Woithe 2023-12-15 11:18:35 UTC
Created attachment 305605 [details]
Output from "lspci -tv"

lcpsi-tv: OUtput from "lspci -tv" as requested on list.
Comment 5 Jonathan Woithe 2023-12-15 23:32:38 UTC
Created attachment 305614 [details]
Tarball of "acpidump -b" output

Files produced by "acpidump -b" as requested on list.
Comment 6 Jonathan Woithe 2023-12-21 05:55:36 UTC
Created attachment 305640 [details]
Output from dmesg with extra debug flags

As requested on list, this is the dmesg output obtained having booted the kernel with the following command line parameters:

dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c
+p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file
drivers/pci/setup-bus.c +p" ignore_loglevel
Comment 7 Jonathan Woithe 2023-12-21 06:09:18 UTC
Created attachment 305641 [details]
Output from dmesg with "pci=realloc" command line parameter

As requested on list, this is the dmesg output from 5.15.139 when booted with "pci=realloc". With this command line option, the GPU initialisation did not fail and the machine seemingly managed to boot normally.