Bug 218268 - [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
Summary: [Regression] Commit 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-15 11:12 UTC by Jonathan Woithe
Modified: 2023-12-21 06:09 UTC (History)
1 user (show)

See Also:
Kernel Version:
Subsystem:
Regression: Yes
Bisected commit-id: 40613da52b13fb21c5566f10b287e0ca8c12c4e9


Attachments
Kernel configuration for 5.15.139 (234.02 KB, text/plain)
2023-12-15 11:14 UTC, Jonathan Woithe
Details
Kernel messages from /var/log/messages (85.22 KB, text/plain)
2023-12-15 11:15 UTC, Jonathan Woithe
Details
Kernel messages from /var/log/syslog (10.73 KB, text/plain)
2023-12-15 11:17 UTC, Jonathan Woithe
Details
Output from "lspci -tv" (7.55 KB, text/plain)
2023-12-15 11:18 UTC, Jonathan Woithe
Details
Tarball of "acpidump -b" output (29.00 KB, application/gzip)
2023-12-15 23:32 UTC, Jonathan Woithe
Details
Output from dmesg with extra debug flags (80.39 KB, text/plain)
2023-12-21 05:55 UTC, Jonathan Woithe
Details
Output from dmesg with "pci=realloc" command line parameter (87.85 KB, text/plain)
2023-12-21 06:09 UTC, Jonathan Woithe
Details

Description Jonathan Woithe 2023-12-15 11:12:28 UTC
Following an update from 5.15.72 to 5.15.139 on one of my machines, the console froze part way through the boot process.  The machine still managed to boot: it could be reached via the network and a keyboard-initiated shutdown would do the right thing.  The problem was that the screen remained static the whole time: the X login did not appear.  Only a reboot would restore the display's functionality.  The kernel logs reported

    radeon 0000:4b:00.0: Fatal error during GPU init
    radeon: probe of 0000:4b:00.0 failed with error -12

Elsewhere, the sequence

    kernel: ATOM BIOS: C09302
    kernel: radeon 0000:4b:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
    kernel: radeon 0000:4b:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
    kernel: [drm] Detected VRAM RAM=1024M, BAR=0M
    kernel: [drm] RAM width 64bits DDR
    kernel: [drm] radeon: finishing device.

appeared where I would normally see

    kernel: ATOM BIOS: C09302
    kernel: radeon 0000:4b:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
    kernel: radeon 0000:4b:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF
    kernel: [drm] Detected VRAM RAM=1024M, BAR=256M
    kernel: [drm] RAM width 64bits DDR
    kernel: [drm] radeon: 1024M of VRAM memory ready
    kernel: [drm] radeon: 1024M of GTT memory ready.
    kernel: [drm] Loading CEDAR Microcode
    kernel: [drm] Internal thermal controller with fan control

There were also some new errors from the thunderbolt and xhci_hcd devices (see following attached kernel logs).

The machine runs Slackware64 15.0 which uses unmodified kernel.org kernels.

A git bisect resulted in the following report:

    d9ce077f8b1f731407e6b612b03bba464fd18d9b is the first bad commit
    commit d9ce077f8b1f731407e6b612b03bba464fd18d9b
    Author: Igor Mammedov <imammedo@redhat.com>
    Date:   Mon Apr 24 21:15:57 2023 +0200

        PCI: acpiphp: Reassign resources on bridge if necessary

        [ Upstream commit 40613da52b13fb21c5566f10b287e0ca8c12c4e9 ]

This has been reported on list: https://lore.kernel.org/linux-pci/20231215102921.587a9857@imammedo.users.ipa.redhat.com/T/

The following additional items requested on-list will be attached:

 * The kernel configuration used (config-5.15.139)

 * Kernel messages (syslog, messages)

 * Output from "lspci -tv"
Comment 1 Jonathan Woithe 2023-12-15 11:14:06 UTC
Created attachment 305602 [details]
Kernel configuration for 5.15.139

config-5.15.139: The configuration used for the 5.15.139 kernel which exhibited the problem.
Comment 2 Jonathan Woithe 2023-12-15 11:15:35 UTC
Created attachment 305603 [details]
Kernel messages from /var/log/messages

messages: Kernel messages from 5.15.139 which was affected by the regression.
Comment 3 Jonathan Woithe 2023-12-15 11:17:43 UTC
Created attachment 305604 [details]
Kernel messages from /var/log/syslog

syslog: kernel messages from 5.15.139 which was affected by the regression.
Comment 4 Jonathan Woithe 2023-12-15 11:18:35 UTC
Created attachment 305605 [details]
Output from "lspci -tv"

lcpsi-tv: OUtput from "lspci -tv" as requested on list.
Comment 5 Jonathan Woithe 2023-12-15 23:32:38 UTC
Created attachment 305614 [details]
Tarball of "acpidump -b" output

Files produced by "acpidump -b" as requested on list.
Comment 6 Jonathan Woithe 2023-12-21 05:55:36 UTC
Created attachment 305640 [details]
Output from dmesg with extra debug flags

As requested on list, this is the dmesg output obtained having booted the kernel with the following command line parameters:

dyndbg="file drivers/pci/access.c +p; file drivers/pci/hotplug/acpiphp_glue.c
+p; file drivers/pci/bus.c +p; file drivers/pci/pci.c +p; file
drivers/pci/setup-bus.c +p" ignore_loglevel
Comment 7 Jonathan Woithe 2023-12-21 06:09:18 UTC
Created attachment 305641 [details]
Output from dmesg with "pci=realloc" command line parameter

As requested on list, this is the dmesg output from 5.15.139 when booted with "pci=realloc". With this command line option, the GPU initialisation did not fail and the machine seemingly managed to boot normally.

Note You need to log in before you can comment on or make changes to this bug.