Bug 219906 - Kernel Oops in pcie_update_link_speed when hotplugging TB4 dock on x870e / kernel 6.14.0-rc7
Summary: Kernel Oops in pcie_update_link_speed when hotplugging TB4 dock on x870e / ke...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: AMD Linux
: P3 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-21 18:30 UTC by Wouter Bijlsma
Modified: 2025-03-21 22:44 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output hotplugging dock (144.00 KB, text/plain)
2025-03-21 18:30 UTC, Wouter Bijlsma
Details

Description Wouter Bijlsma 2025-03-21 18:30:40 UTC
Created attachment 307878 [details]
dmesg output hotplugging dock

When connecting a Lenovo TB4 dock (40B0) using the USB4 port of an ASRock x870E Nova motherboard, Linux kernel 6.14.0-rc7 will Oops with a NULL pointer dereference. 

This is 100% reproducible on my system and happens immediately when plugging the USB4 cable into the docking station. Interestingly, USB devices connected to the docking station still work, but displays connected to it will not. Also, after the kernel Oops the system is in some corrupted state, as things like mkinitcpio will hang at the autodetect hook when enumerating udev devices, and when shutting down the system will get stuck indefinitely on hanging udev_worker processes.

When booting the machine while the dock is already attached, the kernel boots without any apparent problems, but displays connected to the dock will still not show any image. 

This dock has been working without any issue in combination with a different Linux laptop with USB4 and an AMD780M, a MacBook Pro, and a Windows 11 laptop.

Attached is the dmesg output after hot-plugging the dock.

-Wouter
Comment 1 Lukas Wunner 2025-03-21 19:39:15 UTC
This is a regression caused by

    commit 665745f274870c921020f610e2c99a3b1613519b
    Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Date:   Fri Oct 18 17:47:52 2024 +0300

    PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller

which went into v6.13-rc1.  It can be overcome by duplicating

    commit 62e4492c3063048a163d238cd1734273f2fc757d
    Author: Andreas Noever <andreas.noever@gmail.com>
    Date:   Mon Jun 9 23:03:32 2014 +0200

    PCI: Prevent NULL dereference during pciehp probe

in the bandwidth controller driver.

I can submit a patch tomorrow unless someone beats me to it.
Comment 2 Wouter Bijlsma 2025-03-21 22:44:36 UTC
Great, I can test the patch once there is one!

In the kernel log, before the Oops, I see various other messages related to plugging in the dock, some of which suggest something may be wrong before (and maybe related to) this crash? 

For example I see messages like these:

[   29.843931] pci 0000:16:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[   29.844218] pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them
[   29.845440] pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring

Could these be related to the link speed issue, or are they unrelated/harmless/expected?

-Wouter

Note You need to log in before you can comment on or make changes to this bug.