Bug 219259

Summary: Attaching Ryzen 7 7840HS laptop to Lenovo Thunderbolt 4 Dock causes high CPU load
Product: Drivers Reporter: Thilo-Alexander Ginkel (thilo)
Component: USBAssignee: Default virtual assignee for Drivers/USB (drivers_usb)
Status: NEW ---    
Severity: normal    
Priority: P3    
Hardware: AMD   
OS: Linux   
Kernel Version: 6.11.0-rc6 Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci -v
dmesg

Description Thilo-Alexander Ginkel 2024-09-10 19:52:38 UTC
Created attachment 306854 [details]
lspci -v

Hi there,

I recently got a Framework 16 laptop (based on AMD Ryzen 7 7840HS).
When the laptop is placed on my desk, I am docking it using a Lenovo Thunderbolt 4 Dock (via Thunderbolt), to which an 8K monitor (via USB-C) and various USB peripherals are connected.

Recently I noticed that my laptop's load is > 4 while docked although the system is mostly idle.

This seems to be caused by a few kworker/0:x+pm tasks spending plenty of time in "uninterruptible sleep" ("D"). A few random stack traces captured for them look like this:

[<0>] rpm_resume+0x25f/0x700
[<0>] rpm_resume+0x2d3/0x700
[<0>] rpm_resume+0x2d3/0x700
[<0>] pm_runtime_work+0x70/0xb0
[<0>] process_one_work+0x17b/0x330
[<0>] worker_thread+0x2e2/0x410
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1a/0x30

or

[<0>] pci_power_up+0x144/0x190
[<0>] pci_pm_runtime_resume+0x33/0xf0
[<0>] __rpm_callback+0x41/0x170
[<0>] rpm_callback+0x55/0x60
[<0>] rpm_resume+0x4d3/0x700
[<0>] rpm_suspend+0x5db/0x5f0
[<0>] pm_runtime_work+0x84/0xb0
[<0>] process_one_work+0x17b/0x330
[<0>] worker_thread+0x2e2/0x410
[<0>] kthread+0xcf/0x100
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1a/0x30

Fast-forward, I isolated the cause of this to be related to the power
management of the following AMD Thunderbolt PCI devices:

00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel (prog-if 00 [Normal decode])
        Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
        Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 4
        Bus: primary=00, secondary=03, subordinate=61, sec-latency=0
        I/O behind bridge: 6000-9fff [size=16K] [16-bit]
        Memory behind bridge: 78000000-8fffffff [size=384M] [32-bit]
        Prefetchable memory behind bridge: 7800000000-87ffffffff [size=64G] [32-bit]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Root Port (Slot+), IntMsgNum 0
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
        Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [270] Secondary PCI Express
        Capabilities: [400] Data Link Feature <?>
        Kernel driver in use: pcieport

00:04.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel (prog-if 00 [Normal decode])
        Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
        Flags: bus master, fast devsel, latency 0, IRQ 41, IOMMU group 5
        Bus: primary=00, secondary=62, subordinate=c0, sec-latency=0
        I/O behind bridge: a000-efff [size=20K] [16-bit]
        Memory behind bridge: 60000000-77ffffff [size=384M] [32-bit]
        Prefetchable memory behind bridge: 6800000000-77ffffffff [size=64G] [32-bit]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Root Port (Slot+), IntMsgNum 0
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453
        Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [270] Secondary PCI Express
        Capabilities: [400] Data Link Feature <?>
        Kernel driver in use: pcieport

c3:00.5 USB controller: Advanced Micro Devices, Inc. [AMD] Pink Sardine USB4/Thunderbolt NHI controller #1 (prog-if 40 [USB4 Host Interface])
        Subsystem: Framework Computer Inc. Device 0005
        Flags: bus master, fast devsel, latency 0, IRQ 86, IOMMU group 26
        Memory at 90800000 (64-bit, non-prefetchable) [size=512K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Endpoint, IntMsgNum 0
        Capabilities: [a0] MSI: Enable- Count=1/16 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [2a0] Access Control Services
        Kernel driver in use: thunderbolt
        Kernel modules: thunderbolt

c3:00.6 USB controller: Advanced Micro Devices, Inc. [AMD] Pink
Sardine USB4/Thunderbolt NHI controller #2 (prog-if 40 [USB4 Host
Interface])
        Subsystem: Framework Computer Inc. Device 0005
        Flags: bus master, fast devsel, latency 0, IRQ 107, IOMMU group 27
        Memory at 90880000 (64-bit, non-prefetchable) [size=512K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Endpoint, IntMsgNum 0
        Capabilities: [a0] MSI: Enable- Count=1/16 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [2a0] Access Control Services
        Kernel driver in use: thunderbolt
        Kernel modules: thunderbolt

As soon as I disable power management for those in powertop (internally doing echo 'on' > '/sys/bus/pci/devices/0000:c3:00.5/power/control') the load falls below 1 back to normal levels.

I originally spotted this behavior in 6.10.8-arch1-1 (using the LTS kernel based on 6.6.49 makes no difference), but have since also been able to reproduce it using the 6.11.0-rc6 mainline kernel.

I'd appreciate any hints on how to further isolate the root cause of this issue.

Thanks,
Thilo
Comment 1 Thilo-Alexander Ginkel 2024-09-10 19:54:18 UTC
Created attachment 306855 [details]
dmesg