Bug 208825

Summary: lspci triggers NULL pointer dereference on AMD Renoir 4800H/5600M laptop
Product: Drivers Reporter: Jon Tourville (jontourville)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: high CC: alexdeucher
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.8.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci triggers NULL pointer dereference on AMD Renoir laptop

Description Jon Tourville 2020-08-06 00:33:47 UTC
Created attachment 290791 [details]
lspci triggers NULL pointer dereference on AMD Renoir laptop

Running Arch Linux with 5.8.0 kernel built from linux-mainline on a Dell G5 15 SE 5505 laptop with a AMD 4800H Renoir APU and 5600M discrete GPU.

On a fresh install of Arch, running lspci triggers an oops and NULL pointer dereference. The oops is not triggered if the kernel is booted with amdgpu.runpm=0, so it appears to be power management-related. The oops kicks off with the following errors (full dmesg and lspci -vvv output attached):

[   93.485414] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[   93.485452] [drm] PSP is resuming...
[   93.514696] [drm] reserve 0x900000 from 0x800f400000 for PSP TMR
[   93.684656] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[   93.704673] amdgpu: SMU is resuming...
[   95.835970] amdgpu: failed send message:     RunBtc (58) 	param: 0x00000000 response 0xffffffc2
[   95.835971] amdgpu: RunBtc failed!
[   95.836016] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[   95.836053] [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-62).
[   95.851331] snd_hda_intel 0000:03:00.1: refused to change power state from D3hot to D0
[   95.956286] snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
Comment 1 Jon Tourville 2020-09-08 20:16:33 UTC
Appears to be resolved as of 5.8.6 or 5.8.7
Comment 2 Alex Deucher 2020-09-14 05:52:14 UTC
Can you bisect and determine what patch fixed it?
Comment 3 Jon Tourville 2020-09-14 19:13:17 UTC
I am now unable to reproduce even on versions <5.8.6, which I know still had the problem. So I am thinking it may have been a firmware update or something else that resolved the issue for me.