Bug 210415

Summary: [amdgpu] constant GPU hangs followed by kernel "BUG" and following kernel oops
Product: Drivers Reporter: David Rubio (david.alejandro.rubio)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.9.11 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
lspci -vvv output
lscpu output

Description David Rubio 2020-11-29 19:12:08 UTC
Created attachment 293863 [details]
dmesg output

I have an RX 480. Every few hours after kernel 5.4 (!) I've been getting random GPU hangs, and after kernel 5.9, they became not only more frequent, but afterwards the kernel sent messages like 

Nov 29 15:44:31 reimu kernel: [drm] Bailing on TDR for s_job:34a, as another already in progress
Nov 29 15:44:31 reimu kernel: BUG: kernel NULL pointer dereference, address: 0000000000000020
Nov 29 15:44:31 reimu kernel: #PF: supervisor write access in kernel mode
Nov 29 15:44:31 reimu kernel: #PF: error_code(0x0002) - not-present page

And an Oops right afterwards
Oops: 0002 [#2] PREEMPT SMP NOPTI

The full dmesg is attached. Kernel is compiled with Archlinux kernel preferences, but using a kernel directly from kernel.org and compiled with the modules I need give me the same error.

Attached error.
Comment 1 David Rubio 2020-11-29 19:13:17 UTC
This is really been happening for really long, but the now-appearing kernel oops and BUG prints made me realize it's necessary to post this.
The exact GPU model is MSI RX 480 GAMING X.
Comment 2 David Rubio 2020-11-29 19:13:43 UTC
Created attachment 293865 [details]
lspci -vvv output
Comment 3 David Rubio 2020-11-29 19:14:05 UTC
Created attachment 293867 [details]
lscpu output