Bug 212259

Summary: Entire graphics stack locks up when running SteamVR and sometimes Sway; is sometimes unrecoverable
Product: Drivers Reporter: happysmash27
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: high CC: alexdeucher, happysmash27
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.11.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: Last 406 lines of dmesg at the time of error
Full dmesg 2021-03-12
Full dmesg 2021-03-27

Description happysmash27 2021-03-13 05:39:48 UTC
Created attachment 295831 [details]
Last 406 lines of dmesg at the time of error

This bug is driving me nuts! 

First time, it frequently happened when I ran Sway and LXDE at the same time, where my entire graphics stack would freeze and I would be unable to switch virtual TTYs or even kill X, even with kill -9 on everything. 

Now, I am experiencing this bug again, this time with SteamVR, and at a MUCH higher frequency. It seems to happen, fairly randomly, whenever I start SteamVR, but seems to be more likely when my system has been running for a long time, or when Waterfox has been running for a long time. 

A couple weeks ago I found a way to reset my GPU as a workaround to the bug (https://www.reddit.com/r/linuxquestions/comments/lpiwkg/how_to_reset_graphics_stack_when_x_stops/), but today, this did not work. My screen went black, but I was still unable to kill -9 X, even after trying the reset many times. In dmesg, there were messages about the reset still being in-progress, but I did not see anything seem to change. Eventually, I decided to give up on gracefully fixing this bug with no downtime to my website and poweroff, then hold the power button of my computer after the HDD activity light went off as even powering off doesn't seem to work properly with this bug. Only then, could I reboot and recover. 

I have attached the end of the most recent dmesg where this failed. I also have more dmesgs that may or may not be the same bug, but the attachment field only allows for one.
Comment 1 Alex Deucher 2021-03-15 18:17:52 UTC
What chip is this?  Can you attach your full dmesg output?  Does updating mesa help?
Comment 2 happysmash27 2021-03-20 10:02:30 UTC
Created attachment 295961 [details]
Full dmesg 2021-03-12

This is on the AMD Radeon RX 480. Updating Mesa does not help at all; it's already on the latest stable release, and the last time I was on the -git tree SteamVR failed to start entirely. I have attached the full dmesg to this comment; if you would like, I can also attach several more dmesg's from when similar bugs occurred.
Comment 3 happysmash27 2021-03-27 22:05:23 UTC
Created attachment 296093 [details]
Full dmesg 2021-03-27

It has happened again today. Attaching another log.