Bug 204559
Summary: | amdgpu: kernel oops with constant gpu resets while using mpv | ||
---|---|---|---|
Product: | Drivers | Reporter: | Maxim Sheviakov (shoegaze) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alexdeucher, kode54, postix, reuben_p, shoegaze, thejoe |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.2.7 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
oops.txt
journalctl --dmesg output dmesg -w without runpm parameter dmesg -w with runpm=0 parameter |
Description
Maxim Sheviakov
2019-08-12 10:53:06 UTC
Please attach your full dmesg output from boot. Created attachment 284337 [details]
journalctl --dmesg output
As I can't run dmesg after the system's hung, here's journalctl --dmesg output from that particular boot.
You can fetch the output before the hang. Looks like your system has two GPUs. Can you try booting with amdgpu.runpm=0? Does that fix the issue? Created attachment 284341 [details]
dmesg -w without runpm parameter
Here's the whole dmesg from a fresh boot up until the hang, no kernel parameters were modified.
Created attachment 284345 [details]
dmesg -w with runpm=0 parameter
I have left my laptop with a video playing for about half an hour and it seems like no GPU-related warnings have been produced so far, only RTL8821CE spam. Seems like the root cause of the problem is somewhere in the runtime power management and/or GPU switching stuff as far as I can see.
By the way, how *exactly* does disabling runpm affect the system? Does it leave the discrete GPU always-on or vice verse? Or does it vary on each system? I have tried running The Crew via Wine + DXVK while having amdgpu.runpm=0 in my kernel params and it seems that discrete GPU was being used as the framerate was more than fine. (In reply to Maxim Sheviakov from comment #7) > By the way, how *exactly* does disabling runpm affect the system? Does it > leave the discrete GPU always-on or vice verse? Or does it vary on each > system? It leaves the dGPU powered up all the time rather than dynmically powering it on/off as needed. > I have tried running The Crew via Wine + DXVK while having amdgpu.runpm=0 in > my kernel params and it seems that discrete GPU was being used as the > framerate was more than fine. You can use xrandr to pick which GPU you want to use for rendering. Thanks for your explanation. By the way, disabling runpm also seems to fix the other issue with disabling the display after activating the lockscreen as a powersaving measure. Is there anything else I can do to help with this one? The whole thing seems to be an issue somewhere in the dynamic switching mechanism, which works - but is not really stable with all these hangs at certain conditions. This looks like an issue I'm having intermittently with the GPU failing to resume from system sleep mode. Do I need to report a separate issue for this? Should I also bother to test the runpm=0 workaround? Oops, I neglected to mention: The system is non-responsive to input devices, as the USB input appears to all be completely powered off after the GPU crashes, but the network interface is still working, as is sound output, and I'm able to log into the machine via SSH. It does, however, lock up if I attempt to soft reboot it. The full dmesg from the session that eventually crashed is still available in the journal, up to where it was flooding sdma0 timeouts and failures. Have seen the same kernel oops on a dell XPS 15 2-in-1 9575 with vega m hybrid graphics. As far as I could tell it was not triggered by anything specific (eg mpv playbck) though. Running run runpm=0 now, and haven't seen it again yet, but only have seen it once or twice without runpm=0. I'm on kernel 5.4.7 now and seems like this particular issue is fixed - I tried playing some movies with runpm enabled and things seemed to be okay. Though it looks like dGPU performance with runpm is considerably worse than without runpm, but I guess that's another issue :) Can anyone confirm if everything's fine now? i have not seen the oops on a 5.3.x kernel (ubuntu eoan), even without tweaking the runpm setting (again, only saw it a few times on an earlier kernel). |