Bug 209939

Summary: radeontop causes kernel panic
Product: Drivers Reporter: Janpieter Sollie (janpieter.sollie)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED OBSOLETE    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.9.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel .config file of 3 PCs

Description Janpieter Sollie 2020-10-29 13:22:23 UTC
Created attachment 293297 [details]
kernel .config file of 3 PCs

(view 3 .config files)
> PC1: problem pc, Ryzen 2400GE APU with Vega 11 and 5.9.1 kernel (Xorg
> running)
> PC2: working pc, ryzen V1605 APU with vega 8 and 5.8.14 kernel (Xorg running)
> PC3: working pc, Threadripper 1950 + Fiji GPU and 5.9.1 kernel (CLI only)

As the subject states: on PC1, the kernel can't handle the radeontop program, one way or another, these methods work / do not on PC1:
> - while hardware-accelerated content is running, panic
> - When in console mode, it's fine
> - when switching from console to X, it's fine for a few moments
> - when trying it early (X running sddm, radeontop via ssh), panic

with *panic*, I mean: the PC does not react anymore: the num lock trigger is no longer working, no input is accepted, the clock on the GUI does not change anymore, no SSH.

I tried everything:
> - pstore is empty
> - dd if=/dev/kmsg of=/dev/sdb1 & while [ 1]; do echo s > /proc/sysrq-trigger;
> sleep 10; done & radeontop (and pulling it out of this partition afterwards)

The mainboard does not have a RS232 port, so debugging this way is not possible;
also, I doubt I'd be able to use KDB if the screen stucks at GUI mode ...

If I can do anything to gather more info, let me know
Comment 1 Alex Deucher 2020-10-29 13:31:38 UTC
Does setting amdgpu.runpm=0 on the kernel command line in grub fix the issue?  How are you running radeontop?  If you are running it such that it tries to access MMIO space directly rather than going through the kernel, that could cause an issue.
Comment 2 Janpieter Sollie 2020-10-29 20:21:01 UTC
I am running radeontop the usual way - without arguments, default compile.
amdgpu.runpm=0 has no effect
Comment 3 Alex Deucher 2020-10-29 21:00:27 UTC
Does setting amdgpu.ppfeaturemask=0xffff3fff on the kernel command line in grub fix it?
Comment 4 Janpieter Sollie 2020-10-30 07:53:31 UTC
sorry, no, still the same ...
just to be sure, if I do this, this overrides settings in /etc/modprobe.d/amdgpu.conf, right?
Comment 5 Janpieter Sollie 2020-11-04 07:41:47 UTC
Also tried (thanks to hint from Gentoo) netconsole
when using netconsole, no output is logged: while the kernel buffer from before 'radeontop' is printed correctly, no other output is passed during "kernel panic", apparently the kernel does not live long enough to push it to netconsole, or it's a bug in radeontop causing hardware freeze