Bug 202493

Summary: Soft lockup ryzen
Product: Platform Specific/Hardware Reporter: Jon (jon780)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED INVALID    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.20.5 Subsystem:
Regression: No Bisected commit-id:
Attachments: journalctl output during lockup

Description Jon 2019-02-02 13:13:59 UTC
Created attachment 280929 [details]
journalctl output during lockup

Fedora 29
Ryzen 1700X
AMD RX 560

Kernel is currently 4.20.5 compiled from kernel.org using the fedora .config.  Have the same issues with every kernel I've tried in the Fedora 29 repos.

Machine freezes, usually multiple times per day.  Mouse cursor moves, but won't respond to clicks.  No other input works.  Num lock light on keyboard is frozen (cant toggle it on/off from num lock key).  Cannot switch to virtual terminals, cannot sysreq+reisub, nothing.  No response via icmp, no other services respond.    Seems to happen most frequently waking from sleep, but that might just be my impression.

What I have tried:
Set powersupply to "Typical Current Idle" in bios
Disabled c-state control in bios
Compiled kernel with RCU_NOCB_CPU (it was already config in fedora kernels)
Added rcu_nocbs=0-15 to kernel boot
Added idle=nomwait to kernel boot
Added processor.max_cstate=5

(I realize some of this is redundant, I was grasping at straws)

For reference, heres is my kernel boot line:BOOT_IMAGE=/vmlinuz-4.20.5-jmd root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait processor.max_cstate=5 rcu_nocbs=0-15

Attached is journalctl output covering the time period, and a little further back.
Comment 1 Alex Deucher 2019-02-02 17:29:07 UTC
Do you still get the issues with the amdgpu driver blacklisted?
Comment 2 Jon 2019-02-02 19:43:29 UTC
I've blacklisted amdgpu, and confirmed via lsmod that it is not loaded.  Unfortunately now I'm not sure what video driver is in use and I cannot correctly configured the displays (one monitor has no input and the other two monitors mirror each other).  xrandr says only one device is connected.  I'm actually not sure what driver is in use or how to configure my displays now.  So I cannot test like this for an extended period of time.  

I'm going to try to disable the new AMDGPU direct code and see if that helps.  I feel like my issues started happening off/on around 4.17 when it was enabled by default.
Comment 3 Jon 2019-02-03 18:59:11 UTC
Disabling direct code appears to have resolved the problem.  Uptime is 23 hours and it has woken from sleep multiple times without issue.  Also, it is even correctly waking up the monitor connected via displayport.  Previously when waking up the computer I had to --off and then --auto the display with xrandr to get it to wake up from sleep.

I will close this and file a bug against the amdgpu driver.