Bug 219175

Summary: Random reboots with 6.10.3+ (AMD Ryzen 7700X)
Product: Power Management Reporter: g0000ga
Component: OtherAssignee: Rafael J. Wysocki (rjw)
Status: NEW ---    
Severity: normal CC: regressions, spaced.wombat
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description g0000ga 2024-08-18 19:46:16 UTC
Starting from 6.10.3 (where a lot of AMD features and later fixes were introduced) system started to randomly crash-reboot. Maximum time it stayed without crashing was about 12h for me. 6.10.2 runs without issues. I'm pretty sure it is something related to processor/internal graphics power management, since on newer kernels processor reports a couple *C less than on 6.10.2. I've tried recent lts kernels which has same patches. All versions 6.9.44-6.9.46 do crash too. 

Hardware 
Ryzen 7700X
ASRock b650 Steel legend wifi (without E)
Integrated video
64g of ECC Unbuffered ram. memtest doesn't report any errors, during run time there are no memory errors in dmesg. At all dmesg from previous run i can get from systemd doesn't have any errors. 

So the only thing i know 6.10.2 doesn't have this issue. With that kernel system runs for days. 
21:41:25 up 2 days,  5:09,  2 users,  load average: 0.01, 0.06, 0.23

P.S. I do run processor undervolted 20mv, but i've cleaned bios and tested it without undervolting with exactly the same result. It just reboots.
Comment 1 Artem S. Tashkinov 2024-08-18 20:36:55 UTC
You could try to bisect: https://docs.kernel.org/admin-guide/bug-bisect.html
Comment 2 spaced.wombat 2024-08-30 06:17:14 UTC
Can confirm this issue since 6.10.3

Hardware:
- AsRock B650M Pro RS WiFi and BIOS 2.10 (ComboAM5 1.1.0.3), 3.01 (AGESA .1.7.0) and 3.06 (AGESA 1.2.0.0a)
- Ryzen 7700
- use igpu on 4k freesync monitor

any BIOS changes like ECO, disable PBO had no effect,

discusseed/bisec in arch forum:
https://bbs.archlinux.org/viewtopic.php?id=298360&p=4


offending commit 
drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell    
    commit a03ebf116303e5d13ba9a2b65726b106cb1e96f6 

to solve https://gitlab.freedesktop.org/drm/amd/-/issues/3440


but fixed, no longer random reboots with kernel 6.11-rc5 because fix merged to main:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e3e4bf58bad1576ac732a1429f53e3d4bfb82b4b

i hope it goes also back to 6.10.x, not yet backported
https://github.com/gregkh/linux/commits/linux-6.10.y/