Bug 219175 - Random reboots with 6.10.3+ (AMD Ryzen 7700X)
Summary: Random reboots with 6.10.3+ (AMD Ryzen 7700X)
Status: NEW
Alias: None
Product: Power Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-08-18 19:46 UTC by g0000ga
Modified: 2024-08-30 06:17 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description g0000ga 2024-08-18 19:46:16 UTC
Starting from 6.10.3 (where a lot of AMD features and later fixes were introduced) system started to randomly crash-reboot. Maximum time it stayed without crashing was about 12h for me. 6.10.2 runs without issues. I'm pretty sure it is something related to processor/internal graphics power management, since on newer kernels processor reports a couple *C less than on 6.10.2. I've tried recent lts kernels which has same patches. All versions 6.9.44-6.9.46 do crash too. 

Hardware 
Ryzen 7700X
ASRock b650 Steel legend wifi (without E)
Integrated video
64g of ECC Unbuffered ram. memtest doesn't report any errors, during run time there are no memory errors in dmesg. At all dmesg from previous run i can get from systemd doesn't have any errors. 

So the only thing i know 6.10.2 doesn't have this issue. With that kernel system runs for days. 
21:41:25 up 2 days,  5:09,  2 users,  load average: 0.01, 0.06, 0.23

P.S. I do run processor undervolted 20mv, but i've cleaned bios and tested it without undervolting with exactly the same result. It just reboots.
Comment 1 Artem S. Tashkinov 2024-08-18 20:36:55 UTC
You could try to bisect: https://docs.kernel.org/admin-guide/bug-bisect.html
Comment 2 spaced.wombat 2024-08-30 06:17:14 UTC
Can confirm this issue since 6.10.3

Hardware:
- AsRock B650M Pro RS WiFi and BIOS 2.10 (ComboAM5 1.1.0.3), 3.01 (AGESA .1.7.0) and 3.06 (AGESA 1.2.0.0a)
- Ryzen 7700
- use igpu on 4k freesync monitor

any BIOS changes like ECO, disable PBO had no effect,

discusseed/bisec in arch forum:
https://bbs.archlinux.org/viewtopic.php?id=298360&p=4


offending commit 
drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell    
    commit a03ebf116303e5d13ba9a2b65726b106cb1e96f6 

to solve https://gitlab.freedesktop.org/drm/amd/-/issues/3440


but fixed, no longer random reboots with kernel 6.11-rc5 because fix merged to main:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e3e4bf58bad1576ac732a1429f53e3d4bfb82b4b

i hope it goes also back to 6.10.x, not yet backported
https://github.com/gregkh/linux/commits/linux-6.10.y/

Note You need to log in before you can comment on or make changes to this bug.