Bug 217141 - [amdgpu] ring gfx_0.0.0 timeout steam deck AMD APU
Summary: [amdgpu] ring gfx_0.0.0 timeout steam deck AMD APU
Status: RESOLVED ANSWERED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: AMD Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-05 15:32 UTC by Serg Podtynnyi
Modified: 2023-03-07 06:29 UTC (History)
1 user (show)

See Also:
Kernel Version: 6.1.12
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Serg Podtynnyi 2023-03-05 15:32:25 UTC
[  257.182206] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=26043, emitted[64/36172]
[  257.182668] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process NMS.exe pid 2571 thread NMS.exe
pid 2571
[  257.183084] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[  257.183094] ------------[ cut here ]------------
[  257.183095] Evicting all processes
[  257.183151] WARNING: CPU: 6 PID: 745 at drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1935 kfd_suspend_all_proc
esses+0x100/0x110 [amdgpu]
[  257.183562] Modules linked in: uinput snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm algif_aead cbc des_generi
c libdes ecb md4 cmac algif_hash algif_skcipher af_alg bnep ramoops reed_solomon snd_acp5x_pcm_dma snd_soc_acp5x_mach s
nd_acp5x_i2s snd_sof_amd_rembrandt rtw88_8822ce snd_sof_amd_renoir rtw88_8822c snd_sof_amd_acp rtw88_pci intel_rapl_msr
 snd_sof_pci intel_rapl_common rtw88_core snd_sof edac_mce_amd snd_sof_utils btusb kvm_amd btrtl snd_pci_ps mac80211 sn
d_hda_codec_hdmi btbcm snd_soc_cs35l41_spi btintel kvm snd_soc_cs35l41 snd_rpl_pci_acp6x snd_hda_intel btmtk snd_soc_wm
_adsp snd_intel_dspcfg cs_dsp snd_acp_pci libarc4 leds_steamdeck extcon_steamdeck snd_pci_acp6x snd_intel_sdw_acpi snd_
soc_nau8821 snd_soc_cs35l41_lib steamdeck_hwmon irqbypass bluetooth snd_hda_codec snd_pci_acp5x snd_soc_core rapl snd_r
n_pci_acp3x cfg80211 pcspkr snd_hda_core snd_compress i2c_piix4 mousedev cdc_acm ac97_bus snd_acp_config joydev ecdh_ge
neric snd_pcm_dmaengine snd_hwdep snd_soc_acpi
[  257.183627]  snd_pci_acp3x snd_pcm dwc3_pci rfkill ina2xx_adc kfifo_buf snd_timer opt3001 ltrf216a steamdeck spi_amd
 ina2xx industrialio snd acpi_cpufreq mac_hid soundcore fuse ip_tables x_tables overlay ext4 crc16 mbcache jbd2 hid_ste
am usbhid amdgpu vfat fat gpu_sched drm_buddy serio_raw sdhci_pci nvme_tcp drm_display_helper atkbd cqhci libps2 nvme_f
abrics crct10dif_pclmul vivaldi_fmap crc32_pclmul polyval_clmulni sdhci polyval_generic cec i8042 gf128mul nvme hid_mul
titouch drm_ttm_helper ghash_clmulni_intel xhci_pci sha512_ssse3 nvme_core aesni_intel crypto_simd sp5100_tco cryptd wd
at_wdt ttm xhci_pci_renesas ccp mmc_core nvme_common serio video i2c_hid_acpi wmi 8250_dw i2c_hid btrfs blake2b_generic
 xor raid6_pq libcrc32c crc32c_generic crc32c_intel dm_mirror dm_region_hash dm_log dm_mod pkcs8_key_parser crypto_user
[  257.183700] CPU: 6 PID: 745 Comm: kworker/u32:7 Not tainted 6.1.12-valve2-1-neptune-61 #1 4091faa51bd1be3bbac5fd4c3c
e3432202f24d92
[  257.183704] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
[  257.183708] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[  257.183718] RIP: 0010:kfd_suspend_all_processes+0x100/0x110 [amdgpu]
[  257.184119] Code: c7 c7 00 b3 3f c1 41 5c 41 5d e9 cb 4f 5f f1 be 03 00 00 00 e8 d1 89 a3 f1 e9 59 ff ff ff 48 c7 c7
 14 a2 24 c1 e8 12 d6 06 f2 <0f> 0b e9 24 ff ff ff 0f 0b eb c5 0f 1f 44 00 00 66 0f 1f 00 0f 1f
[  257.184122] RSP: 0018:ffffad1140f67cf8 EFLAGS: 00010286
[  257.184125] RAX: 0000000000000000 RBX: ffff993b46b68400 RCX: 0000000000000027
[  257.184127] RDX: ffff993e6eda0728 RSI: 0000000000000001 RDI: ffff993e6eda0720
[  257.184128] RBP: ffff993b44620000 R08: 0000000000000000 R09: ffffad1140f67b78
[  257.184130] R10: 0000000000000003 R11: ffff993e7ef7ffe8 R12: ffffad1140f67dd0
[  257.184131] R13: 0000000000000000 R14: ffff993b89dbe400 R15: 0000000000000000
[  257.184133] FS:  0000000000000000(0000) GS:ffff993e6ed80000(0000) knlGS:0000000000000000
[  257.184135] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  257.184137] CR2: 000055d62521f000 CR3: 0000000108b04000 CR4: 0000000000350ee0
[  257.184139] Call Trace:
[  257.184143]  <TASK>
[  257.184147]  kgd2kfd_suspend.part.0+0x3d/0x40 [amdgpu ad613437896db6c29581f2be9152cc5a6dd35ad7]
[  257.184571]  kgd2kfd_pre_reset+0x47/0x60 [amdgpu ad613437896db6c29581f2be9152cc5a6dd35ad7]
[  257.184965]  amdgpu_device_gpu_recover.cold+0x119/0xb40 [amdgpu ad613437896db6c29581f2be9152cc5a6dd35ad7]
[  257.185430]  amdgpu_job_timedout+0x1dc/0x220 [amdgpu ad613437896db6c29581f2be9152cc5a6dd35ad7]
[  257.185866]  ? try_to_wake_up+0xd9/0x560
[  257.185874]  drm_sched_job_timedout+0x7a/0x110 [gpu_sched 32db77b2b4e1fdeaf45e32d64ce206e5c0ca90ae]
[  257.185885]  process_one_work+0x1c7/0x380
[  257.185892]  worker_thread+0x51/0x390
[  257.185897]  ? rescuer_thread+0x3b0/0x3b0
[  257.185901]  kthread+0xde/0x110
[  257.185905]  ? kthread_complete_and_exit+0x20/0x20
[  257.185909]  ret_from_fork+0x22/0x30
[  257.185917]  </TASK>
[  257.185918] ---[ end trace 0000000000000000 ]---
[  257.284610] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[  257.294783] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux-neptune-61 console=tty1 rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 rd.systemd.gpt_auto=no amdgpu.noretry=0 amdgpu.ppfeaturemask=0xffffbfff amdgpu.lockup_timeout=20000 amdgpu.job_hang_limit=2 drm.debug=0x1ff amdgpu.debug_evictions=true1 tsc=directsync module_blacklist=tpm log_buf_len=4M amd_iommu=off amdgpu.gttsize=8128 spi_amd.speed_dev=1 audit=0 fbcon=rotate:1 loglevel=3 splash quiet plymouth.ignore-serial-consoles fbcon=vc:4-6 steamos.efi=PARTUUID=8bdf3e52-bf2f-7c45-9f00-45e568aa5af0


Linux Thorax 6.1.12-valve2-1-neptune-61 #1 SMP PREEMPT_DYNAMIC Mon, 27 Feb 2023 21:06:42 +0000 x86_64 GNU/Linux


Devices:
========
GPU0:
        apiVersion         = 4206830 (1.3.238)
        driverVersion      = 96469091 (0x5c00063)
        vendorID           = 0x1002
        deviceID           = 0x163f
        deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
        deviceName         = AMD Custom GPU 0405 (RADV VANGOGH)
        driverID           = DRIVER_ID_MESA_RADV
        driverName         = radv
        driverInfo         = Mesa 23.1.0-devel (git-16283f7b97)
        conformanceVersion = 1.3.0.0
        deviceUUID         = 00000000-0400-0000-0000-000000000000
        driverUUID         = 414d442d-4d45-5341-2d44-525600000000
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-06 08:55:18 UTC
Sorry for causing you trouble (note: I'm just the messenger here), but most of the core graphic driver developers (just like many other kernel developers) don't really look in this bug tracker; you want to report the issue to the following place instead, as that's where the developers of the driver in question expect issues to be reported: https://gitlab.freedesktop.org/drm/amd/-/issues

If you do so, it would be great if you could afterwards share the link to your report here.

Note You need to log in before you can comment on or make changes to this bug.