Bug 217692 - amdgpu crashes on resume from standby - amdgpu-reset-dev drm_sched_job_timedout
Summary: amdgpu crashes on resume from standby - amdgpu-reset-dev drm_sched_job_timedout
Status: RESOLVED ANSWERED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: AMD Linux
: P3 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-21 12:23 UTC by Michal Turecki
Modified: 2023-07-21 16:46 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Single log with dmesg joined with suspend + resume log until crash (223.12 KB, text/plain)
2023-07-21 12:23 UTC, Michal Turecki
Details

Description Michal Turecki 2023-07-21 12:23:55 UTC
Created attachment 304679 [details]
Single log with dmesg joined with suspend + resume log until crash

Kernel version: 6.4.3-2-MANJARO #1 SMP PREEMPT_DYNAMIC Sun Jul 16 16:55:12 UTC 2023 x86_64 GNU/Linux

Suspend took longer than usual and after resume by pressing a key on keyboard, screen did not turn on.

A hopefully relevant part of trace (attached complete):

Jul 20 21:44:50 too-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1605372, emitted seq=1605374
Jul 20 21:44:50 too-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jul 20 21:44:50 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: GPU reset begin!
Jul 20 21:44:51 too-pc systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Jul 20 21:44:54 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000003A SMN_C2PMSG_82:0x00000000
Jul 20 21:44:54 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Jul 20 21:44:58 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000003A SMN_C2PMSG_82:0x00000000
Jul 20 21:44:58 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Jul 20 21:45:02 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: failed to suspend display audio
Jul 20 21:45:02 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: Failed to disallow df cstate
Jul 20 21:45:02 too-pc kernel: ------------[ cut here ]------------
Jul 20 21:45:02 too-pc kernel: WARNING: CPU: 24 PID: 574073 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu]
Jul 20 21:45:02 too-pc kernel: Modules linked in: tls vhost_net vhost vhost_iotlb tap tun dm_crypt cbc encrypted_keys trusted asn1_encoder tee xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c x
t_addrtype iptable_filter br_netfilter snd_seq_dummy snd_hrtimer snd_seq bridge stp llc rfkill qrtr overlay uvcvideo videobuf2_vmalloc uvc videobuf2_memops snd_usb_audio videobuf2_v4l2 snd_usbmidi_lib videodev uas snd_rawmidi videobuf2_common snd_seq_device mousedev mc joyd
ev vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek ccp snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel kvm snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hda_core
polyval_generic gf128mul snd_hwdep r8169 ghash_clmulni_intel snd_pcm sha512_ssse3 aesni_intel snd_timer crypto_simd wmi_bmof sp5100_tco realtek cryptd snd rapl mdio_devres pcspkr soundcore k10temp i2c_piix4 libphy ryzen_smu(OE)
Jul 20 21:45:02 too-pc kernel:  mac_hid uinput i2c_dev dm_multipath fuse crypto_user dm_mod loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usb_storage usbhid crc32c_intel nvme xhci_pci nvme_core xhci_pci_renesas nvme_common vfio_pci vfio_pci_core
 irqbypass vfio_iommu_type1 vfio iommufd amdgpu i2c_algo_bit drm_ttm_helper ttm video wmi drm_suballoc_helper drm_buddy gpu_sched drm_display_helper cec
Jul 20 21:45:02 too-pc kernel: CPU: 24 PID: 574073 Comm: kworker/u64:25 Tainted: G        W  OE      6.4.3-2-MANJARO #1 00e85e278f33e769813ea7ae6ba8e625d37e2fac
Jul 20 21:45:02 too-pc kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING PLUS (MS-7C37), BIOS A.I0 08/10/2022
Jul 20 21:45:02 too-pc kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Jul 20 21:45:02 too-pc kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
Jul 20 21:45:02 too-pc kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 8f c5 ae f0 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 7e c5 ae f0 b8 ea ff ff ff e9 74 c5 ae f0
Jul 20 21:45:02 too-pc kernel: RSP: 0018:ffffa87787687c98 EFLAGS: 00010246
Jul 20 21:45:02 too-pc kernel: RAX: ffff9c5f5233c980 RBX: ffff9c5f546c0000 RCX: 0000000000000000
Jul 20 21:45:02 too-pc kernel: RDX: 0000000000000000 RSI: ffff9c5f546d9250 RDI: ffff9c5f546c0000
Jul 20 21:45:02 too-pc kernel: RBP: ffff9c5f546c0000 R08: 000000000003ac80 R09: 0000000000000006
Jul 20 21:45:02 too-pc kernel: R10: 0000000000000100 R11: 0000000000000000 R12: 0000000000001050
Jul 20 21:45:02 too-pc kernel: R13: ffff9c5f546e5d70 R14: ffff9c666fb86000 R15: 0000000000000000
Jul 20 21:45:02 too-pc kernel: FS:  0000000000000000(0000) GS:ffff9c6a3f000000(0000) knlGS:0000000000000000
Jul 20 21:45:02 too-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 21:45:02 too-pc kernel: CR2: 00007f26044c5038 CR3: 000000036936a000 CR4: 0000000000350ee0
Jul 20 21:45:02 too-pc kernel: Call Trace:
Jul 20 21:45:02 too-pc kernel:  <TASK>
Jul 20 21:45:02 too-pc kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  ? __warn+0x81/0x130
Jul 20 21:45:02 too-pc kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  ? report_bug+0x171/0x1a0
Jul 20 21:45:02 too-pc kernel:  ? handle_bug+0x3c/0x80
Jul 20 21:45:02 too-pc kernel:  ? exc_invalid_op+0x17/0x70
Jul 20 21:45:02 too-pc kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jul 20 21:45:02 too-pc kernel:  ? amdgpu_irq_put+0x46/0x70 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  gfx_v10_0_hw_fini+0x1e/0x160 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  amdgpu_device_ip_suspend+0x36/0x70 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  amdgpu_device_gpu_recover+0x4c9/0xd80 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  amdgpu_job_timedout+0x18d/0x240 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9]
Jul 20 21:45:02 too-pc kernel:  drm_sched_job_timedout+0x7a/0x110 [gpu_sched e56a49900f4cfad6f449f14abf3e996a15ceae97]
Jul 20 21:45:02 too-pc kernel:  process_one_work+0x1c7/0x3d0
Jul 20 21:45:02 too-pc kernel:  worker_thread+0x51/0x390
Jul 20 21:45:02 too-pc kernel:  ? __pfx_worker_thread+0x10/0x10
Jul 20 21:45:02 too-pc kernel:  kthread+0xe8/0x120
Jul 20 21:45:02 too-pc kernel:  ? __pfx_kthread+0x10/0x10
Jul 20 21:45:02 too-pc kernel:  ret_from_fork+0x2c/0x50
Jul 20 21:45:02 too-pc kernel:  </TASK>
Jul 20 21:45:02 too-pc kernel: ---[ end trace 0000000000000000 ]---
Jul 20 21:45:02 too-pc kernel: ------------[ cut here ]------------
Comment 1 Artem S. Tashkinov 2023-07-21 14:51:01 UTC
Please take it here instead: https://gitlab.freedesktop.org/drm/amd/-/issues
Comment 2 Michal Turecki 2023-07-21 16:46:24 UTC
Thanks, posted on https://gitlab.freedesktop.org/drm/amd/-/issues/2711.
Apologies for confusing DRI and DRM. Please feel free to close/delete the issue.

Note You need to log in before you can comment on or make changes to this bug.