Bug 217692
Summary: | amdgpu crashes on resume from standby - amdgpu-reset-dev drm_sched_job_timedout | ||
---|---|---|---|
Product: | Drivers | Reporter: | Michal Turecki (turecki) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED ANSWERED | ||
Severity: | normal | ||
Priority: | P3 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: | Single log with dmesg joined with suspend + resume log until crash |
Please take it here instead: https://gitlab.freedesktop.org/drm/amd/-/issues Thanks, posted on https://gitlab.freedesktop.org/drm/amd/-/issues/2711. Apologies for confusing DRI and DRM. Please feel free to close/delete the issue. |
Created attachment 304679 [details] Single log with dmesg joined with suspend + resume log until crash Kernel version: 6.4.3-2-MANJARO #1 SMP PREEMPT_DYNAMIC Sun Jul 16 16:55:12 UTC 2023 x86_64 GNU/Linux Suspend took longer than usual and after resume by pressing a key on keyboard, screen did not turn on. A hopefully relevant part of trace (attached complete): Jul 20 21:44:50 too-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1605372, emitted seq=1605374 Jul 20 21:44:50 too-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 Jul 20 21:44:50 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: GPU reset begin! Jul 20 21:44:51 too-pc systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully. Jul 20 21:44:54 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000003A SMN_C2PMSG_82:0x00000000 Jul 20 21:44:54 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: Failed to retrieve enabled ppfeatures! Jul 20 21:44:58 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000003A SMN_C2PMSG_82:0x00000000 Jul 20 21:44:58 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: Failed to retrieve enabled ppfeatures! Jul 20 21:45:02 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: failed to suspend display audio Jul 20 21:45:02 too-pc kernel: amdgpu 0000:2f:00.0: amdgpu: Failed to disallow df cstate Jul 20 21:45:02 too-pc kernel: ------------[ cut here ]------------ Jul 20 21:45:02 too-pc kernel: WARNING: CPU: 24 PID: 574073 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:599 amdgpu_irq_put+0x46/0x70 [amdgpu] Jul 20 21:45:02 too-pc kernel: Modules linked in: tls vhost_net vhost vhost_iotlb tap tun dm_crypt cbc encrypted_keys trusted asn1_encoder tee xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c x t_addrtype iptable_filter br_netfilter snd_seq_dummy snd_hrtimer snd_seq bridge stp llc rfkill qrtr overlay uvcvideo videobuf2_vmalloc uvc videobuf2_memops snd_usb_audio videobuf2_v4l2 snd_usbmidi_lib videodev uas snd_rawmidi videobuf2_common snd_seq_device mousedev mc joyd ev vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_hda_codec_realtek ccp snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel kvm snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hda_core polyval_generic gf128mul snd_hwdep r8169 ghash_clmulni_intel snd_pcm sha512_ssse3 aesni_intel snd_timer crypto_simd wmi_bmof sp5100_tco realtek cryptd snd rapl mdio_devres pcspkr soundcore k10temp i2c_piix4 libphy ryzen_smu(OE) Jul 20 21:45:02 too-pc kernel: mac_hid uinput i2c_dev dm_multipath fuse crypto_user dm_mod loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usb_storage usbhid crc32c_intel nvme xhci_pci nvme_core xhci_pci_renesas nvme_common vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd amdgpu i2c_algo_bit drm_ttm_helper ttm video wmi drm_suballoc_helper drm_buddy gpu_sched drm_display_helper cec Jul 20 21:45:02 too-pc kernel: CPU: 24 PID: 574073 Comm: kworker/u64:25 Tainted: G W OE 6.4.3-2-MANJARO #1 00e85e278f33e769813ea7ae6ba8e625d37e2fac Jul 20 21:45:02 too-pc kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING PLUS (MS-7C37), BIOS A.I0 08/10/2022 Jul 20 21:45:02 too-pc kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] Jul 20 21:45:02 too-pc kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu] Jul 20 21:45:02 too-pc kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 8f c5 ae f0 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 7e c5 ae f0 b8 ea ff ff ff e9 74 c5 ae f0 Jul 20 21:45:02 too-pc kernel: RSP: 0018:ffffa87787687c98 EFLAGS: 00010246 Jul 20 21:45:02 too-pc kernel: RAX: ffff9c5f5233c980 RBX: ffff9c5f546c0000 RCX: 0000000000000000 Jul 20 21:45:02 too-pc kernel: RDX: 0000000000000000 RSI: ffff9c5f546d9250 RDI: ffff9c5f546c0000 Jul 20 21:45:02 too-pc kernel: RBP: ffff9c5f546c0000 R08: 000000000003ac80 R09: 0000000000000006 Jul 20 21:45:02 too-pc kernel: R10: 0000000000000100 R11: 0000000000000000 R12: 0000000000001050 Jul 20 21:45:02 too-pc kernel: R13: ffff9c5f546e5d70 R14: ffff9c666fb86000 R15: 0000000000000000 Jul 20 21:45:02 too-pc kernel: FS: 0000000000000000(0000) GS:ffff9c6a3f000000(0000) knlGS:0000000000000000 Jul 20 21:45:02 too-pc kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 20 21:45:02 too-pc kernel: CR2: 00007f26044c5038 CR3: 000000036936a000 CR4: 0000000000350ee0 Jul 20 21:45:02 too-pc kernel: Call Trace: Jul 20 21:45:02 too-pc kernel: <TASK> Jul 20 21:45:02 too-pc kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: ? __warn+0x81/0x130 Jul 20 21:45:02 too-pc kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: ? report_bug+0x171/0x1a0 Jul 20 21:45:02 too-pc kernel: ? handle_bug+0x3c/0x80 Jul 20 21:45:02 too-pc kernel: ? exc_invalid_op+0x17/0x70 Jul 20 21:45:02 too-pc kernel: ? asm_exc_invalid_op+0x1a/0x20 Jul 20 21:45:02 too-pc kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: gfx_v10_0_hw_fini+0x1e/0x160 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: amdgpu_device_ip_suspend_phase2+0x107/0x1a0 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: ? amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: amdgpu_device_ip_suspend+0x36/0x70 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: amdgpu_device_pre_asic_reset+0xd3/0x2b0 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: amdgpu_device_gpu_recover+0x4c9/0xd80 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: amdgpu_job_timedout+0x18d/0x240 [amdgpu e53a5ce3236982c3abd9d891fdaf5541b1a96ab9] Jul 20 21:45:02 too-pc kernel: drm_sched_job_timedout+0x7a/0x110 [gpu_sched e56a49900f4cfad6f449f14abf3e996a15ceae97] Jul 20 21:45:02 too-pc kernel: process_one_work+0x1c7/0x3d0 Jul 20 21:45:02 too-pc kernel: worker_thread+0x51/0x390 Jul 20 21:45:02 too-pc kernel: ? __pfx_worker_thread+0x10/0x10 Jul 20 21:45:02 too-pc kernel: kthread+0xe8/0x120 Jul 20 21:45:02 too-pc kernel: ? __pfx_kthread+0x10/0x10 Jul 20 21:45:02 too-pc kernel: ret_from_fork+0x2c/0x50 Jul 20 21:45:02 too-pc kernel: </TASK> Jul 20 21:45:02 too-pc kernel: ---[ end trace 0000000000000000 ]--- Jul 20 21:45:02 too-pc kernel: ------------[ cut here ]------------