Created attachment 276689 [details] HWInfo I have observed a GPU lockup when the systems resumes after a sleep. The duration of the sleep dosn't care. The problem occurs every time putting the system to sleep. I was able to narrow the problem a little bit. When I switch to the console and then putting the system to sleep, the system will come up properly (with a trace on a amgpu fuction). If I then switch back to the login manager or to the desktop, the gpu fault and eventually hangs. See logs below. I can reproduce the problem with kernel 4.16.13. Further it dosn't matter if amdgpu.dc is enabled or disable. System ---------- Linux 4.17.2 Debian Unstable X.Org 1.20 Mesa 18.1.1 Radeon RX 580 Series (POLARIS10, DRM 3.25.0, 4.17.2, LLVM 6.0.0) CPU Intel Core i7-8700k MB Asus Prime z380-A Kernel log after the resume from console: ----------------------------------------- Jun 19 14:24:39 moc kernel: sd 0:0:0:0: [sda] Starting disk Jun 19 14:24:39 moc kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400040000). Jun 19 14:24:39 moc kernel: WARNING: CPU: 7 PID: 28047 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:725 amdgpu_dm_display_resume+0x213/0x220 [amdgpu] Jun 19 14:24:39 moc kernel: Modules linked in: vmnet(OE) vmw_vsock_vmci_transport(E) vsock(E) vmw_vmci(E) vmmon(OE) fuse(E) joydev(E) hid_cherry(E) hid_generic(E) usbhid(E) hid(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) ir Jun 19 14:24:39 moc kernel: asus_wmi(E) evdev(E) efi_pstore(E) intel_uncore(E) sparse_keymap(E) wmi_bmof(E) mxm_wmi(E) i2c_algo_bit(E) rfkill(E) sg(E) intel_rapl_perf(E) iTCO_wdt(E) efivars(E) snd(E) mei_me(E) iTCO_vendor_support(E) soundcore(E) mei(E) shpchp(E) wmi(E) v Jun 19 14:24:39 moc kernel: btrfs(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid10(E) raid456(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md Jun 19 14:24:39 moc kernel: CPU: 7 PID: 28047 Comm: kworker/u24:7 Tainted: G OE 4.17.2 #1 Jun 19 14:24:39 moc kernel: Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 0805 05/18/2018 Jun 19 14:24:39 moc kernel: Workqueue: events_unbound async_run_entry_fn Jun 19 14:24:39 moc kernel: RIP: 0010:amdgpu_dm_display_resume+0x213/0x220 [amdgpu] Jun 19 14:24:39 moc kernel: RSP: 0000:ffffaadd4447fd60 EFLAGS: 00010202 Jun 19 14:24:39 moc kernel: RAX: 0000000000000002 RBX: ffff96d7a48b0000 RCX: 0000000000000006 Jun 19 14:24:39 moc kernel: RDX: 0000000000000006 RSI: ffff96d6915a2c80 RDI: ffff96d7898f7800 Jun 19 14:24:39 moc kernel: RBP: ffff96d79fb9d800 R08: 0000000000000000 R09: ffffffffc14a7174 Jun 19 14:24:39 moc kernel: R10: ffffe4dea0a9a840 R11: 0000000000000001 R12: 0000000000000000 Jun 19 14:24:39 moc kernel: R13: ffff96d7a5e43800 R14: ffff96d7a9ca8d40 R15: ffffffffb4695dbb Jun 19 14:24:39 moc kernel: FS: 0000000000000000(0000) GS:ffff96d7ae3c0000(0000) knlGS:0000000000000000 Jun 19 14:24:39 moc kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 19 14:24:39 moc kernel: CR2: 0000000000000000 CR3: 00000003aa80a001 CR4: 00000000003606e0 Jun 19 14:24:39 moc kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jun 19 14:24:39 moc kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jun 19 14:24:39 moc kernel: Call Trace: Jun 19 14:24:39 moc kernel: amdgpu_device_ip_resume_phase2+0x45/0xb0 [amdgpu] Jun 19 14:24:39 moc kernel: amdgpu_device_resume+0xbf/0x380 [amdgpu] Jun 19 14:24:39 moc kernel: ? pci_pm_freeze+0xd0/0xd0 Jun 19 14:24:39 moc kernel: ? pci_pm_freeze+0xd0/0xd0 Jun 19 14:24:39 moc kernel: dpm_run_callback+0x4d/0x130 Jun 19 14:24:39 moc kernel: device_resume+0x97/0x190 Jun 19 14:24:39 moc kernel: async_resume+0x19/0x40 Jun 19 14:24:39 moc kernel: async_run_entry_fn+0x39/0x160 Jun 19 14:24:39 moc kernel: process_one_work+0x17b/0x360 Jun 19 14:24:39 moc kernel: worker_thread+0x2e/0x390 Jun 19 14:24:39 moc kernel: ? process_one_work+0x360/0x360 Jun 19 14:24:39 moc kernel: kthread+0x113/0x130 Jun 19 14:24:39 moc kernel: ? kthread_create_worker_on_cpu+0x70/0x70 Jun 19 14:24:39 moc kernel: ret_from_fork+0x35/0x40 Jun 19 14:24:39 moc kernel: Code: 00 7f ac 48 89 ef e8 dd df a5 ff 48 c7 83 90 aa 00 00 00 00 00 00 89 c5 48 89 df e8 c8 17 00 00 89 e8 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b e9 48 ff ff ff 0f 0b eb a5 66 90 0f 1f 44 00 00 53 48 89 Jun 19 14:24:39 moc kernel: ---[ end trace c39336409cdb2ae3 ]--- Jun 19 14:24:39 moc kernel: [drm] UVD and UVD ENC initialized successfully. Jun 19 14:24:39 moc kernel: ixgbe 0000:03:00.0: Multiqueue Enabled: Rx Queue count = 12, Tx Queue count = 12 XDP Queue count = 0 Log after switching to X11 --------------------------- Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0a304401 Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08404D46 Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001 Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid 0) at page 138431814, read from 'TC5' (0x54433500) (68) Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08404D46 Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001 Jun 19 14:29:13 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid 0) at page 138431814, read from 'TC5' (0x54433500) (68) Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0a304401 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08404D46 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid 0) at page 138431814, read from 'TC5' (0x54433500) (68) Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0a304401 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08404D46 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid 0) at page 138431814, read from 'TC5' (0x54433500) (68) Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0E40C60C Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048001 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid 0) at page 239126028, read from 'TC4' (0x54433400) (72) Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 0) at page 0, read from 'TC4' (0x54433400) (72) Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0a304401 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08404D46 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid 0) at page 138431814, read from 'TC5' (0x54433500) (68) Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0804800C Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 0) at page 0, read from 'TC4' (0x54433400) (72) Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0a304401 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08404D46 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid 0) at page 138431814, read from 'TC5' (0x54433500) (68) Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0a304401 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08404D46 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08044001 Jun 19 14:29:14 moc kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4, pasid 0) at page 138431814, read from 'TC5' (0x54433500) (68) Jun 19 14:29:24 moc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=384604, last emitted seq=384605 Jun 19 14:29:24 moc kernel: [drm] IP block:gfx_v8_0 is hung! Jun 19 14:29:24 moc kernel: [drm] GPU recovery disabled. -- Reboot --
Duplicate of bug https://bugzilla.kernel.org/show_bug.cgi?id=199959