After upgrading to Linux 5.0.1 I started seeing the following warning in dmesg whenever I try to put the machine to sleep: [ 156.103095] PM: suspend entry (deep) [ 156.103099] PM: Syncing filesystems ... done. [ 156.122217] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 156.124208] OOM killer disabled. [ 156.124208] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. [ 156.125070] printk: Suspending console(s) (use no_console_suspend to debug) [ 156.125431] wlp2s0: deauthenticating from 18:d6:c7:fc:d5:3d by local choice (Reason: 3=DEAUTH_LEAVING) [ 156.461444] WARNING: CPU: 7 PID: 169 at irq_startup+0xd6/0xe0 [ 156.461446] Modules linked in: rfcomm bnep btusb btrtl btbcm btintel bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common ecdh_generic overlay squashfs kvm_amd r8822be(C) tpm_crb r8169 realtek tpm_tis tpm_tis_core amdgpu chash gpu_sched ttm [ 156.461469] CPU: 7 PID: 169 Comm: kworker/u32:6 Tainted: G C 5.0.1 #7 [ 156.461471] Hardware name: LENOVO 20MU000CPB/20MU000CPB, BIOS R0WET48W (1.16 ) 01/03/2019 [ 156.461476] Workqueue: events_unbound async_run_entry_fn [ 156.461479] RIP: 0010:irq_startup+0xd6/0xe0 [ 156.461481] Code: 31 f6 4c 89 ef e8 0a 2b 00 00 85 c0 75 20 48 89 ee 31 d2 4c 89 ef e8 69 db ff ff 48 89 df e8 d1 fe ff ff 89 c5 e9 57 ff ff ff <0f> 0b eb b9 0f 0b eb b5 66 90 55 48 89 fd 53 48 8b 47 38 89 f3 8b [ 156.461482] RSP: 0018:ffffaa3e0047bc28 EFLAGS: 00010002 [ 156.461483] RAX: 0000000000000010 RBX: ffffa0b427beea00 RCX: 0000000000000040 [ 156.461484] RDX: 0000000000000000 RSI: ffffffffad2efeb0 RDI: ffffa0b427beea18 [ 156.461485] RBP: ffffa0b427beea18 R08: 0000000000000000 R09: ffffa0b42bb38c40 [ 156.461486] R10: 0000000000000000 R11: ffffffffad2364c8 R12: 0000000000000001 [ 156.461487] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000010 [ 156.461489] FS: 0000000000000000(0000) GS:ffffa0b42fdc0000(0000) knlGS:0000000000000000 [ 156.461490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 156.461491] CR2: 00007f6e4a053000 CR3: 00000007ade44000 CR4: 00000000003406e0 [ 156.461492] Call Trace: [ 156.461497] enable_irq+0x41/0x80 [ 156.461503] nvme_poll_irqdisable+0xd4/0x230 [ 156.461506] __nvme_disable_io_queues+0x1ae/0x1e0 [ 156.461508] ? nvme_del_queue_end+0x20/0x20 [ 156.461509] nvme_dev_disable+0x1b6/0x1d0 [ 156.461512] nvme_suspend+0x11/0x20 [ 156.461515] pci_pm_suspend+0x6e/0x1b0 [ 156.461517] ? pci_pm_suspend_noirq+0x280/0x280 [ 156.461521] dpm_run_callback+0x46/0x130 [ 156.461523] __device_suspend+0x126/0x4e0 [ 156.461527] ? __wake_up_common+0x72/0x140 [ 156.461529] ? dpm_show_time+0xc0/0xc0 [ 156.461531] async_suspend+0x15/0x80 [ 156.461533] async_run_entry_fn+0x32/0xe0 [ 156.461537] process_one_work+0x1e3/0x3e0 [ 156.461540] worker_thread+0x28/0x3c0 [ 156.461541] ? process_one_work+0x3e0/0x3e0 [ 156.461544] kthread+0x10d/0x130 [ 156.461546] ? kthread_park+0x80/0x80 [ 156.461550] ret_from_fork+0x22/0x40 [ 156.461553] ---[ end trace 27b1790090a517c4 ]--- [ 156.591877] ACPI: EC: interrupt blocked [ 156.639422] ACPI: Preparing to enter system sleep state S3 [ 156.646539] ACPI: EC: event blocked [ 156.646540] ACPI: EC: EC stopped [ 156.646541] PM: Saving platform NVS memory [ 156.647033] Disabling non-boot CPUs .. The computer in question is Lenovo Thinkpad A485. Not sure its important here, but it comes with 2 NVME drivers 1TB ADATA SX6000LNP and 240GB TOSHIBA-RC100. If required I can bisect it to a specific commit that caused this regression, but from the quick look at the changes between 4.20 and 5.0 there are 2 commits that look like likely candidates: 1. nvme-pci: remove the CQ lock for interrupt driven queues (3a7afd8ee42a68d4f24ab9c947a4ef82d4d52375) 2. nvme-pci: don't poll from irq context when deleting queues (d1ed6aa14bc418531220478604c7b12c5e98fdca)
Created attachment 281775 [details] Dmesg from linux 5.0.1 Attaching full dmesg.
Created attachment 282087 [details] kernel 5.0.5 trace Same issue on Lenovo Thinpad E485 with kernel 5.0.5
The problem still persists in kernel 5.1.0
I've stumbled on this on phoronix formus https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1089720-amdgpu-laptop-suspend-hang?p=1091151#post1091151. I can confirm this change does solve the problem on my laptop.
Created attachment 285363 [details] Full dmesg on 5.2.13 I get this bug a lot too, on 5.2.13. Hardware: * Ryzen 2700 * 64GB ECC RAM * Samsung 970 Pro 512GB NVMe
Happens on 5.3.6 too.
I think the kernel warning refers to the following source code: if (cpumask_any_and(aff, cpu_online_mask) >= nr_cpu_ids) { /* * Catch code which fiddles with enable_irq() on a managed * and potentially shutdown IRQ. Chained interrupt * installment or irq auto probing should not happen on * managed irqs either. */ if (WARN_ON_ONCE(force)) return IRQ_STARTUP_ABORT; /* * The interrupt was requested, but there is no online CPU * in it's affinity mask. Put it into managed shutdown * state and let the cpu hotplug mechanism start it up once * a CPU in the mask becomes available. */ return IRQ_STARTUP_ABORT; }
See also https://lore.kernel.org/lkml/CAEJqkggcnW98Sk3BEBCCZf57Uwd9rdqD5Da0tmuTaNfkJN5kVg@mail.gmail.com/
I am also affected by this bug on kernel 5.7.2-arch1-1, with a Dell Inspiron 7405, AMD Ryzen 7 4700u, and a PC SN530 NVMe WDC 512GB SSD. Here is the oops from my kernel messages: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 2042 at kernel/irq/chip.c:210 irq_startup+0xdf/0xf0 Modules linked in: fuse ccm btusb btrtl btbcm btintel bluetooth uvcvideo joydev videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 mousedev video> pinctrl_amd acpi_tad ac drm crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 crc32c_intel x> CPU: 6 PID: 2042 Comm: systemd-sleep Not tainted 5.6.15-arch1-1 #1 Hardware name: Dell Inc. Inspiron 7405 2n1/042J14, BIOS 1.0.0 03/19/2020 RIP: 0010:irq_startup+0xdf/0xf0 Code: f6 4c 89 e7 e8 02 45 00 00 85 c0 75 21 4c 89 e7 31 d2 4c 89 ee e8 21 c9 ff ff 48 89 ef e8 b9 fe ff ff 41 89 c4 e9 53 ff ff ff <0f> 0b eb b> RSP: 0018:ffffb7c8c2de3d90 EFLAGS: 00010002 RAX: 0000000000000140 RBX: 0000000000000001 RCX: 0000000000000140 RDX: 0000000000000004 RSI: ffffffff8b565160 RDI: ffff976402fb0818 RBP: ffff976402fb0800 R08: 0000000000000000 R09: 0000000000000140 R10: 0000000000000000 R11: ffffffff8b453648 R12: 0000000000000001 R13: ffff976402fb0818 R14: ffff976402fb08e4 R15: 0000000000000000 FS: 00007f2aae131a80(0000) GS:ffff976407580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005634481363e6 CR3: 000000017a266000 CR4: 0000000000340ee0 Call Trace: resume_irqs+0xb6/0xf0 dpm_resume_noirq+0xf/0x20 suspend_devices_and_enter+0x338/0x8a0 pm_suspend.cold+0x333/0x387 state_store+0x42/0x90 kernfs_fop_write+0xce/0x1b0 vfs_write+0xb6/0x1a0 ksys_write+0x67/0xe0 do_syscall_64+0x49/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f2aaf094b57 Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f> RSP: 002b:00007ffe27d04b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2aaf094b57 RDX: 0000000000000004 RSI: 00007ffe27d04c80 RDI: 0000000000000004 RBP: 00007ffe27d04c80 R08: 000055df0939ea90 R09: 000000000000000d R10: 000055df0939ac00 R11: 0000000000000246 R12: 0000000000000004 R13: 000055df0939a3c0 R14: 0000000000000004 R15: 00007f2aaf165700 ---[ end trace b43f52b8ea5824de ]---