Created attachment 300069 [details] Output from ver_linux Moving from kernel 5.14.21 to 5.15.x , I noticed sporadic hangs mostly related to NetworkManager. Rebooting the box triggers the behaviour in all cases, NetworkManager hangs when receiving a stop job, and eventually these messages are logged: Dec 18 10:04:07 pyjamarama kernel: INFO: task kworker/15:2:251 blocked for more than 245 seconds. Dec 18 10:04:07 pyjamarama kernel: Not tainted 5.15.10-MANJARO #1 Dec 18 10:04:07 pyjamarama kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 18 10:04:07 pyjamarama kernel: task:kworker/15:2 state:D stack: 0 pid: 251 ppid: 2 flags:0x00004000 Dec 18 10:04:07 pyjamarama kernel: Workqueue: ipv6_addrconf addrconf_verify_work Dec 18 10:04:07 pyjamarama kernel: Call Trace: Dec 18 10:04:07 pyjamarama kernel: <TASK> Dec 18 10:04:07 pyjamarama kernel: __schedule+0x311/0x940 Dec 18 10:04:07 pyjamarama kernel: schedule+0x59/0xc0 Dec 18 10:04:07 pyjamarama kernel: schedule_preempt_disabled+0x11/0x20 Dec 18 10:04:07 pyjamarama kernel: __mutex_lock.constprop.0+0x258/0x460 Dec 18 10:04:07 pyjamarama kernel: addrconf_verify_work+0xa/0x20 Dec 18 10:04:07 pyjamarama kernel: process_one_work+0x1e8/0x3c0 Dec 18 10:04:07 pyjamarama kernel: worker_thread+0x50/0x3b0 Dec 18 10:04:07 pyjamarama kernel: ? process_one_work+0x3c0/0x3c0 Dec 18 10:04:07 pyjamarama kernel: kthread+0x132/0x160 Dec 18 10:04:07 pyjamarama kernel: ? set_kthread_struct+0x40/0x40 Dec 18 10:04:07 pyjamarama kernel: ret_from_fork+0x22/0x30 Dec 18 10:04:07 pyjamarama kernel: </TASK> Dec 18 10:04:07 pyjamarama kernel: INFO: task NetworkManager:681 blocked for more than 245 seconds. Dec 18 10:04:07 pyjamarama kernel: Not tainted 5.15.10-MANJARO #1 Dec 18 10:04:07 pyjamarama kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 18 10:04:07 pyjamarama kernel: task:NetworkManager state:D stack: 0 pid: 681 ppid: 1 flags:0x00004004 Dec 18 10:04:07 pyjamarama kernel: Call Trace: Dec 18 10:04:07 pyjamarama kernel: <TASK> Dec 18 10:04:07 pyjamarama kernel: __schedule+0x311/0x940 Dec 18 10:04:07 pyjamarama kernel: schedule+0x59/0xc0 Dec 18 10:04:07 pyjamarama kernel: schedule_preempt_disabled+0x11/0x20 Dec 18 10:04:07 pyjamarama kernel: __mutex_lock.constprop.0+0x258/0x460 Dec 18 10:04:07 pyjamarama kernel: ? delay_halt+0x3b/0x60 Dec 18 10:04:07 pyjamarama kernel: igb_resume+0xff/0x1d0 [igb] Dec 18 10:04:07 pyjamarama kernel: pci_pm_runtime_resume+0xaa/0xc0 Dec 18 10:04:07 pyjamarama kernel: ? pci_pm_freeze_noirq+0x100/0x100 Dec 18 10:04:07 pyjamarama kernel: __rpm_callback+0x44/0x110 Dec 18 10:04:07 pyjamarama kernel: ? pci_pm_freeze_noirq+0x100/0x100 Dec 18 10:04:07 pyjamarama kernel: rpm_callback+0x35/0x70 Dec 18 10:04:07 pyjamarama kernel: rpm_resume+0x55b/0x800 Dec 18 10:04:07 pyjamarama kernel: __pm_runtime_resume+0x4a/0x80 Dec 18 10:04:07 pyjamarama kernel: dev_ethtool+0xd4/0x2d60 Dec 18 10:04:07 pyjamarama kernel: ? enqueue_task_fair+0x8e/0x690 Dec 18 10:04:07 pyjamarama kernel: ? inet_ioctl+0xde/0x1e0 Dec 18 10:04:07 pyjamarama kernel: dev_ioctl+0x1ae/0x560 Dec 18 10:04:07 pyjamarama kernel: ? _copy_from_user+0x28/0x60 Dec 18 10:04:07 pyjamarama kernel: sock_do_ioctl+0xe4/0x130 Dec 18 10:04:07 pyjamarama kernel: sock_ioctl+0x204/0x310 Dec 18 10:04:07 pyjamarama kernel: __x64_sys_ioctl+0x82/0xb0 Dec 18 10:04:07 pyjamarama kernel: do_syscall_64+0x3b/0x90 Dec 18 10:04:07 pyjamarama kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Dec 18 10:04:07 pyjamarama kernel: RIP: 0033:0x7f8828d4c59b Dec 18 10:04:07 pyjamarama kernel: RSP: 002b:00007ffd61b19498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Dec 18 10:04:07 pyjamarama kernel: RAX: ffffffffffffffda RBX: 00007ffd61b19520 RCX: 00007f8828d4c59b Dec 18 10:04:07 pyjamarama kernel: RDX: 00007ffd61b19540 RSI: 0000000000008946 RDI: 0000000000000013 Dec 18 10:04:07 pyjamarama kernel: RBP: 00007ffd61b19690 R08: 0000000000000007 R09: 003066307334706e Dec 18 10:04:07 pyjamarama kernel: R10: 0000000000000021 R11: 0000000000000246 R12: 0000000000000002 Dec 18 10:04:07 pyjamarama kernel: R13: 0000000000000000 R14: 00007ffd61b19540 R15: 00007ffd61b19520 Dec 18 10:04:07 pyjamarama kernel: </TASK> I bisected the kernel from 5.14.21 to 5.15.0 ( when the issue started ) and it points to this commit: commit f32a213765739f2a1db319346799f130a3d08820 Author: Heiner Kallweit <hkallweit1@gmail.com> Date: Sun Aug 1 12:36:48 2021 +0200 ethtool: runtime-resume netdev parent before ethtool ioctl ops If a network device is runtime-suspended then: - network device may be flagged as detached and all ethtool ops (even if not accessing the device) will fail because netif_device_present() returns false - ethtool ops may fail because device is not accessible (e.g. because being in D3 in case of a PCI device) It may not be desirable that userspace can't use even simple ethtool ops that not access the device if interface or link is down. To be more friendly to userspace let's ensure that device is runtime-resumed when executing the respective ethtool op in kernel. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> I reverted that patch in 5.15.10 and it seems to have cured the hang on reboot behaviour, however i'll be testing more to see if the other random hangs disappear too. thanks for your attention
Created attachment 300071 [details] cpuinfo.txt
Created attachment 300073 [details] lspci
Hi, this is your Linux kernel regression tracker speaking. Not totally sure, but I guess this will be fixed by this patch: https://lore.kernel.org/netdev/6bb28d2f-4884-7696-0582-c26c35534bae@gmail.com/ Will write a mail later why I takes to long to get this mainlined and backported. This bug thus is like a duplicate of Bug 215129
Thanks Thorsten - it seems indeed a duplicate - ( the discussion seemed to hint that more work was needed - but will try the patch )
Hello, I re-applied the commit in question and applied the patch detailed in Bug 215129, which resolves the issue as well.
*** This bug has been marked as a duplicate of bug 215129 ***