Hi, Please help: This really annoying bug seems to always get the complete network part of the kernel into a MUTEX-LOCK :(( As most processes these days do something with network... this basically renders the machine useless. E.g. a "ping 1.1.1.1" will be stuck and become an unkillable process! Same for all programs with network access. Only way to get out of this mess: SYSRQ + R-E-I-S-U-B :(( I am not sure which driver to blame here. As it seems network related and the only network drivers active during the problem happened are INTEL-WIRELESS, I classified it as such. I'll attach the dmesg from the last occurance. I have also the complete systemd journal of this and of many other occurances. I don't know yet a way to reproduce it, but it happens 1 every couple of days, sometimes even several times a day. Please advise if you need any other info. Thanks a lot for any hint or help, Tormen
Created attachment 285959 [details] dmesg (kernel target of journalctl) from last kernel network layer mutex lockup
As this bug forces to SYSRQ reboot the machine everytime it occurs I find it pretty bad / important. I had another look at the log and these were all the reported hung_task_timeouts: Nov 17 16:26:44 huit kernel: INFO: task kworker/10:3:834 blocked for more than 122 seconds. Nov 17 16:26:44 huit kernel: INFO: task sway_statusline:1138 blocked for more than 122 seconds. Nov 17 16:26:44 huit kernel: INFO: task ThreadPoolForeg:3973 blocked for more than 122 seconds. Nov 17 16:26:44 huit kernel: INFO: task ip:3979 blocked for more than 122 seconds. Nov 17 16:28:47 huit kernel: INFO: task kworker/10:3:834 blocked for more than 245 seconds. Nov 17 16:28:47 huit kernel: INFO: task wpa_supplicant:887 blocked for more than 122 seconds. Nov 17 16:28:47 huit kernel: INFO: task sway_statusline:1138 blocked for more than 245 seconds. Nov 17 16:28:47 huit kernel: INFO: task ThreadPoolForeg:3973 blocked for more than 245 seconds. Nov 17 16:28:47 huit kernel: INFO: task ThreadPoolForeg:3977 blocked for more than 122 seconds. Nov 17 16:28:47 huit kernel: INFO: task ip:3979 blocked for more than 245 seconds. The worst and the one that should interest the kernel community I assume should be `kworker`. And this is the first kworker hung_task stack trace: Nov 17 16:26:44 huit kernel: INFO: task kworker/10:3:834 blocked for more than 122 seconds. Nov 17 16:26:44 huit kernel: Tainted: G OE 5.3.7-arch1-1-ARCH #1 Nov 17 16:26:44 huit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 17 16:26:44 huit kernel: kworker/10:3 D 0 834 2 0x80004080 Nov 17 16:26:44 huit kernel: Workqueue: events rtl_work_func_t [r8152] Nov 17 16:26:44 huit kernel: Call Trace: Nov 17 16:26:44 huit kernel: ? __schedule+0x27f/0x6d0 Nov 17 16:26:44 huit kernel: schedule+0x43/0xd0 Nov 17 16:26:44 huit kernel: rpm_resume+0x18b/0x790 Nov 17 16:26:44 huit kernel: ? wait_woken+0x70/0x70 Nov 17 16:26:44 huit kernel: rpm_resume+0x302/0x790 Nov 17 16:26:44 huit kernel: __pm_runtime_resume+0x3b/0x60 Nov 17 16:26:44 huit kernel: usb_autopm_get_interface+0x18/0x50 Nov 17 16:26:44 huit kernel: rtl_work_func_t+0x6b/0x290 [r8152] Nov 17 16:26:44 huit kernel: ? __schedule+0x287/0x6d0 Nov 17 16:26:44 huit kernel: process_one_work+0x1d1/0x3a0 Nov 17 16:26:44 huit kernel: worker_thread+0x4a/0x3d0 Nov 17 16:26:44 huit kernel: kthread+0xfb/0x130 Nov 17 16:26:44 huit kernel: ? process_one_work+0x3a0/0x3a0 Nov 17 16:26:44 huit kernel: ? kthread_park+0x80/0x80 Nov 17 16:26:44 huit kernel: ret_from_fork+0x35/0x40 ... and this points to rpm_resume and then __pm_runtime_resume. Still not 100% sure were to best address this bug though in the kernel bugzilla. Please advise.
I find this bug pretty SEVERE (as it can leave you data-loss as you cannot save anymore (e.g. when writing on a website)), hence the Importance P1 high.
I think I may be experiencing the same issue. The repeated part in my journal is very similar: Nov 29 17:53:31 rxps kernel: INFO: task kworker/7:1:172 blocked for more than 122 seconds. Nov 29 17:53:31 rxps kernel: Tainted: G OE 5.3.13-arch1-1 #1 Nov 29 17:53:31 rxps kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 29 17:53:31 rxps kernel: kworker/7:1 D 0 172 2 0x80004000 Nov 29 17:53:31 rxps kernel: Workqueue: events rtl_work_func_t [r8152] Nov 29 17:53:31 rxps kernel: Call Trace: Nov 29 17:53:31 rxps kernel: ? __schedule+0x27f/0x6d0 Nov 29 17:53:31 rxps kernel: schedule+0x43/0xd0 Nov 29 17:53:31 rxps kernel: rpm_resume+0x18b/0x790 Nov 29 17:53:31 rxps kernel: ? wait_woken+0x70/0x70 Nov 29 17:53:31 rxps kernel: rpm_resume+0x302/0x790 Nov 29 17:53:31 rxps kernel: __pm_runtime_resume+0x3b/0x60 Nov 29 17:53:31 rxps kernel: usb_autopm_get_interface+0x18/0x50 Nov 29 17:53:31 rxps kernel: rtl8152_set_mac_address+0x61/0x1c0 [r8152] Nov 29 17:53:31 rxps kernel: set_ethernet_addr+0x83/0x90 [r8152] Nov 29 17:53:31 rxps kernel: rtl8152_reset_resume+0x48/0x60 [r8152] Nov 29 17:53:31 rxps kernel: usb_resume_interface.isra.0+0x46/0xd0 Nov 29 17:53:31 rxps kernel: usb_resume_both+0xe8/0x140 Nov 29 17:53:31 rxps kernel: ? usb_runtime_suspend+0x70/0x70 Nov 29 17:53:31 rxps kernel: __rpm_callback+0x7b/0x130 Nov 29 17:53:31 rxps kernel: rpm_callback+0x4f/0x70 Nov 29 17:53:31 rxps kernel: ? usb_runtime_suspend+0x70/0x70 Nov 29 17:53:31 rxps kernel: rpm_resume+0x591/0x790 Nov 29 17:53:31 rxps kernel: rpm_resume+0x302/0x790 Nov 29 17:53:31 rxps kernel: __pm_runtime_resume+0x3b/0x60 Nov 29 17:53:31 rxps kernel: usb_autopm_get_interface+0x18/0x50 Nov 29 17:53:31 rxps kernel: rtl_work_func_t+0x6b/0x290 [r8152] Nov 29 17:53:31 rxps kernel: ? __schedule+0x287/0x6d0 Nov 29 17:53:31 rxps kernel: process_one_work+0x1d1/0x3a0 Nov 29 17:53:31 rxps kernel: worker_thread+0x4a/0x3d0 Nov 29 17:53:31 rxps kernel: kthread+0xfb/0x130 Nov 29 17:53:31 rxps kernel: ? process_one_work+0x3a0/0x3a0 Nov 29 17:53:31 rxps kernel: ? kthread_park+0x80/0x80 Nov 29 17:53:31 rxps kernel: ret_from_fork+0x1f/0x40 I can probably reproduce with an untainted kernel, if you'd prefer. It's tainted by bbswitch atm. Some basic info: $ sudo dmidecode -s system-product-name XPS 15 9560 $ uname -r 5.3.13-arch1-1 $ lspci | grep Network 02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78) I have a Dell DA200 USB-C to Ethernet adapter I use with this laptop, and I've only experienced the lockup using that adapter. When I unplug the adapter, ksoftirqd will fill 100% cpu, my laptop becomes quickly unusable, and can only be shut down by a hard power-off.
I think this issue is actually a duplicate of #205155.
@ronan, that is actually true. And I am also kind of sorry for this "cross-post"... but (a) I thought my initial bug report was not as concise as it could have been. (b) I got no reaction whatsoever in weeks on a clearly broken driver so I was kind of desperate and honestly did not know what to do other than to try it again. But I made peace with this ever since. So I will close this bug report. Tormen *** This bug has been marked as a duplicate of bug 205155 ***