Bug 204141
Summary: | iwlwifi - kernel BUG at lib/list_debug.c:54 - WIFI-28668 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Tom Seewald (tseewald) |
Component: | network-wireless | Assignee: | DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | chrzaszc, dev, georgmueller, luca, skyler, tomi |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 5.1.16-300.fc30.x86_64 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Full journalctl output for full context
lspci -vvv output |
Created attachment 283629 [details]
lspci -vvv output
Thanks for reporting. I have created an internal ticket to track this and will have someone look into this issue ASAP. This was also discussed in this thread: https://lkml.org/lkml/2019/5/30/723 I am unsure if this is relevant, but in my case at least, this has so far occurred only on a network using WPA2-EAP. There is also a bug report in redhat bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1717115 With a look at the source code, I think there is a modification of the list without a lock: The change was introduced here: iwlwifi: mvm: support mac80211 TXQs model https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cfbc6c4c5b91c7725ef14465b98ac347d31f2334 In the patch, there is one list_del_init (the one which causes the oops) in iwl_mvm_add_new_dqa_stream_wk(), and one list_add_tail() in iwl_mvm_mac_wake_tx_queue(). They do not share the same lock. There is a mutex in iwl_mvm_add_new_dqa_stream_wk(), but nothing in iwl_mvm_mac_wake_tx_queue(). Maybe it would help to use the mutex here or - if this is too expensive - introduce a spin lock for this list? I am just guessing, but the list_add_tail() looks like the only thing not guarded by the mutex. Let me know if there is any additional information/logs needed, or if there are patches you'd like us to test. I haven't seen this bug in several months so I'm going to assume it's been fixed, but I don't know what fixed it. I'll set this as resolved for now. I also haven't seen it in several months. There must have been a patch that fixed it either directly or indirectly. Awesome! Thanks for reporting. |
Created attachment 283627 [details] Full journalctl output for full context Laptop Model: Dell Latitude 7490 Wifi Card: Intel 8265 Distribution: Fedora 30 Problem: A kernel bug appears to be randomly triggered after AP association, which in turn causes the laptop to misbehave and ultimately requires a shutdown by holding the power button down. e.g. The laptop does not fully shutdown without physical intervention and random applications are totally hung and cannot be killed. See journalctl.txt for the full context of the error. Let me know if you need any additional information.