Bug 204141 - iwlwifi - kernel BUG at lib/list_debug.c:54 - WIFI-28668
Summary: iwlwifi - kernel BUG at lib/list_debug.c:54 - WIFI-28668
Status: ASSIGNED
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Intel Linux Wireless
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-11 15:33 UTC by Tom Seewald
Modified: 2019-08-26 21:32 UTC (History)
6 users (show)

See Also:
Kernel Version: 5.1.16-300.fc30.x86_64
Tree: Mainline
Regression: No


Attachments
Full journalctl output for full context (926.10 KB, text/plain)
2019-07-11 15:33 UTC, Tom Seewald
Details
lspci -vvv output (39.28 KB, text/plain)
2019-07-11 15:34 UTC, Tom Seewald
Details

Description Tom Seewald 2019-07-11 15:33:45 UTC
Created attachment 283627 [details]
Full journalctl output for full context

Laptop Model: Dell Latitude 7490
Wifi Card: Intel 8265
Distribution: Fedora 30

Problem: A kernel bug appears to be randomly triggered after AP association, which in turn causes the laptop to misbehave and ultimately requires a shutdown by holding the power button down.

e.g. The laptop does not fully shutdown without physical intervention and random applications are totally hung and cannot be killed.

See journalctl.txt for the full context of the error.

Let me know if you need any additional information.
Comment 1 Tom Seewald 2019-07-11 15:34:19 UTC
Created attachment 283629 [details]
lspci -vvv output
Comment 2 Luca Coelho 2019-07-12 10:08:36 UTC
Thanks for reporting.  I have created an internal ticket to track this and will have someone look into this issue ASAP.

This was also discussed in this thread:

https://lkml.org/lkml/2019/5/30/723
Comment 3 Tom Seewald 2019-07-12 14:47:31 UTC
I am unsure if this is relevant, but in my case at least, this has so far occurred only on a network using WPA2-EAP.
Comment 4 Georg Müller 2019-07-15 13:50:44 UTC
There is also a bug report in redhat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1717115

With a look at the source code, I think there is a modification of the list without a lock:

The change was introduced here:
iwlwifi: mvm: support mac80211 TXQs model
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cfbc6c4c5b91c7725ef14465b98ac347d31f2334

In the patch, there is one list_del_init (the one which causes the oops) in iwl_mvm_add_new_dqa_stream_wk(), and one list_add_tail() in iwl_mvm_mac_wake_tx_queue().

They do not share the same lock. There is a mutex in iwl_mvm_add_new_dqa_stream_wk(), but nothing in iwl_mvm_mac_wake_tx_queue(). Maybe it would help to use the mutex here or - if this is too expensive - introduce a spin lock for this list?

I am just guessing, but the list_add_tail() looks like the only thing not guarded by the mutex.
Comment 5 Tom Seewald 2019-08-13 06:02:40 UTC
Let me know if there is any additional information/logs needed, or if there are patches you'd like us to test.

Note You need to log in before you can comment on or make changes to this bug.