Bug 214291
Summary: | iwlwifi: AX201: Packet injection crashes system | ||
---|---|---|---|
Product: | Drivers | Reporter: | Iyán (me) |
Component: | network-wireless-intel | Assignee: | Iyán (me) |
Status: | REOPENED --- | ||
Severity: | normal | CC: | golan.ben.ami, linsy_king, ZeroBeat |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.13.13 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg output during iwlwifi crashing
attachment-22035-0.html Photo of dmesg on a VT when laptop crashed Error after hcxdumptool crashed attachment-32274-0.html Netconsole log during laptop crash Caps Lock led behaviour after crash Crash log New photo of dmesg on a VT when laptop crashed |
Description
Iyán
2021-09-02 15:32:47 UTC
My laptop actually crashed again while trying to generate the dmesg log file, so I provide here the output of journalctl -k -b -1. I guess it's almost the same as from dmesg directly. Created attachment 298643 [details]
dmesg output during iwlwifi crashing
Could be related to this issue: https://bbs.archlinux.org/viewtopic.php?id=254766 and this one: https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg5936552.html "Title: Intel AX210 iwlwifi firmware crash under stress tests: Microcode SW error detected. Restarting 0x0." And for sure, running hcxdumptool/hcxlabtool is a stress test for every driver! Did you change the kernel in any way? If I try to switch sniffer to channel 88, I get kernel reports: Channel is disabled command failed: Invalid argument (-22) This 0x200014FC | ADVANCED_SYSASSERT indicates that the channel configured is not a legal channel for this BW. This assert is here to defend from regulatory violation. @ Golan Ben Ami Thanks for that information, which is useful. hcxdumptool scan the whole frequency range (depending on the band) by ioctl(SIOCSIWFREQ). If the result is not -1, I assume it is a valid frequency. The regulatory domain is respected and it is working fine on several drivers, e.g. on mt76x0u: $ sudo hcxdumptool -i wlp39s0f3u1u1u1 -C initialization of hcxdumptool 6.2.4-62-gafc9e51... available channels: 1 / 2412MHz (14 dBm) 2 / 2417MHz (14 dBm) 3 / 2422MHz (14 dBm) 4 / 2427MHz (14 dBm) 5 / 2432MHz (14 dBm) 6 / 2437MHz (14 dBm) 7 / 2442MHz (14 dBm) 8 / 2447MHz (14 dBm) 9 / 2452MHz (14 dBm) 10 / 2457MHz (14 dBm) 11 / 2462MHz (14 dBm) 12 / 2467MHz (14 dBm) 13 / 2472MHz (14 dBm) 36 / 5180MHz (17 dBm) 40 / 5200MHz (17 dBm) 44 / 5220MHz (17 dBm) 48 / 5240MHz (17 dBm) 52 / 5260MHz (17 dBm) 56 / 5280MHz (17 dBm) 60 / 5300MHz (17 dBm) 64 / 5320MHz (17 dBm) 100 / 5500MHz (17 dBm) 104 / 5520MHz (17 dBm) 108 / 5540MHz (17 dBm) 112 / 5560MHz (17 dBm) 116 / 5580MHz (17 dBm) 120 / 5600MHz (17 dBm) 124 / 5620MHz (17 dBm) 128 / 5640MHz (17 dBm) 132 / 5660MHz (17 dBm) 136 / 5680MHz (17 dBm) 140 / 5700MHz (17 dBm) 144 / 5720MHz (17 dBm) 149 / 5745MHz (17 dBm) 153 / 5765MHz (17 dBm) 157 / 5785MHz (17 dBm) 161 / 5805MHz (17 dBm) 165 / 5825MHz (17 dBm) 169 / 5845MHz (17 dBm) 173 / 5865MHz (17 dBm) or rt2800usb $ sudo hcxdumptool -i wlp39s0f3u1u4 -C initialization of hcxdumptool 6.2.4-62-gafc9e51... available channels: 1 / 2412MHz (20 dBm) 2 / 2417MHz (20 dBm) 3 / 2422MHz (20 dBm) 4 / 2427MHz (20 dBm) 5 / 2432MHz (20 dBm) 6 / 2437MHz (20 dBm) 7 / 2442MHz (20 dBm) 8 / 2447MHz (20 dBm) 9 / 2452MHz (20 dBm) 10 / 2457MHz (20 dBm) 11 / 2462MHz (20 dBm) 12 / 2467MHz (20 dBm) 13 / 2472MHz (20 dBm) 36 / 5180MHz (30 dBm) 38 / 5190MHz (30 dBm) 40 / 5200MHz (30 dBm) 42 / 5210MHz (30 dBm) 44 / 5220MHz (30 dBm) 46 / 5230MHz (30 dBm) 48 / 5240MHz (30 dBm) 50 / 5250MHz (24 dBm) 52 / 5260MHz (24 dBm) 54 / 5270MHz (24 dBm) 56 / 5280MHz (24 dBm) 58 / 5290MHz (24 dBm) 60 / 5300MHz (24 dBm) 62 / 5310MHz (24 dBm) 64 / 5320MHz (24 dBm) 100 / 5500MHz (24 dBm) 102 / 5510MHz (24 dBm) 104 / 5520MHz (24 dBm) 106 / 5530MHz (24 dBm) 108 / 5540MHz (24 dBm) 110 / 5550MHz (24 dBm) 112 / 5560MHz (24 dBm) 114 / 5570MHz (24 dBm) 116 / 5580MHz (24 dBm) 118 / 5590MHz (24 dBm) 120 / 5600MHz (24 dBm) 122 / 5610MHz (24 dBm) 124 / 5620MHz (24 dBm) 126 / 5630MHz (24 dBm) 128 / 5640MHz (24 dBm) 130 / 5650MHz (24 dBm) 132 / 5660MHz (24 dBm) 134 / 5670MHz (24 dBm) 136 / 5680MHz (24 dBm) 138 / 5690MHz (24 dBm) 140 / 5700MHz (24 dBm) 149 / 5745MHz (30 dBm) 151 / 5755MHz (30 dBm) 153 / 5765MHz (30 dBm) 155 / 5775MHz (30 dBm) 157 / 5785MHz (30 dBm) 159 / 5795MHz (30 dBm) 161 / 5805MHz (30 dBm) 165 / 5825MHz (30 dBm) terminating... but not on the Intel driver. or rtl8192cu $ sudo hcxdumptool -i wlp39s0f3u1u4 -C initialization of hcxdumptool 6.2.4-62-gafc9e51... available channels: 1 / 2412MHz (20 dBm) 2 / 2417MHz (20 dBm) 3 / 2422MHz (20 dBm) 4 / 2427MHz (20 dBm) 5 / 2432MHz (20 dBm) 6 / 2437MHz (20 dBm) 7 / 2442MHz (20 dBm) 8 / 2447MHz (20 dBm) 9 / 2452MHz (20 dBm) 10 / 2457MHz (20 dBm) 11 / 2462MHz (20 dBm) 12 / 2467MHz (20 dBm) 13 / 2472MHz (20 dBm) terminating... It looks like the Intel driver is handling ioctl(SIOCSIWFREQ) a little bit different compared to the other drivers, but unfortunately I have no Intel chipset here, to test that. or ath9k_htc $ sudo hcxdumptool -i wlp39s0f3u1u4 -C initialization of hcxdumptool 6.2.4-62-gafc9e51... available channels: 1 / 2412MHz (20 dBm) 2 / 2417MHz (20 dBm) 3 / 2422MHz (20 dBm) 4 / 2427MHz (20 dBm) 5 / 2432MHz (20 dBm) 6 / 2437MHz (20 dBm) 7 / 2442MHz (20 dBm) 8 / 2447MHz (20 dBm) 9 / 2452MHz (20 dBm) 10 / 2457MHz (20 dBm) 11 / 2462MHz (20 dBm) 12 / 2467MHz (20 dBm) 13 / 2472MHz (20 dBm) 14 / 2484MHz (20 dBm) terminating... Don't forget that I'm able to completely crash my system. No matter what (e.g. setting an illegal channel), I don't think a wireless driver should ever crash a system so that hard rebooting is the only option. (In reply to Iyán from comment #10) > Don't forget that I'm able to completely crash my system. No matter what > (e.g. setting an illegal channel), I don't think a wireless driver should > ever crash a system so that hard rebooting is the only option. Yeah I saw that, though unfortunately you haven't been able to provide the dump. Could you run the tool that causes this from the VT (ctrl-alt-f1/f2/f3/f4 etc until you see a login prompt, then log in there) and take a picture when it crashes? The logs you've provided are all cut off before the crash. What I've seen in the logs so far is some regulatory checking in the firmware, which is odd because it's going to channel 14 and 88 etc. but that's not really the cause of the crashes, I think. It's not supposed to be possible *either*, but perhaps not our first priority here. Created attachment 299809 [details]
attachment-22035-0.html
Hi,
Thank you for your email.
I'm OOO today 1/12/21.
Available by WhatsApp for urgent issues.
Thanks,
Golan
(In reply to Johannes Berg from comment #11) > (In reply to Iyán from comment #10) > > Don't forget that I'm able to completely crash my system. No matter what > > (e.g. setting an illegal channel), I don't think a wireless driver should > > ever crash a system so that hard rebooting is the only option. > > Yeah I saw that, though unfortunately you haven't been able to provide the > dump. > > Could you run the tool that causes this from the VT (ctrl-alt-f1/f2/f3/f4 > etc until you see a login prompt, then log in there) and take a picture when > it crashes? The logs you've provided are all cut off before the crash. So I logged in into two VTs. In the first one I run hcxdumptool -i wlp39s0f3u1u4 --do_rcascan, while in the second I left dmesg -w running. I attach a photo of the second terminal when laptop crashes. First one doesn't show anything useful. Not every time computer crashes. Sometimes, hcxdumptool terminates after reaching 100 driver issues. > What I've seen in the logs so far is some regulatory checking in the > firmware, which is odd because it's going to channel 14 and 88 etc. but > that's not really the cause of the crashes, I think. It's not supposed to be > possible *either*, but perhaps not our first priority here. I have crda and wireless-regdb installed, and WIRELESS_REGDOM=CH configured in /etc/conf.d/wireless-regdom, just in case it's relevant. Created attachment 299819 [details]
Photo of dmesg on a VT when laptop crashed
Okay, after a few more tries, I was able to crash the laptop again while keeping the hcxdumptool VT open. I attach another photo with the error that I got there. Created attachment 299821 [details]
Error after hcxdumptool crashed
Thanks for all your efforts! That's unfortunately only half the information, I don't know what's going on, why did part of the screen go blank?! If you have wired network on this system, it might be worth trying to configure netconsole (https://www.kernel.org/doc/html/latest/networking/netconsole.html) and send all the debug data to another machine on the local network, but failing that I guess I should somehow try to replicate your setup. Created attachment 299831 [details]
attachment-32274-0.html
Hi,
Thank you for your email.
I'm OOO today 2/12/21.
Available by WhatsApp for urgent issues.
Thanks,
Golan
(In reply to Johannes Berg from comment #17) > Thanks for all your efforts! > > That's unfortunately only half the information, I don't know what's going > on, why did part of the screen go blank?! I don't know. Only the bottom was showing some messages. > If you have wired network on this system, it might be worth trying to > configure netconsole > (https://www.kernel.org/doc/html/latest/networking/netconsole.html) and send > all the debug data to another machine on the local network, but failing that > I guess I should somehow try to replicate your setup. I think I can try that, but it will have to wait till this evening/night. I will write here if I get any additional logs. Thanks, Iyán Created attachment 299857 [details]
Netconsole log during laptop crash
I attached the log from netconsole. Hope it helps. If not, please let me know what else could I try from my part. Thanks, Iyán Created attachment 299859 [details]
Caps Lock led behaviour after crash
This is a video of the weird behavior of the Caps Lock led after a system crash. Couldn't find any information on the lenovo documentation about what this mean. Probably not relevant here, but just in case.
Any updates about this? Could I provide anything else that would be useful to debug this? Looking at this now ... strangely even the netconsole doesn't contain any indication that the system actually *crashed*? Are you in X/wayland when this happens? maybe you can make it happen from the console (ctrl-alt-f2/f3/... or so) and then take a picture when the crash occurs on the screen, there it should normally be printed. Still strange that netconsole has nothing though. The firmware crashing is kind of maybe normal - you're trying to transmit on channels that it's not allowed to transmit on. We should probably prevent that properly though in the driver, not sure why that's going through. Good to the you have found the time to look at it. To get all frequencies, supported by the device I use ioctl(SIOCSIWFREQ) https://github.com/ZerBea/hcxdumptool/blob/master/hcxdumptool.c#L7380 and check the return values of ioctl(SIOCGIWFREQ) to get frequency an exponent: pwrq.u.freq.m pwrq.u.freq.e https://github.com/torvalds/linux/blob/master/include/uapi/linux/wireless.h#L706 Looks like this procedure crashed the driver. BTW: 4 days ago I received another issue report regarding the driver. Maybe related to this one, too: https://github.com/ZerBea/hcxdumptool/issues/245 I noticed a similar behavior on some Realtek drivers, too. Looks like they dropped some ioctl() call support an favor on NETLINK. In this case it is mandatory to set promiscuous by NETLINK instead of ioctl(SIOCSIWMODE) otherwise the device is not initialized correctly. NETLINK is fine, but unfortunately very much asynchronous. We need purely synchronous ioctl() system calls. Let's not mix this up - netlink/ioctl are basically equivalent for it to be synchronous you just need to wait for the ACK messages. The wext ioctls are in fact deprecated - I suggest you look at replacing them very soon, new generation hardware will likely not support them any more: https://patchwork.kernel.org/project/linux-wireless/list/?series=692204&state=%2A&archive=both Anyway - my point was that I don't see a kernel *crash* here, even in the netconsole. Are you saying "kernel crashed" because there's a stack dump (which really came from a warning)? OTOH, the caps lock LED flashing _would_ have been a real kernel crash. So I'm not sure what's going on, because apart from the LED flashing video I see no evidence of a real crash, even in the netconsole. Like I said in comment #24, maybe you can make it happen from the console? Created attachment 303535 [details]
Crash log
This is the crash log by `journalctl -b-1 -p3`.
My device is Intel Corporation Wi-Fi 6 AX201 160MHz.
Thanks for you explanation. Well, you're in good company - I'm not sure what's going on, too. Unfortunately I can't test it, because I have "wext friendly" devices (mt76 and rt2x00), only. But I asked the last user who reported this to join our conversation here and to provide his log files. Maybe we will get some additional information. Aircrack-ng use libnl and doesn't seem to have this problems, that's why my first thought was that it is related to ioctl(). https://github.com/aircrack-ng/aircrack-ng/issues/2274 BTW: Thank you for your recommendation. Currently I'm testing NETLINK communication (without using libnl). (In reply to Johannes Berg from comment #24) > Looking at this now ... strangely even the netconsole doesn't contain any > indication that the system actually *crashed*? > > Are you in X/wayland when this happens? maybe you can make it happen from > the console (ctrl-alt-f2/f3/... or so) and then take a picture when the > crash occurs on the screen, there it should normally be printed. Still > strange that netconsole has nothing though. No, I was not using X/Wayland when it happened. I was running hcxdumptool in VT 3 and dmesg -w in VT 4 as it was suggested in previous comments. > The firmware crashing is kind of maybe normal - you're trying to transmit on > channels that it's not allowed to transmit on. We should probably prevent > that properly though in the driver, not sure why that's going through. I tried to replicate the issue again with same hardware but with latest kernel, firmware and BIOS versions, and I can still obtain the total crash with caps lock key led blinking. This time the VT running dmesg didn't went half empty. I attach a new photo. Created attachment 303536 [details]
New photo of dmesg on a VT when laptop crashed
The lines that cannot be read are not a photo artifact, they were shown exactly like that on my laptop when it crashed. Yeah looks like something got corrupted - but still great! Could you attach iwlmvm.ko from the crash? Might make it a bit easier. @Johannes Berg As suggested in your commend: https://bugzilla.kernel.org/show_bug.cgi?id=214291#c27 I removed WEXT completely. In addition to that, hcxdumptool will now get the supported frequencies via NL80211_ATTR_WIPHY_BANDS. Also it take care about the wireless regulatory domain settings and skip disabled channels. @ Iyán I don't have an Intel device to test. Please try latest git head of hcxdumptool (NL80211 version) and let's figure out if the driver still crashes. I can reproduce the issue with the latest hcxdumptool. Exactly same behavior as originally described. @Johannes Berg: sorry, I missed your previous comment. What file exactly do you need? By the way, I still can't get anything useful from a netconsole. I have provided all the data you have asked (except the last comment), versions, screenshots, even a video... you cannot close this bug saying "not enough data". If Intel has no interest in fixing/supporting monitor and injection mode in iwlwifi, please stay that clearly so I don't waste more time in reporting bugs. |