Bug 214291

Summary: iwlwifi: AX201: Packet injection crashes system
Product: Drivers Reporter: Iyán (me)
Component: network-wireless-intelAssignee: Iyán (me)
Status: REOPENED ---    
Severity: normal CC: golan.ben.ami, linsy_king, ZeroBeat
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.13.13 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output during iwlwifi crashing
attachment-22035-0.html
Photo of dmesg on a VT when laptop crashed
Error after hcxdumptool crashed
attachment-32274-0.html
Netconsole log during laptop crash
Caps Lock led behaviour after crash
Crash log
New photo of dmesg on a VT when laptop crashed

Description Iyán 2021-09-02 15:32:47 UTC
I opened an issue at hcxdumptool [1] github repo since I didn't know where to begin, but after some discussion I was told to open a bug report here.

There seems to be an issue with iwlwifi and packet injection, at least with the AX201 chipset (I don't have any others to try). dmesg shows plenty of errors when running aireplay-ng --test or hcxdumptool --check_injection. Injection is broken and I cannot deauth clients that are literally next to my laptop on many tests. I was even able to crash completely the system (I had to force a reboot with the power button) by running a hcxdumptool -s 1 --do_rcascan. Laptop froze and kept the caps lock key led blinking, although after checking the Thinkpad manual and contacting their support they couldn't tell me what it means.

I will attach a full dmesg file running hcxdumptool --check_driver, hcxdumptool --check_injection and hcxdumptool --do_rcascan.

Please, since this is my first bug report here, let me know anything else that I could provide from my part that may be helpful to debug this issue.

[1] https://github.com/ZerBea/hcxdumptool/issues/186
Comment 1 Iyán 2021-09-02 15:44:51 UTC
My laptop actually crashed again while trying to generate the dmesg log file, so I provide here the output of journalctl -k -b -1. I guess it's almost the same as from dmesg directly.
Comment 2 Iyán 2021-09-02 15:45:44 UTC
Created attachment 298643 [details]
dmesg output during iwlwifi crashing
Comment 3 Michael 2021-09-02 16:52:36 UTC
Could be related to this issue:
https://bbs.archlinux.org/viewtopic.php?id=254766
Comment 4 Michael 2021-09-02 16:54:34 UTC
and this one:
https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg5936552.html
"Title:
  Intel AX210 iwlwifi firmware crash under stress tests: Microcode SW
  error detected. Restarting 0x0."

And for sure, running hcxdumptool/hcxlabtool is a stress test for every driver!
Comment 5 Johannes Berg 2021-11-30 14:37:18 UTC
Did you change the kernel in any way? If I try to switch sniffer to channel 88, I get

kernel reports: Channel is disabled
command failed: Invalid argument (-22)
Comment 6 Golan Ben Ami 2021-11-30 17:39:48 UTC
This 0x200014FC | ADVANCED_SYSASSERT indicates that the channel configured is not a legal channel for this BW.
This assert is here to defend from regulatory violation.
Comment 7 Michael 2021-11-30 19:46:30 UTC
@ Golan Ben Ami
Thanks for that information, which is useful.
hcxdumptool scan the whole frequency range (depending on the band) by ioctl(SIOCSIWFREQ). If the result is not -1, I assume it is a valid frequency.

The regulatory domain is respected and it is working fine on several drivers, e.g. on mt76x0u:
$ sudo hcxdumptool -i wlp39s0f3u1u1u1 -C
initialization of hcxdumptool 6.2.4-62-gafc9e51...
available channels:
  1 / 2412MHz (14 dBm)
  2 / 2417MHz (14 dBm)
  3 / 2422MHz (14 dBm)
  4 / 2427MHz (14 dBm)
  5 / 2432MHz (14 dBm)
  6 / 2437MHz (14 dBm)
  7 / 2442MHz (14 dBm)
  8 / 2447MHz (14 dBm)
  9 / 2452MHz (14 dBm)
 10 / 2457MHz (14 dBm)
 11 / 2462MHz (14 dBm)
 12 / 2467MHz (14 dBm)
 13 / 2472MHz (14 dBm)
 36 / 5180MHz (17 dBm)
 40 / 5200MHz (17 dBm)
 44 / 5220MHz (17 dBm)
 48 / 5240MHz (17 dBm)
 52 / 5260MHz (17 dBm)
 56 / 5280MHz (17 dBm)
 60 / 5300MHz (17 dBm)
 64 / 5320MHz (17 dBm)
100 / 5500MHz (17 dBm)
104 / 5520MHz (17 dBm)
108 / 5540MHz (17 dBm)
112 / 5560MHz (17 dBm)
116 / 5580MHz (17 dBm)
120 / 5600MHz (17 dBm)
124 / 5620MHz (17 dBm)
128 / 5640MHz (17 dBm)
132 / 5660MHz (17 dBm)
136 / 5680MHz (17 dBm)
140 / 5700MHz (17 dBm)
144 / 5720MHz (17 dBm)
149 / 5745MHz (17 dBm)
153 / 5765MHz (17 dBm)
157 / 5785MHz (17 dBm)
161 / 5805MHz (17 dBm)
165 / 5825MHz (17 dBm)
169 / 5845MHz (17 dBm)
173 / 5865MHz (17 dBm)


or rt2800usb
$ sudo hcxdumptool -i wlp39s0f3u1u4 -C
initialization of hcxdumptool 6.2.4-62-gafc9e51...
available channels:
  1 / 2412MHz (20 dBm)
  2 / 2417MHz (20 dBm)
  3 / 2422MHz (20 dBm)
  4 / 2427MHz (20 dBm)
  5 / 2432MHz (20 dBm)
  6 / 2437MHz (20 dBm)
  7 / 2442MHz (20 dBm)
  8 / 2447MHz (20 dBm)
  9 / 2452MHz (20 dBm)
 10 / 2457MHz (20 dBm)
 11 / 2462MHz (20 dBm)
 12 / 2467MHz (20 dBm)
 13 / 2472MHz (20 dBm)
 36 / 5180MHz (30 dBm)
 38 / 5190MHz (30 dBm)
 40 / 5200MHz (30 dBm)
 42 / 5210MHz (30 dBm)
 44 / 5220MHz (30 dBm)
 46 / 5230MHz (30 dBm)
 48 / 5240MHz (30 dBm)
 50 / 5250MHz (24 dBm)
 52 / 5260MHz (24 dBm)
 54 / 5270MHz (24 dBm)
 56 / 5280MHz (24 dBm)
 58 / 5290MHz (24 dBm)
 60 / 5300MHz (24 dBm)
 62 / 5310MHz (24 dBm)
 64 / 5320MHz (24 dBm)
100 / 5500MHz (24 dBm)
102 / 5510MHz (24 dBm)
104 / 5520MHz (24 dBm)
106 / 5530MHz (24 dBm)
108 / 5540MHz (24 dBm)
110 / 5550MHz (24 dBm)
112 / 5560MHz (24 dBm)
114 / 5570MHz (24 dBm)
116 / 5580MHz (24 dBm)
118 / 5590MHz (24 dBm)
120 / 5600MHz (24 dBm)
122 / 5610MHz (24 dBm)
124 / 5620MHz (24 dBm)
126 / 5630MHz (24 dBm)
128 / 5640MHz (24 dBm)
130 / 5650MHz (24 dBm)
132 / 5660MHz (24 dBm)
134 / 5670MHz (24 dBm)
136 / 5680MHz (24 dBm)
138 / 5690MHz (24 dBm)
140 / 5700MHz (24 dBm)
149 / 5745MHz (30 dBm)
151 / 5755MHz (30 dBm)
153 / 5765MHz (30 dBm)
155 / 5775MHz (30 dBm)
157 / 5785MHz (30 dBm)
159 / 5795MHz (30 dBm)
161 / 5805MHz (30 dBm)
165 / 5825MHz (30 dBm)

terminating...

but not on the Intel driver.
Comment 8 Michael 2021-11-30 19:52:58 UTC
or rtl8192cu
$ sudo hcxdumptool -i wlp39s0f3u1u4 -C
initialization of hcxdumptool 6.2.4-62-gafc9e51...
available channels:
  1 / 2412MHz (20 dBm)
  2 / 2417MHz (20 dBm)
  3 / 2422MHz (20 dBm)
  4 / 2427MHz (20 dBm)
  5 / 2432MHz (20 dBm)
  6 / 2437MHz (20 dBm)
  7 / 2442MHz (20 dBm)
  8 / 2447MHz (20 dBm)
  9 / 2452MHz (20 dBm)
 10 / 2457MHz (20 dBm)
 11 / 2462MHz (20 dBm)
 12 / 2467MHz (20 dBm)
 13 / 2472MHz (20 dBm)

terminating...

It looks like the Intel driver is handling ioctl(SIOCSIWFREQ) a little bit different compared to the other drivers, but unfortunately I have no Intel chipset here, to test that.
Comment 9 Michael 2021-11-30 19:56:41 UTC
or ath9k_htc

$ sudo hcxdumptool -i wlp39s0f3u1u4 -C
initialization of hcxdumptool 6.2.4-62-gafc9e51...
available channels:
  1 / 2412MHz (20 dBm)
  2 / 2417MHz (20 dBm)
  3 / 2422MHz (20 dBm)
  4 / 2427MHz (20 dBm)
  5 / 2432MHz (20 dBm)
  6 / 2437MHz (20 dBm)
  7 / 2442MHz (20 dBm)
  8 / 2447MHz (20 dBm)
  9 / 2452MHz (20 dBm)
 10 / 2457MHz (20 dBm)
 11 / 2462MHz (20 dBm)
 12 / 2467MHz (20 dBm)
 13 / 2472MHz (20 dBm)
 14 / 2484MHz (20 dBm)

terminating...
Comment 10 Iyán 2021-11-30 20:29:40 UTC
Don't forget that I'm able to completely crash my system. No matter what (e.g. setting an illegal channel), I don't think a wireless driver should ever crash a system so that hard rebooting is the only option.
Comment 11 Johannes Berg 2021-12-01 08:02:32 UTC
(In reply to Iyán from comment #10)
> Don't forget that I'm able to completely crash my system. No matter what
> (e.g. setting an illegal channel), I don't think a wireless driver should
> ever crash a system so that hard rebooting is the only option.

Yeah I saw that, though unfortunately you haven't been able to provide the dump.

Could you run the tool that causes this from the VT (ctrl-alt-f1/f2/f3/f4 etc until you see a login prompt, then log in there) and take a picture when it crashes? The logs you've provided are all cut off before the crash.

What I've seen in the logs so far is some regulatory checking in the firmware, which is odd because it's going to channel 14 and 88 etc. but that's not really the cause of the crashes, I think. It's not supposed to be possible *either*, but perhaps not our first priority here.
Comment 12 Golan Ben Ami 2021-12-01 08:03:05 UTC
Created attachment 299809 [details]
attachment-22035-0.html

Hi,
Thank you for your email.
I'm OOO today 1/12/21.

Available by WhatsApp for urgent issues.

Thanks,
Golan
Comment 13 Iyán 2021-12-01 22:05:47 UTC
(In reply to Johannes Berg from comment #11)
> (In reply to Iyán from comment #10)
> > Don't forget that I'm able to completely crash my system. No matter what
> > (e.g. setting an illegal channel), I don't think a wireless driver should
> > ever crash a system so that hard rebooting is the only option.
> 
> Yeah I saw that, though unfortunately you haven't been able to provide the
> dump.
> 
> Could you run the tool that causes this from the VT (ctrl-alt-f1/f2/f3/f4
> etc until you see a login prompt, then log in there) and take a picture when
> it crashes? The logs you've provided are all cut off before the crash.

So I logged in into two VTs. In the first one I run hcxdumptool -i wlp39s0f3u1u4 --do_rcascan, while in the second I left dmesg -w running. I attach a photo of the second terminal when laptop crashes. First one doesn't show anything useful. Not every time computer crashes. Sometimes, hcxdumptool terminates after reaching 100 driver issues.

> What I've seen in the logs so far is some regulatory checking in the
> firmware, which is odd because it's going to channel 14 and 88 etc. but
> that's not really the cause of the crashes, I think. It's not supposed to be
> possible *either*, but perhaps not our first priority here.

I have crda and wireless-regdb installed, and WIRELESS_REGDOM=CH configured in /etc/conf.d/wireless-regdom, just in case it's relevant.
Comment 14 Iyán 2021-12-01 22:08:48 UTC
Created attachment 299819 [details]
Photo of dmesg on a VT when laptop crashed
Comment 15 Iyán 2021-12-01 22:28:15 UTC
Okay, after a few more tries, I was able to crash the laptop again while keeping the hcxdumptool VT open. I attach another photo with the error that I got there.
Comment 16 Iyán 2021-12-01 22:28:44 UTC
Created attachment 299821 [details]
Error after hcxdumptool crashed
Comment 17 Johannes Berg 2021-12-02 08:22:03 UTC
Thanks for all your efforts!

That's unfortunately only half the information, I don't know what's going on, why did part of the screen go blank?!

If you have wired network on this system, it might be worth trying to configure netconsole (https://www.kernel.org/doc/html/latest/networking/netconsole.html) and send all the debug data to another machine on the local network, but failing that I guess I should somehow try to replicate your setup.
Comment 18 Golan Ben Ami 2021-12-02 08:22:22 UTC
Created attachment 299831 [details]
attachment-32274-0.html

Hi,
Thank you for your email.
I'm OOO today 2/12/21.

Available by WhatsApp for urgent issues.

Thanks,
Golan
Comment 19 Iyán 2021-12-02 08:30:26 UTC
(In reply to Johannes Berg from comment #17)
> Thanks for all your efforts!
> 
> That's unfortunately only half the information, I don't know what's going
> on, why did part of the screen go blank?!

I don't know. Only the bottom was showing some messages.

> If you have wired network on this system, it might be worth trying to
> configure netconsole
> (https://www.kernel.org/doc/html/latest/networking/netconsole.html) and send
> all the debug data to another machine on the local network, but failing that
> I guess I should somehow try to replicate your setup.

I think I can try that, but it will have to wait till this evening/night. I will write here if I get any additional logs.

Thanks,
Iyán
Comment 20 Iyán 2021-12-03 23:49:24 UTC
Created attachment 299857 [details]
Netconsole log during laptop crash
Comment 21 Iyán 2021-12-03 23:50:10 UTC
I attached the log from netconsole. Hope it helps. If not, please let me know what else could I try from my part.

Thanks,
Iyán
Comment 22 Iyán 2021-12-04 00:06:14 UTC
Created attachment 299859 [details]
Caps Lock led behaviour after crash

This is a video of the weird behavior of the Caps Lock led after a system crash. Couldn't find any information on the lenovo documentation about what this mean. Probably not relevant here, but just in case.
Comment 23 Iyán 2021-12-13 09:26:55 UTC
Any updates about this? Could I provide anything else that would be useful to debug this?
Comment 24 Johannes Berg 2023-01-05 15:42:08 UTC
Looking at this now ... strangely even the netconsole doesn't contain any indication that the system actually *crashed*?

Are you in X/wayland when this happens? maybe you can make it happen from the console (ctrl-alt-f2/f3/... or so) and then take a picture when the crash occurs on the screen, there it should normally be printed. Still strange that netconsole has nothing though.

The firmware crashing is kind of maybe normal - you're trying to transmit on channels that it's not allowed to transmit on. We should probably prevent that properly though in the driver, not sure why that's going through.
Comment 25 Michael 2023-01-05 16:37:50 UTC
Good to the you have found the time to look at it.


To get all frequencies, supported by the device I use ioctl(SIOCSIWFREQ)
https://github.com/ZerBea/hcxdumptool/blob/master/hcxdumptool.c#L7380

and check the return values of ioctl(SIOCGIWFREQ) to get frequency an exponent:
pwrq.u.freq.m
pwrq.u.freq.e 
https://github.com/torvalds/linux/blob/master/include/uapi/linux/wireless.h#L706

Looks like this procedure crashed the driver.

BTW: 4 days ago I received another issue report regarding the driver. Maybe related to this one, too:
https://github.com/ZerBea/hcxdumptool/issues/245
Comment 26 Michael 2023-01-05 16:51:11 UTC
I noticed a similar behavior on some Realtek drivers, too. Looks like they dropped some ioctl() call support an favor on NETLINK. In this case it is mandatory to set promiscuous by NETLINK instead of ioctl(SIOCSIWMODE) otherwise the device is not initialized correctly.
NETLINK is fine, but unfortunately very much asynchronous. We need purely synchronous ioctl() system calls.
Comment 27 Johannes Berg 2023-01-06 08:07:44 UTC
Let's not mix this up - netlink/ioctl are basically equivalent for it to be synchronous you just need to wait for the ACK messages. The wext ioctls are in fact deprecated - I suggest you look at replacing them very soon, new generation hardware will likely not support them any more:

https://patchwork.kernel.org/project/linux-wireless/list/?series=692204&state=%2A&archive=both


Anyway - my point was that I don't see a kernel *crash* here, even in the netconsole. Are you saying "kernel crashed" because there's a stack dump (which really came from a warning)? OTOH, the caps lock LED flashing _would_ have been a real kernel crash.

So I'm not sure what's going on, because apart from the LED flashing video I see no evidence of a real crash, even in the netconsole.

Like I said in comment #24, maybe you can make it happen from the console?
Comment 28 Linsy King 2023-01-06 10:06:14 UTC
Created attachment 303535 [details]
Crash log

This is the crash log by `journalctl -b-1 -p3`.

My device is Intel Corporation Wi-Fi 6 AX201 160MHz.
Comment 29 Michael 2023-01-06 10:08:36 UTC
Thanks for you explanation.
Well, you're in good company - I'm not sure what's going on, too.

Unfortunately I can't test it, because I have "wext friendly" devices (mt76 and rt2x00), only. But I asked the last user who reported this to join our conversation here and to provide his log files. Maybe we will get some additional information. 
Aircrack-ng use libnl and doesn't seem to have this problems, that's why my first thought was that it is related to ioctl().
https://github.com/aircrack-ng/aircrack-ng/issues/2274

BTW:
Thank you for your recommendation. Currently I'm testing NETLINK communication (without using libnl).
Comment 30 Iyán 2023-01-06 11:15:56 UTC
(In reply to Johannes Berg from comment #24)
> Looking at this now ... strangely even the netconsole doesn't contain any
> indication that the system actually *crashed*?
> 
> Are you in X/wayland when this happens? maybe you can make it happen from
> the console (ctrl-alt-f2/f3/... or so) and then take a picture when the
> crash occurs on the screen, there it should normally be printed. Still
> strange that netconsole has nothing though.

No, I was not using X/Wayland when it happened. I was running hcxdumptool in VT 3 and dmesg -w in VT 4 as it was suggested in previous comments.

> The firmware crashing is kind of maybe normal - you're trying to transmit on
> channels that it's not allowed to transmit on. We should probably prevent
> that properly though in the driver, not sure why that's going through.

I tried to replicate the issue again with same hardware but with latest kernel, firmware and BIOS versions, and I can still obtain the total crash with caps lock key led blinking. This time the VT running dmesg didn't went half empty. I attach a new photo.
Comment 31 Iyán 2023-01-06 11:17:16 UTC
Created attachment 303536 [details]
New photo of dmesg on a VT when laptop crashed
Comment 32 Iyán 2023-01-06 11:19:41 UTC
The lines that cannot be read are not a photo artifact, they were shown exactly like that on my laptop when it crashed.
Comment 33 Johannes Berg 2023-01-09 08:33:33 UTC
Yeah looks like something got corrupted - but still great! Could you attach iwlmvm.ko from the crash? Might make it a bit easier.
Comment 34 Michael 2023-04-26 13:31:50 UTC
@Johannes Berg
As suggested in your commend:
https://bugzilla.kernel.org/show_bug.cgi?id=214291#c27
I removed WEXT completely. In addition to that, hcxdumptool will now get the supported frequencies via NL80211_ATTR_WIPHY_BANDS. Also it take care about the wireless regulatory domain settings and skip disabled channels.

@ Iyán
I don't have an Intel device to test. Please try latest git head of hcxdumptool (NL80211 version) and let's figure out if the driver still crashes.
Comment 35 Iyán 2023-07-02 10:40:45 UTC
I can reproduce the issue with the latest hcxdumptool. Exactly same behavior as originally described.

@Johannes Berg: sorry, I missed your previous comment. What file exactly do you need?

By the way, I still can't get anything useful from a netconsole.
Comment 36 Iyán 2023-07-02 10:45:53 UTC
I have provided all the data you have asked (except the last comment), versions, screenshots, even a video... you cannot close this bug saying "not enough data".

If Intel has no interest in fixing/supporting monitor and injection mode in iwlwifi, please stay that clearly so I don't waste more time in reporting bugs.