Bug 218884

Summary: mac80211: simplify non-chanctx drivers (kernel 6.9) breaks monitor mode
Product: Drivers Reporter: Michael (ZeroBeat)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: regressions
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: >= 6.9 Subsystem:
Regression: Yes Bisected commit-id: 0a44dfc070749514b804ccac0b1fd38718f7daa1
Attachments: bisect output of discovered commit

Description Michael 2024-05-24 17:04:51 UTC
Some features are broken since kernel 6.9.1 when running monitor mode.

First bug:
Switching a channel via "NL80211_ATTR_WIPHY_FREQ" does not switch the channel/frequency.

That is not a device driver problem, because all device drivers (Mediatek, Ralink, Realtek, ...) are affected.

To reproduce:
set monitor mode (by iw)
change channel (by iw)
check if channel has been changed (by iw)
record traffic and check radiotap header (Channel frequency)

More information is here:
https://github.com/ZerBea/hcxdumptool/discussions/454
https://github.com/morrownr/8821au-20210708/issues/133#issuecomment-2125425552
confirmed by other users:
https://github.com/morrownr/8821au-20210708/issues/133#issuecomment-2125392151


Second bug:
frame injection is broken

To reproduce:
Try to send a 80211 frame via raw socket (PF_PACKET). It is not transmitted over the air
https://github.com/ZerBea/hcxdumptool/discussions/454


In monitor mode, none of the WiFI tools (iw, Wireshark, airodump-ng, aireplay-ng, hcxdumptool, hcxlabtool is able to switshc the channel or to transmit 80211 frames since kernel 6.9
Comment 1 Michael 2024-05-26 09:00:58 UTC
BTW:
Testing monitor mode mode these days is no easy task, because most of the device drivers are not working as expected:
https://bugzilla.kernel.org/show_bug.cgi?id=218528
https://bugzilla.kernel.org/show_bug.cgi?id=217465
https://bugzilla.kernel.org/show_bug.cgi?id=218528
https://github.com/openwrt/mt76/issues/839

And now, since kernel 6.9.1, the mac stack is broken, too.
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-05-31 07:27:07 UTC
With a bit of luck this ticket might lead to some result, but I'd say it's unlikely, as it does a few things wrong that are listed on https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/

Most importantly: 

* having two bugs reported in one ticket

* reporting it here in bugzilla, as the responsible developers are unlike to see this, as most of them afaics are not following bugzilla (and don't even get a copy of bugs filed here).

If you report the 6.9 regression (e.g. your second bug) in a new ticket (CC me!) I'll forward it to the developers; ideally check 6.10-rc and use a git bisection to find the culprit, as developers then will be obliged to fix it (I'll help with that).

See also: https://docs.kernel.org/admin-guide/reporting-issues.html
Comment 3 Michael 2024-05-31 07:46:49 UTC
Thanks for the information. Maybe I put it the wrong way.
This are not two different bugs, because both problems are related to each other.
In monitor mode all commands are not forwarded from nl80211 to the device drivers.
I've only checked two of them: switching a channel and transmitting a frame.
I'm sure there are more.

I've already checked 6.10-rc:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/net/wireless/nl80211.c?id=v6.10-rc1&id2=v6.9
There are no changes that solve this problem.
Comment 4 Michael 2024-05-31 07:50:10 UTC
Instead of:

First bug:
Switching a channel via "NL80211_ATTR_WIPHY_FREQ" does not switch the channel/frequency.

Second bug:
frame injection is broken

Let's call it:
In monitor mode, commands via NL80211 are not forwarded to the device drivers.
Comment 5 Michael 2024-05-31 07:55:38 UTC
Thanks for your offer. It would be great if you can forward this to the nl80211 developers.
Comment 6 Michael 2024-05-31 08:04:10 UTC
Closed this report because:
"reporting it here in bugzilla, as the responsible developers are unlike to see this, as most of them afaics are not following bugzilla (and don't even get a copy of bugs filed here)"
Comment 7 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-05-31 08:06:46 UTC
No need to close this; and FWIW, I briefly mentioned this to the developers already, without effect: https://lore.kernel.org/all/a51f223f-18ac-4d67-9120-8da1c169b7eb@leemhuis.info/

A bisection result would make the difference https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html

But before doing one you might want to run a search like https://lore.kernel.org/all/?q=net%2Fwireless%2Fnl80211.c to see if fixes are in the works already.
Comment 8 Michael 2024-05-31 08:14:25 UTC
Again thanks for this information.
I've already treid to check the nl80211 code, but it is really complex.
My work is to code some penetration testing tools that use this stack:
https://github.com/ZerBea/hcxdumptool

I can't reach the developers here and I don't want to waste your time with this problem which affects only a few penetration testers.

But I'm sure, we get more issue reports when the most common penetration distributions move to this kernel.

Again, thanks for your kind help.
Comment 9 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-05-31 08:23:51 UTC
(In reply to Michael from comment #8)
> I can't reach the developers here and I don't want to waste your time 

Don't worry about that, that's what I'm here for; just bisect the problem please, otherwise it might never be fixed; and it should, see https://docs.kernel.org/process/handling-regressions.html
Comment 10 Michael 2024-06-03 10:32:55 UTC
Created attachment 306404 [details]
bisect output of discovered commit
Comment 11 Michael 2024-06-03 10:35:11 UTC
reopened as recommended:
https://bugzilla.kernel.org/show_bug.cgi?id=218884#c7

bisected as suggested:
https://bugzilla.kernel.org/show_bug.cgi?id=218884#c9
# good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8
git bisect good e8f897f4afef0031fe618a8e94127a0934896aba
# Status: warte auf schlechten Commit, 1 guter Commit bekannt
# bad: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect bad a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel
git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910
# bad: [9187210eee7d87eea37b45ea93454a88681894a4] Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad 9187210eee7d87eea37b45ea93454a88681894a4
# good: [a01c9fe32378636ae65bec8047b5de3fdb2ba5c8] Merge tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect good a01c9fe32378636ae65bec8047b5de3fdb2ba5c8
# bad: [ca61ba3885274a684c83d8a538eb77b30e38ee92] Merge branch 'rework-genet-mdioclocking'
git bisect bad ca61ba3885274a684c83d8a538eb77b30e38ee92
# good: [f42822f22b1c5f72c7e3497d9683f379ab0c5fe4] bnxt_en: Use firmware provided maximum filter counts.
git bisect good f42822f22b1c5f72c7e3497d9683f379ab0c5fe4
# bad: [e10cd2ddd89e8b3e61b49247067e79f7debec2f1] wifi: rtw89: load BB parameters to PHY-1
git bisect bad e10cd2ddd89e8b3e61b49247067e79f7debec2f1
# good: [2594e4d9e1a2d79bf7bb262974abaf5ef153e371] wifi: iwlwifi: prepare for reading SAR tables from UEFI
git bisect good 2594e4d9e1a2d79bf7bb262974abaf5ef153e371
# bad: [719036ae06d4bfdb65139e3947a8404dec298bc5] wifi: cfg80211: move puncturing validation code
git bisect bad 719036ae06d4bfdb65139e3947a8404dec298bc5
# good: [1209f487d452ff7e822dec30661fd6b5163fb8cf] wifi: rtl8xxxu: Add TP-Link TL-WN823N V2
git bisect good 1209f487d452ff7e822dec30661fd6b5163fb8cf
# good: [4dbd964f33aab6f99891b9610ad4b36cc215be0d] wifi: rtw89: 8922a: add chip_ops::rfk_hw_init
git bisect good 4dbd964f33aab6f99891b9610ad4b36cc215be0d
# good: [2fd53eb04c492eb9a2b06f994b36e5cf34ba7541] wifi: mac80211: remove unused MAX_MSG_LEN define
git bisect good 2fd53eb04c492eb9a2b06f994b36e5cf34ba7541
# bad: [0a44dfc070749514b804ccac0b1fd38718f7daa1] wifi: mac80211: simplify non-chanctx drivers
git bisect bad 0a44dfc070749514b804ccac0b1fd38718f7daa1
# good: [61f0261131c8dc2beeb6b34781a54788221081e9] wifi: mac80211: clean up band switch in duration
git bisect good 61f0261131c8dc2beeb6b34781a54788221081e9
# good: [2d9698dd32d086e47b8bff3df4322cc017c17b55] wifi: mac80211: clean up HE 6 GHz and EHT chandef parsing
git bisect good 2d9698dd32d086e47b8bff3df4322cc017c17b55
# first bad commit: [0a44dfc070749514b804ccac0b1fd38718f7daa1] wifi: mac80211: simplify non-chanctx drivers

attached identified commit as requested:
https://bugzilla.kernel.org/show_bug.cgi?id=218884#c10

As I started this bug report it wasn't clear for me whether
it is a single bug (with a big impact on monitor mode) more bugs. 

I still suspect a single bug with many consequences (channel switching broken,
packet injection broken), due to massive changes of net/mac8021.
.

Now I'm waiting for a patch to figure out if they are related
to each other or not.
Comment 12 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-06-03 10:39:43 UTC
What driver are you using? 

Fixes for mt76 and rtlwifi that are related to that commit are heading towards mainline already:

https://lore.kernel.org/all/1fabb8e4-adf3-47ae-8462-8aea963bc2a5@gmail.com/
https://lore.kernel.org/all/20240528142308.3f7db1821e68.I531135d7ad76331a50244d6d5288e14aa9668390@changeid/
Comment 13 Michael 2024-06-03 10:47:46 UTC
For this test: mt7601u

But all drivers are affected.
https://github.com/ZerBea/hcxdumptool/discussions/454

Even a self compiled out of the Linux tree driver is affected:
https://github.com/lwfinger/rtw88

Unfortunately my hardware a slow and bisecting a kernel is far beyond my hardware capabilities. It took me several days to bisect the kernel.
Comment 14 Michael 2024-06-03 11:27:38 UTC
I took a closer look at the fixes you have mentioned.
If they fix the problem it is is mandatory to apply that to all Linux kernel tree wifi device drivers.
Comment 15 Michael 2024-06-03 12:56:05 UTC
The 2 patches you have mentioned above fixed 2 drivers which have been forgotten in the faulty commit.


The mt7601u has been modified modified in the faulty commit:
 const struct ieee80211_ops mt7601u_ops = {
+	.add_chanctx = ieee80211_emulate_add_chanctx,
+	.remove_chanctx = ieee80211_emulate_remove_chanctx,
+	.change_chanctx = ieee80211_emulate_change_chanctx,
+	.switch_vif_chanctx = ieee80211_emulate_switch_vif_chanctx,
 	.tx = mt7601u_tx,
 	.wake_tx_queue = ieee80211_handle_wake_tx_queue,
 	.start = mt7601u_start,
diff --git a/drivers/net/wireless/purelifi/plfxlc/mac.c b/drivers/net/wireless/purelifi/plfxlc/mac.c
index 506d2f31efb5..7a1b27764f53 100644
--- a/drivers/net/wireless/purelifi/plfxlc/mac.c
+++ b/drivers/net/wireless/purelifi/plfxlc/mac.c
@@ -685,6 +685,10 @@ static int plfxlc_set_rts_threshold(struct ieee80211_hw *hw, u32 value)
 }
So it should work, but it doesn't.

Like all tested drivers it is still affected by the bug. The patches you've mentioned above do not fix my reported bug.
Comment 16 Michael 2024-06-08 08:16:51 UTC
I'm monitoring Linux Wireless Mailing List and I have noticed massive changes on mac80211 and cfg80211 nearly every day.
Let's mark monitor mode of kernel 6.9.x as broken and set focus on 6.10?
Comment 17 Michael 2024-06-10 15:20:33 UTC
I was asked to test
$ uname -r
6.10.0-rc3-1-git

As expected, nothing has changed. Switching a channel and packet injection is still broken.
Comment 18 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-06-12 08:30:20 UTC
Might be worth giving the patch in this message a try (albeit I'm not totally sure if is it related to this issue as well):

https://lore.kernel.org/all/7869b9b29b6796c95fd5af649e4bd6696e56dcaf.camel@sipsolutions.net/
Comment 19 Michael 2024-06-13 09:27:47 UTC
Great work. The patch is working running this test conditions:
$ uname -r
6.10.0-rc3-1-git-dirty

$ lsusb
ID 7392:7710 Edimax Technology Co., Ltd Edimax Wi-Fi

$ hcxdumptool -l
   0	  4	74da38eb45fc	66c5d3c23aa0	*	wlp5s0f4u2      	mt7601u	NETLINK

$ sudo hcxdumptool -i wlp5s0f4u2 --rcascan=active
...
0 ERROR(s) during runtime
233 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
76 PROBERESPONSE(s) captured

I looked at the history and I fully agree to Ping-Ke Shih:
"We have a draft fix of rtw88 driver for RTL8821CE, but as mentioned some drivers are affected, so I don't plan to send out the patch. Instead we are looking for the fix of cfg80211/mac80211."
https://lore.kernel.org/all/0e65ca6b471b4186a370b9a57de11abe@realtek.com/

This issue affects all drivers and it's the right way to fix it in cfg80211/mac80211.

If helpful, I can test other devices/drivers except TP-Link TL-WN8200ND v3 (ID 2357:0126 TP-Link 802.11n NIC) which is affected by a driver problem:
https://bugzilla.kernel.org/show_bug.cgi?id=218528

Best regards
Michael
Comment 20 Michael 2024-06-13 17:50:51 UTC
"Savyasaachi reports that scanning for other stations in monitor mode
does not work anymore with his RTL8821CE wireless network card for linux
kernels after 6.8.9."
https://lore.kernel.org/all/chwoymvpzwtbmzryrlitpwmta5j6mtndocxsyqvdyikqu63lon@gfds653hkknl/

This device is now working, too.

I'm closing this report due to working patch.