Bug 219086

Summary: mt76 driver: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Product: Drivers Reporter: Michael (ZeroBeat)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: high CC: regressions, ZeroBeat
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: >= 6.9.5 Subsystem:
Regression: Yes Bisected commit-id: 0d9c2beed116e623ac30810d382bd67163650f98

Description Michael 2024-07-23 15:38:43 UTC
After a user opened this discussion:
https://github.com/ZerBea/hcxdumptool/discussions/465

Jul 21 05:40:39 rpi4b-aarch kernel: mt76x2u 2-2:1.0 wlan1: entered promiscuous mode
Jul 21 05:40:45 rpi4b-aarch kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Jul 21 05:40:45 rpi4b-aarch kernel: Mem abort info:
Jul 21 05:40:45 rpi4b-aarch kernel:   ESR = 0x0000000096000044
Jul 21 05:40:45 rpi4b-aarch kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 21 05:40:45 rpi4b-aarch kernel:   SET = 0, FnV = 0
Jul 21 05:40:45 rpi4b-aarch kernel:   EA = 0, S1PTW = 0
Jul 21 05:40:45 rpi4b-aarch kernel:   FSC = 0x04: level 0 translation fault
Jul 21 05:40:45 rpi4b-aarch kernel: Data abort info:
Jul 21 05:40:45 rpi4b-aarch kernel:   ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
Jul 21 05:40:45 rpi4b-aarch kernel:   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
Jul 21 05:40:45 rpi4b-aarch kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 21 05:40:45 rpi4b-aarch kernel: user pgtable: 4k pages, 48-bit VAs, pgdp=0000000041300000
Jul 21 05:40:45 rpi4b-aarch kernel: [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Jul 21 05:40:45 rpi4b-aarch kernel: Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP

I decided to run a test (AMD RYZEN & Arch Linux) on kernel 6.9.10 and 6.10 which confirmed the problem:
Trying to inject a 802.11 packet caused my AMD systems to become unresponsive.
I don't have a dmesg log, because my entire system crashed - need to power off!

To reproduce on kernel 6.9.5 up to 6.10:
plug in an ALFA AWUS036ACM (mt76x2u)
set monitor mode
set WiFi channel and inject a packet
$ sudo hcxdumptool -i wlp5s0f4u2 --rcascan=active
or
sudo ./aireplay-ng --test wlp5s0f4u2

Kernel 6.6.40 is not affected and the user reported that kernel 6.8.2 is not affected, too.
That looks like a regression and git bisect identified the commit that caused the problem:

commit 0d9c2beed116e623ac30810d382bd67163650f98
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Wed Jun 12 12:23:51 2024 +0200

    wifi: mac80211: fix monitor channel with chanctx emulation
    
    After the channel context emulation, there were reports that
    changing the monitor channel no longer works. This is because
    those drivers don't have WANT_MONITOR_VIF, so the setting the
    channel always exits out quickly.
    
    Fix this by always allocating the virtual monitor sdata, and
    simply not telling the driver about it unless it wanted to.
    This way, we have an interface/sdata to bind the chanctx to,
    and the emulation can work correctly.
    
    Cc: stable@vger.kernel.org
    Fixes: 0a44dfc07074 ("wifi: mac80211: simplify non-chanctx drivers")
    Reported-and-tested-by: Savyasaachi Vanga <savyasaachiv@gmail.com>
    Closes: https://lore.kernel.org/r/chwoymvpzwtbmzryrlitpwmta5j6mtndocxsyqvdyikqu63lon@gfds653hkknl
    Link: https://msgid.link/20240612122351.b12d4a109dde.I1831a44417faaab92bea1071209abbe4efbe3fba@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>

 net/mac80211/driver-ops.c | 17 +++++++++++++++++
 net/mac80211/iface.c      | 21 +++++++++------------
 net/mac80211/util.c       |  2 +-
 3 files changed, 27 insertions(+), 13 deletions(-)

Looks like the patch which should fix monitor mode breaks mt76x2u driver.

BTW:
Reasons for me to set severity to high:
"Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000"
and
running a simple command from which I would not have expected that my entire system crashes.
Comment 1 Michael 2024-07-23 17:17:13 UTC
After some more tests, I'm not longer sure that the problem is caused by the commit mentioned. It looks like it is only a symptom.
I tested several mt76 devices e.g. this one:
D 148f:761a Ralink Technology, Corp. MT7610U ("Archer T2U" 2.4G+5G WLAN Adapter

Driver is mt76x0u:
$ hcxdumptool -l
  0	  3	503eaa1a736c	f49da7d6f202	*	wlp48s0f4u2u4   	mt76x0u	NETLINK

All of them are running into the same problem as mentioned above,
while other devices are working as expected, e.g.:
ID 2357:010c TP-Link TL-WN722N v2/v3 [Realtek RTL8188EUS]

Driver is rtl8xxxu
$ hcxdumptool -l
  0	  3	9ca2f4094fe1	c8aacc8562e3	+	wlp48s0f4u2u4   	rtl8xxxu	NETLINK

This leads me to the assumption that the "chanctx emulation" inside the mt76 series driver caused the real problem.
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-07-24 07:29:47 UTC
I'd like to forward your report by mail. Can I CC you? this would expose your email address to the public.
Comment 3 Michael 2024-07-24 07:34:31 UTC
Yes, you can add me to CC.
The entire chanctx implementation is much more complex than expected. Good to be in CC if questions arise.
Comment 4 Michael 2024-07-25 17:47:40 UTC
This patch fixed it:
https://patchwork.kernel.org/project/linux-wireless/patch/20240725184836.25d334157a8e.I02574086da2c5cf0e18264ce5807db6f14ffd9c0@changeid/

$ uname -r
6.10.0-1-git-12246-g786c8248dbd3-dirty

$ sudo hcxdumptool -i wlp5s0f4u2 --rcascan=active
...
^C
24 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
14 PROBERESPONSE(s) captured