Bug 216844

Summary: Mediatek MT7921 Driver Crashing Upon Modprobe
Product: Networking Reporter: Ralph (cflandrepro)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: NEW ---    
Severity: normal CC: cflandrepro, larkimpressive, mario.limonciello, regressions
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: 6.1.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: Basic System Information
potential patch (v1)

Description Ralph 2022-12-24 18:11:04 UTC
Created attachment 303468 [details]
Basic System Information

Some preliminary information:

```
$ uname -a
Linux *hostname* 6.1.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 21 Dec 2022 22:27:55 +0000 x86_64 GNU/Linux
~~~

$ lspci --vvnn
...
02:00.0 Network controller [0280]: MEDIATEK Corp. MT7921 802.11ax PCI Express Wireless Network Adapter [14c3:7961]
    Subsystem: Foxconn International, Inc. Device [105b:e0b7]
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 88
    IOMMU group: 10
    Region 0: Memory at fcf0200000 (64-bit, prefetchable) [size=1M]
    Region 2: Memory at fcf0300000 (64-bit, prefetchable) [size=16K]
    Region 4: Memory at fcf0304000 (64-bit, prefetchable) [size=4K]
    Capabilities: <access denied>
    Kernel driver in use: mt7921e
    Kernel modules: mt7921e
...
```

After blacklisting the module and restarting, error message caught in dmesg upon modprobe:

```
[   21.640345] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[   21.640452] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   21.700904] mt7921e 0000:02:00.0: enabling device (0000 -> 0002)
[   21.721280] mt7921e 0000:02:00.0: ASIC revision: 79610010
[   21.799909] mt7921e 0000:02:00.0: sar cnt = 0
[   21.799916] BUG: kernel NULL pointer dereference, address: 0000000000000004
[   21.799951] #PF: supervisor read access in kernel mode
[   21.799970] #PF: error_code(0x0000) - not-present page
[   21.799988] PGD 0 P4D 0 
[   21.800000] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   21.800017] CPU: 9 PID: 559 Comm: modprobe Not tainted 6.1.1-arch1-1 #1 9bd09188b430be630e611f984454e4f3c489be77
[   21.800048] Hardware name: Dell Inc. Inspiron 15 3525/0PX9H7, BIOS 1.3.0 04/02/2022
[   21.800071] RIP: 0010:mt7921_init_acpi_sar+0x1c1/0x220 [mt7921_common]
[   21.800104] Code: ff 88 05 c1 c7 44 24 04 00 00 00 00 e8 b8 fc ff ff 41 89 c4 85 c0 0f 85 e4 fe ff ff 48 8b 43 08 ba 06 00 00 00 b9 06 00 00 00 <80> 78 04 00 40 0f 95 c6 e9 36 ff ff ff 48 8b 73 10 48 8b bd d0 03
[   21.800153] RSP: 0018:ffffb5aa40f0bae8 EFLAGS: 00010246
[   21.800172] RAX: 0000000000000000 RBX: ffff90adc9d11988 RCX: 0000000000000006
[   21.800194] RDX: 0000000000000006 RSI: 941c8c56d44496cf RDI: 0000000000038080
[   21.800216] RBP: ffff90adc5822080 R08: 0000000000000000 R09: ffffb5aa40f0b818
[   21.800240] R10: 0000000000000003 R11: ffffffffa62cb768 R12: 0000000000000000
[   21.800265] R13: 0000000000000000 R14: ffffb5aa40f0baec R15: ffffb5aa40f0bdc8
[   21.800290] FS:  00007f533ec9f740(0000) GS:ffff90aec6840000(0000) knlGS:0000000000000000
[   21.800320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.800342] CR2: 0000000000000004 CR3: 00000001045f6000 CR4: 0000000000750ee0
[   21.800368] PKRU: 55555554
[   21.800381] Call Trace:
[   21.800395]  <TASK>
[   21.800406]  mt7921_register_device+0x323/0x540 [mt7921_common eab0bdebbd12dfe392417c96bcae99c380288bb6]
[   21.800444]  mt7921_pci_probe+0x290/0x2c0 [mt7921e edb9e69dab4307dba494c96e4b7e5de618b3cd2a]
[   21.800479]  ? __pm_runtime_resume+0x58/0x80
[   21.800503]  local_pci_probe+0x45/0x80
[   21.800524]  pci_device_probe+0xc1/0x250
[   21.800542]  ? sysfs_do_create_link_sd+0x6e/0xe0
[   21.800563]  really_probe+0xde/0x380
[   21.800580]  ? pm_runtime_barrier+0x54/0x90
[   21.800598]  __driver_probe_device+0x78/0x170
[   21.800616]  driver_probe_device+0x1f/0x90
[   21.800634]  __driver_attach+0xd5/0x1d0
[   21.800651]  ? __device_attach_driver+0x110/0x110
[   21.800670]  bus_for_each_dev+0x8b/0xd0
[   21.800686]  bus_add_driver+0x1b2/0x200
[   21.800703]  driver_register+0x8d/0xe0
[   21.800721]  ? 0xffffffffc1064000
[   21.800759]  do_one_initcall+0x5d/0x220
[   21.800781]  do_init_module+0x4a/0x1e0
[   21.800801]  __do_sys_init_module+0x17f/0x1b0
[   21.800822]  do_syscall_64+0x5f/0x90
[   21.800843]  ? syscall_exit_to_user_mode+0x1b/0x40
[   21.800863]  ? do_syscall_64+0x6b/0x90
[   21.800880]  ? exc_page_fault+0x74/0x170
[   21.800897]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   21.800920] RIP: 0033:0x7f533e721eae
[   21.800937] Code: 48 8b 0d dd ee 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa ee 0c 00 f7 d8 64 89 01 48
[   21.800991] RSP: 002b:00007fff32d77568 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[   21.801021] RAX: ffffffffffffffda RBX: 000055abccd61030 RCX: 00007f533e721eae
[   21.801039] RDX: 000055abcae64cb2 RSI: 000000000002dd2f RDI: 000055abccf39e10
[   21.801057] RBP: 000055abcae64cb2 R08: 27d4eb2f165667c5 R09: 85ebca77c2b2ae63
[   21.801075] R10: 000000000009a251 R11: 0000000000000246 R12: 0000000000040000
[   21.801094] R13: 000055abccd610b0 R14: 0000000000000000 R15: 000055abccd649d0
[   21.801114]  </TASK>
[   21.801123] Modules linked in: mt7921e(+) mt7921_common mt76_connac_lib mt76 mac80211 libarc4 cfg80211 intel_rapl_msr intel_rapl_common snd_acp3x_rn snd_soc_dmic snd_acp3x_pdm_dma snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp edac_mce_amd joydev snd_sof_pci mousedev snd_ctl_led snd_sof kvm_amd snd_sof_utils snd_soc_core hid_multitouch snd_hda_codec_realtek snd_compress kvm snd_hda_codec_generic snd_hda_codec_hdmi ac97_bus irqbypass snd_hda_intel snd_pcm_dmaengine crct10dif_pclmul crc32_pclmul snd_intel_dspcfg polyval_clmulni snd_pci_ps polyval_generic snd_intel_sdw_acpi gf128mul snd_rpl_pci_acp6x ghash_clmulni_intel snd_acp_pci snd_hda_codec sha512_ssse3 snd_pci_acp6x btusb dell_laptop aesni_intel snd_hda_core dell_smm_hwmon snd_pci_acp5x btrtl crypto_simd snd_hwdep cryptd btbcm snd_rn_pci_acp3x snd_pcm snd_acp_config dell_wmi rapl btintel snd_soc_acpi snd_timer sp5100_tco ledtrig_audio btmtk dell_smbios pcspkr psmouse sparse_keymap dcdbas bluetooth ecdh_generic
[   21.801168]  dell_wmi_descriptor wmi_bmof snd ccp soundcore snd_pci_acp3x i2c_piix4 k10temp dell_rbtn i2c_hid_acpi amd_pmc i2c_hid rfkill acpi_cpufreq mac_hid pkcs8_key_parser dm_multipath dm_mod sg crypto_user fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 amdgpu drm_ttm_helper ttm video serio_raw atkbd libps2 gpu_sched vivaldi_fmap nvme drm_buddy crc32c_intel nvme_core drm_display_helper i8042 xhci_pci nvme_common cec xhci_pci_renesas serio wmi
[   21.804966] CR2: 0000000000000004
[   21.805835] ---[ end trace 0000000000000000 ]---
[   21.806367] RIP: 0010:mt7921_init_acpi_sar+0x1c1/0x220 [mt7921_common]
[   21.806796] Code: ff 88 05 c1 c7 44 24 04 00 00 00 00 e8 b8 fc ff ff 41 89 c4 85 c0 0f 85 e4 fe ff ff 48 8b 43 08 ba 06 00 00 00 b9 06 00 00 00 <80> 78 04 00 40 0f 95 c6 e9 36 ff ff ff 48 8b 73 10 48 8b bd d0 03
[   21.807242] RSP: 0018:ffffb5aa40f0bae8 EFLAGS: 00010246
[   21.807687] RAX: 0000000000000000 RBX: ffff90adc9d11988 RCX: 0000000000000006
[   21.808352] RDX: 0000000000000006 RSI: 941c8c56d44496cf RDI: 0000000000038080
[   21.809158] RBP: ffff90adc5822080 R08: 0000000000000000 R09: ffffb5aa40f0b818
[   21.809984] R10: 0000000000000003 R11: ffffffffa62cb768 R12: 0000000000000000
[   21.810968] R13: 0000000000000000 R14: ffffb5aa40f0baec R15: ffffb5aa40f0bdc8
[   21.811957] FS:  00007f533ec9f740(0000) GS:ffff90aec6840000(0000) knlGS:0000000000000000
[   21.812935] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.813471] CR2: 0000000000000004 CR3: 00000001045f6000 CR4: 0000000000750ee0
[   21.813903] PKRU: 55555554
```

Temporarily resolved by reversion to Linux 5.19.11, but obviously not an ideal situation.

Basic system information added as a picture attachment.
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-12-26 11:04:32 UTC
Bug 216839 is another recent report about problems with mediatek devices. It looks different, but maybe that's due to your blacklisting. It's hence a wild guess (I'm not one of the wifi developers), but maybe this patch will help:

https://patchwork.kernel.org/project/linux-wireless/patch/20221217085624.52077-1-nbd@nbd.name/
Comment 2 Ralph 2022-12-28 16:28:23 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #1)
> Bug 216839 is another recent report about problems with mediatek devices. It
> looks different, but maybe that's due to your blacklisting. It's hence a
> wild guess (I'm not one of the wifi developers), but maybe this patch will
> help:
> 
> https://patchwork.kernel.org/project/linux-wireless/patch/20221217085624.
> 52077-1-nbd@nbd.name/

I'll definitely take a look into this. It sounds similar to what I'm experiencing, though there are some pretty minute difference. Thank you!
Comment 3 Mario Limonciello (AMD) 2023-01-04 22:57:51 UTC
Created attachment 303527 [details]
potential patch (v1)

It looks to me that the problem is trying to access data fetched from ACPI after it's been freed.

Can you see if this patch helps?