Bug 217415 - mt7921e error on hibernate resume path
Summary: mt7921e error on hibernate resume path
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: AMD Linux
: P3 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-08 09:36 UTC by kolAflash
Modified: 2024-03-25 14:04 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with hibernation and failure to start wifi (130.30 KB, text/plain)
2024-03-17 16:55 UTC, Alex Maras
Details
dmesg with failure to rmmod (118.32 KB, text/plain)
2024-03-25 14:03 UTC, Alex Maras
Details
failure to reboot after rmmod stalls (389.24 KB, image/jpeg)
2024-03-25 14:04 UTC, Alex Maras
Details

Description kolAflash 2023-05-08 09:36:29 UTC
When resuming from hibernation (suspend to disk) I got this error from mt7921e.

[T29172] mt7921e 0000:01:00.0: Message 00020007 (seq 11) timeout
[T29172] mt7921e 0000:01:00.0: PM: dpm_run_callback(): pci_pm_restore+0x0/0x90 returns -110
[T29172] mt7921e 0000:01:00.0: PM: failed to restore async: error -110
[T29172] mt7921e 0000:01:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220311230842a
[T29172] 
[T29172] mt7921e 0000:01:00.0: WM Firmware Version: ____010000, Build Time: 20220311230931

Full dmesg:
https://gitlab.freedesktop.org/drm/amd/uploads/4ae31a3d6b9a7db839943c16e06d8704/Ryzen-5650U_6.1.26-with-8cf17c25e_hibernation-wakeup.txt

Came up as part of a different problem:
Ryzen 3500U and 5650U: StandBy and External Monitors broken since >= 6.1
https://gitlab.freedesktop.org/drm/amd/-/issues/2492#note_1894147

Maybe related:
wifi mediatek mt7921e problem after suspend
https://bugzilla.kernel.org/show_bug.cgi?id=215463
Comment 2 kolAflash 2023-08-15 09:10:04 UTC
(In reply to Mario Limonciello (AMD) from comment #1)
> Maybe
> https://patchwork.kernel.org/project/linux-wireless/patch/
> 19f1aae1ab9ea867eb42742fc5b72ed4d7307b0a.1687159671.git.deren.wu@mediatek.
> com/ helps

I've tested Linux-6.1.43 which includes that patch.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=97ccc14d114b1cf3bc16670fa09f74ec7233b643


dmesg shows these errors multiple times. This is the first occurrence.
[ T1314] mt7921e 0000:01:00.0: enabling device (0000 -> 0002)
[ T1314] mt7921e 0000:01:00.0: ASIC revision: 79610010
[  T404] mt7921e 0000:01:00.0: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
[  T404] mt7921e 0000:01:00.0: WM Firmware Version: ____010000, Build Time: 20230526130958
[ T3390] mt7921e 0000:01:00.0: Message 00020007 (seq 4) timeout
[ T3390] mt7921e 0000:01:00.0: PM: dpm_run_callback(): pci_pm_restore+0x0/0x90 returns -110
[ T3390] mt7921e 0000:01:00.0: PM: failed to restore async: error -110


Later on there are also multiple stacktraces involving net/mac80211/rx.c
This is the first occurrence.
[ T1410] ------------[ cut here ]------------
[ T1410] WARNING: CPU: 2 PID: 1410 at /home/myuser/opt/linux-kernel/build.backup-exclude-m461c/build_bisect/worktree/net/mac80211/rx.c:5169 ieee80211_rx_list+0x588/0xc60 [mac80211]
[...]
[ T1410] CPU: 2 PID: 1410 Comm: napi/phy0-8193 Tainted: G        W   E      6.1.43-v6.1.43 #18
[ T1410] Hardware name: HP HP EliteBook 845 G8 Notebook PC/8895, BIOS T82 Ver. 01.13.01 03/31/2023
[ T1410] RIP: 0010:ieee80211_rx_list+0x588/0xc60 [mac80211]
[ T1410] Code: ff 8b b5 18 05 00 00 85 f6 74 0b a9 00 00 04 00 0f 84 98 00 00 00 48 89 df e8 a4 06 60 ea e9 6d fb ff ff 0f 0b e9 59 fb ff ff <0f> 0b e9 52 fb ff ff 8b 85 78 15 00 00 85 c0 0f 84 10 fe ff ff e9
[ T1410] RSP: 0018:ffffb40a006b7c20 EFLAGS: 00010246
[ T1410] RAX: 000000ff0000ff00 RBX: ffff8eba878cd600 RCX: ffff8eb99a602198
[ T1410] RDX: ffff8eb99a6003a0 RSI: 0000000000000000 RDI: ffff8eb99a6008e0
[ T1410] RBP: ffff8eb99a6008e0 R08: 00000000ffffffa6 R09: ffffb40a006b7d30
[ T1410] R10: 000000000000143c R11: 000000002d1db16d R12: ffff8eb99a602080
[ T1410] R13: 0000000000000000 R14: ffff8eb99a603748 R15: 0000000000000000
[ T1410] FS:  0000000000000000(0000) GS:ffff8ec7ce680000(0000) knlGS:0000000000000000
[ T1410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ T1410] CR2: 00007f1a789f7020 CR3: 0000000094010000 CR4: 0000000000750ee0
[ T1410] PKRU: 55555554
[ T1410] Call Trace:
[ T1410]  <TASK>
[ T1410]  ? __warn+0x7d/0x140
[ T1410]  ? ieee80211_rx_list+0x588/0xc60 [mac80211]
[ T1410]  ? report_bug+0xf8/0x1e0
[ T1410]  ? handle_bug+0x44/0x80
[ T1410]  ? exc_invalid_op+0x13/0x60
[ T1410]  ? asm_exc_invalid_op+0x16/0x20
[ T1410]  ? ieee80211_rx_list+0x588/0xc60 [mac80211]
[ T1410]  ? swiotlb_tbl_map_single+0x5d6/0x6b0
[ T1410]  mt76_rx_complete+0x198/0x2e0 [mt76]
[ T1410]  ? swiotlb_map+0x96/0x260
[ T1410]  mt76_rx_poll_complete+0x373/0x570 [mt76]
[ T1410]  ? mt76_dma_rx_poll+0x25d/0x480 [mt76]
[ T1410]  mt76_dma_rx_poll+0x25d/0x480 [mt76]
[ T1410]  ? __napi_poll+0x1b0/0x1b0
[ T1410]  mt7921_poll_rx+0x4a/0xe0 [mt7921e]
[ T1410]  __napi_poll+0x29/0x1b0
[ T1410]  ? napi_threaded_poll+0x80/0x100
[ T1410]  napi_threaded_poll+0x9d/0x100
[ T1410]  kthread+0xd9/0x100
[ T1410]  ? kthread_complete_and_exit+0x20/0x20
[ T1410]  ret_from_fork+0x22/0x30
[ T1410]  </TASK>
[ T1410] ---[ end trace 0000000000000000 ]---


Finally the system is crashing when waking from standby. The crash itself is more likely an issue from the amdgpu driver. But there's also another stacktrace involving mt7921e which I could recover from pstore.
[ T1410] ------------[ cut here ]------------
[ T1410] WARNING: CPU: 9 PID: 1410 at /home/myuser/opt/linux-kernel/build.backup-exclude-m461c/build_bisect/worktree/net/mac80211/rx.c:5169 ieee80211_rx_list+0x588/0xc60 [mac80211]
[...]
[ T1410] CPU: 9 PID: 1410 Comm: napi/phy0-8193 Tainted: G        W   E      6.1.43-v6.1.43 #18
[ T1410] Hardware name: HP HP EliteBook 845 G8 Notebook PC/8895, BIOS T82 Ver. 01.13.01 03/31/2023
[ T1410] RIP: 0010:ieee80211_rx_list+0x588/0xc60 [mac80211]
[ T1410] Code: ff 8b b5 18 05 00 00 85 f6 74 0b a9 00 00 04 00 0f 84 98 00 00 00 48 89 df e8 a4 06 60 ea e9 6d fb ff ff 0f 0b e9 59 fb ff ff <0f> 0b e9 52 fb ff ff 8b 85 78 15 00 00 85 c0 0f 84 10 fe ff ff e9
[ T1410] RSP: 0018:ffffb40a006b7c20 EFLAGS: 00010246
[ T1410] RAX: 000000ff0000ff00 RBX: ffff8ec16adfca00 RCX: ffff8eb99a602198
[ T1410] RDX: ffff8eb99a6003a0 RSI: 0000000000000000 RDI: ffff8eb99a6008e0
[ T1410] RBP: ffff8eb99a6008e0 R08: 00000000ffffffa7 R09: ffffb40a006b7d30
[ T1410] R10: 000000000000143c R11: 0000000004c72c40 R12: ffff8eb99a602080
[ T1410] R13: 0000000000000000 R14: ffff8eb99a603748 R15: 0000000000000000
[ T1410] FS:  0000000000000000(0000) GS:ffff8ec7ce840000(0000) knlGS:0000000000000000
[ T1410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ T1410] CR2: 0000000000000000 CR3: 0000000094010000 CR4: 0000000000750ee0
[ T1410] PKRU: 55555554
[ T1410] Call Trace:
[ T1410]  <TASK>
[ T1410]  ? __warn+0x7d/0x140
[ T1410]  ? ieee80211_rx_list+0x588/0xc60 [mac80211]
[ T1410]  ? report_bug+0xf8/0x1e0
[ T1410]  ? handle_bug+0x44/0x80
[ T1410]  ? exc_invalid_op+0x13/0x60
[ T1410]  ? asm_exc_invalid_op+0x16/0x20
[ T1410]  ? ieee80211_rx_list+0x588/0xc60 [mac80211]
[ T1410]  ? swiotlb_tbl_map_single+0x5d6/0x6b0
[ T1410]  mt76_rx_complete+0x198/0x2e0 [mt76]
[ T1410]  ? swiotlb_map+0x96/0x260
[ T1410]  mt76_rx_poll_complete+0x373/0x570 [mt76]
[ T1410]  ? mt76_dma_rx_poll+0x25d/0x480 [mt76]
[ T1410]  mt76_dma_rx_poll+0x25d/0x480 [mt76]
[ T1410]  ? __napi_poll+0x1b0/0x1b0
[ T1410]  mt7921_poll_rx+0x4a/0xe0 [mt7921e]
[ T1410]  __napi_poll+0x29/0x1b0
[ T1410]  ? napi_threaded_poll+0x80/0x100
[ T1410]  napi_threaded_poll+0x9d/0x100
[ T1410]  kthread+0xd9/0x100
[ T1410]  ? kthread_complete_and_exit+0x20/0x20
[ T1410]  ret_from_fork+0x22/0x30
[ T1410]  </TASK>
[ T1410] ---[ end trace 0000000000000000 ]---


See here for full logs and the mentioned amdgpu driver issue:
https://gitlab.freedesktop.org/drm/amd/-/issues/2492#note_2043652
Comment 3 Alex Maras 2024-03-17 16:55:55 UTC
Created attachment 306000 [details]
dmesg with hibernation and failure to start wifi

I'm getting an ongoing issue with the same symptoms. Coming up from hibernate, wifi is completely dead and similar messages in dmesg.

Running `rmmod mt7921e` followed by `modprobe mt7921e` fixes it, with the exception that sometimes that command refuses to finish, and more importantly won't respond to a `kill -9`, and blocks reboot indefinitely. 

I'll try to get a `dmesg` recording next time the `rmmod` fails to finish, but I don't know if it shows anything. 

Attached the dmesg for a failure coming up out of hibernate.
Comment 4 Alex Maras 2024-03-25 14:03:03 UTC
Created attachment 306040 [details]
dmesg with failure to rmmod

attaching a dmesg of when I tried to `rmmod` and it timed out and failed. This happened every several times I do it.
Comment 5 Alex Maras 2024-03-25 14:04:53 UTC
Created attachment 306041 [details]
failure to reboot after rmmod stalls

and a photo of the output that eventually shows up when the rmmod gets stuck and I try rebooting.

Note You need to log in before you can comment on or make changes to this bug.