Bug 209571 - mt7601u device causes panic on removal
Summary: mt7601u device causes panic on removal
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 high
Assignee: networking_wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-07 18:20 UTC by Jarod Wilson
Modified: 2020-10-26 13:05 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.9-rc8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Screen shot of panic with kernel 5.9-rc8 (2.59 MB, image/heic)
2020-10-07 18:25 UTC, Jarod Wilson
Details
Screen shot of panic with kernel 5.9-rc8 v2 (3.35 MB, image/jpeg)
2020-10-22 20:23 UTC, Jarod Wilson
Details

Description Jarod Wilson 2020-10-07 18:20:32 UTC
I have a Ralink USB wifi adapter with an MT7601U chipset in it, which when removed a system after use, now causes a panic. This is a regression from earlier kernel versions, though I see very little recent change in the relevant driver code. To date, I've been unable to acquire a vmcore from 5.9-rc6 or rc8, but both panic the system. I *was* able to get a vmcore from a Red Hat Enterprise Linux 8 kernel with the wireless stack backported up to effectively the same code as 5.9-rc8 though, which may be of some use here.

[ 1472.863223] usb 2-2: USB disconnect, device number 2
[ 1472.873065] wlp0s29f7u2: deauthenticating from d4:5d:64:9f:9a:58 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 1472.883575] mt7601u 2-2:1.0: mt7601u_rxdc_cal timed out
[ 1472.926086] ------------[ cut here ]------------
[ 1472.926089] kernel BUG at mm/slub.c:294!
[ 1472.926100] invalid opcode: 0000 [#1] SMP PTI
[ 1472.926104] CPU: 0 PID: 3018 Comm: kworker/0:0 Kdump: loaded Not tainted 4.18.0-239.el8.wifi.x86_64 #1
[ 1472.926106] Hardware name: LENOVO 6465CTO/6465CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011
[ 1472.926114] Workqueue: usb_hub_wq hub_event
[ 1472.926120] RIP: 0010:__slab_free+0x17d/0x360
[ 1472.926122] Code: fa 66 66 90 66 66 90 f0 49 0f ba 2c 24 00 0f 82 94 00 00 00 4d 3b 6c 24 20 74 11 49 0f ba 34 24 00 57 9d 66 66 90 66 90 eb 9c <0f> 0b 49 3b 54 24 28 75 e8 49 89 5c 24 20 49 89 4c 24 28 49 0f ba
[ 1472.926125] RSP: 0018:ffffbe1001effa60 EFLAGS: 00010246
[ 1472.926127] RAX: ffff9909d49b0400 RBX: ffff9909d49b0400 RCX: ffff9909d49b0400
[ 1472.926129] RDX: 000000008010000e RSI: ffffdf99c2526c00 RDI: ffff990a47c02a00
[ 1472.926131] RBP: ffffbe1001effb00 R08: 0000000000000001 R09: ffffffffba724217
[ 1472.926132] R10: ffff9909d49b0400 R11: 0000000000000001 R12: ffffdf99c2526c00
[ 1472.926134] R13: ffff9909d49b0400 R14: ffff990a47c02a00 R15: ffff9909ee54b030
[ 1472.926136] FS:  0000000000000000(0000) GS:ffff990a77a00000(0000) knlGS:0000000000000000
[ 1472.926139] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1472.926140] CR2: 00007f12e266c890 CR3: 000000010bdc8000 CR4: 00000000000006f0
[ 1472.926142] Call Trace:
[ 1472.926204]  ? __ieee80211_tx_status+0x325/0x850 [mac80211]
[ 1472.926215]  ? mt7601u_dma_enqueue_tx+0x230/0x250 [mt7601u]
[ 1472.926219]  consume_skb+0x27/0x90
[ 1472.926238]  ieee80211_tx_status+0x6a/0x90 [mac80211]
[ 1472.926246]  mt7601u_tx_status+0x8b/0xb0 [mt7601u]
[ 1472.926252]  mt7601u_dma_cleanup+0xc8/0x110 [mt7601u]
[ 1472.926258]  mt7601u_cleanup+0x23/0x30 [mt7601u]
[ 1472.926263]  mt7601u_disconnect+0x21/0x60 [mt7601u]
[ 1472.926267]  usb_unbind_interface+0x78/0x260
[ 1472.926273]  device_release_driver_internal+0xf4/0x1e0
[ 1472.926276]  bus_remove_device+0xf7/0x170
[ 1472.926278]  device_del+0x172/0x380
[ 1472.926282]  usb_disable_device+0x93/0x240
[ 1472.926284]  usb_disconnect+0xbc/0x250
[ 1472.926287]  hub_port_connect+0x7f/0xa50
[ 1472.926291]  port_event+0x538/0x7c0
[ 1472.926294]  hub_event+0x14a/0x3c0
[ 1472.926298]  process_one_work+0x1a7/0x360
[ 1472.926301]  worker_thread+0x30/0x390
[ 1472.926304]  ? create_worker+0x1a0/0x1a0
[ 1472.926306]  kthread+0x112/0x130
[ 1472.926309]  ? kthread_flush_work_fn+0x10/0x10
[ 1472.926313]  ret_from_fork+0x35/0x40
Comment 1 Jarod Wilson 2020-10-07 18:24:03 UTC
Note: was able to take a screen shot of the laptop display after a panic under 5.9-rc8, and it does look like the same panic. Can upload the pic shortly...
Comment 2 Jarod Wilson 2020-10-07 18:25:00 UTC
Created attachment 292895 [details]
Screen shot of panic with kernel 5.9-rc8
Comment 3 Jarod Wilson 2020-10-15 21:08:25 UTC
With an earlier kernel that doesn't instantly panic, I see:

[ 1115.223575] usb 2-1: USB disconnect, device number 7
[ 1115.229573] mt7601u 2-1:1.0: Warning: TX DMA did not stop!
[ 1117.629508] mt7601u 2-1:1.0: Warning: MAC TX did not stop!
[ 1118.230508] mt7601u 2-1:1.0: Warning: MAC RX did not stop!
[ 1118.230521] mt7601u 2-1:1.0: Warning: RX DMA did not stop!

I assume this might be a relevant hint.
Comment 4 Jakub Kicinski 2020-10-21 22:56:27 UTC
The attachment has a highly unusual format, I can't open it on Fedora 32 :(

FWIW Lorenzo Bianconi or Stanislaw Gruszka had been fixing this driver in the past, they may know off the top of their head what this is.
Comment 5 Jarod Wilson 2020-10-22 20:23:06 UTC
Created attachment 293131 [details]
Screen shot of panic with kernel 5.9-rc8 v2

Sorry about that, some sort of iPhone image format there, I've exported it to a jpeg and attached that instead.
Comment 6 Jakub Kicinski 2020-10-24 19:52:49 UTC
Thanks! That's a strange stack trace, looks like something socket related gets triggered from ieee80211_tx_status().

Would you mind sending an email out to linux-wireless, and CCing Felix Fietkau?
Looks like he was touching status reporting recently.
Comment 7 Jarod Wilson 2020-10-26 13:05:42 UTC
Throwing Felix and Lorenzo on cc here on the bug, will send something to linux-wireless today as well.

Note You need to log in before you can comment on or make changes to this bug.