Bug 210857 - Crash in mt7601u driver
Summary: Crash in mt7601u driver
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-22 17:30 UTC by Matthias
Modified: 2021-01-20 08:09 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.9.16-200.fc33.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
output of journalctl -kb -1 (3.46 KB, text/plain)
2020-12-22 17:31 UTC, Matthias
Details
mt7601u_crash.patch (977 bytes, patch)
2021-01-13 23:09 UTC, Lorenzo Bianconi
Details | Diff
kernel log when unplugging the device from USB port (4.69 KB, text/plain)
2021-01-14 22:53 UTC, Matthias
Details
memory-exposure.txt (4.98 KB, text/plain)
2021-01-14 23:24 UTC, Matthias
Details
journal (42.20 KB, text/plain)
2021-01-15 00:37 UTC, Moises Lima
Details
kernel log when unplugging the device from USB port (95.69 KB, text/plain)
2021-01-15 22:22 UTC, Matthias
Details
journal unplugging the device (32.81 KB, text/plain)
2021-01-15 22:59 UTC, Moises Lima
Details
0001-mt7601u-fix-kernel-crash-unplugging-the-device.patch (2.58 KB, patch)
2021-01-17 21:51 UTC, Lorenzo Bianconi
Details | Diff
journal-patch-calibration-EPROTO-0001unplug (4.03 KB, text/plain)
2021-01-18 20:00 UTC, Moises Lima
Details

Description Matthias 2020-12-22 17:30:17 UTC
Hi there,

I'm currently experiencing regular kernel crashes, perhaps 2 or 3 times a day. The stack trace always mentions the mt7601u_rx_tasklet function, so that's why I'm reporting this as a Wifi driver bug. lsusb tells me this about the device:

  idVendor           0x148f Ralink Technology, Corp.
  idProduct          0x7601 MT7601U Wireless Adapter
  bcdDevice            0.00
  iManufacturer           1 MediaTek
  iProduct                2 802.11 n WLAN
  iSerial                 3 1.0


I've attachted the relevant part of the output of `journalctl -kb -1`. Unfortunately I haven't been able to narrow down the circumstances any further. Everything works fine, and then all of a sudden it just freezes and I need to reset the machine. But maybe the stack trace helps?
Comment 1 Matthias 2020-12-22 17:31:47 UTC
Created attachment 294301 [details]
output of journalctl -kb -1
Comment 2 Matthias 2020-12-22 17:33:27 UTC
Oh, and this crash recently caused me to lose some of my DOOM savegames
Comment 3 Moises Lima 2021-01-05 12:04:38 UTC
Having this problem in 2 PC with mt7601u
Get this crash then after some second/minute the system freeze and I need hard reset

Before Kernel 5.10.3 the crash was happening but not freezing the PC
Comment 4 Matthias 2021-01-09 17:04:41 UTC
I have noticed that this happens also when I unplug the USB Wifi adapter from the computer, so that should make it much easier to debug.

These things are avaiable for less than 3 € these days, so if anybody wants to work on this, I don't mind having one of these shipped to them.
Comment 5 Lorenzo Bianconi 2021-01-13 23:09:23 UTC
Created attachment 294625 [details]
mt7601u_crash.patch
Comment 6 Lorenzo Bianconi 2021-01-13 23:11:40 UTC
Hi Matthias and Moises,

can you please try the attached patch (mt7601u_crash.patch)?
Comment 7 Matthias 2021-01-14 22:53:04 UTC
(In reply to Lorenzo Bianconi from comment #6)
> Hi Matthias and Moises,
> 
> can you please try the attached patch (mt7601u_crash.patch)?


Hi Lorenzo,

it seems that that patch is already contained in Linux 5.10.7. That isn't shipped with Fedora just yet, so I have compiled it myself and I'm now running it.

As I said before, I've seen this driver crash during normal operation (e. g. large downloads, or even just watching videos online). It doesn't happen all that often, so I'm going to see what happens for a few days and report back then.


But I have also seen this driver crash when unplugging it from the USB port, and this is still happening. I'm attaching the kernel log output.
Comment 8 Matthias 2021-01-14 22:53:48 UTC
Created attachment 294637 [details]
kernel log when unplugging the device from USB port
Comment 9 Matthias 2021-01-14 23:24:32 UTC
Created attachment 294639 [details]
memory-exposure.txt

Hey,

I have just experienced another issue with this new kernel 5.10.7.

The internet suddenly stopped working and I found a message in the kernel log:
usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_head_cache' (offset 132, size 1208)!

I'm attaching the relevant part of the kernel log.
It didn't crash the machine at least and I could reboot it without problems, so I guess
that's an improvement. But this time the stack trace doesn't mention mt7601u,
although it still seems to be network-related.

Let me know if there's anything more I can do. When I built the kernel it also built
a bunch of debug rpm packages. Can those be used to provide a more helpful stack
trace perhaps?

All the best,
Matthias
Comment 10 Matthias 2021-01-14 23:31:44 UTC
Ok, this new bug seems to be happening a lot. I've been running the 5.10.7 kernel for maybe an hour or two now and it already occurred twice. Sorry to be the bearer of bad news :-(
Comment 11 Moises Lima 2021-01-15 00:37:23 UTC
Created attachment 294641 [details]
journal

Unplugging the device from USB port
Comment 12 Moises Lima 2021-01-15 00:53:40 UTC
(In reply to Lorenzo Bianconi from comment #6)
> Hi Matthias and Moises,
> 
> can you please try the attached patch (mt7601u_crash.patch)?

Hi Lorenzo,

 After the patch on kernel 5.10.7
I confirm the drive crash when unplugging.
The mouse light keep ON and the fans start running fast. I need to hard reset the PC 

No crash on normal usage, using for 2 hours

Thanks!
Comment 13 Lorenzo Bianconi 2021-01-15 13:11:35 UTC
(In reply to Matthias from comment #7)
> (In reply to Lorenzo Bianconi from comment #6)
> > Hi Matthias and Moises,
> > 
> > can you please try the attached patch (mt7601u_crash.patch)?
> 
> 
> Hi Lorenzo,
> 
> it seems that that patch is already contained in Linux 5.10.7. That isn't
> shipped with Fedora just yet, so I have compiled it myself and I'm now
> running it.
> 
> As I said before, I've seen this driver crash during normal operation (e. g.
> large downloads, or even just watching videos online). It doesn't happen all
> that often, so I'm going to see what happens for a few days and report back
> then.
> 
> 
> But I have also seen this driver crash when unplugging it from the USB port,
> and this is still happening. I'm attaching the kernel log output.

regarding the normal operation crash, mt7601u_crash.patch is wrong. Can you please try one? https://patchwork.kernel.org/project/linux-wireless/patch/62b2380c8c2091834cfad05e1059b55f945bd114.1610643952.git.lorenzo@kernel.org/

regarding the disconnect issue, can you please try this commit to see if it helps?
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git/commit/?id=0e40dbd56d67a5b01b13f4bfb62b470cd99125cd
Comment 14 Matthias 2021-01-15 18:45:38 UTC
I'm compiling a kernel with the first of your proposed patches, the put_page one. I'll try running it for a few days and report what I find.
Comment 15 Matthias 2021-01-15 18:55:19 UTC
Ah, never mind, I'll just compile a kernel with both of the patches applied.
Comment 16 Matthias 2021-01-15 22:22:12 UTC
Created attachment 294657 [details]
kernel log when unplugging the device from USB port

Hey there,

I'm sorry to report that the -EPROTO patch didn't fix the USB unplugging issue. It again crashed immediately after unplugging, and you can find the log in the attachment.

I'll keep running this kernel for a few more days to see if it still has issues during normal use.

Thank you for your efforts!
Comment 17 Moises Lima 2021-01-15 22:59:21 UTC
Created attachment 294659 [details]
journal unplugging the device

journal unplugging the device
Comment 18 Moises Lima 2021-01-15 23:03:20 UTC
Unplugging the device still crash

Work if remove the module before unplug

sudo modprobe -r mt7601u

Testing normal use
Comment 19 Matthias 2021-01-17 14:27:16 UTC
Hey there,

I've been using this patched driver over the weekend and downloaded several gigabytes of data. I have witnessed no more crashes, so apparently this fixes the issue. Thanks, Lorenzo!

As I mentioned before, the USB disconnect issue is still there, but this is not really an issue for me because there isn't really any reason to ever unplug it on this particular machine. But I'll be happy to help if you need me to test any more patches.
Comment 20 Lorenzo Bianconi 2021-01-17 21:51:23 UTC
Created attachment 294709 [details]
0001-mt7601u-fix-kernel-crash-unplugging-the-device.patch

Hi Matthias and Moises,

can you please try this patch for the unplugging kernel crash?
Comment 21 Matthias 2021-01-17 23:11:23 UTC
Should I apply only this patch or both this one and the other one (-EPROTO)?
Comment 22 Moises Lima 2021-01-18 02:49:02 UTC
(In reply to Lorenzo Bianconi from comment #20)
> Created attachment 294709 [details]
> 0001-mt7601u-fix-kernel-crash-unplugging-the-device.patch
> 
> Hi Matthias and Moises,
> 
> can you please try this patch for the unplugging kernel crash?

Hi


Patched on kernel 5.10.7:

mt7601u: check the status of device in calibration
mt7601u: process URBs in status EPROTO properly
0001-mt7601u-fix-kernel-crash-unplugging-the-device.patch

Everything working!
Thank you
Comment 23 Moises Lima 2021-01-18 03:07:18 UTC
Crash after some minutes of normal usage

Testing with:
mt7601u: check the status of device in calibration
mt7601u: process URBs in status EPROTO properly
0001-mt7601u-fix-kernel-crash-unplugging-the-device
mt7601u: fix rx buffer refcounting
Comment 24 Lorenzo Bianconi 2021-01-18 07:02:38 UTC
(In reply to Moises Lima from comment #23)
> Crash after some minutes of normal usage
> 
> Testing with:
> mt7601u: check the status of device in calibration
> mt7601u: process URBs in status EPROTO properly
> 0001-mt7601u-fix-kernel-crash-unplugging-the-device
> mt7601u: fix rx buffer refcounting

Can you provide the crashlog?
Comment 25 Lorenzo Bianconi 2021-01-18 15:43:24 UTC
Hi (In reply to Moises Lima from comment #23)
> Crash after some minutes of normal usage
> 
> Testing with:
> mt7601u: check the status of device in calibration
> mt7601u: process URBs in status EPROTO properly
> 0001-mt7601u-fix-kernel-crash-unplugging-the-device
> mt7601u: fix rx buffer refcounting

Hi Moises,

I run wireless-drivers-next tree + mt7601u fixes [0] for ~ 6h running TCP bidirectional traffic w/o any crash. Can you please give it a whirl?

[0] https://github.com/LorenzoBianconi/wireless-drivers-next/tree/mt7601u_fixes
Comment 26 Moises Lima 2021-01-18 20:00:54 UTC
Created attachment 294743 [details]
journal-patch-calibration-EPROTO-0001unplug

Journal of the crash with patched kernel 5.10.7
mt7601u: check the status of device in calibration
mt7601u: process URBs in status EPROTO properly
0001-mt7601u-fix-kernel-crash-unplugging-the-device.patch
Comment 27 Moises Lima 2021-01-18 20:24:19 UTC
Hi Lorenzo,

Tested with (kernel 5.10.7) patched:
mt7601u: check the status of device in calibration (wireless-drivers-next tree)
mt7601u: process URBs in status EPROTO properly
0001-mt7601u-fix-kernel-crash-unplugging-the-device
mt7601u: fix rx buffer refcounting

No crash on normal usage or unplugging the device (since 2021-01-18 03:07:18 UTC)

Should I test the next version too?
Comment 28 Lorenzo Bianconi 2021-01-18 20:50:01 UTC
(In reply to Moises Lima from comment #27)
> Hi Lorenzo,
> 
> Tested with (kernel 5.10.7) patched:
> mt7601u: check the status of device in calibration (wireless-drivers-next
> tree)
> mt7601u: process URBs in status EPROTO properly
> 0001-mt7601u-fix-kernel-crash-unplugging-the-device
> mt7601u: fix rx buffer refcounting
> 
> No crash on normal usage or unplugging the device (since 2021-01-18 03:07:18
> UTC)
> 
> Should I test the next version too?

Hi Moises,

so no more issues applying those patches, correct? If so, no more tests needed.
Comment 29 Moises Lima 2021-01-18 22:03:13 UTC
Hi Lorenzo,

Yes, everything is fixed

Thank you!
Comment 30 Matthias 2021-01-20 08:07:52 UTC
👍🎉🥳

Note You need to log in before you can comment on or make changes to this bug.