Bug 202241 - AMD RYZEN: IO_PAGE_FAULT when loading mt76x0u driver
Summary: AMD RYZEN: IO_PAGE_FAULT when loading mt76x0u driver
Status: RESOLVED DUPLICATE of bug 202673
Alias: None
Product: Drivers
Classification: Unclassified
Component: IOMMU (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_iommu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-12 10:01 UTC by Michael
Modified: 2019-03-28 13:30 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.20
Tree: Mainline
Regression: No


Attachments
0001-mt76x02-usb-mcu-limit-sg-length.patch (1.13 KB, text/plain)
2019-02-14 15:12 UTC, Stanislaw Gruszka
Details
0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch (5.65 KB, text/plain)
2019-02-14 17:31 UTC, Stanislaw Gruszka
Details
0002-mt76usb-do-not-use-page_frag_alloc.patch (1.66 KB, text/plain)
2019-02-14 17:32 UTC, Stanislaw Gruszka
Details
0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch (892 bytes, text/plain)
2019-02-14 18:53 UTC, Stanislaw Gruszka
Details
amd_iommu.patch (1.06 KB, text/plain)
2019-02-28 11:13 UTC, Stanislaw Gruszka
Details

Description Michael 2019-01-12 10:01:22 UTC
loading WiFi driver (mt76x0u) caused an IO_PAGE_FAULT on AMD RYZEN 1700.

Expected behaviour when TP-LINK Archer T2UH is pluged in (INTEL system):
$ dmesg
[ 132.682145] usb 1-3: new high-speed USB device number 7 using xhci_hcd
[ 132.838532] usb 1-3: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00
[ 132.838543] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 132.838550] usb 1-3: Product: WiFi
[ 132.838557] usb 1-3: Manufacturer: MediaTek
[ 132.838563] usb 1-3: SerialNumber: 1.0
[ 133.406147] usb 1-3: reset high-speed USB device number 7 using xhci_hcd
[ 133.557810] mt76x0u 1-3:1.0: ASIC revision: 76100002 MAC revision: 76502000
[ 134.589471] mt76x0u 1-3:1.0: EEPROM ver:02 fae:01
[ 134.593977] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[ 134.594305] usbcore: registered new interface driver mt76x0u
[ 134.627225] mt76x0u 1-3:1.0 wlp0s20f0u3: renamed from wlan0

Behaviour on AMD RYZEN 1700 system:
$ dmesg
[ 32.126605] usb 5-4.1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00
[ 32.126610] usb 5-4.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 32.126613] usb 5-4.1: Product: WiFi
[ 32.126615] usb 5-4.1: Manufacturer: MediaTek
[ 32.126617] usb 5-4.1: SerialNumber: 1.0
[ 32.236956] usb 5-4.1: reset high-speed USB device number 4 using xhci_hcd
[ 32.333111] mt76x0u 5-4.1:1.0: ASIC revision: 76100002 MAC revision: 76502000
[ 32.675686] xhci_hcd 0000:27:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000ffb50308 flags=0x0000]
[ 33.676816] mt76x0u 5-4.1:1.0: firmware upload timed out

Remarks:
Running Arch Linux on both systems
4.20.0-arch1-1-ARCH
linux-firmware 20181218.0f22c85-1

I encountered this issue during tests of a WiFi USB dongle: TP-LINK Archer T2UH 
Read more here:
https://github.com/ZerBea/hcxdumptool/issues/42#issuecomment-453109280
Comment 1 Michael 2019-01-12 18:08:02 UTC
I'm sure this issue is iommu caused.
booting AMD RYZEN 1700 system using iommu=off

$ dmesg
[   19.250313] usb 5-4.1: new high-speed USB device number 4 using xhci_hcd
[   19.356704] usb 5-4.1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00
[   19.356708] usb 5-4.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   19.356711] usb 5-4.1: Product: WiFi
[   19.356713] usb 5-4.1: Manufacturer: MediaTek
[   19.356715] usb 5-4.1: SerialNumber: 1.0
[   19.510473] usb 5-4.1: reset high-speed USB device number 4 using xhci_hcd
[   19.606128] mt76x0u 5-4.1:1.0: ASIC revision: 76100002 MAC revision: 76502000
[   20.606483] mt76x0u 5-4.1:1.0: EEPROM ver:02 fae:01
[   20.627104] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[   20.627568] usbcore: registered new interface driver mt76x0u
[   20.635820] mt76x0u 5-4.1:1.0 wlp39s0f3u4u1: renamed from wlan0

BTW:
this issue is not related to this issue:
https://bugzilla.kernel.org/show_bug.cgi?id=202243
Comment 2 Stanislaw Gruszka 2019-02-14 15:12:56 UTC
Created attachment 281135 [details]
0001-mt76x02-usb-mcu-limit-sg-length.patch

Could you check attached patch with AMD IOMMU enabled, I would like to get confirmation this is actual bug.
Comment 3 Michael 2019-02-14 15:50:18 UTC
Running kernel 4.20.7-arch1-1-ARCH and applied patch:

TP-LINK Archer T2UH:
[29963.432568] usb 1-1: new high-speed USB device number 4 using xhci_hcd
[29963.679764] usb 1-1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00
[29963.679768] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[29963.679770] usb 1-1: Product: WiFi
[29963.679772] usb 1-1: Manufacturer: MediaTek
[29963.679774] usb 1-1: SerialNumber: 1.0
[29963.968330] usb 1-1: reset high-speed USB device number 4 using xhci_hcd
[29964.238240] mt76x0u 1-1:1.0: ASIC revision: 76100002 MAC revision: 76502000
[29964.645325] xhci_hcd 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000f4280308 flags=0x0000]
[29965.669604] mt76x0u 1-1:1.0: firmware upload timed out
[29968.883711] mt76x0u 1-1:1.0: vendor request req:06 off:0800 failed:-110

ASUS USB AC51
[29963.679764] usb 1-1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00
[29963.679768] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[29963.679770] usb 1-1: Product: WiFi
[29963.679772] usb 1-1: Manufacturer: MediaTek
[29963.679774] usb 1-1: SerialNumber: 1.0
[29963.968330] usb 1-1: reset high-speed USB device number 4 using xhci_hcd
[29964.238240] mt76x0u 1-1:1.0: ASIC revision: 76100002 MAC revision: 76502000
[29964.645325] xhci_hcd 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000f4280308 flags=0x0000]
[29965.669604] mt76x0u 1-1:1.0: firmware upload timed out
[29968.883711] mt76x0u 1-1:1.0: vendor request req:06 off:0800 failed:-110
[29972.083920] mt76x0u 1-1:1.0: vendor request req:07 off:0080 failed:-110
[29975.283813] mt76x0u 1-1:1.0: vendor request req:06 off:0080 failed:-110
[29978.483610] mt76x0u 1-1:1.0: vendor request req:06 off:0080 failed:-110
[29978.524682] mt76x0u: probe of 1-1:1.0 failed with error -110
[29978.524720] usbcore: registered new interface driver mt76x0u
[30128.385876] usb 1-1: USB disconnect, device number 4
[30134.484290] usb 1-1: new high-speed USB device number 5 using xhci_hcd


We run still into this iommu issue.
Comment 4 Stanislaw Gruszka 2019-02-14 16:42:50 UTC
Thanks for testing , I will have two more patches to try to narrow the problem. 

Have you reported this to AMD IOMMU developers ?
Comment 5 Michael 2019-02-14 16:54:25 UTC
No problem, if I can help, I'll run the tests.
No, issue is only reported here.

It looks like the driver crashed the complete xhci system. It is no longer possible to connect another USB device. Tried this with a Prolific pl2303 straight after the mt76x0 tests.

[30134.484290] usb 1-1: new high-speed USB device number 5 using xhci_hcd
[30139.767504] usb 1-1: device descriptor read/64, error -110
[30155.340694] usb 1-1: device descriptor read/64, error -110
[30155.637273] usb 1-1: new high-speed USB device number 6 using xhci_hcd
[30160.887286] usb 1-1: device descriptor read/64, error -110
[30176.460425] usb 1-1: device descriptor read/64, error -110
[30176.567104] usb usb1-port1: attempt power cycle
[30177.210414] usb 1-1: new high-speed USB device number 7 using xhci_hcd
[30182.433690] usb 1-1: device descriptor read/64, error -110
[30198.006837] usb 1-1: device descriptor read/64, error -110
[30198.303451] usb 1-1: new high-speed USB device number 8 using xhci_hcd
[30203.553439] usb 1-1: device descriptor read/64, error -110
[30219.126632] usb 1-1: device descriptor read/64, error -110
[30219.233298] usb usb1-port1: unable to enumerate USB device
Comment 6 Stanislaw Gruszka 2019-02-14 17:31:45 UTC
Created attachment 281137 [details]
0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
Comment 7 Stanislaw Gruszka 2019-02-14 17:32:26 UTC
Created attachment 281139 [details]
0002-mt76usb-do-not-use-page_frag_alloc.patch

Please test those 2 patches
Comment 8 Michael 2019-02-14 18:34:41 UTC
Removed this patch:
https://bugzilla.kernel.org/attachment.cgi?id=281135

Updated kernel to:
4.20.8-arch1-1-ARCH

Applied both patches and it works:
[  526.505013] usb 1-1: new high-speed USB device number 4 using xhci_hcd
[  526.752620] usb 1-1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00
[  526.752624] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  526.752627] usb 1-1: Product: WiFi
[  526.752630] usb 1-1: Manufacturer: MediaTek
[  526.752632] usb 1-1: SerialNumber: 1.0
[  527.043679] usb 1-1: reset high-speed USB device number 4 using xhci_hcd
[  527.311709] mt76x0u 1-1:1.0: ASIC revision: 76100002 MAC revision: 76502000
[  529.596580] mt76x0u 1-1:1.0: EEPROM ver:02 fae:01
[  529.668154] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[  529.668571] usbcore: registered new interface driver mt76x0u
[  529.675943] mt76x0u 1-1:1.0 wlp3s0f0u1: renamed from wlan0

running iw 5.0.1
phy#0
	Interface wlp3s0f0u1
		ifindex 3
		wdev 0x1
		addr 50:3e:aa:a0:8f:6f
		type managed
		txpower 20.00 dBm
		multicast TXQ:
			qsz-byt	qsz-pkt	flows	drops	marks	overlmt	hashcol	tx-bytes	tx-packets

$ hcxdumptool -I
wlan interfaces:
503eaaa08f6f wlp3s0f0u1 (mt76x0u)

hcxdumptool -i wlp3s0f0u1 -C
initialization...
available channels:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,36,40,44,48,52,56,60,64,100,104,108,112,116,120,124,128,132,136,140,149,153,157,161,165 
terminated...

You did it. Everything is working like expected. Thanks a lot for your effort.
Comment 9 Stanislaw Gruszka 2019-02-14 18:53:36 UTC
Created attachment 281141 [details]
0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch

Cool! But second patch was only for check where the problem is. Please retest with changed second patch i.e with those two:

0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
Comment 10 Michael 2019-02-14 19:35:39 UTC
Applied this 2 patches and tested the driver again:
0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch

Everything is working like expected

BTW:
Tx power still working fine on this driver. hcxdumptool rcascan shows 28 AP's in range. I really hope, we can fix the ugly tx power issue on latest driver, too. The TP-LINK Archer T2UH is a very sensitive and cheap device.
Comment 11 Stanislaw Gruszka 2019-02-15 06:52:07 UTC
Could you also check is just one patch:
0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
is sufficient to make AMD IOMMU work .

Re Tx power: I was not able to reproduce via hcxdumptool because, I had only 2 APs in my neighbourhood. I'll try in the office. However it's unlikely the fix will be available before 5.0 release.
Comment 12 Michael 2019-02-15 10:03:12 UTC
But you fixed all 4.20 driver related issues and that is really great. We can close this issue.

BTW:
Running default settings, hcxdumptool is very, very aggressive. It is designed as  an aggressive penetration tool and int is able to run massive attacks against the targets. Also --do_rcascan is an active scan - we probe every AP from which we received a packet.
I don't like the idea to get RSSI from the device via driver and radiotap header. During several phyisical measurements, I noticed that this values are mostly not correct. Also WiFi is packet oriented. So, if a strange AP or a strange CLIENT respond to our packet, it works, ‎no matter which RSSI value we retrieved from the driver.
Comment 13 Stanislaw Gruszka 2019-02-15 10:09:31 UTC
TX power fix for 5.0, once done will be backported through -stable to this release.

Now I'm just looking for information what is need to backport for older kernel releases to fix this AMD IOMMU issue. If we need just one patch:

0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch

or both are needed:

0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch

?
Comment 14 Stanislaw Gruszka 2019-02-28 11:13:19 UTC
Created attachment 281423 [details]
amd_iommu.patch

Could you test this patch without any other patch with AMD IOMMU enabled . I think it can address root of the problem.
Comment 15 Stanislaw Gruszka 2019-02-28 12:15:48 UTC
amd_iommu.patch is wrong , don't test it . But I still would like to know if 
0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch alone make things work.
Comment 16 Stanislaw Gruszka 2019-03-28 13:30:20 UTC

*** This bug has been marked as a duplicate of bug 202673 ***

Note You need to log in before you can comment on or make changes to this bug.