loading WiFi driver (mt76x0u) caused an IO_PAGE_FAULT on AMD RYZEN 1700. Expected behaviour when TP-LINK Archer T2UH is pluged in (INTEL system): $ dmesg [ 132.682145] usb 1-3: new high-speed USB device number 7 using xhci_hcd [ 132.838532] usb 1-3: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00 [ 132.838543] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 132.838550] usb 1-3: Product: WiFi [ 132.838557] usb 1-3: Manufacturer: MediaTek [ 132.838563] usb 1-3: SerialNumber: 1.0 [ 133.406147] usb 1-3: reset high-speed USB device number 7 using xhci_hcd [ 133.557810] mt76x0u 1-3:1.0: ASIC revision: 76100002 MAC revision: 76502000 [ 134.589471] mt76x0u 1-3:1.0: EEPROM ver:02 fae:01 [ 134.593977] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht' [ 134.594305] usbcore: registered new interface driver mt76x0u [ 134.627225] mt76x0u 1-3:1.0 wlp0s20f0u3: renamed from wlan0 Behaviour on AMD RYZEN 1700 system: $ dmesg [ 32.126605] usb 5-4.1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00 [ 32.126610] usb 5-4.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 32.126613] usb 5-4.1: Product: WiFi [ 32.126615] usb 5-4.1: Manufacturer: MediaTek [ 32.126617] usb 5-4.1: SerialNumber: 1.0 [ 32.236956] usb 5-4.1: reset high-speed USB device number 4 using xhci_hcd [ 32.333111] mt76x0u 5-4.1:1.0: ASIC revision: 76100002 MAC revision: 76502000 [ 32.675686] xhci_hcd 0000:27:00.3: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000ffb50308 flags=0x0000] [ 33.676816] mt76x0u 5-4.1:1.0: firmware upload timed out Remarks: Running Arch Linux on both systems 4.20.0-arch1-1-ARCH linux-firmware 20181218.0f22c85-1 I encountered this issue during tests of a WiFi USB dongle: TP-LINK Archer T2UH Read more here: https://github.com/ZerBea/hcxdumptool/issues/42#issuecomment-453109280
I'm sure this issue is iommu caused. booting AMD RYZEN 1700 system using iommu=off $ dmesg [ 19.250313] usb 5-4.1: new high-speed USB device number 4 using xhci_hcd [ 19.356704] usb 5-4.1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00 [ 19.356708] usb 5-4.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 19.356711] usb 5-4.1: Product: WiFi [ 19.356713] usb 5-4.1: Manufacturer: MediaTek [ 19.356715] usb 5-4.1: SerialNumber: 1.0 [ 19.510473] usb 5-4.1: reset high-speed USB device number 4 using xhci_hcd [ 19.606128] mt76x0u 5-4.1:1.0: ASIC revision: 76100002 MAC revision: 76502000 [ 20.606483] mt76x0u 5-4.1:1.0: EEPROM ver:02 fae:01 [ 20.627104] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' [ 20.627568] usbcore: registered new interface driver mt76x0u [ 20.635820] mt76x0u 5-4.1:1.0 wlp39s0f3u4u1: renamed from wlan0 BTW: this issue is not related to this issue: https://bugzilla.kernel.org/show_bug.cgi?id=202243
Created attachment 281135 [details] 0001-mt76x02-usb-mcu-limit-sg-length.patch Could you check attached patch with AMD IOMMU enabled, I would like to get confirmation this is actual bug.
Running kernel 4.20.7-arch1-1-ARCH and applied patch: TP-LINK Archer T2UH: [29963.432568] usb 1-1: new high-speed USB device number 4 using xhci_hcd [29963.679764] usb 1-1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00 [29963.679768] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [29963.679770] usb 1-1: Product: WiFi [29963.679772] usb 1-1: Manufacturer: MediaTek [29963.679774] usb 1-1: SerialNumber: 1.0 [29963.968330] usb 1-1: reset high-speed USB device number 4 using xhci_hcd [29964.238240] mt76x0u 1-1:1.0: ASIC revision: 76100002 MAC revision: 76502000 [29964.645325] xhci_hcd 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000f4280308 flags=0x0000] [29965.669604] mt76x0u 1-1:1.0: firmware upload timed out [29968.883711] mt76x0u 1-1:1.0: vendor request req:06 off:0800 failed:-110 ASUS USB AC51 [29963.679764] usb 1-1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00 [29963.679768] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [29963.679770] usb 1-1: Product: WiFi [29963.679772] usb 1-1: Manufacturer: MediaTek [29963.679774] usb 1-1: SerialNumber: 1.0 [29963.968330] usb 1-1: reset high-speed USB device number 4 using xhci_hcd [29964.238240] mt76x0u 1-1:1.0: ASIC revision: 76100002 MAC revision: 76502000 [29964.645325] xhci_hcd 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000f4280308 flags=0x0000] [29965.669604] mt76x0u 1-1:1.0: firmware upload timed out [29968.883711] mt76x0u 1-1:1.0: vendor request req:06 off:0800 failed:-110 [29972.083920] mt76x0u 1-1:1.0: vendor request req:07 off:0080 failed:-110 [29975.283813] mt76x0u 1-1:1.0: vendor request req:06 off:0080 failed:-110 [29978.483610] mt76x0u 1-1:1.0: vendor request req:06 off:0080 failed:-110 [29978.524682] mt76x0u: probe of 1-1:1.0 failed with error -110 [29978.524720] usbcore: registered new interface driver mt76x0u [30128.385876] usb 1-1: USB disconnect, device number 4 [30134.484290] usb 1-1: new high-speed USB device number 5 using xhci_hcd We run still into this iommu issue.
Thanks for testing , I will have two more patches to try to narrow the problem. Have you reported this to AMD IOMMU developers ?
No problem, if I can help, I'll run the tests. No, issue is only reported here. It looks like the driver crashed the complete xhci system. It is no longer possible to connect another USB device. Tried this with a Prolific pl2303 straight after the mt76x0 tests. [30134.484290] usb 1-1: new high-speed USB device number 5 using xhci_hcd [30139.767504] usb 1-1: device descriptor read/64, error -110 [30155.340694] usb 1-1: device descriptor read/64, error -110 [30155.637273] usb 1-1: new high-speed USB device number 6 using xhci_hcd [30160.887286] usb 1-1: device descriptor read/64, error -110 [30176.460425] usb 1-1: device descriptor read/64, error -110 [30176.567104] usb usb1-port1: attempt power cycle [30177.210414] usb 1-1: new high-speed USB device number 7 using xhci_hcd [30182.433690] usb 1-1: device descriptor read/64, error -110 [30198.006837] usb 1-1: device descriptor read/64, error -110 [30198.303451] usb 1-1: new high-speed USB device number 8 using xhci_hcd [30203.553439] usb 1-1: device descriptor read/64, error -110 [30219.126632] usb 1-1: device descriptor read/64, error -110 [30219.233298] usb usb1-port1: unable to enumerate USB device
Created attachment 281137 [details] 0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch
Created attachment 281139 [details] 0002-mt76usb-do-not-use-page_frag_alloc.patch Please test those 2 patches
Removed this patch: https://bugzilla.kernel.org/attachment.cgi?id=281135 Updated kernel to: 4.20.8-arch1-1-ARCH Applied both patches and it works: [ 526.505013] usb 1-1: new high-speed USB device number 4 using xhci_hcd [ 526.752620] usb 1-1: New USB device found, idVendor=148f, idProduct=761a, bcdDevice= 1.00 [ 526.752624] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 526.752627] usb 1-1: Product: WiFi [ 526.752630] usb 1-1: Manufacturer: MediaTek [ 526.752632] usb 1-1: SerialNumber: 1.0 [ 527.043679] usb 1-1: reset high-speed USB device number 4 using xhci_hcd [ 527.311709] mt76x0u 1-1:1.0: ASIC revision: 76100002 MAC revision: 76502000 [ 529.596580] mt76x0u 1-1:1.0: EEPROM ver:02 fae:01 [ 529.668154] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' [ 529.668571] usbcore: registered new interface driver mt76x0u [ 529.675943] mt76x0u 1-1:1.0 wlp3s0f0u1: renamed from wlan0 running iw 5.0.1 phy#0 Interface wlp3s0f0u1 ifindex 3 wdev 0x1 addr 50:3e:aa:a0:8f:6f type managed txpower 20.00 dBm multicast TXQ: qsz-byt qsz-pkt flows drops marks overlmt hashcol tx-bytes tx-packets $ hcxdumptool -I wlan interfaces: 503eaaa08f6f wlp3s0f0u1 (mt76x0u) hcxdumptool -i wlp3s0f0u1 -C initialization... available channels: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,36,40,44,48,52,56,60,64,100,104,108,112,116,120,124,128,132,136,140,149,153,157,161,165 terminated... You did it. Everything is working like expected. Thanks a lot for your effort.
Created attachment 281141 [details] 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch Cool! But second patch was only for check where the problem is. Please retest with changed second patch i.e with those two: 0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch
Applied this 2 patches and tested the driver again: 0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch Everything is working like expected BTW: Tx power still working fine on this driver. hcxdumptool rcascan shows 28 AP's in range. I really hope, we can fix the ugly tx power issue on latest driver, too. The TP-LINK Archer T2UH is a very sensitive and cheap device.
Could you also check is just one patch: 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch is sufficient to make AMD IOMMU work . Re Tx power: I was not able to reproduce via hcxdumptool because, I had only 2 APs in my neighbourhood. I'll try in the office. However it's unlikely the fix will be available before 5.0 release.
But you fixed all 4.20 driver related issues and that is really great. We can close this issue. BTW: Running default settings, hcxdumptool is very, very aggressive. It is designed as an aggressive penetration tool and int is able to run massive attacks against the targets. Also --do_rcascan is an active scan - we probe every AP from which we received a packet. I don't like the idea to get RSSI from the device via driver and radiotap header. During several phyisical measurements, I noticed that this values are mostly not correct. Also WiFi is packet oriented. So, if a strange AP or a strange CLIENT respond to our packet, it works, ‎no matter which RSSI value we retrieved from the driver.
TX power fix for 5.0, once done will be backported through -stable to this release. Now I'm just looking for information what is need to backport for older kernel releases to fix this AMD IOMMU issue. If we need just one patch: 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch or both are needed: 0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch 0002-mt76usb-do-not-use-compound-head-page-for-SG-I-O.patch ?
Created attachment 281423 [details] amd_iommu.patch Could you test this patch without any other patch with AMD IOMMU enabled . I think it can address root of the problem.
amd_iommu.patch is wrong , don't test it . But I still would like to know if 0001-mt76x02u-use-usb_bulk_msg-to-upload-firmware.patch alone make things work.
*** This bug has been marked as a duplicate of bug 202673 ***