Bug 215906 - DMAR fault when connected usb hub (xhci_hcd)
Summary: DMAR fault when connected usb hub (xhci_hcd)
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-27 20:21 UTC by Piotr Piórkowski
Modified: 2022-06-20 08:40 UTC (History)
3 users (show)

See Also:
Kernel Version: Tested on 5.15.0-27, 5.17.0-051700-generic (from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.17/)
Tree: Mainline
Regression: Yes


Attachments

Description Piotr Piórkowski 2022-04-27 20:21:03 UTC
Since kernel 5.15 (with kernel 5.13 I see no problem) I have a problem with my USB hub. The device stops working shortly after starting the system.
In dmesg log I see DMAR fault on usb controller


[kwi27 22:03] usb 5-1.2: new high-speed USB device number 3 using xhci_hcd
[  +0,100440] usb 5-1.2: New USB device found, idVendor=1a40, idProduct=0101, bcdDevice= 1.11
[  +0,000004] usb 5-1.2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[  +0,000002] usb 5-1.2: Product: USB 2.0 Hub
[  +0,001002] hub 5-1.2:1.0: USB hub found
[  +0,000133] hub 5-1.2:1.0: 4 ports detected
[  +0,702453] usb 5-1.2.2: new full-speed USB device number 4 using xhci_hcd
[  +0,471198] usb 5-1.2.2: New USB device found, idVendor=047f, idProduct=c025, bcdDevice= 1.35
[  +0,000004] usb 5-1.2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  +0,000002] usb 5-1.2.2: Product: Plantronics C320-M
[  +0,000001] usb 5-1.2.2: Manufacturer: Plantronics
[  +0,000001] usb 5-1.2.2: SerialNumber: B13D8BE491B04E73AEB4C95E162DBE2B
[  +0,255862] mc: Linux media interface: v0.10
[  +0,001057] input: Plantronics Plantronics C320-M as /devices/pci0000:00/0000:00:1c.5/0000:04:00.0/usb5/5-1/5-1.2/5-1.2.2/5-1.2.2:1.3/0003:047F:C025.0004/input/input21
[  +0,060275] plantronics 0003:047F:C025.0004: input,hiddev1,hidraw3: USB HID v1.11 Device [Plantronics Plantronics C320-M] on usb-0000:04:00.0-1.2.2/input3
[  +0,859655] usb 5-1.2.2: Warning! Unlikely big volume range (=8192), cval->res is probably wrong.
[  +0,000003] usb 5-1.2.2: [11] FU [Sidetone Playback Volume] ch = 1, val = 0/8192/1
[  +0,584234] usbcore: registered new interface driver snd-usb-audio
[  +0,229229] xhci_hcd 0000:04:00.0: WARNING: Host System Error
[  +0,000014] DMAR: DRHD: handling fault status reg 2
[  +0,000004] DMAR: [DMA Read NO_PASID] Request device [04:00.0] fault addr 0xfffca000 [fault reason 0x06] PTE Read access is not set
[  +0,031993] xhci_hcd 0000:04:00.0: Host halt failed, -110
[kwi27 22:04] xhci_hcd 0000:04:00.0: xHCI host not responding to stop endpoint command.
[  +0,000003] xhci_hcd 0000:04:00.0: USBSTS: HSE EINT
[  +0,032011] xhci_hcd 0000:04:00.0: Host halt failed, -110
[  +0,000002] xhci_hcd 0000:04:00.0: xHCI host controller not responding, assume dead
[  +0,000017] xhci_hcd 0000:04:00.0: HC died; cleaning up
[  +0,000042] usb 5-1: USB disconnect, device number 2
[  +0,000003] usb 5-1.2: USB disconnect, device number 3
[  +0,000002] usb 5-1.2.2: USB disconnect, device number 4
[  +0,000114] usb 5-1.2.2: 1:0: usb_set_interface failed (-110)
[  +0,000016] usb 5-1.2.2: 1:1: usb_set_interface failed (-19)
[  +0,000011] usb 5-1.2.2: 1:0: usb_set_interface failed (-19)

04:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller (rev 01) (prog-if 30 [XHCI])
	Subsystem: Micro-Star International Co., Ltd. [MSI] VL805/806 xHCI USB 3.0 Controller
	Flags: bus master, fast devsel, latency 0, IRQ 31, IOMMU group 12
	Memory at f7100000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
Comment 1 Royston Shufflebotham 2022-05-01 12:32:00 UTC
I just hit exactly the same issue when upgrading the kernel from v5.13.0-40 to v5.15.0-27. With no devices plugged in, the USB hub reports everything as ok. Plugging in a USB keyboard worked for a minute or two, and then I get exactly the same errors from [+0,229229] to [+0,000004] above.

Same USB controller chipset as OP by the looks of things. I've managed to list the Capabilities in case that's any help:
03:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller (rev 01) (prog-if 30 [XHCI])
        Subsystem: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller
        Flags: bus master, fast devsel, latency 0, IRQ 28, IOMMU group 12
        Memory at e0a00000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable+ Count=1/4 Maskable- 64bit+
        Capabilities: [c4] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci

Downgrading back to v5.13.0-40 fixes the problem.
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-05-05 09:40:06 UTC
I wanted to add this issue to the regression tracking and poke the maintainers, but noticed there is a patch that is being backported right now that might or might not be related (not my area of expertise):
https://lore.kernel.org/all/20220504153117.726462014@linuxfoundation.org/

It's already in 5.18-rc5; could somebody please give it a quick try before I proceed with my initial plan?
Comment 3 Piotr Piórkowski 2022-05-05 12:30:34 UTC
I built myself this kernel 5.18-rc5 (with ubuntu default config), but the problem still exists
Comment 4 Piotr Piórkowski 2022-05-06 15:49:39 UTC
I've misled you a bit by saying that the bug didn't occur on the 5.13 kernel. I tried bisecting on the upstream kernel and it turns out that the problem also occurs on the 5.13 - I build it using ubuntu default config from kernel 5.15.0-27.

So far, the only kernel build I haven't noticed a problem with (excluding kernels 5.4 from Ubuntu 20.04 LTS) is kernel 5.13.0-28-generic form Ubuntu.

Interestingly, I found the sources of this kernel on git kernel.ubuntu.com and built this kernel using this config from kernel 5.15 and the problem also occurred.

It was only when I built this kernel using the default config for this kernel that I stopped seeing the problem.
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-05-09 05:42:03 UTC
Sorry, this is starting to get confusing and hard to follow.

If there is something that used to work with an Ubuntu kernel and stops working there, you might want to report it to the Ubuntu developers, but not here.

This bug tracker care mainly about upstream kernel (see front page), so what happens with a kernel build from the Ubuntu sources (which are known to be modified a lot) is irrelevant and even just mentioning that makes things hard to follow. :-/

Regarding your problem: I'm not familiar with the code that might cause this, but to me it looks a lot like Ubuntu switched on a kernel configuration option that is causing this. If that's the case the problem doesn't qualify as regression, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

The developers nevertheless might be interested in fixing this, but might need more details from you (like the config option that is causing this)
Comment 6 Mathias Nyman 2022-05-09 08:11:22 UTC
[  +0,229229] xhci_hcd 0000:04:00.0: WARNING: Host System Error

The xHC controller reports a catastrophic error, and sets HSE bit.

For PCI xHC controllers the spec lists possible causes as:
host controller PCI parity error, PCI Master Abort, PCI Target Abort.
But DMA issues also possible cause, especially as log shows  DMAR
problems right after this.

Any chance you could bisect this on upstream kernel?
Comment 7 Piotr Piórkowski 2022-05-09 09:00:24 UTC
@Thorsten Leemhuis sorry for misleading you but when adding this bug here, I didn't know it wasn't an upstream regression - at first look it looked that way, as I also observed the problem on the upstream.

So far we only know that in one of the kernel configurations the problem does not occur - but this does not mean that the problem does not exist.

> Any chance you could bisect this on upstream kernel?

I'll try to do it this week
Comment 8 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-06-20 08:40:02 UTC
(In reply to Piotr Piórkowski from comment #7)
>
> > Any chance you could bisect this on upstream kernel?
> I'll try to do it this week

And news? Was the issue maybe fixed meanwhile?

Note You need to log in before you can comment on or make changes to this bug.