Bug 215906
Summary: | DMAR fault when connected usb hub (xhci_hcd) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Piotr Piórkowski (qba100) |
Component: | USB | Assignee: | Default virtual assignee for Drivers/USB (drivers_usb) |
Status: | NEW --- | ||
Severity: | normal | CC: | mathias.nyman, regressions, royston |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | Tested on 5.15.0-27, 5.17.0-051700-generic (from https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.17/) | Tree: | Mainline |
Regression: | Yes |
Description
Piotr Piórkowski
2022-04-27 20:21:03 UTC
I just hit exactly the same issue when upgrading the kernel from v5.13.0-40 to v5.15.0-27. With no devices plugged in, the USB hub reports everything as ok. Plugging in a USB keyboard worked for a minute or two, and then I get exactly the same errors from [+0,229229] to [+0,000004] above. Same USB controller chipset as OP by the looks of things. I've managed to list the Capabilities in case that's any help: 03:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller (rev 01) (prog-if 30 [XHCI]) Subsystem: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller Flags: bus master, fast devsel, latency 0, IRQ 28, IOMMU group 12 Memory at e0a00000 (64-bit, non-prefetchable) [size=4K] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/4 Maskable- 64bit+ Capabilities: [c4] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: xhci_hcd Kernel modules: xhci_pci Downgrading back to v5.13.0-40 fixes the problem. I wanted to add this issue to the regression tracking and poke the maintainers, but noticed there is a patch that is being backported right now that might or might not be related (not my area of expertise): https://lore.kernel.org/all/20220504153117.726462014@linuxfoundation.org/ It's already in 5.18-rc5; could somebody please give it a quick try before I proceed with my initial plan? I built myself this kernel 5.18-rc5 (with ubuntu default config), but the problem still exists I've misled you a bit by saying that the bug didn't occur on the 5.13 kernel. I tried bisecting on the upstream kernel and it turns out that the problem also occurs on the 5.13 - I build it using ubuntu default config from kernel 5.15.0-27. So far, the only kernel build I haven't noticed a problem with (excluding kernels 5.4 from Ubuntu 20.04 LTS) is kernel 5.13.0-28-generic form Ubuntu. Interestingly, I found the sources of this kernel on git kernel.ubuntu.com and built this kernel using this config from kernel 5.15 and the problem also occurred. It was only when I built this kernel using the default config for this kernel that I stopped seeing the problem. Sorry, this is starting to get confusing and hard to follow. If there is something that used to work with an Ubuntu kernel and stops working there, you might want to report it to the Ubuntu developers, but not here. This bug tracker care mainly about upstream kernel (see front page), so what happens with a kernel build from the Ubuntu sources (which are known to be modified a lot) is irrelevant and even just mentioning that makes things hard to follow. :-/ Regarding your problem: I'm not familiar with the code that might cause this, but to me it looks a lot like Ubuntu switched on a kernel configuration option that is causing this. If that's the case the problem doesn't qualify as regression, as explained here: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html The developers nevertheless might be interested in fixing this, but might need more details from you (like the config option that is causing this) [ +0,229229] xhci_hcd 0000:04:00.0: WARNING: Host System Error The xHC controller reports a catastrophic error, and sets HSE bit. For PCI xHC controllers the spec lists possible causes as: host controller PCI parity error, PCI Master Abort, PCI Target Abort. But DMA issues also possible cause, especially as log shows DMAR problems right after this. Any chance you could bisect this on upstream kernel? @Thorsten Leemhuis sorry for misleading you but when adding this bug here, I didn't know it wasn't an upstream regression - at first look it looked that way, as I also observed the problem on the upstream.
So far we only know that in one of the kernel configurations the problem does not occur - but this does not mean that the problem does not exist.
> Any chance you could bisect this on upstream kernel?
I'll try to do it this week
(In reply to Piotr Piórkowski from comment #7) > > > Any chance you could bisect this on upstream kernel? > I'll try to do it this week And news? Was the issue maybe fixed meanwhile? |