Bug 220016
Description
Joe Doss
2025-04-16 00:25:05 UTC
Created attachment 307968 [details]
6.13.7-200.fc41.x86_64 journalctl -k output
Would be great if you could bisect: https://docs.kernel.org/admin-guide/bug-bisect.html What if you disconnect the hub and connect an ordinary device in its place? This log looks like several USB ports on your computer should be completely dead. And some SATA ports too. This looks like devm_request_mem_region() is failing in usb_hcd_pci_probe(), and maybe pcim_iomap() in ahci (SATA). So something to do with PCI, ACPI or other lower level stuff. You may try your luck reassigning this bug to drivers/PCI, they seem to be responsive to bugzilla reports. (In reply to Michał Pecio from comment #3) > What if you disconnect the hub and connect an ordinary device in its place? > This log looks like several USB ports on your computer should be completely > dead. And some SATA ports too. I just swapped my mouse into the USB A 3.0 port that my hub was in and it is dead too. > This looks like devm_request_mem_region() is failing in usb_hcd_pci_probe(), > and maybe pcim_iomap() in ahci (SATA). So something to do with PCI, ACPI or > other lower level stuff. > > You may try your luck reassigning this bug to drivers/PCI, they seem to be > responsive to bugzilla reports. OK sounds good. I will reassign this to drivers/PCI to see if those folks have any ideas what is going on here. Created attachment 307975 [details]
Good boot with 6.13 kernel
Created attachment 307976 [details]
Bad boot with 6.14 kernel
I appear to be having the same problem, and I've attached journal logs from a good boot and a bad boot. In my case I have a mouse and keyboard directly plugged into the motherboard. The keyboard was not working once I upgraded to kernel 6.14 but the mouse was working. I stepped back to kernel 6.13, and both mouse and keyboard were working. I tried switching which connectors the mouse and keyboard were plugged into, and at that point the mouse wasn't working but the keyboard was working. So with kernel 6.14 several of the motherboard jacks were not functional. I looked at the pci information, and the usb devices are: usb1/2 0000:23:00.0 ASMedia Technology Inc. ASM3242 USB 3.2 Host Controller usb3/4 0000:28:00.1 Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller usb5/6 0000:28:00.3 Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller usb7/8 0000:2d:00.3 Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller usb9/10 0000:05:00.3 Advanced Micro Devices, Inc. [AMD] Starship USB 3.0 Host Controller Apparently, with 6.14 one of the Matisse controllers was recognized, but the other was not. And neither of the Starship controllers were recognized. I am now able to build a working 6.14 kernel. The only thing I had to change was to turn off CONFIG_PCI_REALLOC_ENABLE_AUTO. With CONFIG_PCI_REALLOC_ENABLE_AUTO off, all my USB controllers work. With CONFIG_PCI_REALLOC_ENABLE_AUTO=y, some of my USB controllers don't initialize. Setting pci=realloc=off in the kernel cmdline works as well. The question now is whether this is a kernel bug or a motherboard bug. If it is a mobo bug then I guess a quirk would be appropriate. Does also 6.13 fail if pci=realloc is on the kernel command line? It's not clear to me if PCI resource reallocation got enabled along with the kernel upgrade, that is, if Fedora packaging e.g. decided to enable CONFIG_PCI_REALLOC_ENABLE_AUTO in kernel's .config which is orthogonal to kernel version changes. Please provide /proc/iomem from both working and non-working configuration. Preferrably with the otherwise identical kernel version, if you can't get it from the non-working configuration at all due to lack of keyboard, I'll manage with the log from working only (things can be inferred from dmesg but it's very very tedious process). Could you also clarify what you meant with this: "This issue also seems to be present in 6.15.0 as well."? Kernel version 6.15 is not released yet, 6.15 cycle is only at -rc2 at this point. I'll try to look more into this next week. Created attachment 307986 [details]
6.13.10 with realloc=on, no keyboard
Created attachment 307987 [details]
6.13.10 with realloc=off, good keyboard
Created attachment 307988 [details]
6.14.2 with realloc=on, no keyboard
Created attachment 307989 [details]
6.14.2 with realloc=off, good keyboard
I ran the tests you requested. I attached /proc/iomem for 6.13.10 both with realloc=off and realloc=on. I repeated the tests with 6.14.2. For both kernel versions, realloc=on results in a non-functional USB keyboard. For both kernels, realloc=off works properly. I had been running Fedora 41 with their 6.13 kernel. It was compiled with realloc=off. I then upgraded to Fedora 42 with their 6.14 kernel. It was compiled with realloc=on. Therefore, this is not a new problem with 6.14; it is just that the kernel option was changed in Fedora 42. I cannot comment about 6.15 but I think Joe was the one who tried it. Perhaps he can comment on that. However, since both 6.13 and 6.14 don't work if realloc=on, it probably doesn't matter. Please let me know if there is any other data you need. I just realized that I need to read /proc/iomem as root. I'll recreate the attachments - please give me a few minutes. Created attachment 307990 [details]
6.13.10 with realloc=on, no keyboard
Created attachment 307991 [details]
6.13.10 with realloc=off, good keyboard
Created attachment 307992 [details]
6.14.2 with realloc=on, no keyboard
Created attachment 307993 [details]
6.14.2 with realloc=off, good keyboard
I updated the attachments. (In reply to Ilpo Järvinen from comment #11) > Could you also clarify what you meant with this: "This issue also seems to > be present in 6.15.0 as well."? Kernel version 6.15 is not released yet, > 6.15 cycle is only at -rc2 at this point. I was just trying the kernels that were built on Fedora's build system to see if the issues persisted with different versions. Steven is correct that this doesn't really matter at this point. Created attachment 308020 [details]
Add min_bridge_window cmdline parameter
It seems the FW left zero extra tail room in the root iomem windows which makes it a bit hard for pci=realloc to work. It doesn't work well with the default sizing algorithm that is greedy and tries to reserves more space than is really required.
Maybe the attached patch helps more devices to assign their resources:
So please test with pci=realloc,min_bridge_window
It could still leads to a failure or some failures because it seems there simply is not enough space to fit everything even without pci=realloc reserving that extra empty space.
Please also add dyndbg="file drivers/pci/*.c +p" on kernel cmdline to get a bit more information.
Created attachment 308021 [details]
Test with min_bridge_window patch
I've run the experiment you requested and attached a log file. This did not help with the problem - I still had no keyboard - but hopefully the extra logging will give you more information. For some background, CONFIG_PCI_REALLOC_ENABLE_AUTO was turned on for the first time with the 6.14 kernels in Fedora, prior to those it was off. I am turning it off again because it seems to cause a number of issues. That said, I do think those issues need to be fixed, so while new kernels (6.14.5+ and 6.15-rc4+) in fedora will not reproduce this problem, it is likely worth debugging with realloc=on Thanks Justin. I'll continue to work with Ilpo if there are any proposed fixes or additional tests that I can help with. Created attachment 308058 [details]
Resource reset logging patch (only for 6.15+ kernels)
Justin, sure, my intention is to try to fix these issues now that we've becomes aware of them. I'm just not sure if that can be done for pre-6.15 kernels as touching the resource fitting / assignment algorithm is pretty precarious. Steven, could you try with 6.15-rc based kernels with that min bridge window patch + pci=realloc,min_bridge_window? Please also include the reset debugging patch I just attached and dyndbg on the command line. There are number of things in the 6.14 log I don't understand and I suspect I might have fixed at least one of those in my recent changes to the resource fitting algorithm. The failure to shrink bridge window size for 0000:22:02 from 0x500000 to 0x100000 as IOV resources should be optional, is particularly perplexing. Instead, the non-64bit windows vanish from many bridges after bridge windows are released, which could be due to improper reset of a resource after one of the steps which the later steps fail to undo for some reason. Created attachment 308061 [details]
Test with 6.15 rc plus two patches
I ran the experiment and attached the journal log. |