Created attachment 298255 [details] output of lsusb -v We found some USB devices (keyboard, storage) will fail to be detected on kernel 5.13 during boot, but it's working OK after re-plug the device. After looking into the dmesg during boot, there's an error message as down below [ 39.350435] hub 1-0:1.0: USB hub found [ 39.398835] hub 1-0:1.0: 12 ports detected [ 39.622744] usb usb1-port3: couldn't allocate usb_device Then we try to reproduce it on older kernel (5.4 and 5.8), there's no such problem. After bisecting, we found the commit https://github.com/torvalds/linux/commit/90d28fb53d4a51299ff324dede015d5cb11b88a2 makes the difference. It seems that hub_power_on_good_delay is not long enough for this hub. The bPwrOn2PwrGood in the Hub Descriptor (please refer to the attached lsusb-v.txt) is 20ms, which seems pretty normal when I check the same thing on other hubs. Can we totally trust this? Please suggest what we should do for this kind of hub. Thanks
Created attachment 298257 [details] output of lsusb -t
Forgot to mention that this problem only happens on the high-speed hub in the attached file. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
You can try changing the bPwrOn2PwrGood value in the kernel source to see if that will make any difference. The line to change is in drivers/usb/host/xhci-hub.c in the function xhci_common_hub_descriptor(). It's at the start of the function: desc->bPwrOn2PwrGood = 10; /* xhci section 5.4.9 says 20ms max */ The value gets multiplied by two, so if you change it to 50 then the delay will become 100 ms. However I suspect that the length of this delay is not the cause of the problem you see. The actual error occurs in the usb_alloc_dev() function in drivers/usb/core/usb.c. You can try adding some printk statements in there to see which of the "return NULL" statements is the one you're getting.
The patch author Chunfeng Yun also suggest to change the bPwrOn2PwrGood. It still fails even I set the value to 20 and 40. Then I set to a 100 to see if it makes difference. The value 100 works. I'll add some printk to get more information. Thanks
I add printks to trace where the error happens. The dmesg show as down below [ 44.488980] xhci_hcd 0000:00:14.0: FIXME: allocate a command ring segment [ 44.488981] usb_alloc_dev: hcd driver alloc dev exist but return 0 [ 44.488982] usb usb1-port4: couldn't allocate usb_device The first line show for the "xhci_dbg(xhci, "FIXME: allocate a command ring segment\n");" in xhci_alloc_dev() The second line shows the printk I add in usb_alloc_dev() /* Root hubs aren't true devices, so don't allocate HCD resources */ if (usb_hcd->driver->alloc_dev && parent && !usb_hcd->driver->alloc_dev(usb_hcd, dev)) { + printk("%s: hcd driver alloc dev exist but return 0", __func__); usb_put_hcd(bus_to_hcd(bus)); kfree(dev); return NULL; } The usb_hcd->driver->alloc_dev is equal to xhci_alloc_dev and fails the xhci_queue_slot_control(). Does that mean anything regarding this?
If the controller doesn't work, there will be log "xHCI dying or halted, can't queue_command"? printed in queue_command()
This means the problem is somewhere below xhci_alloc_dev. In other words, the problem is in the xhci-hcd driver, and it has no connection to the commit you found. Except perhaps that increasing the time delay manages to avoid the bug somehow. You can try enabling dynamic debugging for xhci-hcd. On the boot command line, add: dyndbg="module xhci_hcd =p" and see what shows up in the dmesg log.
After lots of experiments, I agree the commit is not the root cause. Since the commit applied on 5.11 has no problem, but it happens when applying this commit on 5.4 base. The confusing part is, I can't reproduce it by enabling the dynamic debug on xhci-hcd even with the kernel 5.4 base. I'm still trying to find out what exactly commit cause the difference.
Kishon reported this same issue. xhci_alloc_dev()->xhci_queue_slot_control()->queue_command() fails because xhci->xhc_state still had a XHCI_STATE_HALTED bit set. This is possible because the xhci driver will first add the high-speed hcd/roothub, then the super-speed hcd/roothub. xHC controller is started and xhci->xhc_state cleared only when the second, super-speed hcd is added. So if we are unlucky the first hs hcd will create the hs roothub device, hub driver binds to it, hub inits, and hub thread start enumerating devies before xHC is running. xHC port registers can be read when xHC is powered, before it's is running, so hub thread is able to see the connected devices early.
Created attachment 298391 [details] Delay registration of secondary root hub Would something like the attached patch fix the problem? It delays registration of the secondary root hub until after the primary hcd is started, just before the primary root hub is registered. Chris, can you test this patch?
(In reply to Alan Stern from comment #10) > > Would something like the attached patch fix the problem? It delays > registration of the secondary root hub until after the primary hcd is > started, just before the primary root hub is registered. If I understood the problem correctly we would need to delay the registration of the primary roothub until secondary hcd usb_hcd_add() reaches hcd->driver->start() xhci pci driver will first create and add primary hcd, then create and add secondary hcd. xhci_pci_probe() /* create AND add primary hcd in one go */ usb_hcd_pci_probe(dev, id, &hc_driver) primary_hcd = dev_get_drvdata(&dev->dev) /* create secondary hcd */ secondary_hcd = usb_create_shared_hcd(...,primary_hcd) /* add secondary hcd */ usb_add_hcd(secondary_hcd) If we force xhci driver to always first create both hcds, and then add them, we could maybe do somethinh like this: usb_hcd_add() ... if ((is_primary_hcd(hcd) && !hcd->shared_hcd) || !is_primary_hcd) register_roothub(primary_hcd); if(!is_primary_hcd(hcd)) register_roothub(hcd); If this makes any sense I'll turn this into real code.
I may have gotten the primary and secondary mixed up. Anyway, the general idea is simple: Don't register the first root hub along with the first hcd; instead register both root hubs along with the second hcd. The only problem is how to tell usb_hcd_add that the first hcd is going to be shared. You'll probably need to modify xhci-hcd slightly to make this possible. Maybe add a new flag.
Mathias, thanks for the analysis since I've ran out of ideas for debugging this. In my case, even a single line printk in either xhci-hub.c or hub.c before hub_power_on() can make difference (undetected -> detected). But I can always reproduce with the clean code. Let me know if there's any patch I can help test. Thanks
Thanks to the upstream patch from Kishon. I've tested and it works on my kernel 5.4 base which used to have problem.
> Thanks to the upstream patch from Kishon. Which upstream patch are you talking about? I am running into this issue and I was wondering if you could point to the actual patch. Thanks!
Any progress on this? This is getting fixed/circumvented and rebroken repeatedly in downstream releases like Ubuntu. A definitive fix would be highly appreciated.
Created attachment 300715 [details] output of journalctl -b | grep -E "(xhci|\: usb)"
(In reply to stygarfield from comment #17) > Created attachment 300715 [details] > output of journalctl -b | grep -E "(xhci|\: usb)" I'm having the same issue I believe, and am hoping that there is a fix out there. This started happening fairly recently.
The roothub deferring patches fixing this triggered a race, and were reverted. That race should now be fixed in 5.16-rc3 and in stable kernels with patches: 6ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0 race") 6cca13de26ee ("usb: hub: Fix locking issues with address0_mutex") I'm planning on reapplying the roothub deferring patches but would need some help testing them, making sure they work and not trigger any new issues. Patches attached (based on 5.18-rc1). 0001-usb-core-hcd-Add-support-for-deferring-roothub-regis.patch 0002-xhci-Set-HCD-flag-to-defer-primary-roothub-registrat.patch Thanks
Created attachment 300723 [details] Patch 0001 usb core hcd: Add support for deferring roothub
Created attachment 300724 [details] PAtch 0002 xhci: set hcd flag to defer primary roothub registration
Hi, everyone. What's the status on this?
(In reply to Mathias Nyman from comment #21) > Created attachment 300724 [details] > PAtch 0002 xhci: set hcd flag to defer primary roothub registration Reapplied on linux-next? Haven't seen this on linux-next.
(In reply to Hongyu Xie from comment #23) > Reapplied on linux-next? Haven't seen this on linux-next. As I said earlier, I'd like to get these patches tested and verified by someone who can trigger this issue before applying them. Can you help me test them?
Created attachment 300800 [details] patch20220425.tar.gz On 2022/4/25 18:08, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=214021 > > --- Comment #24 from Mathias Nyman (mathias.nyman@linux.intel.com) --- > (In reply to Hongyu Xie from comment #23) > >> Reapplied on linux-next? Haven't seen this on linux-next. > As I said earlier, I'd like to get these patches tested and verified by > someone > who can trigger this issue before applying them. > > Can you help me test them? About to do so. But my kernel is little old, 5.4.x. So I had to changed it a little bit. You can see them in the attachment. >
(In reply to Mathias Nyman from comment #24) > (In reply to Hongyu Xie from comment #23) > > > Reapplied on linux-next? Haven't seen this on linux-next. > > As I said earlier, I'd like to get these patches tested and verified by > someone who can trigger this issue before applying them. > > Can you help me test them? It works.
The patches are already in GregKH tree https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/commit/?h=usb-testing&id=a44623d9279086c89f631201d993aa332f7c9e66 https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/commit/?h=usb-testing&id=b7a4f9b5d0e4b6dd937678c546c0b322dd1a4054