Bug 215095 - USB ehci error -110 and soft lockup on kernel 5.15.4 as kvm guest
Summary: USB ehci error -110 and soft lockup on kernel 5.15.4 as kvm guest
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-22 01:28 UTC by wangyugui@e16-tech.com
Modified: 2021-11-26 00:12 UTC (History)
0 users

See Also:
Kernel Version: 5.15.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description wangyugui@e16-tech.com 2021-11-22 01:28:18 UTC
USB ehci error -110 and soft lockup on kernel 5.15.4/5.15.3 as kvm guest.

It happen when host kernel 5.15.4/5.15.3, but not happen when host kernel 5.10.76.

OS/KVM: RHEL 7.9 / qemu-kvm-1.5.3-175.el7_9.4.x86_64

[    1.009986] usb 1-1: new high-speed USB device number 2 using ehci-pci
...
[    2.696524] cdrom: Uniform CD-ROM driver Revision: 3.20
[   17.880021] usb 1-1: device not accepting address 2, error -110
[   17.995018] usb 1-1: new high-speed USB device number 3 using ehci-pci
[   34.776032] usb 1-1: device not accepting address 3, error -110
[   34.777331] usb usb1-port1: attempt power cycle
[   34.956027] usb 1-1: new high-speed USB device number 4 using ehci-pci
[   46.040013] usb 1-1: device not accepting address 4, error -110
[   46.154997] usb 1-1: new high-speed USB device number 5 using ehci-pci
[   57.304016] usb 1-1: device not accepting address 5, error -110
[   57.306286] usb usb1-port1: unable to enumerate USB device
[  OK  ] Started udev Wait for Complete Device Initialization.
         Starting Device-Mapper Multipath Device Controller...
[  OK  ] Started dracut initqueue hook.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Starting dracut pre-mount hook...
[  OK  ] Started Device-Mapper Multipath Device Controller.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Create Volatile Files and Directories.
[  OK  ] Reached target System Initialization.
[  OK  ] Reached target Basic System.
[  OK  ] Started dracut pre-mount hook.
         Starting File System Check on /dev/disk/by-label/OS_USB...
[  OK  ] Started File System Check on /dev/disk/by-label/OS_USB.
         Mounting /sysroot...
[   57.388906] BTRFS info (device vda1): flagging fs with big metadata feature
[   57.389868] BTRFS info (device vda1): enabling tiering(tier=auto)
[   57.390720] BTRFS info (device vda1): has skinny extents
[   57.391494] BTRFS info (device vda1): disk space caching is enabled
[*     ] A start job is running for /sysroot (1min 22s / 2min 25s)[   84.295876] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1:7]
[   84.296909] Modules linked in: sr_mod cdrom sg ata_generic btrfs xor raid6_pq zstd_compress cirrus zstd_decompress drm_kms_helper nvme_tcp syscopyarea sysfillrect sysimgblt nvme_fabrics fb_sys_fops cec nvme ata_piix nvme_core crct10dif_pclmul crc32_pclmul t10_pi crc32c_intel drm libata virtio_net ghash_clmulni_intel virtio_console net_failover serio_raw virtio_blk failover floppy dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod i2c_dev
[   84.300771] CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.15.4-1.el7.x86_64 #1
[   84.301502] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   84.302153] Workqueue: pm pm_runtime_work
[   84.302689] RIP: 0010:__synchronize_hardirq+0x7d/0xb0
[   84.303291] Code: 00 0f 95 44 24 07 75 17 40 84 ed 74 12 48 8d 54 24 07 be 01 00 00 00 4c 89 e7 e8 4e ff ff ff 4c 89 f6 48 89 df e8 d3 d4 92 00 <80> 7c 24 07 00 75 ad 48 8b 44 24 08 65 48 33 04 25 28 00 00 00 75
[   84.305017] RSP: 0018:ffffb3ee80047b88 EFLAGS: 00000206
[   84.305623] RAX: 0000000000000001 RBX: ffff95ec0012e6a4 RCX: 0000000000000000
[   84.306362] RDX: 0200000000000000 RSI: 0000000000000246 RDI: ffff95ec0012e6a4
[   84.307096] RBP: 0000000000000001 R08: ffff95ec00400510 R09: ffff95ec00400490
[   84.307813] R10: 0000000000000000 R11: ffffffffb5e66a28 R12: ffff95ec0012e628
[   84.308541] R13: ffff95ec0012e600 R14: 0000000000000246 R15: 0000000000000000
[   84.309264] FS:  0000000000000000(0000) GS:ffff95f31f800000(0000) knlGS:0000000000000000
[   84.310052] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.310693] CR2: 00007f44e8271000 CR3: 000000024e410002 CR4: 00000000007706f0
[   84.311408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   84.312132] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   84.312875] PKRU: 55555554
[   84.313347] Call Trace:
[   84.313804]  <TASK>
[   84.314234]  synchronize_irq+0x35/0xa0
[   84.314766]  ? preempt_schedule_common+0xa/0x20
[   84.315350]  ? __cond_resched+0x1d/0x30
[   84.315852]  ? usb_kill_urb+0x4c/0xc0
[   84.316357]  ehci_halt+0x66/0xe0
[   84.316826]  ehci_bus_suspend+0x235/0x480
[   84.317345]  hcd_bus_suspend+0x77/0x240
[   84.317849]  usb_generic_driver_suspend+0x3d/0x40
[   84.318405]  usb_suspend_both+0x199/0x230
[   84.318920]  ? usb_probe_interface+0x280/0x280
[   84.319462]  usb_runtime_suspend+0x2a/0x70
[   84.319984]  __rpm_callback+0x44/0x110
[   84.320479]  rpm_callback+0x59/0x60
[   84.320965]  ? usb_probe_interface+0x280/0x280
[   84.321532]  rpm_suspend.part.11+0x10d/0x590
[   84.322111]  ? rpm_idle+0x2f3/0x320
[   84.322607]  __pm_runtime_suspend+0x55/0x170
[   84.323154]  ? usb_runtime_resume+0x20/0x20
[   84.323689]  usb_runtime_idle+0x2d/0x40
[   84.324211]  __rpm_callback+0x44/0x110
[   84.324720]  rpm_idle+0x23a/0x320
[   84.325208]  pm_runtime_work+0x7a/0x190
[   84.325710]  process_one_work+0x1cb/0x370
[   84.326203]  worker_thread+0x30/0x380
[   84.326653]  ? process_one_work+0x370/0x370
[   84.327132]  kthread+0x118/0x140
[   84.327544]  ? set_kthread_struct+0x40/0x40
[   84.328010]  ret_from_fork+0x1f/0x30
[   84.328437]  </TASK>
[***   ] A start job is running for /sysroot (1min 37s / 2min 25s)
Comment 1 wangyugui@e16-tech.com 2021-11-22 04:24:29 UTC
althought we can work around this issue by revert 90d28fb53d4a(usb: core: reduce power-on-good delay time of root hub), but we still need to fix this soft lockup?
Comment 2 wangyugui@e16-tech.com 2021-11-22 05:33:39 UTC
more test show that this issue is not fully walked around by revert 90d28fb53d4a(usb: core: reduce power-on-good delay time of root hub).

still some chance(>20%) to happen.
Comment 3 Alan Stern 2021-11-22 15:30:52 UTC
Since this happens only when the kernel is running as a kvm guest but never happens when running on the bare metal, I would think that this is most likely a bug in kvm, not a bug in the kernel.

However, if you want to track it down further, you could try different kernel versions between 5.10 and 5.15 to see at what point the trouble begins.  But doing this requires that you get reliable testing results, which might not be easy if the problem occurs only 20% of the time.
Comment 4 wangyugui@e16-tech.com 2021-11-23 00:20:24 UTC
there are 2 problems.
1) USB ehci error -110(timeout)
  this is hard to happen on the bare metal.
2) soft lockup after USB ehci error -110

In fact, we do not care about 1) on KVM, because the USB device on KVM is not really used.

maybe a problem of KVM, maybe a problem of USB. difficult to troubleshoot.
Comment 5 Alan Stern 2021-11-23 02:44:57 UTC
Here's something you can do to help diagnose the soft lockup.  When the lockup occurs, go to the /sys/kernel/debug/usb/ehci/* directory (fill in the "*" with the pathname for the EHCI controller that's locking up), make copies of the files you find there, and attach those copies to this bug report.

Another thing you can do: Before the problem occurs, start a process doing

   cat /sys/kernel/debug/usb/usbmon/1u >mon.out

and after the lockup happens, kill the "cat" process.  Then attach the mon.out file to the bug report.

There may be additional debugging options available in kvm, but I don't know what they are.
Comment 6 wangyugui@e16-tech.com 2021-11-26 00:12:47 UTC
This problem yet not happen on kvm/host 5.15.5. so let's mark it as kvm/host problem.

although I checked the patches of 5.15.5, but failed to find out the patch which matched this problem directly.

Note You need to log in before you can comment on or make changes to this bug.