Bug 201919

Summary:	[Intel GFX CI] "xhci_hcd 0000:6c:00.0: disabling already-disabled device" when suspending
Product:	Drivers	Reporter:	Martin Peres (martin.peres)
Component:	USB	Assignee:	Greg Kroah-Hartman (greg)
Status:	RESOLVED UNREPRODUCIBLE
Severity:	normal
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	4.20.0-rc5	Subsystem:
Regression:	No	Bisected commit-id:

Description Martin Peres 2018-12-07 11:29:05 UTC

When suspending Intel GFX CI's fi-cfl-8109u[1], the following warning happens from time to time:

<6> [130.718642] xhci_hcd 0000:6c:00.0: USB bus 3 deregistered
<4> [130.719037] ------------[ cut here ]------------
<4> [130.719040] xhci_hcd 0000:6c:00.0: disabling already-disabled device
<4> [130.719052] WARNING: CPU: 0 PID: 154 at drivers/pci/pci.c:1870 pci_disable_device+0x90/0xb0
<4> [130.719054] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal snd_hda_intel coretemp snd_hda_codec crct10dif_pclmul btusb snd_hwdep snd_hda_core btrtl btbcm btintel crc32_pclmul bluetooth snd_pcm ghash_clmulni_intel e1000e i2c_i801 ecdh_generic mei_me mei prime_numbers
<4> [130.719073] CPU: 0 PID: 154 Comm: irq/123-pciehp Tainted: G     U            4.20.0-rc5-CI-CI_DRM_5271+ #1
<4> [130.719075] Hardware name: Intel Corporation NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0037.2018.0614.2204 06/14/2018
<4> [130.719078] RIP: 0010:pci_disable_device+0x90/0xb0
<4> [130.719080] Code: c6 05 cf f6 df 00 01 48 85 ed 74 34 48 8d bb b0 00 00 00 e8 12 75 12 00 48 89 ea 48 89 c6 48 c7 c7 f0 d8 0d 82 e8 60 9b b7 ff <0f> 0b eb 84 48 89 df e8 f4 fe ff ff 80 a3 b9 07 00 00 fb 5b 5d c3
<4> [130.719083] RSP: 0018:ffffc90000e53cf8 EFLAGS: 00010282
<4> [130.719085] RAX: 0000000000000000 RBX: ffff8882b336a2a8 RCX: 0000000000000006
<4> [130.719087] RDX: 0000000000000006 RSI: ffffffff821298aa RDI: ffffffff820d7ca7
<4> [130.719089] RBP: ffff8882b3357768 R08: 00000000fe7982de R09: 0000000000000000
<4> [130.719091] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff822c2c80
<4> [130.719092] R13: ffffffff822c2cf0 R14: 0000000000000060 R15: ffff8882b1a9d040
<4> [130.719095] FS:  0000000000000000(0000) GS:ffff8882b5a00000(0000) knlGS:0000000000000000
<4> [130.719097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [130.719099] CR2: 000055b6046265b8 CR3: 0000000005210003 CR4: 00000000003606f0
<4> [130.719100] Call Trace:
<4> [130.719106]  pci_device_remove+0x36/0xb0
<4> [130.719110]  device_release_driver_internal+0x185/0x240
<4> [130.719116]  ? irq_forced_thread_fn+0x70/0x70
<4> [130.719119]  pci_stop_bus_device+0x5f/0x80
<4> [130.719123]  pci_stop_bus_device+0x26/0x80
<4> [130.719127]  pci_stop_bus_device+0x26/0x80
<4> [130.719131]  pci_stop_and_remove_bus_device+0x9/0x20
<4> [130.719134]  pciehp_unconfigure_device+0x7f/0x130
<4> [130.719140]  pciehp_disable_slot+0x66/0x110
<4> [130.719144]  pciehp_handle_presence_or_link_change+0x78/0x440
<4> [130.719152]  pciehp_ist+0x19c/0x1c0
<4> [130.719156]  irq_thread_fn+0x1c/0x60
<4> [130.719159]  ? irq_thread+0xa6/0x1e0
<4> [130.719161]  irq_thread+0x163/0x1e0
<4> [130.719164]  ? irq_thread_check_affinity.part.4+0x70/0x70
<4> [130.719175]  ? irq_thread_dtor+0xb0/0xb0
<4> [130.719177]  kthread+0x119/0x130
<4> [130.719179]  ? kthread_park+0x80/0x80
<4> [130.719184]  ret_from_fork+0x3a/0x50
<4> [130.719195] irq event stamp: 8322
<4> [130.719197] hardirqs last  enabled at (8321): [<ffffffff810fc764>] vprintk_emit+0x124/0x320
<4> [130.719200] hardirqs last disabled at (8322): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [130.719202] softirqs last  enabled at (8246): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
<4> [130.719204] softirqs last disabled at (8239): [<ffffffff8108e6a9>] irq_exit+0xa9/0xc0
<4> [130.719214] WARNING: CPU: 0 PID: 154 at drivers/pci/pci.c:1870 pci_disable_device+0x90/0xb0
<4> [130.719215] ---[ end trace 21c238dfc083f85e ]---

The full information about this failure can be seen at https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5271/fi-cfl-8109u/igt%40gem_exec_suspend%40basic-s3.html

[1] https://intel-gfx-ci.01.org/hardware/fi-cfl-8109u/

Comment 1 Greg Kroah-Hartman 2018-12-07 12:54:17 UTC

On Fri, Dec 07, 2018 at 11:29:05AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=201919
> 
>             Bug ID: 201919
>            Summary: [Intel GFX CI] "xhci_hcd 0000:6c:00.0: disabling
>                     already-disabled device" when suspending
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 4.20.0-rc5

All USB bugs should be sent to the linux-usb@vger.kernel.org mailing
list, and not entered into bugzilla.  Please bring this issue up there,
if it is still a problem in the latest kernel release.

Comment 2 Martin Peres 2018-12-12 16:46:22 UTC

Thanks Greg, I'll send an email.

In the mean time, just adding that this is likely related:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_151/fi-cfl-8109u/igt%40gem_ctx_isolation%40vcs1-s3.html

<7> [268.406326] [drm:drm_helper_hpd_irq_event] [CONNECTOR:98:HDMI-A-1] status updated from disconnected to disconnected
<6> [268.530784] acpi LNXPOWER:04: Turning OFF
<6> [268.531473] OOM killer enabled.
<6> [268.531477] Restarting tasks ... done.
<4> [268.565866] 
<4> [268.565876] ======================================================
<4> [268.565884] WARNING: possible circular locking dependency detected
<4> [268.565894] 4.20.0-rc4-gb3a448de6828-drmtip_151+ #1 Tainted: G     U           
<4> [268.565903] ------------------------------------------------------
<4> [268.565911] kworker/u8:18/1278 is trying to acquire lock:
<4> [268.565920] 000000007bc56ea1 (&ctrl->reset_lock){.+.+}, at: pciehp_check_presence+0x1d/0x70
<4> [268.565936] \x0abut task is already holding lock:
<4> [268.565943] 00000000f6ea0a25 (pci_rescan_remove_lock){+.+.}, at: acpiphp_hotplug_notify+0x57/0x1f0
<4> [268.565955] \x0awhich lock already depends on the new lock.\x0a
<4> [268.565964] \x0athe existing dependency chain (in reverse order) is:
<4> [268.565973] \x0a-> #1 (pci_rescan_remove_lock){+.+.}:
<4> [268.565975]        pciehp_unconfigure_device+0x44/0x130
<4> [268.565975]        pciehp_disable_slot+0x66/0x110
<4> [268.565975]        pciehp_handle_presence_or_link_change+0x78/0x440
<4> [268.565975]        pciehp_ist+0x19c/0x1c0
<4> [268.565975]        irq_thread_fn+0x1c/0x60
<4> [268.565975]        irq_thread+0x163/0x1e0
<4> [268.565975]        kthread+0x119/0x130
<4> [268.565975]        ret_from_fork+0x3a/0x50
<4> [268.565975] \x0a-> #0 (&ctrl->reset_lock){.+.+}:
<4> [268.565975]        down_read+0x36/0x60
<4> [268.565975]        pciehp_check_presence+0x1d/0x70
<4> [268.565975]        pciehp_resume+0x22/0x30
<4> [268.565975]        device_for_each_child+0x4f/0x90
<4> [268.565975]        pcie_port_device_runtime_resume+0x2b/0x50
<4> [268.565975]        pci_pm_runtime_resume+0x73/0xa0
<4> [268.565975]        __rpm_callback+0xb3/0x1b0
<4> [268.565975]        rpm_callback+0x1a/0x70
<4> [268.565975]        rpm_resume+0x533/0x830
<4> [268.565975]        __pm_runtime_resume+0x42/0x80
<4> [268.565975]        trim_stale_devices+0x8c/0x1a0
<4> [268.565975]        trim_stale_devices+0xb0/0x1a0
<4> [268.565975]        acpiphp_check_bridge.part.4+0xf9/0x140
<4> [268.565975]        acpiphp_hotplug_notify+0x159/0x1f0
<4> [268.565975]        acpi_device_hotplug+0x9b/0x460
<4> [268.565975]        acpi_hotplug_work_fn+0x15/0x20
<4> [268.565975]        process_one_work+0x262/0x630
<4> [268.565975]        worker_thread+0x37/0x380
<4> [268.565975]        kthread+0x119/0x130
<4> [268.565975]        ret_from_fork+0x3a/0x50
<4> [268.565975] \x0aother info that might help us debug this:\x0a
<4> [268.565975]  Possible unsafe locking scenario:\x0a
<4> [268.565975]        CPU0                    CPU1
<4> [268.565975]        ----                    ----
<4> [268.565975]   lock(pci_rescan_remove_lock);
<4> [268.565975]                                lock(&ctrl->reset_lock);
<4> [268.565975]                                lock(pci_rescan_remove_lock);
<4> [268.565975]   lock(&ctrl->reset_lock);
<4> [268.565975] \x0a *** DEADLOCK ***\x0a
<4> [268.565975] 5 locks held by kworker/u8:18/1278:
<4> [268.565975]  #0: 00000000bde3e755 ((wq_completion)"kacpi_hotplug"){+.+.}, at: process_one_work+0x1d6/0x630
<4> [268.565975]  #1: 0000000047b1a369 ((work_completion)(&hpw->work)){+.+.}, at: process_one_work+0x1d6/0x630
<4> [268.565975]  #2: 0000000079a0f774 (device_hotplug_lock){+.+.}, at: acpi_device_hotplug+0x2b/0x460
<4> [268.565975]  #3: 00000000b377c98e (acpi_scan_lock){+.+.}, at: acpi_device_hotplug+0x39/0x460
<4> [268.565975]  #4: 00000000f6ea0a25 (pci_rescan_remove_lock){+.+.}, at: acpiphp_hotplug_notify+0x57/0x1f0
<4> [268.565975] \x0astack backtrace:
<4> [268.565975] CPU: 3 PID: 1278 Comm: kworker/u8:18 Tainted: G     U            4.20.0-rc4-gb3a448de6828-drmtip_151+ #1
<4> [268.565975] Hardware name: Intel Corporation NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0037.2018.0614.2204 06/14/2018
<4> [268.565975] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
<4> [268.565975] Call Trace:
<4> [268.565975]  dump_stack+0x67/0x9b
<4> [268.565975]  print_circular_bug.isra.16+0x1c8/0x2b0
<4> [268.565975]  __lock_acquire+0x183a/0x1b00
<4> [268.565975]  ? pcie_do_write_cmd+0xe2/0x130
<4> [268.565975]  ? lock_acquire+0xa6/0x1c0
<4> [268.565975]  lock_acquire+0xa6/0x1c0
<4> [268.565975]  ? pciehp_check_presence+0x1d/0x70
<4> [268.565975]  ? pci_restore_standard_config+0x40/0x40
<4> [268.565975]  down_read+0x36/0x60
<4> [268.565975]  ? pciehp_check_presence+0x1d/0x70
<4> [268.565975]  pciehp_check_presence+0x1d/0x70
<4> [268.565975]  ? pci_idt_bus_quirk+0xf0/0xf0
<4> [268.565975]  pciehp_resume+0x22/0x30
<4> [268.565975]  device_for_each_child+0x4f/0x90
<4> [268.565975]  pcie_port_device_runtime_resume+0x2b/0x50
<4> [268.565975]  pci_pm_runtime_resume+0x73/0xa0
<4> [268.565975]  __rpm_callback+0xb3/0x1b0
<4> [268.565975]  rpm_callback+0x1a/0x70
<4> [268.565975]  ? pci_restore_standard_config+0x40/0x40
<4> [268.565975]  rpm_resume+0x533/0x830
<4> [268.565975]  ? wait_woken+0xa0/0xa0
<4> [268.565975]  __pm_runtime_resume+0x42/0x80
<4> [268.565975]  trim_stale_devices+0x8c/0x1a0
<4> [268.565975]  trim_stale_devices+0xb0/0x1a0
<4> [268.565975]  ? get_slot_status+0x9a/0x100
<4> [268.565975]  acpiphp_check_bridge.part.4+0xf9/0x140
<4> [268.565975]  acpiphp_hotplug_notify+0x159/0x1f0
<4> [268.565975]  ? acpiphp_post_dock_fixup+0xd0/0xd0
<4> [268.565975]  acpi_device_hotplug+0x9b/0x460
<4> [268.565975]  acpi_hotplug_work_fn+0x15/0x20
<4> [268.565975]  process_one_work+0x262/0x630
<4> [268.565975]  worker_thread+0x37/0x380
<4> [268.565975]  ? process_one_work+0x630/0x630
<4> [268.565975]  kthread+0x119/0x130
<4> [268.565975]  ? kthread_park+0x80/0x80
<4> [268.565975]  ret_from_fork+0x3a/0x50
<6> [268.584191] pcieport 0000:00:1c.4: PCI bridge to [bus 02-6c]
<6> [268.584217] pcieport 0000:00:1c.4:   bridge window [io  0x6000-0x8fff]
<6> [268.584240] pcieport 0000:00:1c.4:   bridge window [mem 0x40000000-0x6e0fffff]
<6> [268.584264] pcieport 0000:00:1c.4:   bridge window [mem 0x6000000000-0x6049ffffff 64bit pref]
<6> [268.590285] pcieport 0000:00:1d.6: PCI bridge to [bus 6e]
<6> [268.590308] pcieport 0000:00:1d.6:   bridge window [io  0x3000-0x3fff]
<6> [268.590332] pcieport 0000:00:1d.6:   bridge window [mem 0x6e100000-0x6eafffff]
<6> [268.590356] pcieport 0000:00:1d.6:   bridge window [mem 0x604a100000-0x604aafffff 64bit pref]
<6> [268.590719] pcieport 0000:00:1c.4: PCI bridge to [bus 02-6c]
<6> [268.590739] pcieport 0000:00:1c.4:   bridge window [io  0x6000-0x8fff]
<6> [268.590762] pcieport 0000:00:1c.4:   bridge window [mem 0x40000000-0x6e0fffff]
<6> [268.590785] pcieport 0000:00:1c.4:   bridge window [mem 0x6000000000-0x6049ffffff 64bit pref]
<6> [268.632250] video LNXVIDEO:00: Restoring backlight state
<4> [268.632252] ------------[ cut here ]------------
<4> [268.632255] downgrading a read lock
<4> [268.632266] WARNING: CPU: 1 PID: 1841 at kernel/locking/lockdep.c:3556 lock_downgrade+0x158/0x1e0
<4> [268.632268] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core btusb btrtl btbcm btintel snd_pcm bluetooth e1000e ecdh_generic i2c_i801 mei_me mei prime_numbers
<6> [268.632301] PM: suspend exit
<4> [268.632325] CPU: 1 PID: 1841 Comm: systemd-udevd Tainted: G     U            4.20.0-rc4-gb3a448de6828-drmtip_151+ #1
<4> [268.632339] Hardware name: Intel Corporation NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0037.2018.0614.2204 06/14/2018
<4> [268.632351] RIP: 0010:lock_downgrade+0x158/0x1e0
<4> [268.632358] Code: ff e9 23 ff ff ff 4c 89 ea 4c 89 f6 48 89 df e8 2e bd ff ff 85 c0 74 aa eb 9a 48 c7 c7 b2 fe 06 91 48 89 04 24 e8 08 d2 f9 ff <0f> 0b 8b 54 24 0c 48 8b 04 24 e9 46 ff ff ff e8 74 70 3a 00 85 c0
<4> [268.632374] RSP: 0018:ffffb5b880a2be38 EFLAGS: 00010082
<4> [268.632382] RAX: 0000000000000000 RBX: ffff9ca265f30040 RCX: 0000000000000000
<4> [268.632389] RDX: ffffffff9010059b RSI: 0000000000000001 RDI: ffffffff901005b0
<4> [268.632397] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
<4> [268.632405] R10: ffff9ca26f3cce60 R11: 0000000000062b68 R12: 0000000000000246
<4> [268.632413] R13: ffffffff901e71a3 R14: ffff9ca26f1f4288 R15: ffff9ca26f3ceec0
<4> [268.632422] FS:  00007ff9e6d62680(0000) GS:ffff9ca275a80000(0000) knlGS:0000000000000000
<4> [268.632431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [268.632438] CR2: 000055593aa539e0 CR3: 00000002a3efc003 CR4: 00000000003606e0
<4> [268.632445] Call Trace:
<4> [268.632454]  downgrade_write+0x12/0x80
<4> [268.632462]  __do_munmap+0x393/0x400
<4> [268.632471]  __vm_munmap+0x6e/0xc0
<4> [268.632479]  __x64_sys_munmap+0x12/0x20
<4> [268.632486]  do_syscall_64+0x55/0x190
<4> [268.632495]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [268.632502] RIP: 0033:0x7ff9e6882ab7
<4> [268.632509] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 c9 f3 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 f3 2c 00 f7 d8 64 89 01 48
<4> [268.632524] RSP: 002b:00007ffeb58cb548 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
<4> [268.632533] RAX: ffffffffffffffda RBX: 000055593a97cf70 RCX: 00007ff9e6882ab7
<4> [268.632541] RDX: 0000000000000080 RSI: 00000000008f0036 RDI: 00007ff9e4b41000
<4> [268.632549] RBP: 000055593a97a4d8 R08: 0000000000000048 R09: 0000000000000010
<4> [268.632557] R10: 00000000ffffffff R11: 0000000000000206 R12: 000055593a97cf50
<4> [268.632564] R13: 00007ffeb58cb650 R14: 00007ffeb58cb700 R15: 000055593a97ce90
<4> [268.632575] irq event stamp: 970
<4> [268.632582] hardirqs last  enabled at (969): [<ffffffff90001994>] trace_hardirqs_on_thunk+0x1a/0x1c
<4> [268.632593] hardirqs last disabled at (970): [<ffffffff909717fa>] __schedule+0xaa/0xb50
<4> [268.632604] softirqs last  enabled at (836): [<ffffffff90c0033a>] __do_softirq+0x33a/0x4b9
<4> [268.632614] softirqs last disabled at (829): [<ffffffff900929f9>] irq_exit+0xa9/0xc0
<4> [268.632625] WARNING: CPU: 1 PID: 1841 at kernel/locking/lockdep.c:3556 lock_downgrade+0x158/0x1e0
<4> [268.632634] ---[ end trace f4ab6f68f1e4fe03 ]---
<5> [268.651832] Setting dangerous option reset - tainting kernel
<6> [268.657275] [IGT] gem_ctx_isolation: exiting, ret=0

Comment 3 Martin Peres 2019-03-08 16:56:42 UTC

Used to happen every 2.8 days in average, then nothing for the past 2 months and 3 weeks. Closing!