When suspending Intel GFX CI's fi-cfl-8109u[1], the following warning happens from time to time: <6> [130.718642] xhci_hcd 0000:6c:00.0: USB bus 3 deregistered <4> [130.719037] ------------[ cut here ]------------ <4> [130.719040] xhci_hcd 0000:6c:00.0: disabling already-disabled device <4> [130.719052] WARNING: CPU: 0 PID: 154 at drivers/pci/pci.c:1870 pci_disable_device+0x90/0xb0 <4> [130.719054] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal snd_hda_intel coretemp snd_hda_codec crct10dif_pclmul btusb snd_hwdep snd_hda_core btrtl btbcm btintel crc32_pclmul bluetooth snd_pcm ghash_clmulni_intel e1000e i2c_i801 ecdh_generic mei_me mei prime_numbers <4> [130.719073] CPU: 0 PID: 154 Comm: irq/123-pciehp Tainted: G U 4.20.0-rc5-CI-CI_DRM_5271+ #1 <4> [130.719075] Hardware name: Intel Corporation NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0037.2018.0614.2204 06/14/2018 <4> [130.719078] RIP: 0010:pci_disable_device+0x90/0xb0 <4> [130.719080] Code: c6 05 cf f6 df 00 01 48 85 ed 74 34 48 8d bb b0 00 00 00 e8 12 75 12 00 48 89 ea 48 89 c6 48 c7 c7 f0 d8 0d 82 e8 60 9b b7 ff <0f> 0b eb 84 48 89 df e8 f4 fe ff ff 80 a3 b9 07 00 00 fb 5b 5d c3 <4> [130.719083] RSP: 0018:ffffc90000e53cf8 EFLAGS: 00010282 <4> [130.719085] RAX: 0000000000000000 RBX: ffff8882b336a2a8 RCX: 0000000000000006 <4> [130.719087] RDX: 0000000000000006 RSI: ffffffff821298aa RDI: ffffffff820d7ca7 <4> [130.719089] RBP: ffff8882b3357768 R08: 00000000fe7982de R09: 0000000000000000 <4> [130.719091] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff822c2c80 <4> [130.719092] R13: ffffffff822c2cf0 R14: 0000000000000060 R15: ffff8882b1a9d040 <4> [130.719095] FS: 0000000000000000(0000) GS:ffff8882b5a00000(0000) knlGS:0000000000000000 <4> [130.719097] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [130.719099] CR2: 000055b6046265b8 CR3: 0000000005210003 CR4: 00000000003606f0 <4> [130.719100] Call Trace: <4> [130.719106] pci_device_remove+0x36/0xb0 <4> [130.719110] device_release_driver_internal+0x185/0x240 <4> [130.719116] ? irq_forced_thread_fn+0x70/0x70 <4> [130.719119] pci_stop_bus_device+0x5f/0x80 <4> [130.719123] pci_stop_bus_device+0x26/0x80 <4> [130.719127] pci_stop_bus_device+0x26/0x80 <4> [130.719131] pci_stop_and_remove_bus_device+0x9/0x20 <4> [130.719134] pciehp_unconfigure_device+0x7f/0x130 <4> [130.719140] pciehp_disable_slot+0x66/0x110 <4> [130.719144] pciehp_handle_presence_or_link_change+0x78/0x440 <4> [130.719152] pciehp_ist+0x19c/0x1c0 <4> [130.719156] irq_thread_fn+0x1c/0x60 <4> [130.719159] ? irq_thread+0xa6/0x1e0 <4> [130.719161] irq_thread+0x163/0x1e0 <4> [130.719164] ? irq_thread_check_affinity.part.4+0x70/0x70 <4> [130.719175] ? irq_thread_dtor+0xb0/0xb0 <4> [130.719177] kthread+0x119/0x130 <4> [130.719179] ? kthread_park+0x80/0x80 <4> [130.719184] ret_from_fork+0x3a/0x50 <4> [130.719195] irq event stamp: 8322 <4> [130.719197] hardirqs last enabled at (8321): [<ffffffff810fc764>] vprintk_emit+0x124/0x320 <4> [130.719200] hardirqs last disabled at (8322): [<ffffffff810019b0>] trace_hardirqs_off_thunk+0x1a/0x1c <4> [130.719202] softirqs last enabled at (8246): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9 <4> [130.719204] softirqs last disabled at (8239): [<ffffffff8108e6a9>] irq_exit+0xa9/0xc0 <4> [130.719214] WARNING: CPU: 0 PID: 154 at drivers/pci/pci.c:1870 pci_disable_device+0x90/0xb0 <4> [130.719215] ---[ end trace 21c238dfc083f85e ]--- The full information about this failure can be seen at https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5271/fi-cfl-8109u/igt%40gem_exec_suspend%40basic-s3.html [1] https://intel-gfx-ci.01.org/hardware/fi-cfl-8109u/
On Fri, Dec 07, 2018 at 11:29:05AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=201919 > > Bug ID: 201919 > Summary: [Intel GFX CI] "xhci_hcd 0000:6c:00.0: disabling > already-disabled device" when suspending > Product: Drivers > Version: 2.5 > Kernel Version: 4.20.0-rc5 All USB bugs should be sent to the linux-usb@vger.kernel.org mailing list, and not entered into bugzilla. Please bring this issue up there, if it is still a problem in the latest kernel release.
Thanks Greg, I'll send an email. In the mean time, just adding that this is likely related: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_151/fi-cfl-8109u/igt%40gem_ctx_isolation%40vcs1-s3.html <7> [268.406326] [drm:drm_helper_hpd_irq_event] [CONNECTOR:98:HDMI-A-1] status updated from disconnected to disconnected <6> [268.530784] acpi LNXPOWER:04: Turning OFF <6> [268.531473] OOM killer enabled. <6> [268.531477] Restarting tasks ... done. <4> [268.565866] <4> [268.565876] ====================================================== <4> [268.565884] WARNING: possible circular locking dependency detected <4> [268.565894] 4.20.0-rc4-gb3a448de6828-drmtip_151+ #1 Tainted: G U <4> [268.565903] ------------------------------------------------------ <4> [268.565911] kworker/u8:18/1278 is trying to acquire lock: <4> [268.565920] 000000007bc56ea1 (&ctrl->reset_lock){.+.+}, at: pciehp_check_presence+0x1d/0x70 <4> [268.565936] \x0abut task is already holding lock: <4> [268.565943] 00000000f6ea0a25 (pci_rescan_remove_lock){+.+.}, at: acpiphp_hotplug_notify+0x57/0x1f0 <4> [268.565955] \x0awhich lock already depends on the new lock.\x0a <4> [268.565964] \x0athe existing dependency chain (in reverse order) is: <4> [268.565973] \x0a-> #1 (pci_rescan_remove_lock){+.+.}: <4> [268.565975] pciehp_unconfigure_device+0x44/0x130 <4> [268.565975] pciehp_disable_slot+0x66/0x110 <4> [268.565975] pciehp_handle_presence_or_link_change+0x78/0x440 <4> [268.565975] pciehp_ist+0x19c/0x1c0 <4> [268.565975] irq_thread_fn+0x1c/0x60 <4> [268.565975] irq_thread+0x163/0x1e0 <4> [268.565975] kthread+0x119/0x130 <4> [268.565975] ret_from_fork+0x3a/0x50 <4> [268.565975] \x0a-> #0 (&ctrl->reset_lock){.+.+}: <4> [268.565975] down_read+0x36/0x60 <4> [268.565975] pciehp_check_presence+0x1d/0x70 <4> [268.565975] pciehp_resume+0x22/0x30 <4> [268.565975] device_for_each_child+0x4f/0x90 <4> [268.565975] pcie_port_device_runtime_resume+0x2b/0x50 <4> [268.565975] pci_pm_runtime_resume+0x73/0xa0 <4> [268.565975] __rpm_callback+0xb3/0x1b0 <4> [268.565975] rpm_callback+0x1a/0x70 <4> [268.565975] rpm_resume+0x533/0x830 <4> [268.565975] __pm_runtime_resume+0x42/0x80 <4> [268.565975] trim_stale_devices+0x8c/0x1a0 <4> [268.565975] trim_stale_devices+0xb0/0x1a0 <4> [268.565975] acpiphp_check_bridge.part.4+0xf9/0x140 <4> [268.565975] acpiphp_hotplug_notify+0x159/0x1f0 <4> [268.565975] acpi_device_hotplug+0x9b/0x460 <4> [268.565975] acpi_hotplug_work_fn+0x15/0x20 <4> [268.565975] process_one_work+0x262/0x630 <4> [268.565975] worker_thread+0x37/0x380 <4> [268.565975] kthread+0x119/0x130 <4> [268.565975] ret_from_fork+0x3a/0x50 <4> [268.565975] \x0aother info that might help us debug this:\x0a <4> [268.565975] Possible unsafe locking scenario:\x0a <4> [268.565975] CPU0 CPU1 <4> [268.565975] ---- ---- <4> [268.565975] lock(pci_rescan_remove_lock); <4> [268.565975] lock(&ctrl->reset_lock); <4> [268.565975] lock(pci_rescan_remove_lock); <4> [268.565975] lock(&ctrl->reset_lock); <4> [268.565975] \x0a *** DEADLOCK ***\x0a <4> [268.565975] 5 locks held by kworker/u8:18/1278: <4> [268.565975] #0: 00000000bde3e755 ((wq_completion)"kacpi_hotplug"){+.+.}, at: process_one_work+0x1d6/0x630 <4> [268.565975] #1: 0000000047b1a369 ((work_completion)(&hpw->work)){+.+.}, at: process_one_work+0x1d6/0x630 <4> [268.565975] #2: 0000000079a0f774 (device_hotplug_lock){+.+.}, at: acpi_device_hotplug+0x2b/0x460 <4> [268.565975] #3: 00000000b377c98e (acpi_scan_lock){+.+.}, at: acpi_device_hotplug+0x39/0x460 <4> [268.565975] #4: 00000000f6ea0a25 (pci_rescan_remove_lock){+.+.}, at: acpiphp_hotplug_notify+0x57/0x1f0 <4> [268.565975] \x0astack backtrace: <4> [268.565975] CPU: 3 PID: 1278 Comm: kworker/u8:18 Tainted: G U 4.20.0-rc4-gb3a448de6828-drmtip_151+ #1 <4> [268.565975] Hardware name: Intel Corporation NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0037.2018.0614.2204 06/14/2018 <4> [268.565975] Workqueue: kacpi_hotplug acpi_hotplug_work_fn <4> [268.565975] Call Trace: <4> [268.565975] dump_stack+0x67/0x9b <4> [268.565975] print_circular_bug.isra.16+0x1c8/0x2b0 <4> [268.565975] __lock_acquire+0x183a/0x1b00 <4> [268.565975] ? pcie_do_write_cmd+0xe2/0x130 <4> [268.565975] ? lock_acquire+0xa6/0x1c0 <4> [268.565975] lock_acquire+0xa6/0x1c0 <4> [268.565975] ? pciehp_check_presence+0x1d/0x70 <4> [268.565975] ? pci_restore_standard_config+0x40/0x40 <4> [268.565975] down_read+0x36/0x60 <4> [268.565975] ? pciehp_check_presence+0x1d/0x70 <4> [268.565975] pciehp_check_presence+0x1d/0x70 <4> [268.565975] ? pci_idt_bus_quirk+0xf0/0xf0 <4> [268.565975] pciehp_resume+0x22/0x30 <4> [268.565975] device_for_each_child+0x4f/0x90 <4> [268.565975] pcie_port_device_runtime_resume+0x2b/0x50 <4> [268.565975] pci_pm_runtime_resume+0x73/0xa0 <4> [268.565975] __rpm_callback+0xb3/0x1b0 <4> [268.565975] rpm_callback+0x1a/0x70 <4> [268.565975] ? pci_restore_standard_config+0x40/0x40 <4> [268.565975] rpm_resume+0x533/0x830 <4> [268.565975] ? wait_woken+0xa0/0xa0 <4> [268.565975] __pm_runtime_resume+0x42/0x80 <4> [268.565975] trim_stale_devices+0x8c/0x1a0 <4> [268.565975] trim_stale_devices+0xb0/0x1a0 <4> [268.565975] ? get_slot_status+0x9a/0x100 <4> [268.565975] acpiphp_check_bridge.part.4+0xf9/0x140 <4> [268.565975] acpiphp_hotplug_notify+0x159/0x1f0 <4> [268.565975] ? acpiphp_post_dock_fixup+0xd0/0xd0 <4> [268.565975] acpi_device_hotplug+0x9b/0x460 <4> [268.565975] acpi_hotplug_work_fn+0x15/0x20 <4> [268.565975] process_one_work+0x262/0x630 <4> [268.565975] worker_thread+0x37/0x380 <4> [268.565975] ? process_one_work+0x630/0x630 <4> [268.565975] kthread+0x119/0x130 <4> [268.565975] ? kthread_park+0x80/0x80 <4> [268.565975] ret_from_fork+0x3a/0x50 <6> [268.584191] pcieport 0000:00:1c.4: PCI bridge to [bus 02-6c] <6> [268.584217] pcieport 0000:00:1c.4: bridge window [io 0x6000-0x8fff] <6> [268.584240] pcieport 0000:00:1c.4: bridge window [mem 0x40000000-0x6e0fffff] <6> [268.584264] pcieport 0000:00:1c.4: bridge window [mem 0x6000000000-0x6049ffffff 64bit pref] <6> [268.590285] pcieport 0000:00:1d.6: PCI bridge to [bus 6e] <6> [268.590308] pcieport 0000:00:1d.6: bridge window [io 0x3000-0x3fff] <6> [268.590332] pcieport 0000:00:1d.6: bridge window [mem 0x6e100000-0x6eafffff] <6> [268.590356] pcieport 0000:00:1d.6: bridge window [mem 0x604a100000-0x604aafffff 64bit pref] <6> [268.590719] pcieport 0000:00:1c.4: PCI bridge to [bus 02-6c] <6> [268.590739] pcieport 0000:00:1c.4: bridge window [io 0x6000-0x8fff] <6> [268.590762] pcieport 0000:00:1c.4: bridge window [mem 0x40000000-0x6e0fffff] <6> [268.590785] pcieport 0000:00:1c.4: bridge window [mem 0x6000000000-0x6049ffffff 64bit pref] <6> [268.632250] video LNXVIDEO:00: Restoring backlight state <4> [268.632252] ------------[ cut here ]------------ <4> [268.632255] downgrading a read lock <4> [268.632266] WARNING: CPU: 1 PID: 1841 at kernel/locking/lockdep.c:3556 lock_downgrade+0x158/0x1e0 <4> [268.632268] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core btusb btrtl btbcm btintel snd_pcm bluetooth e1000e ecdh_generic i2c_i801 mei_me mei prime_numbers <6> [268.632301] PM: suspend exit <4> [268.632325] CPU: 1 PID: 1841 Comm: systemd-udevd Tainted: G U 4.20.0-rc4-gb3a448de6828-drmtip_151+ #1 <4> [268.632339] Hardware name: Intel Corporation NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0037.2018.0614.2204 06/14/2018 <4> [268.632351] RIP: 0010:lock_downgrade+0x158/0x1e0 <4> [268.632358] Code: ff e9 23 ff ff ff 4c 89 ea 4c 89 f6 48 89 df e8 2e bd ff ff 85 c0 74 aa eb 9a 48 c7 c7 b2 fe 06 91 48 89 04 24 e8 08 d2 f9 ff <0f> 0b 8b 54 24 0c 48 8b 04 24 e9 46 ff ff ff e8 74 70 3a 00 85 c0 <4> [268.632374] RSP: 0018:ffffb5b880a2be38 EFLAGS: 00010082 <4> [268.632382] RAX: 0000000000000000 RBX: ffff9ca265f30040 RCX: 0000000000000000 <4> [268.632389] RDX: ffffffff9010059b RSI: 0000000000000001 RDI: ffffffff901005b0 <4> [268.632397] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 <4> [268.632405] R10: ffff9ca26f3cce60 R11: 0000000000062b68 R12: 0000000000000246 <4> [268.632413] R13: ffffffff901e71a3 R14: ffff9ca26f1f4288 R15: ffff9ca26f3ceec0 <4> [268.632422] FS: 00007ff9e6d62680(0000) GS:ffff9ca275a80000(0000) knlGS:0000000000000000 <4> [268.632431] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [268.632438] CR2: 000055593aa539e0 CR3: 00000002a3efc003 CR4: 00000000003606e0 <4> [268.632445] Call Trace: <4> [268.632454] downgrade_write+0x12/0x80 <4> [268.632462] __do_munmap+0x393/0x400 <4> [268.632471] __vm_munmap+0x6e/0xc0 <4> [268.632479] __x64_sys_munmap+0x12/0x20 <4> [268.632486] do_syscall_64+0x55/0x190 <4> [268.632495] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [268.632502] RIP: 0033:0x7ff9e6882ab7 <4> [268.632509] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 c9 f3 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 f3 2c 00 f7 d8 64 89 01 48 <4> [268.632524] RSP: 002b:00007ffeb58cb548 EFLAGS: 00000206 ORIG_RAX: 000000000000000b <4> [268.632533] RAX: ffffffffffffffda RBX: 000055593a97cf70 RCX: 00007ff9e6882ab7 <4> [268.632541] RDX: 0000000000000080 RSI: 00000000008f0036 RDI: 00007ff9e4b41000 <4> [268.632549] RBP: 000055593a97a4d8 R08: 0000000000000048 R09: 0000000000000010 <4> [268.632557] R10: 00000000ffffffff R11: 0000000000000206 R12: 000055593a97cf50 <4> [268.632564] R13: 00007ffeb58cb650 R14: 00007ffeb58cb700 R15: 000055593a97ce90 <4> [268.632575] irq event stamp: 970 <4> [268.632582] hardirqs last enabled at (969): [<ffffffff90001994>] trace_hardirqs_on_thunk+0x1a/0x1c <4> [268.632593] hardirqs last disabled at (970): [<ffffffff909717fa>] __schedule+0xaa/0xb50 <4> [268.632604] softirqs last enabled at (836): [<ffffffff90c0033a>] __do_softirq+0x33a/0x4b9 <4> [268.632614] softirqs last disabled at (829): [<ffffffff900929f9>] irq_exit+0xa9/0xc0 <4> [268.632625] WARNING: CPU: 1 PID: 1841 at kernel/locking/lockdep.c:3556 lock_downgrade+0x158/0x1e0 <4> [268.632634] ---[ end trace f4ab6f68f1e4fe03 ]--- <5> [268.651832] Setting dangerous option reset - tainting kernel <6> [268.657275] [IGT] gem_ctx_isolation: exiting, ret=0
Used to happen every 2.8 days in average, then nothing for the past 2 months and 3 weeks. Closing!