I'm attempting to use a Plugable USB-C docking station, with my laptop, a Dell XPS 13 (9350). The docking station provides a USB hub, ethernet-over-USB, Displayport via USB-C Alternate Mode, and USB power delivery. If the docking station is plugged in at startup, everything works correctly. However, if it is unplugged and plugged back in, the USB devices are not detected; lspci shows only "!!! Unknown header type 7f" against the USB controller.
Created attachment 227331 [details] lspci -vv when the device is present on boot. USB controller is 05:00.0.
Created attachment 227341 [details] lspci -vv after hotplug
Created attachment 227351 [details] dmesg dmesg showing a disconnect/connect cycle. The device was disconnected at 843s; the WARNING is concerning but appears to be to do with the alternate-mode display (which seems to work fine) rather than the USB subsystem. The xhci_hcd errors at 848s may be more relevant? The device was reconnected at 879s.
(I've tried updating to the latest BIOS/firmware from Dell. Hotplugging appears to work correctly under Windows.)
Anything I can do to help investigate this?
I've noticed that the USB devices start working correctly if I force a rescan of the PCI devices: echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove echo 1 > /sys/bus/pci/devices/0000:00:1c.0/rescan
I think only USB hotplug is involved here, not PCI hotplug, right? Reassigning to USB on that assumption.
On Tue, Aug 23, 2016 at 10:15:58PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=151261 Please send this to the linux-usb@vger.kernel.org mailing list.
Should I send it to the linux-usb@vger.kernel.org mailing list?
Feedback from the USB mailing list confirms my suspicion that this is a PCI issue. There is a chain of three PCI bridges (00:1c.0, 01:00.0, 02:02.0) before a PCI->USB bridge (05:00.0 or 39:00.0); it is the PCI bridges which are not appearing automatically after a hotplug (they appear if I force a rescan). I've also raised this on the linux-pci mailing list.
It turned out that all that was needed to fix this was a kernel with CONFIG_HOTPLUG_PCI=y. Thanks to those that offered pointers.
Strictly speaking, I think you need both of the following: CONFIG_HOTPLUG_PCI=y CONFIG_HOTPLUG_PCI_PCIE=y
> Strictly speaking, I think you need both of the following: > > CONFIG_HOTPLUG_PCI=y > CONFIG_HOTPLUG_PCI_PCIE=y Apparently not, for me; though having both certainly seems a sensible precaution.
Oh, your system must use acpiphp then? Do you have CONFIG_HOTPLUG_PCI_ACPI=y? Can you attach a complete dmesg log showing a hotplug event?
Created attachment 230211 [details] Working kernel config
Created attachment 230221 [details] dmesg showing hotplug event Hotplug starts at 76.965131.
(In reply to Bjorn Helgaas from comment #49) > Oh, your system must use acpiphp then? Do you have > CONFIG_HOTPLUG_PCI_ACPI=y? Apparently so. Complete config attached. > Can you attach a complete dmesg log showing a > hotplug event? Done. It seems I'm not quite out of the woods yet - unplugging the dock just now caused a kernel panic. Nothing useful in the logs, sadly; if it repeats I'll try to collect some more information and report elsewhere.
Adding Andreas. My guess is that your system uses acpiphp so the platform can do Thunderbolt magic when the hotplug happens. Your dmesg shows some ACPI errors when you do the hotplug, so maybe there's some issue there. I don't really know how to debug that, but if I were to try, an acpidump might be useful.
Created attachment 230241 [details] acpidump
The panic turns out to be quite repeatable. It begins (as collected over a serial console): [ 44.010232] BUG: unable to handle kernel NULL pointer dereference at (null) [ 44.010274] IP: [<ffffffff81630316>] usb_hc_died+0x16/0xc0 [ 44.010292] PGD 0 [ 44.010300] Oops: 0000 [#1] SMP [ 44.010310] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables rfcomm bnep snd_hda_codec_hdmi snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core snd_hda_codec_realtek binfmt_misc snd_compress snd_hda_codec_generic i2c_designware_platform dcdbas ac97_bus i2c_des Is it useful to file this, along with more details, as a separate bug?
Looks like usb does not handle surprise removal that well. Were you able to collect the whole stacktrace? usb_hc_died is called from a bunch of places.
On Fri, Aug 26, 2016 at 11:44:49AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > Looks like usb does not handle surprise removal that well. It's "xhci doesn't handle suprise removal of its device" that well, it's not a USB core issue, it's the host controller driver issue. And people are working on it... thanks, greg k-h
In the interest of trying to avoid confusion with the original bug, I've filed the panic as https://bugzilla.kernel.org/show_bug.cgi?id=155541. Andreas: I captured a dump via kdump and extracted a stacktrace, which I have attached to that bug. Apologies if it's duplicating already-known problems, but I thought it might be helpful to record some pointers in case anyone else is seeing the same symptoms.