Created attachment 99541 [details] full dmesg Usually I see "EHCI: BIOS handoff failed" and "xHCI: BIOS handoff failed" in dmesg, and laptop hangs after "Power down" message. However, if I was lucky enough and "handoff failed" messages didn't appear, laptop suspends and turns off without any problems. I remember that it didn't happen with old kernels (3.2 maybe). With 3.8 it happens almost always. On the same laptop, Windows 7 runs fine. However, looks like there is a long pause during boot.
Maybe it's more clear description: When I boot my laptop (HP Pavilion dv6-6179er), I see these messages in dmesg: [ 3.141630] pci 0000:00:1a.0: EHCI: BIOS handoff failed (BIOS bug?) 01010001 [ 4.474539] pci 0000:00:1d.0: EHCI: BIOS handoff failed (BIOS bug?) 01010001 [ 4.481509] pci 0000:19:00.0: xHCI BIOS handoff failed (BIOS bug ?) 00010401 If i attempt a power-off, suspend or hibernate, I see "Power down." message (or just black screen in case of suspend), but laptop doesn't actually turn off. However, seems that at least hard drive and ethernet adapter are turned off. Sometimes I don't see "BIOS handoff failed" messages, and then power-off, suspend, hibernation works as expected. Reboot always works.
I assume this is a USB issue, so reassigning there. Bounce it back if you disagree.
I downgraded BIOS and don't experience this problem anymore. However, Windows works without problems even with latest BIOS. Should this bug be closed?
It sounds like Linux works correctly with the downgraded BIOS (unknown version), but not with the original BIOS (F.1B 10/05/2011). Windows works correctly with both. Changing BIOS versions is not an acceptable bug resolution, so I think this is still a valid bug report and should not be closed. I don't know how much the USB folks use bugzilla, but I added a couple experts to the CC: list.
Created attachment 103581 [details] Increase time delay for BIOS EHCI handoff Maybe increasing the delay time for the EHCI handoff will help. The fact that Windows takes a long time to boot points in this direction, at least. The attached patch will increase the delay to 10 seconds (10000 ms). You can try even larger values if you want.
Does your BIOS have an option to "disable" EHCI and xHCI? Some BIOSes will have a USB mode that is something like "Enabled", "Smart auto", or "Disabled". For xHCI, that should be set to "Smart Auto" or "Enabled". If the EHCI to xHCI port switchover is working properly, all of your USB devices should show up under xHCI instead of EHCI. What does `sudo lsusb -t` show in the different failure cases? E.g. when the xHCI BIOS handoff fails and when it succeeds, but the EHCI BIOS handoff fails? Some BIOSes will not expect EHCI to be used if the ports are switched over to xHCI. That may be why the EHCI handoff fails. Increasing the handoff delay is unlikely to help in that case.
I'm not sure why power off would cause the system to hang though. Perhaps the system needs the XHCI_SPURIOUS_REBOOT quirk, so that ports are switched back to EHCI on shutdown? commit e95829f474f0db3a4d940cae1423783edd966027 Author: Sarah Sharp <sarah.a.sharp@linux.intel.com> Date: Mon Jul 23 18:59:30 2012 +0300 xhci: Switch PPT ports to EHCI on shutdown. The Intel desktop boards DH77EB and DH77DF have a hardware issue that can be worked around by BIOS. If the USB ports are switched to xHCI on shutdown, the xHCI host will send a spurious interrupt, which will wake the system. Some BIOS will work around this, but not all. The bug can be avoided if the USB ports are switched back to EHCI on shutdown. The Intel Windows driver switches the ports back to EHCI, so change the Linux xHCI driver to do the same. Unfortunately, we can't tell the two effected boards apart from other working motherboards, because the vendors will change the DMI strings for the DH77EB and DH77DF boards to their own custom names. One example is Compulab's mini-desktop, the Intense-PC. Instead, key off the Panther Point xHCI host PCI vendor and device ID, and switch the ports over for all PPT xHCI hosts. The only impact this will have on non-effected boards is to add a couple hundred milliseconds delay on boot when the BIOS has to switch the ports over from EHCI to xHCI. This patch should be backported to kernels as old as 3.0, that contain the commit 69e848c2090aebba5698a1620604c7dccb448684 "Intel xhci: Support EHCI/xHCI port switching." Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Reported-by: Denis Turischev <denis@compulab.co.il> Tested-by: Denis Turischev <denis@compulab.co.il> Cc: stable@vger.kernel.org diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c index df0828c..c5e9e4a 100644 --- a/drivers/usb/host/pci-quirks.c +++ b/drivers/usb/host/pci-quirks.c @@ -800,6 +800,13 @@ void usb_enable_xhci_ports(struct pci_dev *xhci_pdev) } EXPORT_SYMBOL_GPL(usb_enable_xhci_ports); +void usb_disable_xhci_ports(struct pci_dev *xhci_pdev) +{ + pci_write_config_dword(xhci_pdev, USB_INTEL_USB3_PSSEN, 0x0); + pci_write_config_dword(xhci_pdev, USB_INTEL_XUSB2PR, 0x0); +} +EXPORT_SYMBOL_GPL(usb_disable_xhci_ports); + /** * PCI Quirks for xHCI. * diff --git a/drivers/usb/host/pci-quirks.h b/drivers/usb/host/pci-quirks.h index b1002a8..ef004a5 100644 --- a/drivers/usb/host/pci-quirks.h +++ b/drivers/usb/host/pci-quirks.h @@ -10,6 +10,7 @@ void usb_amd_quirk_pll_disable(void); void usb_amd_quirk_pll_enable(void); bool usb_is_intel_switchable_xhci(struct pci_dev *pdev); void usb_enable_xhci_ports(struct pci_dev *xhci_pdev); +void usb_disable_xhci_ports(struct pci_dev *xhci_pdev); #else static inline void usb_amd_quirk_pll_disable(void) {} static inline void usb_amd_quirk_pll_enable(void) {} diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 92eaff6..9bfd4ca11 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -94,6 +94,15 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_EP_LIMIT_QUIRK; xhci->limit_active_eps = 64; xhci->quirks |= XHCI_SW_BW_CHECKING; + /* + * PPT desktop boards DH77EB and DH77DF will power back on after + * a few seconds of being shutdown. The fix for this is to + * switch the ports from xHCI to EHCI on shutdown. We can't use + * DMI information to find those particular boards (since each + * vendor will change the board name), so we have to key off all + * PPT chipsets. + */ + xhci->quirks |= XHCI_SPURIOUS_REBOOT; } if (pdev->vendor == PCI_VENDOR_ID_ETRON && pdev->device == PCI_DEVICE_ID_ASROCK_P67) { diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 5c3a3f7..c59d5b5 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -659,6 +659,9 @@ void xhci_shutdown(struct usb_hcd *hcd) { struct xhci_hcd *xhci = hcd_to_xhci(hcd); + if (xhci->quirks && XHCI_SPURIOUS_REBOOT) + usb_disable_xhci_ports(to_pci_dev(hcd->self.controller)); + spin_lock_irq(&xhci->lock); xhci_halt(xhci); spin_unlock_irq(&xhci->lock); diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 96f49db..c713256 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1494,6 +1494,7 @@ struct xhci_hcd { #define XHCI_TRUST_TX_LENGTH (1 << 10) #define XHCI_LPM_SUPPORT (1 << 11) #define XHCI_INTEL_HOST (1 << 12) +#define XHCI_SPURIOUS_REBOOT (1 << 13) unsigned int num_active_eps; unsigned int limit_active_eps; /* There are two roothubs to keep track of bus suspend info for */
You mentioned that earlier kernels (3.2?) worked okay. Did they display the various "BIOS handoff failed" messages? That fact that those messages don't always appear indicates that the problem may be related to timing; that's one of the reasons why I suggested increasing the EHCI handoff delay time. Is it possible to set up the BIOS to disable xHCI while leaving EHCI enabled? If you can do that, does the system then power-off correctly? Since EHCI is generally simpler than xHCI, this ought to be an easier scenario to debug. Once it is working, maybe the same solution will work with xHCI.
The problem returned, but I didn't upgrade BIOS. Increasing timeout doesn't help.
(In reply to comment #7) > I'm not sure why power off would cause the system to hang though. Perhaps > the > system needs the XHCI_SPURIOUS_REBOOT quirk, so that ports are switched back > to > EHCI on shutdown? > I'm using kernel 3.9.4, so this patch is already applied. I tried replacing "pdev->device == PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI" with "(pdev->device == PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI || pdev->device == 0x1c26)" (0x1c26 is my device id). It doesn't help. And I can't disable XHCI in BIOS.
Sorry, this was ID for EHCI. However, adding the quirk in the right place (NEC vendor and device 0x0194) still doesn't help.
Seems that this patch fixes the problem. Just tried because "why not?". After applying it, I didn't see "handoff failed" for two days. I need to test it for longer time to be sure. --- pci-quirks.c.orig 2013-06-13 11:07:39.879228805 +0700 +++ pci-quirks.c 2013-06-12 19:27:15.325444795 +0700 @@ -571,6 +571,8 @@ try_handoff = 0; } + pci_enable_device(pdev); + if (try_handoff && (cap & EHCI_USBLEGSUP_BIOS)) { dev_dbg(&pdev->dev, "EHCI: BIOS handoff\n");
That's what the mmio_resource_enabled() call at the start of quirk_usb_disable_ehci() is supposed to check for. In addition, there's already a call to pci_enable_device() inside quirk_usb_early_handoff() at the end of the source file. So I don't see how this can make any difference.
I didn't see these errors since I added this line (and laptop turns off without problems), and when I removed it I got errors again. I have buggy BIOS, and this somehow kicks it. This line affects only EHCI devices, but without it handoff fails on XHCI too.
Can you try moving the new pci_enable_device() call earlier, and see if it still works? For example, does it work if you move it to the start of ehci_bios_handoff()? What about quirk_usb_disable_ehci() just before the "while" loop? Or just after the mmio_resource_enabled() check? Or just before the check? Or even in quirk_usb_early_handoff() just before the call to quirk_usb_disable_ehci()? Or even just before the multi-branching "if" statement? There's not much point in moving it earlier than that, because that's where the original pci_enable_device() call is.
Seems that the earliest place where it works is --- drivers/usb/host/pci-quirks.c.orig 2013-06-20 10:44:12.348533318 +0700 +++ drivers/usb/host/pci-quirks.c 2013-06-19 21:41:28.717175243 +0700 @@ -651,6 +651,7 @@ offset = (hcc_params >> 8) & 0xff; while (offset && --count) { pci_read_config_dword(pdev, offset, &cap); + pci_enable_device(pdev); switch (cap & 0xff) { case 1: If I move pci_enable_device before pci_read_config_dword or move it outside loop, I get "handoff failed" on next boot.
That's really weird. Okay, given your results, I suggest you call pci_enable_device() immediately before the call to ehci_bios_handoff() and then call pci_disable_device() immediately after (each enable is supposed to be balanced by a disable). If that works, you can send in a patch or I can submit one for you. Be sure to add a comment explaining why pci_enable_device() is needed; otherwise in the future somebody will think it is redundant and will remove it.
pci_enable_device doesn't help anymore. This was just coincidence (that lasted for a week). Disabling and then enabling, resetting the device doesn't help too.
Created attachment 105641 [details] Dmesg with additional debug output I've enabled debug output for usb, pci and pci-related things in ACPI.
Created attachment 105651 [details] Dmesg when handoff didn't fail
Since the problem doesn't really show up until you try to suspend or shut down the computer, that's where the interesting events occur. However, because they occur after the OS has given control back to the BIOS, they won't generate any messages in the log. There's one other thing you can try. A section of code in ehci_bios_handoff() is disabled with "#if 0". If you enable it again, does it help? (You'll have to declare "val" as a u32.) Is it still true that the problem doesn't occur with the 3.2 kernel? If it is, maybe you can find the earliest kernel version where the problem does occur.
I tried removing "#if 0". It doesn't help too. Maybe non-working suspend/shutdown is only one of symptoms, and interesting things happen early on boot when BIOS go to bad state? Handoff always fails for all hosts or succeeds for all of them. Suspend/shutdown doesn't work if and only if handoff failed. Also, seems that udev settle takes much longer time when handoff failed. I'll test kernel 3.2 tomorrow.
Oops, I forgot about this bug, but received email notification it today. I can't surely say whether this bug is present or not in specific kernel version, as it happens only from time to time. But it looks like it is present in all kernels I used.