Bug 56931 - "E/xHCI: BIOS handoff failed" breaks poweroff
Summary: "E/xHCI: BIOS handoff failed" breaks poweroff
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-21 17:41 UTC by Aleksandr Mezin
Modified: 2014-03-09 12:56 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.8.8
Tree: Mainline
Regression: Yes


Attachments
full dmesg (50.89 KB, text/plain)
2013-04-21 17:41 UTC, Aleksandr Mezin
Details
Increase time delay for BIOS EHCI handoff (585 bytes, patch)
2013-06-05 17:57 UTC, Alan Stern
Details | Diff
Dmesg with additional debug output (89.21 KB, application/octet-stream)
2013-06-21 14:01 UTC, Aleksandr Mezin
Details
Dmesg when handoff didn't fail (89.01 KB, application/octet-stream)
2013-06-21 14:03 UTC, Aleksandr Mezin
Details

Description Aleksandr Mezin 2013-04-21 17:41:34 UTC
Created attachment 99541 [details]
full dmesg

Usually I see "EHCI: BIOS handoff failed" and "xHCI: BIOS handoff failed" in dmesg, and laptop hangs after "Power down" message. However, if I was lucky enough and "handoff failed" messages didn't appear, laptop suspends and turns off without any problems.

I remember that it didn't happen with old kernels (3.2 maybe). With 3.8 it happens almost always.

On the same laptop, Windows 7 runs fine. However, looks like there is a long pause during boot.
Comment 1 Aleksandr Mezin 2013-04-23 15:08:31 UTC
Maybe it's more clear description:

When I boot my laptop (HP Pavilion dv6-6179er), I see these messages in dmesg:
[    3.141630] pci 0000:00:1a.0: EHCI: BIOS handoff failed (BIOS bug?) 01010001
[    4.474539] pci 0000:00:1d.0: EHCI: BIOS handoff failed (BIOS bug?) 01010001
[    4.481509] pci 0000:19:00.0: xHCI BIOS handoff failed (BIOS bug ?) 00010401

If i attempt a power-off, suspend or hibernate, I see "Power down." message (or just black screen in case of suspend), but laptop doesn't actually turn off. However, seems that at least hard drive and ethernet adapter are turned off.

Sometimes I don't see "BIOS handoff failed" messages, and then power-off, suspend, hibernation works as expected. Reboot always works.
Comment 2 Bjorn Helgaas 2013-05-08 19:54:40 UTC
I assume this is a USB issue, so reassigning there.  Bounce it back if you disagree.
Comment 3 Aleksandr Mezin 2013-05-26 10:36:22 UTC
I downgraded BIOS and don't experience this problem anymore. However, Windows works without problems even with latest BIOS. Should this bug be closed?
Comment 4 Bjorn Helgaas 2013-06-05 17:37:35 UTC
It sounds like Linux works correctly with the downgraded BIOS (unknown version), but not with the original BIOS (F.1B 10/05/2011).  Windows works correctly with both.

Changing BIOS versions is not an acceptable bug resolution, so I think this is still a valid bug report and should not be closed.

I don't know how much the USB folks use bugzilla, but I added a couple experts to the CC: list.
Comment 5 Alan Stern 2013-06-05 17:57:16 UTC
Created attachment 103581 [details]
Increase time delay for BIOS EHCI handoff

Maybe increasing the delay time for the EHCI handoff will help.  The fact that Windows takes a long time to boot points in this direction, at least.

The attached patch will increase the delay to 10 seconds (10000 ms).  You can try even larger values if you want.
Comment 6 Sarah Sharp 2013-06-05 18:12:21 UTC
Does your BIOS have an option to "disable" EHCI and xHCI?  Some BIOSes will have a USB mode that is something like "Enabled", "Smart auto", or "Disabled".  For xHCI, that should be set to "Smart Auto" or "Enabled".

If the EHCI to xHCI port switchover is working properly, all of your USB devices should show up under xHCI instead of EHCI.  What does `sudo lsusb -t` show in the different failure cases?  E.g. when the xHCI BIOS handoff fails and when it succeeds, but the EHCI BIOS handoff fails?

Some BIOSes will not expect EHCI to be used if the ports are switched over to xHCI.  That may be why the EHCI handoff fails.  Increasing the handoff delay is unlikely to help in that case.
Comment 7 Sarah Sharp 2013-06-05 18:13:12 UTC
I'm not sure why power off would cause the system to hang though.  Perhaps the system needs the XHCI_SPURIOUS_REBOOT quirk, so that ports are switched back to EHCI on shutdown?

commit e95829f474f0db3a4d940cae1423783edd966027
Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Date:   Mon Jul 23 18:59:30 2012 +0300

    xhci: Switch PPT ports to EHCI on shutdown.
    
    The Intel desktop boards DH77EB and DH77DF have a hardware issue that
    can be worked around by BIOS.  If the USB ports are switched to xHCI on
    shutdown, the xHCI host will send a spurious interrupt, which will wake
    the system.  Some BIOS will work around this, but not all.
    
    The bug can be avoided if the USB ports are switched back to EHCI on
    shutdown.  The Intel Windows driver switches the ports back to EHCI, so
    change the Linux xHCI driver to do the same.
    
    Unfortunately, we can't tell the two effected boards apart from other
    working motherboards, because the vendors will change the DMI strings
    for the DH77EB and DH77DF boards to their own custom names.  One example
    is Compulab's mini-desktop, the Intense-PC.  Instead, key off the
    Panther Point xHCI host PCI vendor and device ID, and switch the ports
    over for all PPT xHCI hosts.
    
    The only impact this will have on non-effected boards is to add a couple
    hundred milliseconds delay on boot when the BIOS has to switch the ports
    over from EHCI to xHCI.
    
    This patch should be backported to kernels as old as 3.0, that contain
    the commit 69e848c2090aebba5698a1620604c7dccb448684 "Intel xhci: Support
    EHCI/xHCI port switching."
    
    Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
    Reported-by: Denis Turischev <denis@compulab.co.il>
    Tested-by: Denis Turischev <denis@compulab.co.il>
    Cc: stable@vger.kernel.org

diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index df0828c..c5e9e4a 100644
--- a/drivers/usb/host/pci-quirks.c
+++ b/drivers/usb/host/pci-quirks.c
@@ -800,6 +800,13 @@ void usb_enable_xhci_ports(struct pci_dev *xhci_pdev)
 }
 EXPORT_SYMBOL_GPL(usb_enable_xhci_ports);
 
+void usb_disable_xhci_ports(struct pci_dev *xhci_pdev)
+{
+       pci_write_config_dword(xhci_pdev, USB_INTEL_USB3_PSSEN, 0x0);
+       pci_write_config_dword(xhci_pdev, USB_INTEL_XUSB2PR, 0x0);
+}
+EXPORT_SYMBOL_GPL(usb_disable_xhci_ports);
+
 /**
  * PCI Quirks for xHCI.
  *
diff --git a/drivers/usb/host/pci-quirks.h b/drivers/usb/host/pci-quirks.h
index b1002a8..ef004a5 100644
--- a/drivers/usb/host/pci-quirks.h
+++ b/drivers/usb/host/pci-quirks.h
@@ -10,6 +10,7 @@ void usb_amd_quirk_pll_disable(void);
 void usb_amd_quirk_pll_enable(void);
 bool usb_is_intel_switchable_xhci(struct pci_dev *pdev);
 void usb_enable_xhci_ports(struct pci_dev *xhci_pdev);
+void usb_disable_xhci_ports(struct pci_dev *xhci_pdev);
 #else
 static inline void usb_amd_quirk_pll_disable(void) {}
 static inline void usb_amd_quirk_pll_enable(void) {}
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 92eaff6..9bfd4ca11 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -94,6 +94,15 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
                xhci->quirks |= XHCI_EP_LIMIT_QUIRK;
                xhci->limit_active_eps = 64;
                xhci->quirks |= XHCI_SW_BW_CHECKING;
+               /*
+                * PPT desktop boards DH77EB and DH77DF will power back on after
+                * a few seconds of being shutdown.  The fix for this is to
+                * switch the ports from xHCI to EHCI on shutdown.  We can't use
+                * DMI information to find those particular boards (since each
+                * vendor will change the board name), so we have to key off all
+                * PPT chipsets.
+                */
+               xhci->quirks |= XHCI_SPURIOUS_REBOOT;
        }
        if (pdev->vendor == PCI_VENDOR_ID_ETRON &&
                        pdev->device == PCI_DEVICE_ID_ASROCK_P67) {
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 5c3a3f7..c59d5b5 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -659,6 +659,9 @@ void xhci_shutdown(struct usb_hcd *hcd)
 {
        struct xhci_hcd *xhci = hcd_to_xhci(hcd);
 
+       if (xhci->quirks && XHCI_SPURIOUS_REBOOT)
+               usb_disable_xhci_ports(to_pci_dev(hcd->self.controller));
+
        spin_lock_irq(&xhci->lock);
        xhci_halt(xhci);
        spin_unlock_irq(&xhci->lock);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 96f49db..c713256 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1494,6 +1494,7 @@ struct xhci_hcd {
 #define XHCI_TRUST_TX_LENGTH   (1 << 10)
 #define XHCI_LPM_SUPPORT       (1 << 11)
 #define XHCI_INTEL_HOST                (1 << 12)
+#define XHCI_SPURIOUS_REBOOT   (1 << 13)
        unsigned int            num_active_eps;
        unsigned int            limit_active_eps;
        /* There are two roothubs to keep track of bus suspend info for */
Comment 8 Alan Stern 2013-06-05 18:55:44 UTC
You mentioned that earlier kernels (3.2?) worked okay.  Did they display the various "BIOS handoff failed" messages?  That fact that those messages don't always appear indicates that the problem may be related to timing; that's one of the reasons why I suggested increasing the EHCI handoff delay time.

Is it possible to set up the BIOS to disable xHCI while leaving EHCI enabled?  If you can do that, does the system then power-off correctly?

Since EHCI is generally simpler than xHCI, this ought to be an easier scenario to debug.  Once it is working, maybe the same solution will work with xHCI.
Comment 9 Aleksandr Mezin 2013-06-07 05:01:00 UTC
The problem returned, but I didn't upgrade BIOS.
Increasing timeout doesn't help.
Comment 10 Aleksandr Mezin 2013-06-07 05:27:19 UTC
(In reply to comment #7)
> I'm not sure why power off would cause the system to hang though.  Perhaps
> the
> system needs the XHCI_SPURIOUS_REBOOT quirk, so that ports are switched back
> to
> EHCI on shutdown?
> 
I'm using kernel 3.9.4, so this patch is already applied.

I tried replacing "pdev->device == PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI" with "(pdev->device == PCI_DEVICE_ID_INTEL_PANTHERPOINT_XHCI || pdev->device == 0x1c26)"
(0x1c26 is my device id). It doesn't help.

And I can't disable XHCI in BIOS.
Comment 11 Aleksandr Mezin 2013-06-07 06:06:50 UTC
Sorry, this was ID for EHCI. However, adding the quirk in the right place (NEC vendor and device 0x0194) still doesn't help.
Comment 12 Aleksandr Mezin 2013-06-13 04:11:42 UTC
Seems that this patch fixes the problem. Just tried because "why not?". After applying it, I didn't see "handoff failed" for two days. I need to test it for longer time to be sure.

--- pci-quirks.c.orig   2013-06-13 11:07:39.879228805 +0700
+++ pci-quirks.c        2013-06-12 19:27:15.325444795 +0700
@@ -571,6 +571,8 @@
                        try_handoff = 0;
        }
 
+       pci_enable_device(pdev);
+
        if (try_handoff && (cap & EHCI_USBLEGSUP_BIOS)) {
                dev_dbg(&pdev->dev, "EHCI: BIOS handoff\n");
Comment 13 Alan Stern 2013-06-13 17:45:07 UTC
That's what the mmio_resource_enabled() call at the start of quirk_usb_disable_ehci() is supposed to check for.  In addition, there's already a call to pci_enable_device() inside quirk_usb_early_handoff() at the end of the source file.  So I don't see how this can make any difference.
Comment 14 Aleksandr Mezin 2013-06-17 07:29:00 UTC
I didn't see these errors since I added this line (and laptop turns off without problems), and when I removed it I got errors again.

I have buggy BIOS, and this somehow kicks it. This line affects only EHCI devices, but without it handoff fails on XHCI too.
Comment 15 Alan Stern 2013-06-17 15:14:42 UTC
Can you try moving the new pci_enable_device() call earlier, and see if it still works?

For example, does it work if you move it to the start of ehci_bios_handoff()?

What about quirk_usb_disable_ehci() just before the "while" loop?  Or just after the mmio_resource_enabled() check?  Or just before the check?

Or even in quirk_usb_early_handoff() just before the call to quirk_usb_disable_ehci()?  Or even just before the multi-branching "if" statement?

There's not much point in moving it earlier than that, because that's where the original pci_enable_device() call is.
Comment 16 Aleksandr Mezin 2013-06-20 03:48:20 UTC
Seems that the earliest place where it works is

--- drivers/usb/host/pci-quirks.c.orig  2013-06-20 10:44:12.348533318 +0700
+++ drivers/usb/host/pci-quirks.c       2013-06-19 21:41:28.717175243 +0700
@@ -651,6 +651,7 @@
        offset = (hcc_params >> 8) & 0xff;
        while (offset && --count) {
                pci_read_config_dword(pdev, offset, &cap);
+               pci_enable_device(pdev);
 
                switch (cap & 0xff) {
                case 1:

If I move pci_enable_device before pci_read_config_dword or move it outside loop, I get "handoff failed" on next boot.
Comment 17 Alan Stern 2013-06-20 17:02:25 UTC
That's really weird.

Okay, given your results, I suggest you call pci_enable_device() immediately before the call to ehci_bios_handoff() and then call pci_disable_device() immediately after (each enable is supposed to be balanced by a disable).  If that works, you can send in a patch or I can submit one for you.  Be sure to add a comment explaining why pci_enable_device() is needed; otherwise in the future somebody will think it is redundant and will remove it.
Comment 18 Aleksandr Mezin 2013-06-21 13:59:57 UTC
pci_enable_device doesn't help anymore. This was just coincidence (that lasted for a week). Disabling and then enabling, resetting the device doesn't help too.
Comment 19 Aleksandr Mezin 2013-06-21 14:01:49 UTC
Created attachment 105641 [details]
Dmesg with additional debug output

I've enabled debug output for usb, pci and pci-related things in ACPI.
Comment 20 Aleksandr Mezin 2013-06-21 14:03:25 UTC
Created attachment 105651 [details]
Dmesg when handoff didn't fail
Comment 21 Alan Stern 2013-06-21 15:57:47 UTC
Since the problem doesn't really show up until you try to suspend or shut down the computer, that's where the interesting events occur.  However, because they occur after the OS has given control back to the BIOS, they won't generate any messages in the log.

There's one other thing you can try.  A section of code in ehci_bios_handoff() is disabled with "#if 0".  If you enable it again, does it help?  (You'll have to declare "val" as a u32.)

Is it still true that the problem doesn't occur with the 3.2 kernel?  If it is, maybe you can find the earliest kernel version where the problem does occur.
Comment 22 Aleksandr Mezin 2013-06-21 16:31:38 UTC
I tried removing "#if 0". It doesn't help too.

Maybe non-working suspend/shutdown is only one of symptoms, and interesting things happen early on boot when BIOS go to bad state? Handoff always fails for all hosts or succeeds for all of them. Suspend/shutdown doesn't work if and only if handoff failed. Also, seems that udev settle takes much longer time when handoff failed.

I'll test kernel 3.2 tomorrow.
Comment 23 Aleksandr Mezin 2014-03-09 12:56:12 UTC
Oops, I forgot about this bug, but received email notification it today.
I can't surely say whether this bug is present or not in specific kernel version, as it happens only from time to time. But it looks like it is present in all kernels I used.

Note You need to log in before you can comment on or make changes to this bug.