Bug 42728

Summary:	ultrabook not able to turn on after failed suspend, EHCI workaround - Asus Zenbook UX31E
Product:	ACPI	Reporter:	Oleksij Rempel (fishor) (bug-track)
Component:	Power-Sleep-Wake	Assignee:	Rafael J. Wysocki (rjw)
Status:	CLOSED CODE_FIX
Severity:	normal	CC:	alan, dev.johnan, ernstp, florian, fougner89, grawity, james, jbarnes, jrnieder, lenb, michiel.beijen, ncoghlan, petter, rap, rjw, sjharms, srivatsa, stern, timo.jyrinki, wengxt
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:		Subsystem:
Regression:	No	Bisected commit-id:
Attachments:	dmesg after filed suspend and brocken poweron normal dmesg normal lspci normal intel_reg_dump brocken lspci brocken intel_reg_dump oops trace debug port dmesg after processors > pm_test Keep EHCI controllers in D0 during suspend on ASUS lspci -vvvv from Comment #48 notebook dmesg from Comment #48 notebook, across suspend/resume cycle with workaround script ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't managed by ACPI ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't managed by ACPI ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't managed by ACPI ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't managed by ACPI

Description Oleksij Rempel (fishor) 2012-02-04 18:03:57 UTC

Hardware:
Asus Zenbook UX31E, CPU i7-2677M, QS67,  4G RAM

kernel version:
i use current git 3.3.0-rc2-00110-gd125666, but i was also able to reproduce it with 3.2.0
 
Steps to reproduce:
- suspend it. with gui and distribution related wrappers, or just set: "echo mem > /sys/power/state"

- it will freeze with blak screene and not blinking cursor on top left. i also not able to access it over ssh.

- force power off. hold power button untill it goes off.

- try to turn it on again. at this stage i get most problems. linux or windos can't start. windows get BSOD, linux get random kernel panics. memtest show many corrupted memory blocks. even harddrive controller seems to be corrupt (there is SSD drive insight)

the battary is build in, this is why not able to remove power completly. 

some times i can get linux to start, but i get some grafick artifacts. Only after i get windows to start every thing goes to normal. memtest show no more corrupt memory blocks, and linux works just fine.

Please do not ask me do bisect. It is too hard to get it work again. If i will have some magick kernel parametr or patch wich will help me restore to working configuration i can start doing testing.

Comment 1 Arne Woerner 2012-02-04 19:58:01 UTC

these
1. https://bugs.freedesktop.org/show_bug.cgi?id=40241
2. https://bugzilla.kernel.org/show_bug.cgi?id=42691
seem to be related... -arne

Comment 2 Oleksij Rempel (fishor) 2012-02-05 13:00:29 UTC

hmm...
i just found this workaround, for my model and i got suspend without crashes:

EHCI_BUSES="0000:00:1a.0 0000:00:1d.0"
case "${1}" in
    hibernate|suspend)
        # Switch USB buses off
        for bus in $EHCI_BUSES; do
            echo -n $bus > /sys/bus/pci/drivers/ehci_hcd/unbind
        done
        ;;
    resume|thaw)
        # Switch USB buses back on
        for bus in $EHCI_BUSES; do
            echo -n $bus > /sys/bus/pci/drivers/ehci_hcd/bind
        done
        ;;
esac

what i still really care is memmory corruptions after failed suspend. why do they present after power off/on cycle? only way to fix it is use start windows and suspend it (just start will not help), or open case of laptop and remove battary.

Comment 3 Oleksij Rempel (fishor) 2012-02-05 13:01:39 UTC

Created attachment 72290 [details]
dmesg after filed suspend and brocken poweron

Comment 4 Oleksij Rempel (fishor) 2012-02-05 13:01:57 UTC

Created attachment 72291 [details]
normal dmesg

Comment 5 Oleksij Rempel (fishor) 2012-02-05 13:02:32 UTC

Created attachment 72292 [details]
normal lspci

Comment 6 Oleksij Rempel (fishor) 2012-02-05 13:02:59 UTC

Created attachment 72293 [details]
normal intel_reg_dump

Comment 7 Oleksij Rempel (fishor) 2012-02-05 13:03:19 UTC

Created attachment 72294 [details]
brocken lspci

Comment 8 Oleksij Rempel (fishor) 2012-02-05 13:03:40 UTC

Created attachment 72295 [details]
brocken intel_reg_dump

Comment 9 Len Brown 2012-02-07 04:04:56 UTC

did this work with a previous version of Linux?
If so, which version?

Unclear what you mean by "suspend without crashes"
when using the EHCI workaround.  Exactly what does
the suspend/resume failure look like when the EHCI
workaround is used?

Comment 10 Oleksij Rempel (fishor) 2012-02-07 07:20:12 UTC

I is not regression. Older kernel, for example 3.0 need EHCI + xHCI workaround. So current kernel is in better shape! :)

I mean, with workaround computer able to suspend and resume without noticeable problems. I will do some more memory checks, but it looks promising now. No spontaneous crashes or segfaults are noticed.

You probably mean, how failure looks like if EHCI workaround _not_ used?
At first place it can't suspend, the laptop will freeze on some stage with black screen and not blinking cursor on left top.
I also tried to access this laptop over ssh, without luck. But - there is no build in ethernet, i use usb-lan dongle. So there is not guaranty if some part is still alive and not completely frozen.
After filed suspend, i usually need to power off the laptop, by holding power button. After this is laptop almost _unusable_, system crashes... Memtest86+ reports many corrupted memory blocks, starting with ~1GB.

Reboots or longer poweroffs will not help.There is only two ways to "repair" this laptop after it:
- some how start windows (avoiding the BSODs) and suspend-resume it.
- or open the case of laptop, and detach the build in battery. I mean _not_ BIOS battery.

Comment 11 Alan Stern 2012-02-07 16:18:38 UTC

Have you tried booting with no_console_suspend on the command line and CONFIG_USB_DEBUG and CONFIG_DEBUG_DRIVER enabled?

It sounds like the suspend never finishes.  Instead the computer is left in some invalid intermediate state, with devices partially configured.  Then when the power is turned back on, the BIOS does not go through a full reset.

Comment 12 Oleksij Rempel (fishor) 2012-02-08 08:55:05 UTC

Created attachment 72316 [details]
oops trace

This trace was made with "nomodeset no_console_suspend vga=791 oops=panic". I also tried to suspend with same setting + workaround, but i got similar trace. probably the reason of this oops is nomodeset. Without "nomodeset" i get only black screen. no ways to get some trace.

Comment 13 Alan Stern 2012-02-08 16:00:44 UTC

Jesse, can you look at this?  A suspend problem related to "nomodeset" is preventing us from debugging a different suspend problem!

Comment 14 Jesse Barnes 2012-02-08 17:08:57 UTC

Unfortunately, we don't really support nomodeset on recent hardware.  If you want to stay in VGA mode to debug suspend w/usb, it would be best to blacklist the i915 driver altogether...

Comment 15 Oleksij Rempel (fishor) 2012-02-09 10:37:00 UTC

Status update:
I blacklisted agp* and i915, to use plain vga. With dmesg i can confirm, agp and i915 not used.
With this configuration i was also not able to get oops-trace. I get black screen with cursor.
Workaround + blacklist able to suspend.

I also tried to remove uvcvideo and btusb instead of workaround, but this not fixed the issue.

If nomodeset + suspend do not supported for new devices any more, then modeset+suspend+no_sonsole_suspend should be fixed. Jesse?

Comment 16 Oleksij Rempel (fishor) 2012-02-10 08:53:12 UTC

Created attachment 72351 [details]
debug port

There is some kind of debug port on this device. May be it is just norm serial port, but i do not know where can i find this kind of connector or what wort should i google.

Comment 17 Len Brown 2012-03-06 02:58:52 UTC

Hmm, i wonder how the EHCI workaround could be related
to the i915 oops, anybody?

Comment 18 Alan Stern 2012-03-06 15:57:08 UTC

I doubt they are related at all.  But then, I have no idea what the underlying problem is, so what do I know?

The best approach is probably to boot without nomodeset and to use i915.  Also, for testing it will help to do "echo 0 >/sys/power/pm_async" first.  If you enable CONFIG_DEBUG_DRIVER and switch to a VT and set the console log level to 8 before suspending, you'll get a lot of debug info on the screen during the suspend.

What happens if you do "echo devices >/sys/power/pm_test" before suspending?  Or any of the other possible pm_test options (freezer, platform, processors, or core)?

Finally, if necessary you can slow down the suspend procedure by adding something like "msleep(100);" to drivers/base/power/main.c:dpm_run_callback().

Comment 19 Oleksij Rempel (fishor) 2012-03-07 06:21:21 UTC

I will do the test this weekend. If you have some more test suggestions, keep writing.

Comment 20 Johan Nenzén 2012-03-08 10:01:54 UTC

hi
I have the memory corruption problem. BSOD in Windows, random kernel panics in Ubuntu and corrupt memory reported by Memtest86. I tried the "repair" advices but none of them worked. BSOD when booting windows and I pulled the battery but still corrupt memory. Any advice before I try to send it in? Something you want med to test?

Comment 21 Oleksij Rempel (fishor) 2012-03-08 13:53:37 UTC

Just curios. This device has "Intel Anti-Theft", which should make computer useless if some special configuration was set. May be, we hit some configuration bit at suspend time?

Comment 22 Oleksij Rempel (fishor) 2012-03-08 15:03:27 UTC

I tested different pm_test with mem > state. All test was ok, it suspended devices,... and then came back. No freezes. But, then i did some discovery. After poweroff/on, memory was corrupted...! After some more testing i found, devices > pm_test is ok, no memory corruption. After processors > pm_test, there is memory corruption.

Steps to reproduce:
- echo 0 > pm_async
- echo processors > pm_test
- echo mem > state
(here the display will go off, usb mouse powered off too, after one second it returns back to console .... no freezes)
- poweroff
after poweron, start memtest .... here i will get lots of bad memory blocks.

Comment 23 Oleksij Rempel (fishor) 2012-03-08 15:08:17 UTC

Created attachment 72555 [details]
dmesg after processors > pm_test

This dmesg was grabbed processors > pm_test; mem > state.

Comment 24 Alan Stern 2012-03-08 15:24:06 UTC

If memtest shows bad memory blocks then there's probably something wrong with the firmware or the hardware on the computer's motherboard.  Have you tried updating the BIOS?

Comment 25 Oleksij Rempel (fishor) 2012-03-08 15:44:21 UTC

the bios is up to date. And:
- memtest shows bad memory blocks _only_ after bad suspend.
- there is no bad memory blocks after complete poweroff (battery and ac are off)
- after bad suspend+freeze+poweroff, laptop remains in some kind of suspend mode. for example after poweroff (AC stay connected), power indicator may blinking (it mean suspend mode). This can be reason for bad memory blocks.
- suspend works perfectly with windows.

This processor has build in memory controller... may be this is the reason?

Comment 26 Alan Stern 2012-03-08 16:25:04 UTC

Before you said that memtest shows bad memory blocks after a _good_ pm_test=processors suspend test followed by power-off, power-on.  No freeze involved.

Of course the memory controller _could_ be the reason.  I have no way to tell.  Maybe you should try posting a message on linux-pm@vger.kernel.org and perhaps also linux-kernel@vger.kernel.org, to see if anybody else has a good idea.  Include the information in comment #22.

Comment 27 Oleksij Rempel (fishor) 2012-03-08 16:33:30 UTC

this bug starting to be to match confusing. pm_test=devises is good. pm_test=processors is bad.

Comment 28 Timo Jyrinki 2012-03-22 11:07:17 UTC

For me, I didn't get memory corruption even though I did a suspend or two before adding the workaround from comment #2, so the memory corruption might not be completely related to the resume failure. The resuming just failed.

With the workaround everything seems fine, but would be of course nicer to have it work out-of-the-box.

BIOS version 210.

Comment 29 Oleksij Rempel (fishor) 2012-03-25 07:26:48 UTC

Hi Time,
what cpu is in you laptop? RAM? My is i7-2677M, with 4GB RAM.
How do you checked for memory corruption?
In my comment #22 i said, that i can reproduce corruptions even after
pm_test=processors. It seems like usb workaround works, because mmap of usb on my laptop is in affected memory range. 

Timo, can you please run fallowing test:
- cd /sys/power/
- echo 0 > pm_async
- echo processors > pm_test
- echo mem > state
- poweroff
- after poweron, start memtest86+

Comment 30 Timo Jyrinki 2012-03-27 08:05:50 UTC

I've an i5-2557M model with 4GB RAM. I let the memtest86+ run for one full pass and then some, without errors. I haven't also experienced hangups.

However, I'm using the laptop for work so I wouldn't want to experiment with the memory corruptions right now. Instructions for removing the battery could be useful anyway - there looks like to be 8 screws at the bottom. Confirming that only those need to be unscrewed and then the battery can be safely removed and reattached could help encouraging me or someone else to experiment with this. Obviously testing will be needed at some point whether or not someone comes up with a potential patch.

Comment 31 Ernst Persson 2012-04-13 06:21:02 UTC

I got the memory corruption also. I left memtest running until the battery was completely drained. Then I plugged it in and started Windows and did a suspend/resume as fast as I could, and now I think it's back to normal.
Just letting the battery drain and then starting Linux wasn't enough, the corruption came back after a few minutes (??).

Comment 32 Alan Stern 2012-04-23 16:52:48 UTC

Everybody suffering from this problem: Please add a comment containing the output from "lspci -nv -s 1d.0".  No need to attach it, the output will be small enough to paste it in directly.

Are the affected machines all ASUS, like Oleksij's?

Comment 33 Timo Jyrinki 2012-04-23 18:21:49 UTC

Alan:
$ lspci -nv -s 1d.0
00:1d.0 0c03: 8086:1c26 (rev 05) (prog-if 20 [EHCI])
	Subsystem: 1043:1427
	Flags: bus master, medium devsel, latency 0, IRQ 23
	Memory at dfe07000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] PCI Advanced Features
	Kernel driver in use: ehci_hcd

AFAIK this is a Asus Zenbook UX21/UX31 specific problem.

Comment 34 Oleksij Rempel (fishor) 2012-04-23 18:24:16 UTC

00:1d.0 0c03: 8086:1c26 (rev 05) (prog-if 20 [EHCI])
	Subsystem: 1043:1427
	Flags: bus master, medium devsel, latency 0, IRQ 23
	Memory at dfe07000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] PCI Advanced Features
	Kernel driver in use: ehci_hcd
	Kernel modules: ehci-hcd

Comment 35 Ernst Persson 2012-04-23 19:53:46 UTC

sudo lspci -nv -s 1d.0
00:1d.0 0c03: 8086:1c26 (rev 05) (prog-if 20 [EHCI])
	Subsystem: 1043:1427
	Flags: bus master, medium devsel, latency 0, IRQ 23
	Memory at dfe07000 (32-bit, non-prefetchable) [size=1K]
	Capabilities: [50] Power Management version 2
	Capabilities: [58] Debug port: BAR=1 offset=00a0
	Capabilities: [98] PCI Advanced Features
	Kernel driver in use: ehci_hcd

Comment 36 Alan Stern 2012-04-23 20:16:48 UTC

Created attachment 73049 [details]
Keep EHCI controllers in D0 during suspend on ASUS

The numbers are all the same, so this patch ought to work for all of you.  It prevents the EHCI controllers on ASUS computers using Intel's 6 Series/C200 Series chipset from being put in low power during system sleep.  Evidently that causing some people's systems to crash, and quite likely it's causing the memory corruption on yours.

Anyway, please try the patch without using the script mentioned in comment #2.  It was written against 3.4-rc4, so it may need slight adjustments to apply to your kernel sources.

Comment 37 Oleksij Rempel (fishor) 2012-04-24 07:09:28 UTC

Cool! it works. Just with echo mem > /sys/power/state.
Memtest do not show any error.

Just to confirm your suggestion. I checked, what happens with Windows 7. EHCI uses MS driver _and_ AiCharger.sys. If laptop is suspended in windows, ehci continues to provide power. With this patch linux doing the same.

Comment 38 Oleksij Rempel (fishor) 2012-04-25 16:58:59 UTC

After we found what exact the bug is, i will try to reproduce it on windows. I will probably need to remove some of asus software. If it will be possible, it will be good to make some pressure on asus. Thay support only windows, so if it is broken, then they will need to fix it.
Stay tuned!

Comment 39 Oleksij Rempel (fishor) 2012-04-26 16:30:12 UTC

I did some windows testing with removed AiCharger. Here is some more info about this software:
http://event.asus.com/mb/2010/ai_charger/

Suddenly it didn't made any difference. After syspend, usb ports still prowide power, so i assume they are in D0 state. May be firmware provide correct values?

I digget a bit in DSDT and found this part:
  Device (EHC1)
  {
     Name (_ADR, 0x001D0000)
     OperationRegion (PWKE, PCI_Config, 0x62, 0x04)
....
  Method (_S3D, 0, NotSerialized)
    {
       Return (0x02)
    }

  Method (_S4D, 0, NotSerialized)
    {
      Return (0x02)
    }


if i understand correctly, EHC1 is "00:1d.0", _S3D is "Methods that return the lowest D-state values" It mean, it will return D2, not D3. Is it correct? Do kernel use this method to get Dstate for ehci?

Comment 40 Oleksij Rempel (fishor) 2012-04-26 16:43:48 UTC

For the same device there is one more method:
  Method (_PSW, 1, NotSerialized)
    {
       If (Arg0)
    {
       Store (Ones, PWUC)
    }
       Else
    {
       Store (0x00, PWUC)
    }
  }
this method is depressed for by acpi 3.0 but it is responsible by for waking laptop from S3 state by usb event.
I say it because in workaround description there is:
"However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep.  Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs."

Comment 41 Alan Stern 2012-04-26 17:16:36 UTC

The USB ports continue to be powered even in D3, because many USB devices need bus power in order to maintain their state and to report wakeup events.

I don't know exactly how the settings are managed.  Maybe Linux's PCI core uses both the native PCI PM methods and the ACPI methods.

Comment 42 Oleksij Rempel (fishor) 2012-04-26 18:41:36 UTC

hmm...  i just made this patch find D-state used for suspend:
+       /*
+        * Some systems crash if an EHCI controller is in D3 during
+        * a sleep transition.  We have to leave such controllers in D0.
+        */
+       if (hcd->broken_pci_sleep) {
+               retval = pci_prepare_to_sleep(pci_dev);
+               printk("TTTTT, %i\n", retval);
+               dev_dbg(dev, "Staying in PCI D0\n");
+               return retval;
+       }

but system freezed. Same way like before, with memory corruption. It looks like pci_prepare_to_sleep() is the cause of this issue.

Comment 43 Alan Stern 2012-04-26 19:54:37 UTC

No, the cause is the fact that the controller is in D3 when the system suspends.  However you are correct that pci_prepare_to_sleep() calls pci_set_power_state(), which calls pci_raw_set_power_state(), which puts the controller into D3.

Comment 44 Oleksij Rempel (fishor) 2012-04-27 09:37:03 UTC

This all power management stuff is new for me, so please be patient with me.
Id did some more digging.
pci_prepare_to_sleep() calls pci_target_state() to get supported state.
pci_target_state() checks if platform_pci_power_manageable(), the last one check if struct pci_platform_pm exist. Since it dos net exist, It trying to calculate supported state by itself.
The problem is, acpi_pci_init() do set pci_set_platform_pm(). ACPI _is_ used for PM, but not for EHCI. Why?

Comment 45 Oleksij Rempel (fishor) 2012-04-27 10:29:12 UTC

hmm... platform_pci_power_manageable() returns for all devices pci_platform_pm->is_manageable(dev) = false.
Is it possible, that pci_platform_pm is supported and loaded, but none of devices is_manageable?

Comment 46 Steven Harms 2012-04-28 23:52:35 UTC

I am able to reproduce this on an Acer 4830TG-6808 also.  Using the work around from comment #2 resolves this.

Comment 47 Alan Stern 2012-04-29 02:05:52 UTC

Steven, does the patch attached to this email message:

  http://marc.info/?l=linux-pm&m=133563455220196&w=2

fix your problem?  If not, can you provide the output from "lspci -nv -s 1d.0"?

Comment 48 James Ettle 2012-05-08 21:19:11 UTC

I have a non-Asus notebook with similar symptoms --- suspend-to-RAM hangs, but I do *not* see the memory corruption issue. Exactly the same workaround script as in Comment #2 makes things work properly.

I note that after resuming,

  [   52.899028] usb 1-1: clear tt 1 (9042) error -19

appears in dmesg.

I'm currently using kernel 3.3.5, which as I understand it includes a patch to fix this on Asus machines. Is there a way I could test this machine to see if the patch should be broadened? (I've attached my lspci and dmesg below.)

Comment 49 James Ettle 2012-05-08 21:20:05 UTC

Created attachment 73224 [details]
lspci -vvvv from Comment #48 notebook

Comment 50 James Ettle 2012-05-08 21:20:48 UTC

Created attachment 73225 [details]
dmesg from Comment #48 notebook, across suspend/resume cycle with workaround script

Comment 51 Alan Stern 2012-05-09 17:21:45 UTC

You should test this patch instead:

    http://marc.info/?l=linux-kernel&m=133582194609000&w=2

Comment 52 James Ettle 2012-05-10 19:08:54 UTC

(In reply to comment #51)
> You should test this patch instead:
> 
>     http://marc.info/?l=linux-kernel&m=133582194609000&w=2

Just tested it, built against 3.3.5 from Fedora sources -- seems to work! Thanks! Any chance this will make it into the 3.3 series?

Comment 53 Alan Stern 2012-05-10 19:25:07 UTC

It will appear in a 3.3-stable release in the not-too-distant future.  Probably either the next one or the one after that.

Comment 54 Ernst Persson 2012-05-12 08:29:59 UTC

(In reply to comment #48)
> I'm currently using kernel 3.3.5, which as I understand it includes a patch
> to
> fix this on Asus machines. Is there a way I could test this machine to see if
> the patch should be broadened? (I've attached my lspci and dmesg below.)

Oh this has been integrated and released? Why didn't anyone say so? :-)
Would be good to have in 3.2.x also.

This commit right? http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=1ce9245f5aff46201fa81fdd3f796a6c9f3ad1ab

(Guess it's not NEEDINFO anymore then!)

Comment 55 Rafael J. Wysocki 2012-05-25 20:02:18 UTC

Created attachment 73392 [details]
ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't managed by ACPI

Can anyone please check if this patch is sufficient to fix the problem?

Comment 56 Rafael J. Wysocki 2012-05-25 20:16:40 UTC

Created attachment 73394 [details]
ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't managed by ACPI

Sorry, please try this one instead.

Comment 57 Rafael J. Wysocki 2012-05-25 21:36:01 UTC

Created attachment 73397 [details]
ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't managed by ACPI

One more adjustment.  Please test this one.

Comment 58 Rafael J. Wysocki 2012-05-25 22:48:57 UTC

To be precise, please revert commit 151b61284776 from the 3.4.0 kernel, apply the patch from comment #57 and see if it still works.

Unfortunately, commit 151b61284776 has introduced a regression reported in bug #43278 and we need to find an alternative approach to address this issue.

Comment 59 James Ettle 2012-05-25 22:49:45 UTC

(In reply to comment #57)
> Created an attachment (id=73397) [details]
> ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't
> managed
> by ACPI
> 
> One more adjustment.  Please test this one.

Tried it built against 3.4 (Fedora 17 sources, kernel-3.4.0-1.fc17), didn't work.

(The patch from Comment #52 works for me, when used in a 3.3-series kernel. I could try it in 3.4 later on to see if it still works.)

Comment 60 James Ettle 2012-05-25 22:54:26 UTC

(In reply to comment #59)

Note that Comment #59 applies to a kernel without commit 151b61284776 reverted. I'm working on rolling a reverted one now...

Comment 61 Rafael J. Wysocki 2012-05-25 22:58:26 UTC

Created attachment 73399 [details]
ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't managed by ACPI

Hmm.  What about this one?

Comment 62 James Ettle 2012-05-25 23:00:14 UTC

(In reply to comment #61)
> Created an attachment (id=73399) [details]
> ACPI / PCI / PM: Check _SxD/_SxW for devices whose power states aren't
> managed
> by ACPI
> 
> Hmm.  What about this one?

With or without 151b61284776 reverted?

Comment 63 Rafael J. Wysocki 2012-05-25 23:02:24 UTC

With commit 151b61284776 reverted, please.

Comment 64 James Ettle 2012-05-26 00:14:08 UTC

reverting 151b61284776 + Patch of Comment #61 works.

Comment 65 Oleksij Rempel (fishor) 2012-05-26 06:11:22 UTC

Ok, this dsdt method looks like in asus ux31e but triggers some new bug. hm..
According to this dsdt if we enter S3 sleeps state, we should put usb controller in acpi_d2 state. Normally we assume acpi_d2 == pci_d2, at least current code do this. But according to controller documentation (for asus ux31e, not for hardware in this case) usb controller do not support D2 state. On my hardware i can pass pci_D2 and it will be just ignored, it will continue to stay in pci_D0. What if this controller do not support pci_D2 and will cause this problem?

Comment 66 Oleksij Rempel (fishor) 2012-05-26 06:13:42 UTC

oops, postet it to wrong bug report

Comment 67 Rafael J. Wysocki 2012-05-26 11:44:00 UTC

OK, thanks for testing!

Comment 68 James Ettle 2012-05-26 11:48:31 UTC

(In reply to comment #67)
> OK, thanks for testing!

Just want to point out Comment #48, to avoid any confusion -- my notebook's not an Asus, but I think it does have the same USB chips and fails to suspend in similar circumstances without either (revert + above patch) or (workarounds script). [I could file a separate bug report if appropriate.]

Comment 69 Rafael J. Wysocki 2012-05-26 11:50:06 UTC

That's OK.  Let's keep things in one place.

Comment 70 fougner89 2012-07-04 12:51:43 UTC

*** Bug 43064 has been marked as a duplicate of this bug. ***

Comment 71 Alan Stern 2012-07-24 15:05:20 UTC

Commit dbf0e4c7257f8d684ec1a3c919853464293de66e (PCI: EHCI: fix crash during suspend on ASUS computers) should be the final, correct fix for this problem.  This bug report can be closed out.

Comment 72 Nick Coghlan 2012-08-05 03:46:26 UTC

The 3.5 kernel just landed in Fedora 17, including the fix for this problem.

The hibernate situation is *much* improved on my UX31E, but doesn't appear to be 100% fixed yet.

Old behaviour:
1. Attempt to hibernate hangs with the screen in console mode and a couple of messages about ALSA shutting down
2. CPU fan spins up to full speed
3. System stays in that state until I press and hold the power button
4. On restart, system cold boots instead of returning from hibernate

New behaviour:
1. Attempt to hibernate hangs with the screen in console mode and a couple of messages about ALSA shutting down
2. System stays in that state until I press and hold the power button
3. On restart, system returns from hibernate as expected

It appears the system state is now getting saved properly, but there's still something going wrong where the actual "power down now" command isn't being issued properly.

I'm not sure if that's a completely separate bug or a continuation of this one, though - happy to file a new report if you prefer.

Comment 73 Alan Stern 2012-08-06 15:11:34 UTC

It's almost certainly a separate bug.  You're better off starting a new bug report.

Comment 74 Oleksij Rempel (fishor) 2012-08-10 18:18:44 UTC

Hi Alan,
this problems seems to be related. See bug #45811, i was able to reproduce working suspend to disk just by unbinding ehci_hcd, like in this bug.

Comment 75 Florian Mickler 2012-08-12 09:28:19 UTC

A patch referencing this bug report has been merged in Linux v3.4-rc5:

commit 151b61284776be2d6f02d48c23c3625678960b97
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Tue Apr 24 14:07:22 2012 -0400

    USB: EHCI: fix crash during suspend on ASUS computers

Comment 76 Florian Mickler 2012-08-12 09:43:37 UTC

A patch referencing a commit referencing this bug report has been merged in Linux v3.5-rc3:

commit c2fb8a3fa25513de8fedb38509b1f15a5bbee47b
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Wed Jun 13 11:20:19 2012 -0400

    USB: add NO_D3_DURING_SLEEP flag and revert 151b61284776be2

Comment 77 Len Brown 2013-02-08 18:25:33 UTC

please re-open if problem still present in Linux 3.5 or later.

Comment 78 Weng Xuetian 2013-03-06 08:05:03 UTC

I find linux 3.7.6 will have the same problem on ux31e, but 3.6.6 is ok for me.

Comment 79 Alan Stern 2013-03-06 15:52:05 UTC

If you have a new problem then maybe you can use git bisect to determine what commit is responsible.

Comment 80 Weng Xuetian 2013-03-06 19:53:05 UTC

#79, just like the original post.. the memory will be corrupted if I hit the bug (some time grub will just crash due to memory corruption), I have to leave it alone for more than 30 minutes to get it back to live again. (ux31e battery is not removable.)

I guess it's kinds of impossible for me to do bisect.. I'm willing to test some single patch.

my lspci is same as others.

Comment 81 Alan Stern 2013-03-06 20:12:17 UTC

Why is it impossible for you to bisect?  It might be slow but it ought to work.  You already know that 3.6.6 is okay and 3.7.7 isn't.  What about 3.6 and 3.7?  What about 3.7-rc1?  What about the patch in comment #36?