Bug 42977 - fail to resume after suspend - Toshiba Tecra R840
Summary: fail to resume after suspend - Toshiba Tecra R840
Status: ASSIGNED
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-22 18:08 UTC by Artur
Modified: 2015-09-19 12:55 UTC (History)
19 users (show)

See Also:
Kernel Version: 3.2.0
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
debug patch: check if outb helps (630 bytes, patch)
2013-04-09 01:32 UTC, Zhang Rui
Details | Diff
debug patch 2: check if any function call helps (903 bytes, patch)
2013-04-09 01:38 UTC, Zhang Rui
Details | Diff
x86: Remove wbinvd from trampoline_64.S (437 bytes, patch)
2013-04-09 10:35 UTC, Rafael J. Wysocki
Details | Diff
DMI for Toshiba Portege Z830 (12.06 KB, application/octet-stream)
2013-05-30 03:25 UTC, Aaron Lu
Details
dmidecode output Sandybridge (12.34 KB, application/octet-stream)
2013-06-18 14:22 UTC, Jamin W. Collins
Details

Description Artur 2012-03-22 18:08:51 UTC
I open a bug report in Ubuntu, but they suggest that a should open a bug here. the bug in launchpad is: https://bugs.launchpad.net/bugs/962142

I have a Toshiba Tecra R840 and with Ubuntu 11.10 suspend/resume work fine. Last kernel I use is 3.0.0-16-generic-pae.

After upgrade to 12.04 beta, resume stop work. The laptop enter suspend state, but I was unable to resume. The PC became unresponsive, just the cooler fan is at maximum speed.

I test it with 3.2.0, and mainline kernel 3.3.0 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-precise/. The symptoms are the same.

Thanks,

Artur
Comment 1 Zhang Rui 2012-03-27 01:10:20 UTC
is /sys/power/pm_test available in your new kernel?
If no, please rebuild a kernel with CONFIG_PM_DEBUG set.
If yes, please try
"echo processors > /sys/power/pm_test; echo mem > /sys/power/state"
Does the system come back after about 10s?
Comment 2 Len Brown 2012-03-27 01:50:09 UTC
per above, please try the steps noted in the source tree:

Documentation/power/basic-pm-debugging.txt
Comment 3 Artur 2012-03-27 09:09:52 UTC
Hi,

I made the test you ask, and the system came back with no problem.

I repeat the test from freeze to core and all resume ok.

The only thing I notice from dmesg are:

[ 2582.577508] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359)
[ 2582.577546] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._SDD] (Node ffff880136a47cd0), AE_NOT_FOUND (20110623/psparse-536)
[ 2582.577656] ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20110623/psargs-359)
[ 2582.577666] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._GTF] (Node ffff880136a47cf8), AE_NOT_FOUND (20110623/psparse-536)
Comment 4 Artur 2012-03-27 09:50:13 UTC
I boot the system with init=/bin/bash, suspend with s2ram and try to resume, but the system became unresponsive.

I try to boot with "init=/bin/bash acpi=off" but the system seems to freeze after something like "Probing PCI devices". There are no "Caps Lock" blinking. The systems just freeze, no disk activity, no response to keyboard, no panic...
Comment 5 Artur 2012-03-28 09:37:46 UTC
Hi,

this night I try s2disk (I haven't tried it before) and it work, suspend to disk/resume with success.

Don't know if this help.
Comment 6 Artur 2012-04-03 09:40:31 UTC
Hi,

I've tried kernel 3.3.1 and 3.4.0-rc1.

in 3.3.1 the problem remains.

in 3.4.0-rc1 the computer reboots after resuming.
Comment 7 Lan Tianyu 2012-04-23 07:15:09 UTC
After s2ram resuming, is there any output?

Please add "no_console_suspend" kernel parameter and test again to find some
outputs.
Comment 8 Artur 2012-04-23 13:14:41 UTC
No, the display never turns on.

There are no output with "no_console_suspend". Tried with normal boot and with init=/bin/bash.
Comment 9 Artur 2012-06-11 12:20:45 UTC
Last night I tried last Ubuntu live-cd 32 bits version. Suspend/resume work OK.

All versions I tested before and failed are 64 bits. My laptop suspend/resume without problems with Ubuntu 11.10 32bits.

So I think this is not related to Kernel version, but rather to 32 or 64 bits kernel.
Comment 10 Steven Noonan 2012-09-15 05:16:50 UTC
I'm encountering the same problem as described above, but on a Toshiba Portege Z830. The test described in comment #1 succeeds on this machine as well, but resuming from suspend is broken. When I trigger a resume (tapping a key), the power light changes from amber to green, but there is *zero* disk activity and the screen stays off.

Anything I can do to help debug the issue?
Comment 11 Steven Noonan 2012-09-15 05:19:35 UTC
Oops, should mention I'm running v3.5.3.
Comment 12 Artur 2012-12-11 15:33:40 UTC
Hi,

Today I decided that I needed to debug this, because I need the 64-bit kernel.

I tried acpi_sleep=s3_beep to see if that can give me some clue, and... surprise, my laptop resumed without problem.

I made several suspend/resume, and all worked correctly. I tried with kernel 3.2.0-24 and 3.7.0-999 from Ubuntu repository and both work.

Then I tried all other acpi_sleep options (removing acpi_sleep=s3_beep), but none worked.
Comment 13 Steven Noonan 2012-12-11 16:17:58 UTC
Fascinating. I just tested on the Portege Z830 with Linux 3.6.9, and I can confirm that acpi_sleep=s3_beep somehow makes things resume just fine.
Comment 14 Steven Noonan 2012-12-11 16:28:25 UTC
Looking at the code path for s3_beep, I am guessing that the actual code in s3_beep isn't directly affecting suspend/resume. The code doesn't really do anything too different from the normal resume path (enable speakers, beep for some time, disable speakers, and so on). I think the real impact here is the udelay() calls in there.

I'm about to head to my office, but can someone try something like this and see if the resume works without s3_beep enabled?

diff --git a/arch/x86/realmode/rm/wakemain.c b/arch/x86/realmode/rm/wakemain.c
index 91405d5..bc9bdc9 100644
--- a/arch/x86/realmode/rm/wakemain.c
+++ b/arch/x86/realmode/rm/wakemain.c
@@ -71,6 +71,8 @@ void main(void)
        if (wakeup_header.realmode_flags & 4)
                send_morse("...-");
 
+       udelay(US_PER_DOT * 20);
+
        if (wakeup_header.realmode_flags & 1)
                asm volatile("lcallw   $0xc000,$3");
Comment 15 Steven Noonan 2012-12-11 17:19:32 UTC
Just tested locally with the above patch. The udelay() makes resume take ~1630ms, but it -does- successfully resume. Anyone have any ideas why the delay is needed?
Comment 16 Aaron Lu 2013-03-12 06:08:09 UTC
Hi Artur and Steven,

Thanks for the findings! Can you please test if the latest upstream kernel still has this problem?

According to Artur, v3.0 doesn't have this problem while 3.2+ all have this problem. What about v3.1? And Steven, is this the same to you?

I asked this because I want to collect more information for the x86 code maintainer, so that they can better understand what is the problem and then fix it, thanks.
Comment 17 Artur 2013-03-12 09:48:49 UTC
Hi Aaron,

maybe I don't explain properly. With all x86 32bit kernel I was able to resume.

The problem is with amd64. All kernel I tried (3.2, 3.3, 3.4 and 3.7) have the same problem, fail to resume.

I haven't tried this lately because I solve the problem with "acpi_sleep=s3_beep" during kernel boot.

I'm going to try the latest upsream kernel from Ubuntu Mainline Kernels Archive (3.9-rc2).

Thanks

Artur
Comment 18 Steven Noonan 2013-03-12 14:27:44 UTC
I haven't used my Toshiba machine for ages solely because of this bug, but I'll re-image it and see how it behaves on the latest.
Comment 19 Steven Noonan 2013-03-13 17:58:11 UTC
I just tested Linux 3.9.0-rc2-00188-g6c23cbb on my Toshiba Portege Z830. It still fails to resume from suspend without that extra udelay thrown in.
Comment 20 Artur 2013-03-13 18:29:42 UTC
Hi,

I also tested Linux 3.9.0-rc2.

Fail without acpi_sleep=s3_beep

Resume work with acpi_sleep=s3_beep
Comment 21 Aaron Lu 2013-03-14 01:53:22 UTC
Thanks for your test.

So this only occurs on 64 bit kernels, right?

The following kernel versions I mentioned are all about 64 bits, let's forget 32 bits kernels now, since they don't have any problem :-)

According to Artur, v3.2+ all fail, and v3.0 resume OK, is this correct? If so, what about v3.1? I hope we can find the first failing kernel and the last working kernel if possible, that would help people to find the problem quicker. Thanks.
Comment 22 Steven Noonan 2013-03-14 03:17:50 UTC
Just tested 3.0.68. Same failure to resume.
Comment 23 Aaron Lu 2013-03-14 03:29:22 UTC
(In reply to comment #22)
> Just tested 3.0.68. Same failure to resume.

Thanks Steven. So this means there is no known working 64 bit kernel.
Let's see what Artur's situation is(I hope it's the same :-).
Comment 24 Artur 2013-03-14 10:20:53 UTC
Hi,

All 32 bit kernel I've tested resume ok.

All 64 bit kernel fail.

I haven't notice this in the beginning of this thread. Initially I think the problem was related to kernel 3.2, but that's not true. The real problem is related to 32 or 64 bits kernel.
Comment 25 Aaron Lu 2013-03-18 06:20:44 UTC
Adding more people.

A brief description of the problem:
Artur and Steven experienced a S3 resume problem only on 64 bit kernels: on resume, they have to use acpi_sleep=s3_beep to make resume work, or system will hang. And Steven also tried to use a delay instead of s3_beep as showed in comment #14, it also worked. There are no working 64 bit kernels for them.
Comment 26 Aaron Lu 2013-03-27 14:07:32 UTC
Hello x86 experts,

Any suggestions about this bug? Thanks.
Comment 27 Aaron Lu 2013-04-07 08:01:35 UTC
Hi Artur and Steven,

Perhaps we can raise this question to the kernel mailing list, it doesn't seem there are people looking at this bug page. So can you please send an email to linux-kernel@vger.kernel.org?
Comment 28 Zhang Rui 2013-04-09 01:32:50 UTC
Created attachment 97771 [details]
debug patch: check if outb helps

please try the debug patch attached, boot without any s3_xxx options, and see if it helps.
Comment 29 Zhang Rui 2013-04-09 01:38:55 UTC
Created attachment 97781 [details]
debug patch 2: check  if any function call  helps

please try this patch. boot without any s3_xxx options, and see if it helps.
Comment 30 Rafael J. Wysocki 2013-04-09 01:43:19 UTC
Well.  The delay added in comment #14 is in the real mode code that should be the same for 32-bit and 64-bit kernels, so it's really puzzling.

I wonder if the amount of delay added actually matters.
Comment 31 H. Peter Anvin 2013-04-09 03:47:52 UTC
The 64-bit kernel does a WBINVD (which I have no idea why it's there) at the top of trampoline_64.S.  This is a very slow instruction, and might have the effect of a delay.
Comment 32 Rafael J. Wysocki 2013-04-09 10:20:43 UTC
On Tuesday, April 09, 2013 03:47:53 AM bugzilla-daemon@bugzilla.kernel.org wrote:
>
> --- Comment #31 from H. Peter Anvin <hpa@zytor.com>  2013-04-09 03:47:52 ---
> The 64-bit kernel does a WBINVD (which I have no idea why it's there) at the
> top of trampoline_64.S.

Interesting.  I have no idea why it's there too.

> This is a very slow instruction, and might have the effect of a delay.

I wonder what happens if we remove that instruction?
Comment 33 Rafael J. Wysocki 2013-04-09 10:35:16 UTC
Created attachment 97811 [details]
x86: Remove wbinvd from trampoline_64.S

Whoever can reproduce this problem, can you please test the attached patch too (apply without any previous patches/workarounds from this bug entry)?
Comment 34 Rafael J. Wysocki 2013-04-09 10:43:55 UTC
Well, there are some more apparently arbitrary differences between trampoline_64.S and trampoline_32.S.  For example, the 64-bit trampoline sets up the stack at rm_stack_end, while the 32-bit one doesn't do that.  Moreover, the 32-bit trampoline doesn't touch the stack segment.

Not to mention the ordering differences.
Comment 35 Rafael J. Wysocki 2013-04-09 10:56:44 UTC
Actually, the 32-bit trampoline executes the wbinvd too.
Comment 36 H. Peter Anvin 2013-04-09 15:40:04 UTC
Some of those differences aren't arbitrary at all, rather they are a reflection of the inherent differences between the 32- and 64-bit environments.  That being said, the differences are probably bigger than they need to be.

The uses of the trampolines are also different; the 32-bit trampoline isn't used *at all* for the BSP during resume for example (the APs will still use it, of course.)
Comment 37 Rafael J. Wysocki 2013-04-09 21:33:56 UTC
This is interesting, because APs are not resumed.  They are turned on via CPU online (hotplug), so the BSP is the only CPU that executes the real mode resume code (the main() function in wakemain.c in particular).

Also, I wonder if checking cpuid in the 64-bit trampoline is actually necessary?  It looks like we could do without it.
Comment 38 H. Peter Anvin 2013-04-09 21:36:36 UTC
The APs go though the 32-bit trampoline exactly because they are not resumed.

The checking of CPUID in the 64-bit trampoline isn't necessary for resume, but is important for AP bringup: we can't transfer to a set of NX-containing page tables until we have made sure NX is enabled, for example.
Comment 39 Rafael J. Wysocki 2013-04-09 21:48:17 UTC
Well, perhaps the 64-bit BSP resume should follow the 32-bit variant, then?
Comment 40 H. Peter Anvin 2013-04-09 21:50:30 UTC
That would require a lot more complexity.
Comment 41 Rafael J. Wysocki 2013-04-09 22:00:52 UTC
OK

Artur, Steven, can you please check the debug patches from comments #28, #29, and #33?
Comment 42 Steven Noonan 2013-04-10 04:41:56 UTC
Tested all three patches (#28, #29, #33). None had any discernible impact on resume behavior.
Comment 43 Jamin W. Collins 2013-04-25 18:32:31 UTC
I too am experiencing this issue on a Portege Z830. As far as I can see all suggested patches have been tried without success.  Any further suggestions?  Any further data I can gather?
Comment 44 Peshko 2013-05-18 15:50:18 UTC
Same here/ I have Toshiba R840. Put the latest kernel 3.8.0-21...still no luck. Same problem. What is the potential fix kernel/timeframe?
Comment 45 Zhang Rui 2013-05-20 16:07:09 UTC
can you please try this patch
https://patchwork.kernel.org/patch/2593741/
and see if it helps?

Note that this is probably not a fix, according to HPA's comments. But let's see if this is the same problem addressed in this patch.
Comment 46 Peshko 2013-05-21 00:25:13 UTC
I tried the patch as per #45. I downloaded the latest stable kernel 3.9.3. Patched it and recompile it. No effect. Same problem. So it didn't help.
Comment 47 Aaron Lu 2013-05-30 03:25:52 UTC
Created attachment 102941 [details]
DMI for Toshiba Portege Z830

Today I received a Toshiba Portege Z830, I've tested v3.2 shipped with debian and v3.9 shipped with fedora, both 64 bits kernel and both resumed fine. DMI for this laptop model attached.
Comment 48 Rafael J. Wysocki 2013-06-09 22:35:54 UTC
Aaron, can you please compare .configs?
Comment 49 Aaron Lu 2013-06-14 08:20:09 UTC
Hi Steven & Jamin,

I suspect we are using different processor in z830. What model are yours, is it a Sandy bridge one or Ivy one? Thanks.
Comment 50 Steven Noonan 2013-06-14 12:01:32 UTC
It's a Sandy Bridge.
Comment 51 Jamin W. Collins 2013-06-14 21:57:24 UTC
I'll happily upload any details, dumps, diagnostic information anyone is interested in seeing from the unit.
Comment 52 Aaron Lu 2013-06-17 08:45:51 UTC
(In reply to comment #51)
> I'll happily upload any details, dumps, diagnostic information anyone is
> interested in seeing from the unit.

Output of dmidecode please, thanks.

(In reply to comment #50)
> It's a Sandy Bridge.

Looks like this is related to processor.
Comment 53 Jamin W. Collins 2013-06-18 14:22:46 UTC
Created attachment 105211 [details]
dmidecode output Sandybridge

I'm attaching the requested dmidecode output from the system.  Please let me know if there is anything else you would like or other testing that I can perform.
Comment 54 Aaron Lu 2013-06-19 13:46:48 UTC
(In reply to comment #53)
> Created an attachment (id=105211) [details]
> dmidecode output Sandybridge
> 
> I'm attaching the requested dmidecode output from the system.  Please let me
> know if there is anything else you would like or other testing that I can
> perform.

So looks like this only occurs on Sandybridge CPUs.
Comment 55 Aaron Lu 2013-07-04 06:41:03 UTC
Maybe worth to try the nox2apic kernel command line, x2apic is only available under x86_64 and from Artur's dmesg, x2apic is enabled.
Comment 56 Aaron Lu 2013-07-04 07:41:42 UTC
Please also test intremap=off, which disables interrupt remapping in addition to x2apic, thanks.
Comment 57 Aaron Lu 2013-07-09 06:24:11 UTC
Anyone?
Comment 58 Artur 2013-07-10 00:20:56 UTC
Hi Aaron,

sorry for the late answer...

I test the nox2apic and have made several suspend/resume with success.

I haven't test the intremap=off. Do you steel want me to test it?

Thank you very much for your help.

Artur
Comment 59 Aaron Lu 2013-07-10 00:41:12 UTC
(In reply to Artur from comment #58)
> Hi Aaron,
> 
> sorry for the late answer...
> 
> I test the nox2apic and have made several suspend/resume with success.
> 
> I haven't test the intremap=off. Do you steel want me to test it?

Yes if possible, but I think nox2apic works is already a good hint. Thanks a lot for your test.
Comment 60 Artur 2013-07-10 23:07:17 UTC
Using intremap=off without nox2apic also allows suspend/resume with success.
Comment 61 Aaron Lu 2013-07-16 02:55:26 UTC
Add Youquan and David.

Some user has reported that systems doesn't resume on 64 bits kernel, while 32 bits kernel is OK. Adding acpi_sleep=s3_beep command line is a work around, and the cure is actually adding a little delay early. It turned out x2apic is enabled on the system, if nox2apic is used, no workaround is needed and resume is just fine. Any ideas? Thanks.
Comment 62 Jamin W. Collins 2013-07-17 13:31:57 UTC
Sorry for the late update, but I can confirm that using nox2apic allows for a functional suspend/resume.
Comment 63 David Morgado 2013-10-26 02:58:40 UTC
Hello is there any update on this bug. I also have a Toshiba Tecra R840 with this problem and tested nox2apic and it works. Latest kernels 3.11 and 3.12 still have this problem.

Thanks

Note You need to log in before you can comment on or make changes to this bug.