Bug 10260 - Suspend to RAM reboots on resume with S3
Suspend to RAM reboots on resume with S3
Status: CLOSED DUPLICATE of bug 10927
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend
All Linux
: P1 normal
Assigned To: Rafael J. Wysocki
:
Depends on:
Blocks: 7216
  Show dependency treegraph
 
Reported: 2008-03-16 05:39 UTC by Rafael J. Wysocki
Modified: 2009-04-02 04:56 UTC (History)
4 users (show)

See Also:
Kernel Version: all up to 2.6.25-rc5
Tree: Mainline
Regression: No


Attachments
lspci output (926 bytes, text/plain)
2008-03-16 06:38 UTC, Sitsofe Wheeler
Details
2.6.25rc5 dmesg output (58.90 KB, text/plain)
2008-03-16 06:39 UTC, Sitsofe Wheeler
Details
acpidump (49.56 KB, text/plain)
2008-03-16 06:43 UTC, Sitsofe Wheeler
Details
lspci -vvxxx output (16.15 KB, text/plain)
2008-03-16 06:45 UTC, Sitsofe Wheeler
Details
dmesg output of core suspend (9.04 KB, text/plain)
2008-03-16 07:13 UTC, Sitsofe Wheeler
Details
/proc/cpuinfo output (424 bytes, text/plain)
2008-04-17 01:58 UTC, Sitsofe Wheeler
Details

Description Rafael J. Wysocki 2008-03-16 05:39:44 UTC
Subject    : Suspend to RAM reboots on resume with S3
Submitter  : Sitsofe Wheeler <sitsofe@yahoo.com>
Date       : 2008-03-16 00:24
References : http://lkml.org/lkml/2008/3/15/127
Comment 1 Sitsofe Wheeler 2008-03-16 06:38:25 UTC
Created attachment 15287 [details]
lspci output
Comment 2 Sitsofe Wheeler 2008-03-16 06:39:01 UTC
Created attachment 15288 [details]
2.6.25rc5 dmesg output
Comment 3 Sitsofe Wheeler 2008-03-16 06:43:39 UTC
Created attachment 15289 [details]
acpidump
Comment 4 Sitsofe Wheeler 2008-03-16 06:45:42 UTC
Created attachment 15290 [details]
lspci -vvxxx output
Comment 5 Sitsofe Wheeler 2008-03-16 07:13:13 UTC
Created attachment 15291 [details]
dmesg output of core suspend

Doing:
echo 8 > /proc/sys/kernel/printk
echo core > /sys/power/pm_test
echo mem > /sys/power/state

at runlevel 1 works correctly and the system resumes (dmesg output attached). Graphics card is an AGP NVIDIA GeForce 1.
Comment 6 Rafael J. Wysocki 2008-03-16 10:38:49 UTC
The problem is apparently related to the BIOS.

I have no idea how it could be fixed in the kernel.  Perheps s2ram might help
(see Bug #7225).
Comment 7 Sitsofe Wheeler 2008-03-16 14:18:11 UTC
Bah. How did you know? Is there an easy test to tell that you've got a bad BIOS? How does Windows 98 work around it (or is it that the BIOS just does something different when it spots the Windows 98 OS?).

I'm sceptical that I will be able to get s2ram to work around the problem as I have already tried all the suggestions on http://people.freedesktop.org/~hughsient/quirk/quirk-suspend-advanced.html (which seems like s2ram but with more complexity) as mentioned in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/86099 .
Comment 8 Rafael J. Wysocki 2008-03-16 14:44:59 UTC
(In reply to comment #7)
> Bah. How did you know? Is there an easy test to tell that you've got a bad
> BIOS?

In some cases it is.  You can use acpidump and analyse its output, for example, but that's tedious and does not always let you find anything.  Also, I'm not telling the BIOS is buggy.  Rather, it does something we don't know about and therefore we don't handle it.

> How does Windows 98 work around it (or is it that the BIOS just does
> something different when it spots the Windows 98 OS?).

It is possible.

> I'm sceptical that I will be able to get s2ram to work around the problem
> as I have already tried all the suggestions on
> http://people.freedesktop.org/~hughsient/quirk/quirk-suspend-advanced.html
> (which seems like s2ram but with more complexity) as mentioned in
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/86099 .

No, in fact s2ram allows you to run some additional quirks from the user land which help many systems (including mine) to actually suspend and resume.
Comment 9 Sitsofe Wheeler 2008-03-17 00:23:27 UTC
(In reply to comment #8)
> No, in fact s2ram allows you to run some additional quirks from the user land
> which help many systems (including mine) to actually suspend and resume.

You're right, s2ram does completely different things to the suggestions on the quirk-suspend page. So far I've tried the first three -a options on s2ram but it has made no difference. I'll report back when I have tried all the ones suggested on http://en.opensuse.org/S2ram (and with their -v variant).

I guess if none of those work then this will have to be resolved WONTFIX :(
Comment 10 Sitsofe Wheeler 2008-03-29 08:55:53 UTC
None of the following s2ram options made a difference...
s2ram -f -a 1
s2ram -f -a 2
s2ram -f -a 3
s2ram -f -p -m
s2ram -f -p -s *
s2ram -f -m
s2ram -f -s
s2ram -f -p
s2ram -f -a 1 -m
s2ram -f -a 1 -s 

s2ram -f -a 1 -v
s2ram -f -a 2 -v
s2ram -f -a 3 -v *
s2ram -f -p -m -v
s2ram -f -p -s -v
s2ram -f -m -v
s2ram -f -s -v
s2ram -f -p -v
s2ram -f -a 1 -m -v
s2ram -f -a 1 -s -v
Comment 11 Rafael J. Wysocki 2008-03-29 11:31:14 UTC
You might also try booting with acpi_new_pts_ordering on the kernel command line.  I doubt it will help, but well.
Comment 12 Sitsofe Wheeler 2008-04-04 08:07:24 UTC
Which kernel (or git tree) is the option acpi_new_pts_ordering within?
Comment 13 Rafael J. Wysocki 2008-04-06 12:33:49 UTC
It was in 2.6.25-rc up to 2.6.25-rc7.  It was removed in -rc8 and the 2.6.24 suspend code ordering was restored, so you can just try the latest -git.
Comment 14 Sitsofe Wheeler 2008-04-13 03:29:44 UTC
Same problem with a git 2.6.25-rc8 kernel.
Comment 15 Sitsofe Wheeler 2008-04-15 04:27:23 UTC
What might not be obvious (unless you dig into the attachments) is that the machine seeing this problem has a VIA KT133 motherboard. After trawling bugzilla this issue might be related to bugs like bug #3691 or bug #3586 ...
Comment 16 Sitsofe Wheeler 2008-04-17 01:57:03 UTC
My problem seems to be very similar to bug #3586 - commenting out
        pushl  $0                                              # Kill any dangerous flags
        popfl

from line 52 of arch/x86/kernel/acpi/wakeup_32.S resumes all the way to the console.
Comment 17 Sitsofe Wheeler 2008-04-17 01:58:19 UTC
Created attachment 15783 [details]
/proc/cpuinfo output
Comment 18 Rafael J. Wysocki 2008-04-19 14:54:48 UTC
Hm, that's interesting.  Can you try to replace the popfl with 'popl %eax' and retest?
Comment 19 H. Peter Anvin 2008-04-19 15:02:33 UTC
Okay, so either the stack is buggered, or there is something fishy in the flags.
Comment 20 Pavel Machek 2008-04-20 00:59:26 UTC
I have seen stacks with %esp=0x12345678 (in real mode!); then our initialization would break because we only init low 16 bits.

2.6.25-latest-git has this rewritten, and this quirk fixed. Can you try it?
Comment 21 Sitsofe Wheeler 2008-04-20 04:08:36 UTC
I have tried the latest mainline git (pulled on Sun Apr 20) and the rebooting problem is still there. Further, whereas before booting using the acpi_sleep=s3_beep kernel parameters would let out a beep before the system rebooted after resuming, now the system is completely silent.

Commenting out pushl   $0; popfl from arch/x86/kernel/acpi/realmode/wakeup.S no longer stops the reboot.
Comment 22 H. Peter Anvin 2008-04-20 05:12:18 UTC
In other words, the stack pointer should be considered uninitialized...
Comment 23 Sitsofe Wheeler 2008-04-20 06:53:47 UTC
After messing about with a BEEP patch the line it reboots on is pushl $0 . Putting the BEEP after this line does not result in any sound.

You can read a thread from 2005 which looks related on http://thread.gmane.org/gmane.linux.kernel/308114/focus=310174 but unfotunately it petered out without conclusion.
Comment 24 Sitsofe Wheeler 2008-05-05 13:17:36 UTC
Is there any more information I can provide to help solve this issue?
Comment 25 Rafael J. Wysocki 2008-05-05 13:27:52 UTC
Yes.  Please do what I asked for in Comment #18 with that kernel.
Comment 26 Sitsofe Wheeler 2008-05-05 16:15:55 UTC
I tried and the system reboots. The suggestion sounds similar to Pavel's suggestion over in http://thread.gmane.org/gmane.linux.kernel/308114/focus=310338 ...
Comment 27 Rafael J. Wysocki 2008-05-06 06:07:21 UTC
This means the stack is at a wrong place, probably.

I'll look at this when I have some time, not sure when exactly.
Comment 28 Sitsofe Wheeler 2008-07-10 15:50:24 UTC
Rafael:
When did you fix this? I have just tried 2.6.26-rc9-next-20080710 and this problem seems to have been completely resolved. Mysteriously the last kernel I tried (2.6.26-rc6-next-20080618) did show the problem...
Comment 29 Sitsofe Wheeler 2008-07-10 16:09:34 UTC
Could Bug #10927 have been related to this one?
Comment 30 Rafael J. Wysocki 2008-07-11 03:21:42 UTC
Yes, it could.

Please try 2.6.26 final when it's out and close the bug if resume works for you.
Comment 31 Sitsofe Wheeler 2008-07-13 03:50:42 UTC
I've just done some more testing. The problem is there in next-20080624 and gone in next-20080625 which is the first version carrying the commit ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4b4f7280d7fd1feeff134c2cf2db32fd583b6c29 ) mentioned in Bug #10927 . I'm happy for this to be resolved as fixed or even a duplicate.

The only question I have is what could I have done better? It was lucky that a regression for someone else led to a solution for me but was there something in my original filing missed that could have helped? Were the links to old bugs/threads any use or do people not have the time to look at them/too misleading?
Comment 32 Sitsofe Wheeler 2008-07-13 03:52:30 UTC
(PS: I don't have permission to change the status of this bug - it was "reported" by Rafael)
Comment 33 Rafael J. Wysocki 2008-07-13 10:48:20 UTC
(In reply to comment #31)
> The only question I have is what could I have done better? It was lucky that
> a regression for someone else led to a solution for me but was there
> something in my original filing missed that could have helped? Were the links
> to old bugs/threads any use or do people not have the time to look at
> them/too misleading?

In Bug 10927 the problem appeared after the patch changing the resume code to C.  That helped a lot, because we had some more people looking into the problem as a result.

Also, while debugging Bug 10927 we had an idea how to find the exact point of resume breakage and that led to a solution.


*** This bug has been marked as a duplicate of bug 10927 ***

Note You need to log in before you can comment on or make changes to this bug.