Subject : Suspend to RAM reboots on resume with S3
Submitter : Sitsofe Wheeler <firstname.lastname@example.org>
Date : 2008-03-16 00:24
References : http://lkml.org/lkml/2008/3/15/127
Created attachment 15287 [details]
Created attachment 15288 [details]
2.6.25rc5 dmesg output
Created attachment 15289 [details]
Created attachment 15290 [details]
lspci -vvxxx output
Created attachment 15291 [details]
dmesg output of core suspend
echo 8 > /proc/sys/kernel/printk
echo core > /sys/power/pm_test
echo mem > /sys/power/state
at runlevel 1 works correctly and the system resumes (dmesg output attached). Graphics card is an AGP NVIDIA GeForce 1.
The problem is apparently related to the BIOS.
I have no idea how it could be fixed in the kernel. Perheps s2ram might help
(see Bug #7225).
Bah. How did you know? Is there an easy test to tell that you've got a bad BIOS? How does Windows 98 work around it (or is it that the BIOS just does something different when it spots the Windows 98 OS?).
I'm sceptical that I will be able to get s2ram to work around the problem as I have already tried all the suggestions on http://people.freedesktop.org/~hughsient/quirk/quirk-suspend-advanced.html (which seems like s2ram but with more complexity) as mentioned in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/86099 .
(In reply to comment #7)
> Bah. How did you know? Is there an easy test to tell that you've got a bad
In some cases it is. You can use acpidump and analyse its output, for example, but that's tedious and does not always let you find anything. Also, I'm not telling the BIOS is buggy. Rather, it does something we don't know about and therefore we don't handle it.
> How does Windows 98 work around it (or is it that the BIOS just does
> something different when it spots the Windows 98 OS?).
It is possible.
> I'm sceptical that I will be able to get s2ram to work around the problem
> as I have already tried all the suggestions on
> (which seems like s2ram but with more complexity) as mentioned in
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/86099 .
No, in fact s2ram allows you to run some additional quirks from the user land which help many systems (including mine) to actually suspend and resume.
(In reply to comment #8)
> No, in fact s2ram allows you to run some additional quirks from the user land
> which help many systems (including mine) to actually suspend and resume.
You're right, s2ram does completely different things to the suggestions on the quirk-suspend page. So far I've tried the first three -a options on s2ram but it has made no difference. I'll report back when I have tried all the ones suggested on http://en.opensuse.org/S2ram (and with their -v variant).
I guess if none of those work then this will have to be resolved WONTFIX :(
None of the following s2ram options made a difference...
s2ram -f -a 1
s2ram -f -a 2
s2ram -f -a 3
s2ram -f -p -m
s2ram -f -p -s *
s2ram -f -m
s2ram -f -s
s2ram -f -p
s2ram -f -a 1 -m
s2ram -f -a 1 -s
s2ram -f -a 1 -v
s2ram -f -a 2 -v
s2ram -f -a 3 -v *
s2ram -f -p -m -v
s2ram -f -p -s -v
s2ram -f -m -v
s2ram -f -s -v
s2ram -f -p -v
s2ram -f -a 1 -m -v
s2ram -f -a 1 -s -v
You might also try booting with acpi_new_pts_ordering on the kernel command line. I doubt it will help, but well.
Which kernel (or git tree) is the option acpi_new_pts_ordering within?
It was in 2.6.25-rc up to 2.6.25-rc7. It was removed in -rc8 and the 2.6.24 suspend code ordering was restored, so you can just try the latest -git.
Same problem with a git 2.6.25-rc8 kernel.
What might not be obvious (unless you dig into the attachments) is that the machine seeing this problem has a VIA KT133 motherboard. After trawling bugzilla this issue might be related to bugs like bug #3691 or bug #3586 ...
My problem seems to be very similar to bug #3586 - commenting out
pushl $0 # Kill any dangerous flags
from line 52 of arch/x86/kernel/acpi/wakeup_32.S resumes all the way to the console.
Created attachment 15783 [details]
Hm, that's interesting. Can you try to replace the popfl with 'popl %eax' and retest?
Okay, so either the stack is buggered, or there is something fishy in the flags.
I have seen stacks with %esp=0x12345678 (in real mode!); then our initialization would break because we only init low 16 bits.
2.6.25-latest-git has this rewritten, and this quirk fixed. Can you try it?
I have tried the latest mainline git (pulled on Sun Apr 20) and the rebooting problem is still there. Further, whereas before booting using the acpi_sleep=s3_beep kernel parameters would let out a beep before the system rebooted after resuming, now the system is completely silent.
Commenting out pushl $0; popfl from arch/x86/kernel/acpi/realmode/wakeup.S no longer stops the reboot.
In other words, the stack pointer should be considered uninitialized...
After messing about with a BEEP patch the line it reboots on is pushl $0 . Putting the BEEP after this line does not result in any sound.
You can read a thread from 2005 which looks related on http://thread.gmane.org/gmane.linux.kernel/308114/focus=310174 but unfotunately it petered out without conclusion.
Is there any more information I can provide to help solve this issue?
Yes. Please do what I asked for in Comment #18 with that kernel.
I tried and the system reboots. The suggestion sounds similar to Pavel's suggestion over in http://thread.gmane.org/gmane.linux.kernel/308114/focus=310338 ...
This means the stack is at a wrong place, probably.
I'll look at this when I have some time, not sure when exactly.
When did you fix this? I have just tried 2.6.26-rc9-next-20080710 and this problem seems to have been completely resolved. Mysteriously the last kernel I tried (2.6.26-rc6-next-20080618) did show the problem...
Could Bug #10927 have been related to this one?
Yes, it could.
Please try 2.6.26 final when it's out and close the bug if resume works for you.
I've just done some more testing. The problem is there in next-20080624 and gone in next-20080625 which is the first version carrying the commit ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4b4f7280d7fd1feeff134c2cf2db32fd583b6c29 ) mentioned in Bug #10927 . I'm happy for this to be resolved as fixed or even a duplicate.
The only question I have is what could I have done better? It was lucky that a regression for someone else led to a solution for me but was there something in my original filing missed that could have helped? Were the links to old bugs/threads any use or do people not have the time to look at them/too misleading?
(PS: I don't have permission to change the status of this bug - it was "reported" by Rafael)
(In reply to comment #31)
> The only question I have is what could I have done better? It was lucky that
> a regression for someone else led to a solution for me but was there
> something in my original filing missed that could have helped? Were the links
> to old bugs/threads any use or do people not have the time to look at
> them/too misleading?
In Bug 10927 the problem appeared after the patch changing the resume code to C. That helped a lot, because we had some more people looking into the problem as a result.
Also, while debugging Bug 10927 we had an idea how to find the exact point of resume breakage and that led to a solution.
*** This bug has been marked as a duplicate of bug 10927 ***