Kernel Bug Tracker – Bug 13931
second resume after s2ram fails
Last modified: 2010-07-07 02:50:45 UTC
Acer Aspire 5715Z notebook, BIOS v1.45 (latest), kernel 220.127.116.11 w/o any closed source drivers; suspends and resumes once but does not resume after the second suspend. After a brief HD activity, everything seems be dead; no reaction to CapsLock/Numlock, screen off, etc.
please set CONFIG_PM_DEBUG,
rebuild and run
1. echo core > /sys/power/pm_test
2. echo mem > /sys/power/state
does the system come back in about 10 seconds?
if yes, does the system still alive when you run the above test for several times?
please attach the dmesg out after the first S3 resume.
please attach the acpidump output.
(In reply to comment #1)
> please set CONFIG_PM_DEBUG,
> rebuild and run
> 1. echo core > /sys/power/pm_test
> 2. echo mem > /sys/power/state
> does the system come back in about 10 seconds?
yes, it does.
> if yes, does the system still alive when you run the above test for several
yes, I repeated those steps 4-5 times in a row and it came back in about 10 sec.s everytime, seemed to function normally.
> please attach the dmesg out after the first S3 resume.
> please attach the acpidump output.
Created attachment 22657 [details]
acpidump output for acer aspire 5715z w/ Bios v1.45
Created attachment 22658 [details]
dmesg output after first S3 resume
is this a regression? i.e. did the s2ram work in any earlier kernel?
(In reply to comment #5)
> is this a regression? i.e. did the s2ram work in any earlier kernel?
No, it's not regression. I've been testing Linux on this notebook for ~15 months (tried every BIOS from 1.29 on) and have never been able to resume from suspend successfully.
This happens on my notebook too, and I pretty much tried everything.
The core cause of this problem is that bios doesn't pass control to real mode handler of linux.
It chokes on something
I wish I had contact with their bios team. For them it is piece of cake to see why it hangs...
(In reply to comment #8)
> I wish I had contact with their bios team. For them it is piece of cake to see
> why it hangs...
BIOS changelog mentions Linux several times, so I think they're not totally unmindful to it. Maybe we just need to take their attention to this issue.
what if you boot with boot option "nosmp" or "maxcpus=1"?
Just in case, tested, this suggestion, nothing had changed
Maxim, what's the model name of your laptop?
Can Koy, can you try boot option "nosmp"?
(In reply to comment #10)
> what if you boot with boot option "nosmp" or "maxcpus=1"?
I just tried with nosmp (observed switching to UP code in dmesg output).
It made no difference, unfortunately.
Hi. I have the same problem as Maxim and practically the same machine (his is 5720g). My machine info, as reported by s2ram, is:
sys_vendor = "Acer "
sys_product = "Aspire 5720 "
sys_version = "V1.14"
bios_version = "V1.14"
Will you please try the following test?
a. kill the process using /proc/acpi/event(use the command of "lsof /proc/acpi/event")
b. echo mem > /sys/power/state; dmesg >dmesg_after1; sync; sleep 10; echo mem > /sys/power/state; dmesg >dmesg_after2; sync;
c. press the power button to do the first resume
d. wait for some time and press the power button to see whether the box can be resumed
e. Please reboot the system and see whether the file of dmesg_after1/2 is created.
Its almost a thought experiment, but anyway.
I don't have /proc/acpi/event, I disabled this interface in kernel config. I don't have acpid running ether.
And the outcome of the experiment is as usual, first resume works perfectly, second one fails. dmesg_after1 is created, and contains nothing interesting.
(In reply to comment #15)
> Will you please try the following test?
The second resume did not succeed, and dmesg_after2 was not created.
However, I observed the following lines on console after first resume:
[ 360.124067] ata1: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x9 t4
[ 360.124070] ata1: irq_stat 0x00400040, connection status changed
dmesg_after1 is attached fyi.
Created attachment 24005 [details]
dmesg_after1 per comment #17
I can confirm the presence of this bug. Still found no solution yet.
My machine is a 5720 (no Z) with 1.42 bios.
Can we write together a letter to acer about that issue?
This problem affects me too. Acer Aspire 5720Z. Firmware 1.42. Kernel 2.6.32 (Ubuntu 10.04 2.6.32-22-generic).
You know, I tell everyone that there is no problem I can't fix.
There is just no upper bound on how much time it will take me to do that.
Well it took me almost 2 years to fix that problem.
Created attachment 26442 [details]
The fix (workaround)
This makes Linux store and restore BIOS NVS (non-volatile) area during suspend to ram.
Since memory is preserved during suspend to ram that shouldn't be necessary, but maybe the Other OS restores that area?
And then maybe BIOS craps to that area on resume (or suspend...) so...
Created attachment 26443 [details]
acpidump of my system (acer aspire 5720G)
Maxim: Thanks for working on that! Let's hope a maintainer will have a look at your patch... Else, you may want to send a mail to LKML.
Created attachment 26459 [details]
Can you run some tests using this code? Revert your patch first, then build this code and (as root) do (appname) read output . That'll create a 99 byte file called output. Suspend and resume, then do (appname) read output2. Now do (appname) write output and see if you can suspend and resume again. If so, attach output and output2.
Created attachment 26462 [details]
dump of NVS before first s2ram attempt
Created attachment 26463 [details]
Dump of NVS after first s2ram
The irony is that although NVS area contains several changes, the only one that makes s2ram work is just one byte. Just one byte makes that BIOS hang.
it is at 0x7fe99010.
The change is always 00 - good, 0xFF bad
I tested that this memory change happens just after resume, because commenting the code that finally puts system in suspend mode in acpi_enter_sleep_state doesn't trigger 00->FF change. I also know that right after resume the system is in *bad* state (because of many checks I did before).
I also suspect that 00 is only valid value, because writing 0x10 to that offset produced hang.
And in addition to that, setting this byte to 0xFF is enough to trigger a hang on *first* resume, thus I am confident that this byte alone crashes BIOS on resume
I guess that is all I could reverse engineer for now.
Someday if I install windows on that system I see if this byte stays clear there.
(In reply to comment #26)
> Created an attachment (id=26459) [details]
> Test app
this code has buffer overflow problem.
it reads 0x63 bytes into a 63 byte buffer.
Ah, sorry for confusion, this patch applies on top of
I don't know why its is not upstream yet..
Created attachment 26531 [details]
Store NVS over S3
Can you confirm that this patch works as well as your version? I haven't made any progress in figuring out what the BIOS is actually doing there, but I have verified that Windows dumps and restores the NVS region.
I test this this Thursday.
I am very short on time though, won't be able to much more testing.
I am quite sure you forgot to 'git add' the new version of 'hibernate_nvs.c' because it is only removed but nothing added instead.
Also note that somehow this helps only on my computer.
The answers I receive from users are till now negative.
This Thursday I try to figure out if some other unintended change also made the suspend finally work.
(Note that suspend does work 100% of time on my notebook now)
Created attachment 26559 [details]
Original patch missed the nvs.c file. Try this one.
As expected this patch does work.
Note that now
needs several changes.
I would think that now we need to post both patches to ACPI mailing list?
Note that I did many test and came to conclusion that restoring NVS works all they way back to 2.6.28, and this doesn't depend on anything else (like .config)
Does even work with ubuntu kernel if I restore that byte from userspace.
*many tests* *they* -> *the*
If you for some reason can't compile kernel and/or my patch appears not to work, try the following userspace workaround:
compile writereg.c (gcc -o writereg writereg.c), put it in somewhere in $PATH (/usr/local/bin for example), and then use attached script (make_s2ram_work) after first resume.
If you see something like that (note FF):
writing 0 to <address> (current value = FF)
Then its likely my workaround works.
Try to suspend to ram again, if works, execute this script on each resume to make next resume work. (create a script in /etc/pm/sleep.d to do that automaticly)
Created attachment 26593 [details]
Created attachment 26594 [details]
Fix is upstream
shipped in linux-2.6.35-rc4
Author: Matthew Garrett <firstname.lastname@example.org>
Date: Fri May 28 16:32:15 2010 -0400
ACPI: Store NVS state even when entering suspend to RAM
Author: Matthew Garrett <email@example.com>
Date: Fri May 28 16:32:14 2010 -0400
suspend: Move NVS save/restore code to generic suspend functionality