Bug 16396

Summary: [bisected] resume from suspend freezes system
Product: ACPI Reporter: tomas m (tmezzadra)
Component: Power-Sleep-WakeAssignee: acpi_power-sleep-wake
Status: CLOSED CODE_FIX    
Severity: blocking CC: acpi-bugzilla, arne, bjorn.helgaas, claudiomkd, eric.valette, florian, hugh, jmgh87, lenb, maciej.rutecki, maximlevitsky, rjw, rui.zhang, shawn.starr
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35-rc series Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 16055    
Attachments: dmidecode for the everex stepnote sa2053t
acpidump of an everex stepnot sa2053t
iomem from an everex stepnote sa2053t
lspnp -vv from everex stepnote sa2053t
ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM
lspnp -vv for fujitsus-siemens amilo xi 3650
cat /proc/iomem
kernel log saver
dmi for fujitsu XI 3650
PM / ACPI: Blacklist systems known to require acpi_sleep=nonvs
PM / ACPI: Blacklist systems known to require acpi_sleep=nonvs
PM / ACPI: Blacklist systems known to require acpi_sleep=nonvs
dmidecode from Averatec AV1020-ED2, a system that needs to be blacklisted
ACPI / PM: Blacklist Averatec machine known to require acpi_sleep=nonvs
dmidecode output of Sony Vaio VGN-SR5 to be blacklisted
dmidecode output for affected laptop Sony VAIO VGN-SR26GN
ACPI / PM: Blacklist Sony Vaio machine known to require acpi_sleep=nonvs

Description tomas m 2010-07-15 02:32:17 UTC
bisected to commit 2a6b69765ad794389f2fc3e14a0afa1a995221c2

ACPI: Store NVS state even when entering suspend to RAM


reverted commit on 2.6.35-rc5 and the system can suspend/resume correctly again

im not sure what kind of hardware details i must report for this.
Comment 1 Rafael J. Wysocki 2010-07-15 08:53:52 UTC
Please attach the output of dmidecode from the affected box.
Comment 2 tomas m 2010-07-15 10:39:30 UTC
Created attachment 27112 [details]
dmidecode for the everex stepnote sa2053t
Comment 3 Matthew Garrett 2010-07-15 12:17:11 UTC
Hm. Safest thing might be to revert this for now. I'll try to figure out what the trigger is.

Tomas, could you please attach the output of the acpidump command?
Comment 4 tomas m 2010-07-15 15:58:48 UTC
Created attachment 27121 [details]
acpidump of an everex stepnot sa2053t
Comment 5 Maxim Levitsky 2010-07-18 23:24:41 UTC
@tomas m

Could you explain in detail how system is failing to suspend?

Does it fail on first suspend cycle?
Does system hang on suspend or resume?

Is system completely dead when it hangs?

What windows version was shipped on this system?


@Matthew Garrett

Now I am by no means a Linux zealot or MS hater, but things like that I think can make anyone hate this company.

I mean they break the spec, break suspend on my laptop (I was without suspend for 2 years), now we try to replicate their bug, and worst thing happens, it breaks this system....
Comment 6 tomas m 2010-07-19 00:40:06 UTC
(In reply to comment #5)
> @tomas m
> 
> Could you explain in detail how system is failing to suspend?

system suspends correctly (power button blinks as it should).

uppon resume, the display does not light up (no backlight). 
sysrq-reisub ignored. no panic, nothing.

> 
> Does it fail on first suspend cycle?

yes, first suspend cycle, 100% of the time.

> 
> Is system completely dead when it hangs?

Yes, completely dead. although the power button lights up as it should when it resumes from suspend.

 
> What windows version was shipped on this system?

vista.



what i may add, this is a notebook that suffers from a crazy bios issue where some reserved space is not being reserved (or something like that), check bug 9905.

to get this working, i had to add to the kernel boot parameters the command reserve=0xffb00000,0x100000

this might have something to do with it. i will try and test a broken kernel with troublesome modules removed before suspending, and will report back.
Comment 7 tomas m 2010-07-19 02:05:53 UTC
> 
> this might have something to do with it. i will try and test a broken kernel
> with troublesome modules removed before suspending, and will report back.

and the results:

kernel 2.6.35-rc5 + commit

without sdhci or 8139too (offending modules), the system  can suspend/resume correctly.

with sdhci or 8139too loaded, the system hangs upon resume.


so what is the appropiate course of action here? am i expected to deal with this myself (remove modules before suspend) or is this still a valid bug and a fix should be pursued?

is there a way to store the reserved memory map during suspend? im not quite sure i understand what this commit does.
Comment 8 Maxim Levitsky 2010-07-19 12:36:58 UTC
First, sure this is a valid bug.

Then, the 'crazy issue with bios' is very good clue, it almost explains things.

Please give output of cat /proc/iomem and lspnp -vv
Comment 9 Maxim Levitsky 2010-07-19 12:58:43 UTC
Other question, does suspend to disk work?
Did it work in 2.6.34?
Comment 10 tomas m 2010-07-19 15:28:16 UTC
(In reply to comment #8)
> First, sure this is a valid bug.
> 
> Then, the 'crazy issue with bios' is very good clue, it almost explains
> things.
> 
> Please give output of cat /proc/iomem and lspnp -vv

in reply to both of your messages,

no, it cannot hibernate. i had never tested that before, it hangs right after loading ram contents.


im attaching the contents of /proc/iomem

i cannot find lspnp within my distribution. do you have a place where i can download this?
Comment 11 tomas m 2010-07-19 15:29:05 UTC
Created attachment 27154 [details]
iomem from an everex stepnote sa2053t
Comment 12 Matthew Garrett 2010-07-19 15:32:39 UTC
Ok, I think bug 9905 needs to be fixed before we can consider this a problem.
Comment 13 tomas m 2010-07-19 15:54:23 UTC
(In reply to comment #12)
> Ok, I think bug 9905 needs to be fixed before we can consider this a problem.

take a swing at it.. im willing to test whatever you may think necessary.

i think the people with the right knowledge isnt interested in it anymore.

there is a patch flying around but it was considered an ugly hack back then.
Comment 14 Maxim Levitsky 2010-07-19 18:00:59 UTC
Could add "acpi_sleep=s4_nonvs" and try suspend to disk?
Comment 15 Maxim Levitsky 2010-07-19 18:01:26 UTC
I mean add to kernel command line
Comment 16 Maxim Levitsky 2010-07-19 18:04:15 UTC
And about lspnp, it is indeed not installed by default (here on ubuntu).
You should install 'pnputils' or something similiar.

Here on ubuntu,
sudo apt-get install pnputils
Comment 17 tomas m 2010-07-19 18:28:52 UTC
(In reply to comment #14)
> Could add "acpi_sleep=s4_nonvs" and try suspend to disk?

yes, after adding this line, i could resume from disk successfully. is it adviced to have this line permanently there?

this was tested with 2.6.34.1 should i test a broken 2.6.35?
Comment 18 Matthew Garrett 2010-07-19 18:32:43 UTC
Ok. I think it's clear that this patch isn't the problem. Once 9905 is fixed things ought to work fine.
Comment 19 tomas m 2010-07-19 18:34:24 UTC
Created attachment 27157 [details]
lspnp -vv from everex stepnote sa2053t
Comment 20 tomas m 2010-07-19 18:39:40 UTC
(In reply to comment #18)
> Ok. I think it's clear that this patch isn't the problem. Once 9905 is fixed
> things ought to work fine.

so, anyway i can draw attention back to 9905. it appears to be a dead report for over a year..
Comment 21 Maxim Levitsky 2010-07-19 18:40:17 UTC
Thanks!

@Matthew Garrett , I think for now we can just make sure that 'acpi_sleep=s4_nonvs' disables NVS restore on suspend to ram.

So, the problem is quite clear:

The dimwit bios programs sdhci and nic device bars into region that is used by chipset itself, but reserves that region in ACPI tables.

Windows 'fixes' that problem by relocating the bars out of reserved area.
'reserve=0xffb00000,0x100000' achieves same effect.

However, the NVS region restore, somehow undoes the fix.

I would suggest to do following test:

Boot into latest git kernel without the patch reverted, unload both sdhci, and 8139too

save output of 'sudo lspci -H1 -vvvxxx', then do suspend to ram cycle, then save the output of former command again, and post it.
Comment 22 tomas m 2010-07-19 18:58:20 UTC
hmmm, tried to do just that, i cant seem to be able to resume from suspend with the modules removed anymore...

i might have screwed up before :(
Comment 23 tomas m 2010-07-19 21:06:50 UTC
ive been thinking a bit about this, and how i could have swapped and booted into a working kernel....but it cant be i screwed up, i dont know what changed from the previous test and this one. i used the same kernel build im using right now.

i did manage to suspend and resume removing the modules as described in comment 7, and after inserting the module, the second suspend hung the system. so. the question lies...

WTF?! ... im lost.
Comment 24 Rafael J. Wysocki 2010-07-20 15:16:11 UTC
(In reply to comment #12)
> Ok, I think bug 9905 needs to be fixed before we can consider this a problem.

I'm not sure.  Apparently, suspend/resume worked on the box successfully before commit 2a6b69765ad794389f2fc3e14a0afa1a995221c2 the bug #9905 problem notwithstanding, so to be fair, we should at least provide a workaround for this issue.

tomas, I'll try to prepare a patch for you to try later today.
Comment 25 Rafael J. Wysocki 2010-07-20 19:39:51 UTC
Created attachment 27170 [details]
ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM

tomas, please try if suspend works for you with this patch applied when you add the acpi_sleep=nonvs option to the kernel command line (hibernation should work with this option as well).
Comment 26 Rafael J. Wysocki 2010-07-20 19:43:15 UTC
The patch is on top of the current mainline (2.6.35-rc5 + later updates), BTW.
Comment 27 tomas m 2010-07-20 22:07:08 UTC
(In reply to comment #25)
> Created an attachment (id=27170) [details]
> ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM
> 
> tomas, please try if suspend works for you with this patch applied when you
> add
> the acpi_sleep=nonvs option to the kernel command line (hibernation should
> work
> with this option as well).

yes, this patch works.

tested without and with xorg. twice to be sure this time ;)
Comment 28 Rafael J. Wysocki 2010-07-20 22:30:04 UTC
Thanks for the confirmation, I'm going to send the patch upstream shortly.
Comment 29 Bjorn Helgaas 2010-07-21 14:37:09 UTC
I don't understand this, Rafael.  Are you saying that this bug is resolved because the user can avoid the hang by booting with "acpi_sleep=nonvs"?

That doesn't seem like a great solution to me.  I think we need something
that works automatically, with no special options.  We can't expect users
to experience a hang, research it, and figure out an arcane option that
works around it.  But maybe I'm missing something.
Comment 30 Rafael J. Wysocki 2010-07-21 19:40:52 UTC
First, the system is known broken BIOS (we're talking about systems that most probably don't work with Windows here).

Second, the other option would be to change the default to 'nosave' for suspend to RAM, but then some other users would need to use a command line switch to make their systems work.  That actually would be backwards compatible, but then the chance a user would discover the switch to use would be minimal and the default behavior would be different for S3 and S4.

I think that would be suboptimal.
Comment 31 Rafael J. Wysocki 2010-07-21 20:22:50 UTC
Patch : https://patchwork.kernel.org/patch/113108/
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
Comment 32 Bjorn Helgaas 2010-07-26 17:11:43 UTC
I'm still hopeful that we can find a generic solution that doesn't require
a kernel argument, so here are some breadcrumbs for future investigation.

Arne Fitzenreiter reports in bug 9905 comment 108 that he has a similar
laptop (Averatec 2400), and suspend to disk and ram works correctly under
Windows XP.  With Ubuntu 10.04, suspend to ram is works but suspend to
disk fails.

I think bug 9905 is caused by the fact that the BIOS put some PCI devices
at addresses that conflict with a PNP0C02 motherboard device.  In this case,
Windows moves the PCI devices, but Linux currently does not (see bug 9905
comment 110).  The Linux workaround is to boot with "reserve=", which
forces us to move the PCI devices.

I hoped that a fix for bug 9905 would also fix this bug, but based on
the following observations by Tomas, that doesn't seem likely:

  <2.6.34 "reserve=0xffb00000,0x100000 acpi_sleep=nonvs" suspends OK
  <2.6.34 "pci=use_crs acpi_sleep=nonvs" suspends OK
   2.6.34 "reserve=0xffb00000,0x100000" suspends OK
   2.6.35 "reserve=0xffb00000,0x100000" hangs during resume
Comment 33 Len Brown 2010-07-26 17:15:54 UTC
the acpi_sleep=nonvs patch from comment #31 is now upstream,
after 2.6.35-rc6

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=72ad5d77fb981963edae15eee8196c80238f5ed0
Comment 34 Eric Valette 2010-07-29 14:13:46 UTC
See also different reports from bug 16311. I also find that requiring a special kernel command option for users that used to have a working slpeep/suspend without option is NOT a solution. How many people will be hurt by the problem compared with those that require the NVS store/resstore?

Its a regression, so fix the new code, or make previous behavior the default until you have a working solution.
Comment 35 tomas m 2010-07-29 14:20:40 UTC
(In reply to comment #34)
> See also different reports from bug 16311. I also find that requiring a
> special
> kernel command option for users that used to have a working slpeep/suspend
> without option is NOT a solution. How many people will be hurt by the problem
> compared with those that require the NVS store/resstore?
> 
> Its a regression, so fix the new code, or make previous behavior the default
> until you have a working solution.


i thought my notebook was the only one being affected by the store NVS code.

if its more widespread, then i can only agree with you
Comment 36 Eric Valette 2010-07-29 14:55:17 UTC
Created attachment 27296 [details]
lspnp -vv for fujitsus-siemens amilo xi 3650
Comment 37 Eric Valette 2010-07-29 14:56:18 UTC
Created attachment 27297 [details]
cat /proc/iomem
Comment 38 Rafael J. Wysocki 2010-07-29 22:55:22 UTC
*** Bug 16311 has been marked as a duplicate of this bug. ***
Comment 39 Rafael J. Wysocki 2010-07-29 23:59:00 UTC
(In reply to comment #34)
> See also different reports from bug 16311. I also find that requiring a
> special
> kernel command option for users that used to have a working slpeep/suspend
> without option is NOT a solution. How many people will be hurt by the problem
> compared with those that require the NVS store/resstore?

Since the new behavior is what Windows does, there likely are more systems that require the NVS store/resstore, because that's what hardware vendors test.  The problem is if we go back to the old behavior, we'll never know.

BTW, does your system require acpi_sleep=nonvs to hibernate correctly?

> Its a regression, so fix the new code, or make previous behavior the default
> until you have a working solution.

The only "solution" we can use at this point is to choose the default and blacklist the systems that need to be handled differently.
Comment 40 Eric Valette 2010-07-30 08:31:26 UTC
1) you marked the bug 16311 as a duplicate of this one although it is older given the number. Fine but at least take care of information the bug 16311 contains,
2) I explicitly state in bug 16311 that acpi_sleep=nonvs fixes my suspend problem
3) We currently have 3 brand (IBM, HP, futjitsu) that brakes with new code. I bet you will likely get a lot more as soon as people will try official 2.6.35 kernel
4) If you find a way to make it work enabled fine, if not you know that you will introduce a kernel with know severe regression for laptop user. You have no clue on the percentage of failing compared to the percentage of people that will benefit from it. So why explicitly allowing a know regression with no obvious benefit?
5) When you say its what windows does, maybe it does it a different/better way (e.g remapping the wrong BIOS IO region as done on boot if I read the io region problem correctly)
Comment 41 Maxim Levitsky 2010-07-30 14:05:21 UTC
I want to note that at least on 'tomas m''s system, NVS save/restore is broken regardless of it being performed during suspend to disk or suspend to ram.

Do you know a system where suspend to disk works (without acpi=nos4_nvs of course), but suspend to ram is broken?

Save/Restore of NVS during suspend to disk *is* ACPI compliant.

This commit just exposed more systems that break on NVS save restore, and this probably just means that skip of NVS save restore just fixes accidentally some Linux or BIOS bug.
Comment 42 Maxim Levitsky 2010-07-30 14:05:59 UTC
'Do you know a system where suspend to disk works (without acpi=nos4_nvs of
course), but suspend to ram is broken?'

I mean by commit in question.
Comment 43 Eric Valette 2010-07-30 19:03:19 UTC
If ACPI specification says NVS has to be saved and restored for suspend to disk fine with me.

I just complains to extend this behavior to suspend to RAM: 
1) just because windows does it,
2) in a post RC6 time frame, 
3) when it obviously breaks many computed suspend to ram feature,

Does no sound reasonable to me. And BTW reading the patch, the default value for for S4 and S3 can be different...


BTW, I know how to make it work for my case, so don't take it as a selfish comment.
Comment 44 Maxim Levitsky 2010-07-31 12:14:48 UTC
It does sound selfish :-)

@Eric Valette
Lets actually do some constructive work instead of complaining, ok?
(Don't forget that I didn't have s2ram for 2 years, and I didn't have any workarounds to make it work...)

What I want to know is:

1) What systems you know are affected.
2) DMI information from affected systems
3) Test if hibernation works on these systems
4) If no, test if 'acpi_sleep=s4_nonvs' 'fixes' the hibernation.
5) Explanation (if possible) how exactly s2ram and s2disk don't work.
(For example does it hang on first suspend, first resume, second suspend, etc..)


@thomas:

If you have spare time, It would be great if you do some debugging on why system hangs if NVS is saved/restored. This might give us some insight on what is going on.

That what I want you to do:

1) Don't use acpi_sleep=nvs
2) Verify that s2ram fails

3) Verify that following patch 'fixes' the s2ram problem.
It nothing to get exited, but it would prove that restoration of nvs
is the problem.


diff --git a/kernel/power/nvs.c b/kernel/power/nvs.c
index 1836db6..0a4a54a 100644
--- a/kernel/power/nvs.c
+++ b/kernel/power/nvs.c
@@ -127,6 +127,7 @@ void suspend_nvs_save(void)
 void suspend_nvs_restore(void)
 {
        struct nvs_page *entry;
+       return;
 
        printk(KERN_INFO "PM: Restoring platform NVS memory\n");
 


4) Apply my printk blackbox patch I attach.
I have updated it today to work with no configuration at all.
It allows you to look at kernel log of crashed kernel.
It is a bit hackish, and I don't have the time and will power
to make it acceptable upstream...

turn on in kernel config the
CONFIG_DEBUG_FS
CONFIG_HWMEM_PRINTK
(and leave HWMEM_PRINTK_DEFAULT_ADDRESS to default)


5) In kernel configuration, turn on:
CONFIG_DETECT_SOFTLOCKUP
CONFIG_DETECT_HUNG_TASK
(These turn on mechanisms to detect hangs)

Create file '/etc/sysctl.d/90-local.conf' with contents:

kernel.panic = 20
kernel.panic_on_oops = 1
kernel.softlockup_panic = 1
kernel.hung_task_panic = 1
kernel.hung_task_timeout_secs = 30

These make sure that system reboots as soon as it hangs/panicks/oopses...
(There is still a delay of about minute untill system reboots)

Also to arm out hang detection even better add 'nmi_watchdog=lapic' to kernel command line.

6) Compile the kernel & boot into it

7) Verify that you did everything right:
(Except first command, output should be exactly the same)
maxim@maxim-laptop:~$ cat /proc/cmdline | grep nmi
BOOT_IMAGE=/boot/vmlinuz-2.6.35-rc6+ root=UUID=52341b68-74f3-4c96-aaf8-7586a06c4b4e ro splash vga=791 nmi_watchdog=lapic

maxim@maxim-laptop:~$ cat /proc/sys/kernel/panic
20
maxim@maxim-laptop:~$ cat /proc/sys/kernel/panic_on_oops 
1
maxim@maxim-laptop:~$ cat /proc/sys/kernel/softlockup_panic 
1
maxim@maxim-laptop:~$ cat /proc/sys/kernel/hung_task_panic 
1
maxim@maxim-laptop:~$ cat /proc/sys/kernel/hung_task_timeout_secs 
30
maxim@maxim-laptop:~$ dmesg | grep Logging
[    0.592411] Logging kernel messages into HW memory at 0x03c00000
maxim@maxim-laptop:~$ 
maxim@maxim-laptop:~$ ls /sys/kernel/debug/printk/crash_dmesg 
/sys/kernel/debug/printk/crash_dmesg
maxim@maxim-laptop:~$ 


8) Now suspend/resume the system.
Wait till system reboots (wait about 1.5 minutes, if it doesn't then this strategy failed)

9) As sson as system reboots, make sure it boots in same kernel (and be sure not to power system off, or boot into another kernel)

10) now get the results:

sudo cat /sys/kernel/debug/printk/crash_dmesg | strings > $HOME/old_dmesg


(I have put that into a script:

maxim@maxim-laptop:~$ cat /home/maxim/bin/kernel/blackbox 
#! /bin/bash

sudo cat /sys/kernel/debug/printk/crash_dmesg | strings > /home/maxim/old_dmesg
cat /home/maxim/old_dmesg


11) If output looks reasonable, post it.
Comment 45 Maxim Levitsky 2010-07-31 12:17:15 UTC
Created attachment 27305 [details]
kernel log saver

this is what I call a kernel blackbox...
Comment 46 Eric Valette 2010-07-31 12:56:21 UTC
If you do not your own part of the work, do not be surprised if I complain!
Did you ever read the comment that are in the bug that is marked as duplicate (bug 16311)?

The bug is not in the restore path as the laptop does not even finish to suspend. Its the case for at least two of the three laptop users that have complained so far and I dunno for the third case. I have been unable to find anything in the log when I force a hard reboot by clicking on the power button for 5 s.

Concerning laptop that do not work, again read bug. You will get two laptops there. Then do a grep in the kernel mailing list to find a third where acpi_suspend=novns <http://lkml.indiana.edu/hypermail/linux/kernel/1007.3/01172.html>
Comment 47 tomas m 2010-07-31 12:59:27 UTC
Eric, this is not a support forum, its a bug report, please treat it as such

@Maxim: im working on the reports. the nvs patch failed to apply for some obscure reason, did apply it manually.

and btw, its Tomas, without an h ;)
Comment 48 Eric Valette 2010-07-31 13:01:25 UTC
Created attachment 27307 [details]
dmi for fujitsu XI 3650

dmi as requested.
Comment 49 Eric Valette 2010-07-31 13:04:49 UTC
(In reply to comment #47)
> Eric, this is not a support forum, its a bug report, please treat it as such

I do not need support. I have a working solution. I just complain closing a bug when the proposed solution will introduce unknown regression for a unknown number of user.
Comment 50 Maxim Levitsky 2010-07-31 13:12:56 UTC
Eric Valette, thanks for report.

Note that this isn't related at all because acpi=nonvs was just suggested to try and didn't help.
'http://lkml.indiana.edu/hypermail/linux/kernel/1007.3/01172.html'
Comment 51 Maxim Levitsky 2010-07-31 13:18:09 UTC
@thomas, small note, you probably noticed already that after step 3, you need to remove the nvs one line test. Its just small test for step 3 only.

I also just copypasted it from terminal, so probably something  mangled it.
But you get the idea: we save NVS region on suspend, and restore it on resume.
I want to skip the restoration on resume to see if it causes problem on your system.

(I do realize that on other system the 'store' of the NVS region hangs the system, which is pretty wierd...)
Comment 52 Eric Valette 2010-07-31 13:22:02 UTC
(In reply to comment #50)
> Eric Valette, thanks for report.
> 
> Note that this isn't related at all because acpi=nonvs was just suggested to
> try and didn't help.
> 'http://lkml.indiana.edu/hypermail/linux/kernel/1007.3/01172.html'

That is not the way I read it. He says, that console=tty0 is set by default on its system due to the fact that kubuntu use "quiet splash" and that indeed *adding* acpi=nonvs fixes the problem.

What he also says is that without console=tty0 it does not suspend but as it was not the default for its system, the problem may have existed exist before or maybe unrelated.
Comment 53 Maxim Levitsky 2010-07-31 13:34:14 UTC
ok, you are right here.
Comment 54 tomas m 2010-07-31 14:02:18 UTC
after step 3)

it did suspend / resume correctly, so it does hung after restoring the nvs state.

applying the debug info:

during resume. the kernel panics (keyboard leds flashing)

after about 30 secs, the keyboard leds stay on, and the system does not reboot.

i did check all the debug stuff got applied in /proc/sys/kernel/ , dmesg and friends...


suggestions?
Comment 55 Maxim Levitsky 2010-07-31 14:12:23 UTC
You did test

maxim@maxim-laptop:~$ cat /proc/sys/kernel/panic
20
maxim@maxim-laptop:~$ cat /proc/sys/kernel/panic_on_oops 
1
maxim@maxim-laptop:~$ cat /proc/sys/kernel/softlockup_panic 
1
maxim@maxim-laptop:~$ cat /proc/sys/kernel/hung_task_panic 
1
maxim@maxim-laptop:~$ cat /proc/sys/kernel/hung_task_timeout_secs 
30
Comment 56 tomas m 2010-07-31 14:14:31 UTC
(In reply to comment #55)
> You did test
> 
> maxim@maxim-laptop:~$ cat /proc/sys/kernel/panic
> 20
> maxim@maxim-laptop:~$ cat /proc/sys/kernel/panic_on_oops 
> 1
> maxim@maxim-laptop:~$ cat /proc/sys/kernel/softlockup_panic 
> 1
> maxim@maxim-laptop:~$ cat /proc/sys/kernel/hung_task_panic 
> 1
> maxim@maxim-laptop:~$ cat /proc/sys/kernel/hung_task_timeout_secs 
> 30

yes, thats what i meant by

> i did check all the debug stuff got applied in /proc/sys/kernel/ , dmesg and
friends...
Comment 57 Eric Valette 2010-07-31 14:19:32 UTC
So your non working case is different from mine: my laptop locks up before completing the suspend path, exactly in the apparent same way as in the kernel thread already mentionned above.
Comment 58 Maxim Levitsky 2010-07-31 14:22:05 UTC
wierd, but what else to expect from debugging. Still we are close.
I assume you don't have a reset button on your system
If you do just hit it.

try to add 'reboot=pci' and then 'reboot=acpi' and reboot='kbd' to kernel command line, just a guess, maybe will help.
Also wait a bit longer after panic.

The procedure I described here is what I use.
There are other ways to get kernel log, I look at them soon.
Comment 59 tomas m 2010-07-31 14:25:10 UTC
(In reply to comment #58)
> wierd, but what else to expect from debugging. Still we are close.
> I assume you don't have a reset button on your system
> If you do just hit it.
> 
> try to add 'reboot=pci' and then 'reboot=acpi' and reboot='kbd' to kernel
> command line, just a guess, maybe will help.
> Also wait a bit longer after panic.
> 
> The procedure I described here is what I use.
> There are other ways to get kernel log, I look at them soon.

i thought about that already, but no, i dont have a reset button.. gonna check if i can change the power button's behaviour through the bios, but i dont think its possible..
Comment 60 Eric Valette 2010-07-31 16:59:31 UTC
As your mail suggested way to catch software lockup, I added CONFIG_DETECT_SOFTLOCKUP and all (except the one that are related to the blackbox patches) and used sysfs to set the suggested values for timeout and it fixes the lockup during suspend!!! And I can also resume without problem.

Grrr. Will not be easy to debug! But at least probably makes a BIOS/ACPI related bug less likely. Means probably either a race somewhere that the additional code closes or that the change in the code/data mapping make the problem vanish.
Comment 61 Eric Valette 2010-07-31 17:45:19 UTC
Double checked, I removed only DETECT_SOFTLOCKUP and BOOTPARAM_SOFTLOCKUP_PANIC and the bug reappears. So its clearly an implementation bug...

Again, makes the default for suspend to ram nonvs until we know how to fix the regression/bug when enabled...
Comment 62 Maxim Levitsky 2010-07-31 19:50:47 UTC
@Eric Valette

I would suggest you to throw in my blackbox patch, disable softlockup detector, and then as soon at system hangs try to reboot the system with sysrq key
(and before that print locked tasks.... ctrl+alt+sysrw+w, ctrl+alt+sysrw+s, ctrl+alt+sysrw+b)

What surprised me is that softlockup detection is arch agnostic, and pretty much doesn't touch hardware.

@thomas, you could try that too.

Best regards,
        Maxim Levitsky
Comment 63 Eric Valette 2010-07-31 21:45:16 UTC
Unfortunately, while sysrq works correctly on the tty on my machine (tried explicitly on a healthy system), it does not seem to work when the system is locked in the suspend path with a blinking cursor on top of the newly switched to tty.

have to press 4 keys (printscreen needs the FN key) but anyway, tried and succeeded in healthy case not in the hung case.
Comment 64 tomas m 2010-07-31 22:19:27 UTC
(In reply to comment #62)

> 
> @thomas, you could try that too.
> 

i dont think it will make a difference. Where Eric can use the sysrq or even make things work without nonvs, i cant. once the system locks.. NOTHING works here.

i could try and add a loop ignore during the nvs restore asuming list_for_each_entry(entry, &nvs_list, node) is actually a loop and might prove helpful.

otherwise, just ignore what i said.
Comment 65 Eric Valette 2010-08-01 09:57:18 UTC
apparently, when hung I can *only* do a hardware reset (5s on power button is managed entirely by hardware as far as I know).
Comment 66 Rafael J. Wysocki 2010-08-01 13:10:50 UTC
Ignore-Patch : https://patchwork.kernel.org/patch/113108/
Comment 67 Shawn Starr 2010-08-02 02:28:42 UTC
I will note in this kernel: 2.6.35-0.57.rc6.git1.fc14.x86_64 I am able suspend to ram/resume.
Comment 68 Shawn Starr 2010-08-02 02:38:46 UTC
Correction, it doesn't always work. And when I just tried the moon flashed on/off yet I was able to continue using laptop for the most part.

I have not tried acpi=nonvs yet.
Comment 69 tomas m 2010-08-02 18:31:23 UTC
eric,

ive since your laptop fails to suspend, i thought this thread might be of relevance: 

https://kerneltrap.org/mailarchive/linux-kernel/2010/8/1/4600853
Comment 70 Eric Valette 2010-08-02 20:01:39 UTC
Yes especially because I do use an Intel chipset (at least as long that hybrid graphic is broken or I need to plig the HDMI output and use the nvidia one). 

However, I just found time to try the 2.6.35 and suspend to ram works (at least several time in a row). This bug is very sensitive either to timing or code/data layout or was not solely related to the NVS save restore but had mixed cause (e.g timing change due to additional time taken to the store the NVS content and the likelihood to have a certain irq/hardware event in the meddle) !!!

As the hang in my case happens when switching from the lockscreen to the tty, of course i915 driver is a very good suspect, especially because some fixes went in before 2.6.35 and the last version I tested.

Anyway thanks for the pointer.
Comment 71 Eric Valette 2010-08-03 20:20:57 UTC
Apparently the bug is still there. It failed to suspend already once...

Will try the suggested fix/workaround.
Comment 72 tomas m 2010-08-08 21:03:39 UTC
ive hit a similar hang twice since using acpi_sleep=nonvs.

out of hundreds of suspends/resume cycles. the system would not come back from suspend. (it tries to, but screen stays black, and alt-sysrq will not work) just like it did when i first posted the bug.

could there be a race condition somewhere we are missing which is being triggered by nvs saves? and hardly unlikely without the saves? 

is there any way i could test this out? for example, instead of restoring the nvs states, simply delay a bit and see if it hangs that way? 

how could i implement such a delay?


thanks
Comment 73 Rafael J. Wysocki 2010-09-12 18:46:36 UTC
That most likely is a separate issue.

Well, I guess we're not going to really fix this problem, so please use the
acpi_sleep=nonvs workaround if necessary.

We may want to create a DMI-based list of systems that require it, so please
drop your system's DMI information here if you are affected.
Comment 74 Zhang Rui 2010-09-14 01:47:42 UTC
Bug closed as the commit below is in upstream kernel.
Laptops with this bug needs the kernel parameter acpi_sleep=nonvs as a workaround.

commit 72ad5d77fb981963edae15eee8196c80238f5ed0
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Fri Jul 23 22:59:09 2010 +0200

    ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM
    
    Commit 2a6b69765ad794389f2fc3e14a0afa1a995221c2
    (ACPI: Store NVS state even when entering suspend to RAM) caused the
    ACPI suspend code save the NVS area during suspend and restore it
    during resume unconditionally, although it is known that some systems
    need to use acpi_sleep=s4_nonvs for hibernation to work.  To allow
    the affected systems to avoid saving and restoring the NVS area
    during suspend to RAM and resume, introduce kernel command line
    option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its
    alias temporarily (add acpi_sleep=s4_nonvs to the feature removal
    file).
    
    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 .
    
    Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
    Reported-and-tested-by: tomas m <tmezzadra@gmail.com>
    Signed-off-by: Len Brown <len.brown@intel.com>
Comment 75 Rafael J. Wysocki 2010-09-20 17:27:59 UTC
Created attachment 30782 [details]
PM / ACPI: Blacklist systems known to require acpi_sleep=nonvs

This patch blacklists some systems known to require acpi_sleep=nonvs to resume correctly, so that it's not necessary to add acpi_sleep=nonvs on them to the kernel command line.

I added the dmidecode information for the everex stepnote sa2053t and fujitsu XI 3650 to it.

Please test it and let me know if it works for you.
Comment 76 Rafael J. Wysocki 2010-09-20 17:29:15 UTC
*** Bug 18572 has been marked as a duplicate of this bug. ***
Comment 77 Eric Valette 2010-09-20 18:45:40 UTC
Please remove the fujitsu from the blacklist as it works well now with 2.6.36-rc4-git4. Unless someone with an up-todate kernel has still the problem.
Comment 78 tomas m 2010-09-20 19:10:54 UTC
patch works for me (everex stepnote)
Comment 79 Rafael J. Wysocki 2010-09-20 19:53:50 UTC
Created attachment 30822 [details]
PM / ACPI: Blacklist systems known to require acpi_sleep=nonvs

Thanks for the testing and feedback, updated patch is attached.
Comment 80 Rafael J. Wysocki 2010-09-20 19:55:07 UTC
Ignore-Patch : https://patchwork.kernel.org/patch/113108/
Patch : https://bugzilla.kernel.org/attachment.cgi?id=30822
Comment 81 Rafael J. Wysocki 2010-09-24 20:47:00 UTC
Created attachment 31422 [details]
PM / ACPI: Blacklist systems known to require acpi_sleep=nonvs
Comment 83 Len Brown 2010-09-29 01:28:52 UTC
commit 539986482b0db07b7164ab086d167ab99b4d3061
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Fri Sep 24 16:46:14 2010 -0400

    PM / ACPI: Blacklist systems known to require acpi_sleep=nonvs

is in acpi-release, staged for upstream
Comment 84 Len Brown 2010-10-02 02:07:28 UTC
commit 539986482b0db07b7164ab086d167ab99b4d3061
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Fri Sep 24 16:46:14 2010 -0400

    PM / ACPI: Blacklist systems known to require acpi_sleep=nonvs

shipped in Linux-2.6.36-rc6-git2
closed
Comment 85 D. Hugh Redelmeier 2010-12-13 06:04:28 UTC
My Averatec AV1020-ED2 notebook does not resume from sleep under Ubuntu 10.10 or Fedora 14.  It does resume from sleep under Ubuntu 9.10.

With acpi_sleep=nonvs, resume from sleep does work under Fedora 14.  This is with Fedora's kernel-2.6.35.9-64.fc14.i686 kernel.  I haven't tested with Ubuntu 10.10.

Hibernate works on Fedora 14 with or without the acpi_sleep=nonvs parameter.  I haven't tested with Ubuntu 10.10.

I will attach dmidecode output so that this system can be blacklisted.

Thanks.  This problem has taken a while for me to figure out.  It would be a kindness to others if you could add the blacklisting.
Comment 86 D. Hugh Redelmeier 2010-12-13 06:08:30 UTC
Created attachment 39952 [details]
dmidecode from Averatec AV1020-ED2, a system that needs to be blacklisted

see comment #85 for context.
Comment 87 Rafael J. Wysocki 2010-12-13 22:05:31 UTC
Created attachment 40072 [details]
ACPI / PM: Blacklist Averatec machine known to require acpi_sleep=nonvs

Please verify if the attached patch works for you (on top of
https://patchwork.kernel.org/patch/401512/).
Comment 88 Len Brown 2011-02-10 19:35:57 UTC
commit 7b330707dddab1ad772898c1c82516342a551173
(ACPI / PM: Blacklist Averatec machine known to require acpi_sleep=nonvs)
shipped in 2.6.38-rc1
Comment 89 Jesus Gonzalez 2011-05-19 15:39:50 UTC
Created attachment 58572 [details]
dmidecode output of Sony Vaio VGN-SR5 to be blacklisted
Comment 90 Jesus Gonzalez 2011-05-19 15:41:47 UTC
I'm a bit of a noob in bug reporting, specially for important stuff, so i don't know if this should be here or if there is already a newer bug report where to put it, but since it's the same as above, here it goes.

As already pointed out, same problem and solution: I have a Sony Vaio SR, and couldn't suspend. Long story short, after months of not caring about it, I decided to do some tests and found the problem to be in the kernel. I'm not skilled enough to find out WHAT it was exactly, so after googling around here I am.

acpi_sleep=nonvs also works for me, so it would be good if it's possible to add this laptop to the blacklist. dmidecode output is attached above (i'm still learning to use diff and patch properly, sorry for plain output)
Comment 91 Arief M Utama 2011-07-25 18:13:04 UTC
Created attachment 66582 [details]
dmidecode output for affected laptop Sony VAIO VGN-SR26GN

Confirmed that sony vaio VGN-SR26GN is also affected by this, acpi_sleep=nonvs enables the system to suspend-resume.

Attached is dmidecode output from the system.
Comment 92 Rafael J. Wysocki 2011-08-14 21:06:58 UTC
Created attachment 68802 [details]
ACPI / PM: Blacklist Sony Vaio machine known to require acpi_sleep=nonvs

Please verify that this patch fixes the problem for you.
Comment 93 Florian Mickler 2012-01-12 21:23:43 UTC
A patch referencing this bug report has been merged in Linux v3.2-rc1:

commit 89e8ea1278fb3b237159a1ca193002ef5c8652d8
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Thu Oct 6 20:35:03 2011 +0200

    PM / ACPI: Blacklist Sony Vaio known to require acpi_sleep=nonvs
Comment 94 Florian Mickler 2012-01-12 21:27:02 UTC
A patch referencing this bug report has been merged in Linux v3.2-rc1:

commit 731b25a4ad3c27b44f3447382da18b59167eb7a1
Author: Bogdan Radulescu <bogdan@nimblex.net>
Date:   Thu Oct 6 20:35:12 2011 +0200

    PM / ACPI: Blacklist Vaio VGN-FW520F machine known to require acpi_sleep=nonvs