Subject : IDE/ACPI related hibernation regression: Second attempt fails
Submitter : Mikko Vinni <email@example.com>
Date : 2007-12-31 13:27
References : http://lkml.org/lkml/2007/12/31/135
Handled-By : Andreas Mohr <firstname.lastname@example.org>
Author: Shaohua Li <email@example.com>
Date: Thu Oct 11 23:53:58 2007 +0200
ide: hook ACPI _PSx method to IDE power on/off
Created attachment 14250 [details]
dmesg from 2.6.24-rc6+ (e697789d64...)
dmesg after one suspend/resume. Second suspend (to disk) attempt on this kernel will hang the computer without giving any usable messages on the screen or logs, as far as I can tell.
Created attachment 14251 [details]
dmesg with 5e32132bef... reverted
Despite the weird version number this is from the current 2.6.24-rc6+ git tree also, with just commit 5e32132bef... reverted. I put the machine to sleep three times for this dmesg, twice from runlevel 1 and third time after starting X. All succeeded fine.
Created attachment 14252 [details]
Created attachment 14253 [details]
lspci -xxx output after one hibernation on the failing kernel
What seems to happen when I try to hibernate the machine for the second time, is that the hard disk powers down quite early in the process (I hear it spin down, if I'm not totally mistaken) and it doesn't turn back on.
There is a _PS3 method for IDE device. From the AML code, the method will call into SMI and do somthing we don't know. The _PS0 method is fake, and does nothing.
Could IDE core resume method power on an disk?
Do you mean complete reset/power-on processing + device re-discovery or something else? I have a bunch of device probing fixes pending for IDE tree which should make it relatively easy (given that if somebody would like to work on it). Unfortunately it doesn't seem like a possible solution for 2.6.24 this late in the release cycle.
What really puzzles me is why does it survive one complete suspend/resume cycle and breaks on the second one?
Yes, it's strange the first attempt doesn't fail. The _PS3 method will call into SMM mode, nobody knows what BIOS does. Maybe we should blacklist the system. What do you think?
I somehow suspect suspend-to-disk doesn't save/restore ACPI NVS region, which ACPI spec clearly declares this should be saved/restored in a S4. Maybe the _PS3 method use some info in this region. Last time when I try to push a patch to save/restore the region, I failed because it cause regression.
For 2.6.24 blacklisting sounds fine but I think that we should at least check if ACPI NVS save/restore fixes the issue (so when ACPI NVS finally hits the kernel we could remove the blacklist)...
I have the saving of ACPI NVS on my todo list, but we need to be extremely cautious with that, because of its potential for breaking systems. Post 2.6.25 material, I'd say.
Mikko, can you please attach your output of command 'dmidecode'?
Created attachment 14284 [details]
Created attachment 14317 [details]
workaround for the issue
workaround for 2.6.24, please try. We should revisit this issue with ACPI NVS save/store later.
Created attachment 14320 [details]
LD [M] drivers/ide/ide-core.o
drivers/ide/ide-acpi.o: In function `init_module':
(.init.text+0x0): multiple definition of `init_module'
drivers/ide/ide.o:(.init.text+0x190): first defined here
make: *** [drivers/ide/ide-core.o] Error 1
make: *** [drivers/ide] Error 2
make: *** [drivers] Error 2
Compilation failed :( I suppose building IDE into the kernel would not change the result.
Created attachment 14340 [details]
workaround for the issue
sorry, I should have tried to compile it as a module. pls try the new one.
Yes, that one seems to work:
$ dmesg | egrep 'nx|Image'
[ 19.821868] HP nx9005 detected - disable ACPI _PSx.
[ 100.956784] PM: Image restored successfully.
[ 104.302034] PM: Image restored successfully.
[ 97.658460] PM: Image restored successfully.
Thank you. Hopefully there will be a more generic solution possible in the future, so that if HP for example releases an updated BIOS for this machine, it won't break the quirk (if it is still necessary).
I'm happy to hear this. :-) Thanks for reporting the bug and testing the patches.
David, thanks for fixing this regression. Patch looks fine, please submit the final version (updated with patch description and Signed-off-by: line) to linux-ide mailing list (maybe also cc:-ing linux-kernel) so I could apply it.
The fix now is commit 90494893b5d2bf7533fb65accbfd8cbd6b51b9c3 in Linus' tree.