Bug 9673 - IDE/ACPI related hibernation regression: Second attempt fails
Summary: IDE/ACPI related hibernation regression: Second attempt fails
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: IDE (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Shaohua
URL:
Keywords:
Depends on:
Blocks: 7216 9243
  Show dependency tree
 
Reported: 2008-01-01 12:04 UTC by Rafael J. Wysocki
Modified: 2008-01-11 08:24 UTC (History)
5 users (show)

See Also:
Kernel Version: commit 5e32132befa5d2cefadf3141fee0bbb40cd11f0e
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg from 2.6.24-rc6+ (e697789d64...) (81.25 KB, text/plain)
2008-01-01 13:34 UTC, Mikko Vinni
Details
dmesg with 5e32132bef... reverted (34.41 KB, text/plain)
2008-01-01 13:43 UTC, Mikko Vinni
Details
DSDT (Disassembled) (222.72 KB, text/plain)
2008-01-01 13:47 UTC, Mikko Vinni
Details
lspci -xxx output after one hibernation on the failing kernel (11.60 KB, text/plain)
2008-01-01 13:54 UTC, Mikko Vinni
Details
dmidecode output (12.04 KB, text/plain)
2008-01-04 11:57 UTC, Mikko Vinni
Details
workaround for the issue (1.48 KB, patch)
2008-01-06 18:11 UTC, Shaohua
Details | Diff
.config (63.26 KB, text/plain)
2008-01-07 01:29 UTC, Mikko Vinni
Details
workaround for the issue (1.76 KB, text/x-patch)
2008-01-07 17:12 UTC, Shaohua
Details

Description Rafael J. Wysocki 2008-01-01 12:04:29 UTC
Subject         : IDE/ACPI related hibernation regression: Second attempt fails
Submitter       : Mikko Vinni <mmvinni@yahoo.com>
Date            : 2007-12-31 13:27
References      : http://lkml.org/lkml/2007/12/31/135
Handled-By      : Andreas Mohr <andi@lisas.de>

Caused by:

commit 5e32132befa5d2cefadf3141fee0bbb40cd11f0e
Author: Shaohua Li <shaohua.li@intel.com>
Date:   Thu Oct 11 23:53:58 2007 +0200

    ide: hook ACPI _PSx method to IDE power on/off

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5e32132befa5d2cefadf3141fee0bbb40cd11f0e
Comment 1 Mikko Vinni 2008-01-01 13:34:45 UTC
Created attachment 14250 [details]
dmesg from 2.6.24-rc6+ (e697789d64...)

dmesg after one suspend/resume. Second suspend (to disk) attempt on this kernel will hang the computer without giving any usable messages on the screen or logs, as far as I can tell.
Comment 2 Mikko Vinni 2008-01-01 13:43:55 UTC
Created attachment 14251 [details]
dmesg with 5e32132bef... reverted

Despite the weird version number this is from the current 2.6.24-rc6+ git tree also, with just commit 5e32132bef... reverted. I put the machine to sleep three times for this dmesg, twice from runlevel 1 and third time after starting X. All succeeded fine.
Comment 3 Mikko Vinni 2008-01-01 13:47:10 UTC
Created attachment 14252 [details]
DSDT (Disassembled)
Comment 4 Mikko Vinni 2008-01-01 13:54:13 UTC
Created attachment 14253 [details]
lspci -xxx output after one hibernation on the failing kernel
Comment 5 Mikko Vinni 2008-01-01 14:03:59 UTC
What seems to happen when I try to hibernate the machine for the second time, is that the hard disk powers down quite early in the process (I hear it spin down, if I'm not totally mistaken) and it doesn't turn back on. 
Comment 6 Tejun Heo 2008-01-01 19:12:08 UTC
cc'ing Bartlomiej.
Comment 7 Shaohua 2008-01-01 19:33:04 UTC
There is a _PS3 method for IDE device. From the AML code, the method will call into SMI and do somthing we don't know. The _PS0 method is fake, and does nothing.
Could IDE core resume method power on an disk?
Comment 8 Bartlomiej Zolnierkiewicz 2008-01-02 14:11:23 UTC
Do you mean complete reset/power-on processing + device re-discovery or something else?  I have a bunch of device probing fixes pending for IDE tree which should make it relatively easy (given that if somebody would like to work on it). Unfortunately it doesn't seem like a possible solution for 2.6.24 this late in the release cycle.

What really puzzles me is why does it survive one complete suspend/resume cycle and breaks on the second one?
Comment 9 Shaohua 2008-01-02 17:41:15 UTC
Yes, it's strange the first attempt doesn't fail. The _PS3 method will call into SMM mode, nobody knows what BIOS does. Maybe we should blacklist the system. What do you think?

I somehow suspect suspend-to-disk doesn't save/restore ACPI NVS region, which ACPI spec clearly declares this should be saved/restored in a S4. Maybe the _PS3 method use some info in this region. Last time when I try to push a patch to save/restore the region, I failed because it cause regression.
Comment 10 Bartlomiej Zolnierkiewicz 2008-01-03 04:33:10 UTC
For 2.6.24 blacklisting sounds fine but I think that we should at least check if ACPI NVS save/restore fixes the issue (so when ACPI NVS finally hits the kernel we could remove the blacklist)...
Comment 11 Rafael J. Wysocki 2008-01-03 15:47:56 UTC
I have the saving of ACPI NVS on my todo list, but we need to be extremely cautious with that, because of its potential for breaking systems.  Post 2.6.25 material, I'd say.
Comment 12 Shaohua 2008-01-03 17:12:58 UTC
Mikko, can you please attach your output of command 'dmidecode'?
Comment 13 Mikko Vinni 2008-01-04 11:57:26 UTC
Created attachment 14284 [details]
dmidecode output
Comment 14 Shaohua 2008-01-06 18:11:21 UTC
Created attachment 14317 [details]
workaround for the issue

workaround for 2.6.24, please try. We should revisit this issue with ACPI NVS save/store later.
Comment 15 Mikko Vinni 2008-01-07 01:29:04 UTC
Created attachment 14320 [details]
.config

  LD [M]  drivers/ide/ide-core.o
drivers/ide/ide-acpi.o: In function `init_module':
(.init.text+0x0): multiple definition of `init_module'
drivers/ide/ide.o:(.init.text+0x190): first defined here
make[3]: *** [drivers/ide/ide-core.o] Error 1
make[2]: *** [drivers/ide] Error 2
make[1]: *** [drivers] Error 2


Compilation failed :(  I suppose building IDE into the kernel would not change the result.
Comment 16 Shaohua 2008-01-07 17:12:22 UTC
Created attachment 14340 [details]
workaround for the issue

sorry, I should have tried to compile it as a module. pls try the new one.
Comment 17 Mikko Vinni 2008-01-07 23:39:28 UTC
Yes, that one seems to work:

$ dmesg | egrep 'nx|Image'
[   19.821868] HP nx9005 detected - disable ACPI _PSx.
[  100.956784] PM: Image restored successfully.
[  104.302034] PM: Image restored successfully.
[   97.658460] PM: Image restored successfully.

Thank you. Hopefully there will be a more generic solution possible in the future, so that if HP for example releases an updated BIOS for this machine, it won't break the quirk (if it is still necessary).
Comment 18 Bartlomiej Zolnierkiewicz 2008-01-09 11:31:36 UTC
I'm happy to hear this. :-)  Thanks for reporting the bug and testing the patches.

David, thanks for fixing this regression.  Patch looks fine, please submit the final version (updated with patch description and Signed-off-by: line) to linux-ide mailing list (maybe also cc:-ing linux-kernel) so I could apply it.
Comment 19 Adrian Bunk 2008-01-11 08:24:59 UTC
The fix now is commit 90494893b5d2bf7533fb65accbfd8cbd6b51b9c3 in Linus' tree.

Note You need to log in before you can comment on or make changes to this bug.