Distribution: Debian unstable Hardware Environment: Core 2 Duo Software Environment: init=/bin/sh Problem Description: When suspending the machine to RAM, the CPU becomes fully loaded, the fan kicks on at full speed, and after several minutes, the machine successfully suspends. See the thread on suspend-devel: http://thread.gmane.org/gmane.linux.kernel.suspend.devel/2264 Steps to reproduce: 1. Boot with "debug noapic nosmp vga=0 init=/bin/sh" boot options 2. "echo mem > /sys/power/state" 3. Note that the CPU is hot and the fan is on full 4. After several minutes the machine suspends
I would suggest you to try linux firmware ready tool http://www.linuxfirmwarekit.org/, and try s3 with that tool.
https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/90771 https://launchpad.net/ubuntu/+source/linux-source-2.6.17/+bug/82996 https://wiki.ubuntu.com/LaptopTestingTeam/ToshibaSatelliteU200-163
The machine suspended immediately and perfectly with LFDK. I will attach the output of the tests, but it didn't look like they didn't say much. What is different about the LFDK that allows the machine to suspend?
Created attachment 11461 [details] DSDT.aml Output file from LFDK
Created attachment 11462 [details] DSDT.dat Output file from LFDK
Created attachment 11463 [details] DSDT.dsl Output file from LFDK
Created attachment 11464 [details] resources.css Output file from LFDK
Created attachment 11465 [details] resources.xml Output file from LFDK
Created attachment 11466 [details] results.css Output file from LFDK
Created attachment 11467 [details] results.txt Output file from LFDK
Created attachment 11468 [details] results.xml Output file from LFDK
I have also tried the LFDk CD and my machine also suspended right away. I have been trying to hunt this bug down, adding many printk to my kernel to see where the time is being spend. Up to now I can see that the system I get the following: enter_state in kernel/power/main.c is called that calls suspend_enter in kernel/power/main.c is called that calls device_power_up in drivers/base/power/resume that calls sysdev_resume in drivers/base/sys.c that start calling __sysdev_resume in the same file to resume type 'cpu' and it works OK, but when it is called to resume type 'timer' (actually the timer0) it seems to take too long. See the related portion of my messages.log below Is this the expected behavior? Should I try to debug deeper in the calling sequence? Best, Paulo --- extract of my messages.log file --- May 10 16:25:00 trinity kernel: [ 85.124000] **** Calling device_power_up. May 10 16:25:00 trinity kernel: [ 85.124000] **** In device_power_up. May 10 16:25:00 trinity kernel: [ 85.124000] **** Calling sysdev_resume. May 10 16:25:00 trinity kernel: [ 85.124000] *** Entering sysdev resume May 10 16:25:00 trinity kernel: [ 85.124000] **** Resuming type 'cpu': May 10 16:25:00 trinity kernel: [ 85.124000] **** cpu0 May 10 16:25:00 trinity kernel: [ 85.124000] **** In __sysdev_resume, calling class specific resume. May 10 16:25:00 trinity kernel: [ 85.124000] **** Done May 10 16:25:00 trinity kernel: [ 85.124000] **** Aux: May 10 16:25:00 trinity kernel: [ 85.124000] **** Done May 10 16:25:00 trinity kernel: [ 85.124000] ****Done cpu0 May 10 16:25:00 trinity kernel: [ 85.124000] **** cpu1 May 10 16:25:00 trinity kernel: [ 85.124000] **** In __sysdev_resume, calling class specific resume. May 10 16:25:00 trinity kernel: [ 85.124000] **** Done May 10 16:25:00 trinity kernel: [ 85.124000] **** Aux: May 10 16:25:00 trinity kernel: [ 85.124000] **** Done May 10 16:25:00 trinity kernel: [ 85.124000] ****Done cpu1 May 10 16:25:00 trinity kernel: [ 85.124000] **** Resuming type 'timer': May 10 16:25:00 trinity kernel: [ 85.124000] **** timer0 May 10 16:25:00 trinity kernel: [ 85.124000] **** In __sysdev_resume, calling class specific resume. May 10 16:25:00 trinity kernel: [ 9002.124000] **** Done May 10 16:25:00 trinity kernel: [ 9002.124000] ****Done timer0 May 10 16:25:00 trinity kernel: [ 9002.124000] **** Resuming type 'i8259': May 10 16:25:00 trinity kernel: [ 9002.124000] **** i82590 May 10 16:25:00 trinity kernel: [ 9002.124000] **** In __sysdev_resume, calling class specific resume. May 10 16:25:00 trinity kernel: [ 9002.124000] **** Done
Hello, I have found the problematic module that don't allow suspend to ram to work in the Toshiba U205 S5067 laptop: it is the sata driver. This laptop has as an Intel ICH7 Family SATA controller that uses the libsata + ata_piix module. I had a hint that the SATA module could be the problem after ROSS told us that the LFDK CD kernel could suspend to RAM. I confirmed this now after recompiling my distribution kernel (Ubuntu Feisty Fawn, based on kernel 2.6.20) and making the following changes to the kernel configuration (in menuconfig): 1) Compile the following options as builtin (not as module): Device Drivers -> ATA/ATAPI/MFM/RLL Support -> Include IDE/ATA-2 DISK support Device Drivers -> ATA/ATAPI/MFM/RLL Support -> genric IDE chipset support (This is necessary to allow the laptop to access the HD without the SATA driver. Note that the HD works without DMA and hence it is very, very slow, so this is not a real workaround, it is only a way to do the test. Note that with this driver the hd will appear as /dev/hda (and not as /dev/sda). This may make necessary some changes in the GRUB boot options. In Ubuntu no changes to grub are necessary) 2) Turn off (exclude) the ata_piix module: Device Drivers -> Serial ATA (prod) and Parallel ATA (experimental) drivers -> Intel ESB, ICH, PIIX3, PIIX4 PATA/SATA support After these changes I have recompiled the kernel and rebooted onto the new compiled version. The laptop is slow due to the DMA-less HD, but I am able to suspend to ram and come back just in LFDK CD.
OK, thanks for the work done. I'll see if I can redirect this bug to the SATA people.
What more can we do from here to help?
Well, let's try to let Tejun Heo know about the problem.
ata_piix should behave nicely on shutdown and resume. I use it all the time on my notebook. With ata_piix, you said that the machine does enter suspend mode after several minutes. Can it wake up from the suspend? If so, dmesg should contain what was going on during those minutes. Can you post it?
Created attachment 11519 [details] Dmesg information with libsata + ata_piix loaded Here is the required dmesg information. Note the long below that follows the "Back to C!" line.
Created attachment 11520 [details] Dmesg information *without* libsata + ata_piix loaded Dmesg information when I do not load libsata + ata_piix and use the generic ide driver instead.
The log indicates that ACPI sleep routine are burning the CPU cycles, not the libata code. It might be that ata_piix puts the controller into different mode than the ACPI suspend routine expects and it could also have something to do with libata not executing ATA ACPI callbacks. The generic driver doesn't know how to program the controller. That's probably why acpi suspend routine is happy with it. Proper implementation of ATA-ACPI is currently pending review and will probably make it to 2.6.23. Can you test a git tree?
Yes, I can. Please just send me the tree URL. It may take a couple of days though.
Okay, the tree is at... http://htj.dyndns.org/git/?p=libata-tj.git;a=shortlog;h=acpi git://htj.dyndns.org/libata-tj acpi
I checked out the tree you mentioned and tried a kernel compiled from it. I booted with init=/bin/sh and tried "echo mem > /sys/power/state" and had the same problem, it took several minutes and a very hot CPU before it finally went to sleep. It resumed just fine. I'll also attach the dmesg output for that attempt.
Created attachment 11528 [details] dmesg output from suspend resume cycle with the git tree suggested
I forgot to ask what I/we can do next to help?
I'm out of ideas. The ata_piix driver is putting the controller and/or the drives into a different mode than the ACPI suspend routine is expecting and executing ATA ACPI routines don't seem to help much. Does the BIOS of the machine allow putting the controller into a different mode - say, AHCI or RAID?
No, I'm afraid the BIOS doesn't provide any options for putting the card in a different mode. Do you have any suggestions where else or who else we might want to draw attention to this bug?
It definitely looks like an ACPI problem. CC'ing my two ACPI acquaintances - Thomas Renninger and Kristen Carlson Accardi. Hello, guys. Does this ring any bell? ACPI suspend routine seems to expect the ATA controller in certain state and burn cpu cycles for a while if it isn't.
I can confirm Ross fidings. In my laptop the git tree kernel also display the same behavior: sleep takes very long. I would like also to state that I am, like Ross is, willing to work on solving this bug. I am not a kernel hacker but I am an experienced programmer (C, Python, C++) and I am not afraid to edit my kernel sources and do whatever you guys tell me to try. I can also arrange to give remote access to my laptop to the kernel hackers if this can help. My laptop is not my production machine, hence anything that doesn't destroy the hardware is valid :-)
Ditto, also if someone in the Bay Area wants to meet to get a chance to play around on one of these laptops, I can provide.
Hope I read this bug half way right: The problem is that S2Ram works, but takes a few mintues (hangs somewhere shortly before suspending), right? It's not working better with and without recent ACPI Sata patches (why do you think it's SATA anyway?). Hmm, maybe it's just some driver specific suspend/resume function. You could try booting into init=/bin/bash and try without any driver loaded (Rafael is "echo mem >/sys/power/state" ok for triggering S2Ram there?). If this works, try to find out which module is the bad one by either setting up the modules from init=/bin/bash or rmmodding them from fully booted system. If you don't find out anything you can try to: Can you pls attach acpidump output. Compile a kernel with PM_DEBUG (or similar) and ACPI_DEBUG set to y. Then increase acpi debug level (cat /proc/acpi/debug_level) carefully (MBs of syslog output could happen...). E.g. 0x1F or also adding execute and I/O accesses could reveal something. The bug is probably wrongly assigned to Tejun, let's wait for response and then reassign it ...
Thomas, please read comments #18 and #19. The cpu burning delay disappears if no ATA driver is loaded or the generic one is used, so it's somehow related to ATA.
Created attachment 11584 [details] dont-freeze-on-suspend This is a wild guess but does this change anything?
No, it doesn't change anything. I applied that patch to the git tree and recompiled. I booted with the new kernel and captured the output of dmesg and acpidump both before and after suspend. Suspend behaved as before, several minutes and a very hot CPU later, the machine finally suspends. I'll attach those dmesg and acpidump logs now.
Created attachment 11592 [details] dmesg from before suspend
Created attachment 11593 [details] dmesg after suspend
Created attachment 11594 [details] acpidump before suspend
Created attachment 11595 [details] acpidump after suspend
Aieeeee... so my wild guess goes only so far. I guess it's Thomas turn. :-)
i can confirm this bug using 2.6.21 kernel on a toshiba u205-s5002 laptop, i'll play with the bios settings to see if i can work something out
Pls attach acpidump output(best with a short description of the laptop/model). You can compile the kernel with ACPI_DEBGUG=y; When done, increase the debug level to e.g.: echo 0x21F >/proc/acpi/debug_level (cat /proc/acpi/debug_level shows you the available flags). Then trigger the suspend, have a look where it hangs, send syslog output with the hang marked. Hope that works...
Also, the 2.6.22-rc3 kernel contains an update that might improve the suspend. Please try it.
Created attachment 11618 [details] my dmesg using 2.6.22-rc3 highest acpi log level
Some comments from a guy who knows more about ata/sata drivers than me: It could be that a cache sync occured while the disk was already suspended. What I think is a bit suspicious is that those _STM and friends ACPI methods are missing to be invoked on resume in the ata_piix resume handler? Hector: highest acpi debug level is not a good idea in general, 0x421F (values+exec+info+default) should be enough to see whether an ACPI function got invoked. But no ACPI things are executed at all (the ones you see at beginning are battery funcs). Hector: Your dmesg output is quite long, does it really hang exactly (before/ after?) at "Back to C!"? If not could you copy out five lines and mark the exact hang with a dash line pls. Sorry I doubt I can help here much. Maybe we could try whether Hannes ACPI patches help and have a look what is different? Maybe you still have them?
I was trying today to reproduce Hector test in my laptop (U205 S5067) and my dmesg output is very different. I don't understand the output very well, but I feel like the ACPI code is being called in my case. Actually, even using a kernel buffer of 128Kb the output is still too long to fit. Is there a way to increase the dmesg buffer size? Unfortunately, my screen goes blank very fast after I trigger suspend. I don't see where the system hangs. I tried to set the kernel option to keep the console on during suspend but it still goes blank before hanging. Any tricks that can make the screen remain on? I'll attach my dmesg next. Obs: This is a core 2 duo laptop with a T7200 processor, 2Gb of RAM and intel chipset. If you need any extra description of the hardware, like a lspci, let me know.
Created attachment 11653 [details] dmesg -c output before calling suspend with a 2.6.22-rc3 kernel
Created attachment 11654 [details] dmesg output after triggering suspend and comming back (kernel 2.6.22-rc3)
I think this may be related to the $subject problem: http://bugzilla.kernel.org/show_bug.cgi?id=7780#c23
Please post the result of 'dmidecode' and 'lspci -nnv'.
(In reply to comment #49) > Please post the result of 'dmidecode' and 'lspci -nnv'. I assume you're asking Paulo. :-)
Hmm... I have another reason that the bugs may be related. Recently I have tested to enable the AHCI mode of my laptop SATA controller using the following patch: http://mjg59.livejournal.com/76062.html The patch works and my HD is controlled by the AHCI driver (I have left out the original driver just to be sure). However the behavior is the same when try to suspend to RAM (lond delay, high CPU). By the way, the idea of using the AHCI controller without depending on the BIOS is neat! However the patch still has problems, for example I can supend to RAM but I can't wake up anymore. This may indicate that the problem is in the general libata that both drivers use. I will get the dmidecode and the out put of the lspci now and attach them.
Created attachment 11796 [details] demidecode output
Created attachment 11797 [details] lspci -nnv output
Created attachment 11809 [details] skip-pci-disable-and-suspend-on-tecra-m5-and-satellite-U205.patch Please test this patch and report the result.
Bingo! Good work Tejun! The laptop suspend very fast with your patch! I tested in a official kernel of version 2.6.22-rc3. I am writing from my laptop after suspending to ram and waking up from both console and X. However there is a gotcha: resume takes longer. Without the patch resume takes around 3s, with the patch it takes 15s. (However with the patch sleep takes 3s against some minutes without the patch). As a side note. Actually, just before seeing your patch, I had already tried to comment out the call to pci_set_power_state from the ata_pci_device_do_suspend function in drivers/ata/libata-core.c in both the 2.6.22-rc3 official kernel and a Ubuntu gutsy development kernel. It worked on both. When I came to the bug page to report the news I saw your patch and I decided to try it too as it looked much more polished than the "simply comment out the bad call" approach. Looking your patch more closely, in the case of my laptop it only calls pci_save_state, while simply commenting the call to pci_set_power_state in pci_device_suspend would call pci_save_state *and* pci_disable_device. I don't know if the call to pci_disable_device should make any difference, but I thought I should call your attention to it. (It doesn't make any difference in the resume time). Another point is the question if the patch should leave in ata_piix.c file or in the general libsata-core.c. If in the future the same machines start using the AHCI driver, due to an update in the bios or Matthew patch to enable AHCI without using the bios, we would start to see the same problem again. Thanks for the good work!
Ops... There is an error on my last comment. When I said that the awake was taking longer with the patch, I was comparing the regular kernel of Ubuntu feisty (which is based on the 2.6.20 kernel), with the 2.6.22-rc3 official kernel. I have just tested the "simply comment the pci_set_power_state" approach in the Ubuntu kernel (the patch failed to apply) and awake is still very fast (3s). Hence, it seems that there is some difference on the Ubuntu kernel or on the configuration that makes the awake process take longer in the official kernel. I am now in the process of manually applying Tejun's patch to my Ubuntu feisty (based on 2.6.20) kernel to confirm it will still work. I am also going to see what happens with the new Ubuntu gutsy kernel (which is based on the pre-releases of 2.6.22) to see what happens. I'll post an update as soon as I get the kernels patched, compiled and tested. Sorry for the misinformation.
Yeah, the increased resume time is a small regression introduced during 2.6.22-rc1 merge. Can you give a shot at 2.6.22-rc5? It might behave better. Tecra M5 has a similar problem (bug 27780, thanks Rafael) and doesn't like pci_disable_device(), so... I'll attach an updated patch. Please test it on top of 2.6.22-rc5 and report how it works and how long it takes to go to sleep and wake up. Please also test STD (sleep-to-disk, not the horrible thing).
Created attachment 11824 [details] skip-pci-disable-and-suspend-on-tecra-m5-and-satellite-U205-take3.patch
Just tested with new 2.6.22-rc5 S2disk: works S2ram: works (both suspend and wake up in around 3 to 4 seconds). Good job! (Actually I have both tried S2disk and s2ram while writing this message and I could finish editing it :-)
(In reply to comment #58) > Created an attachment (id=11824) [details] > skip-pci-disable-and-suspend-on-tecra-m5-and-satellite-U205-take3.patch Do you consider the patch as the final one or are you going to update it?
Tejun, two comments: 1) Probably the Toshiba U200 (the older brother of mine U205) suffers from the same problem. See the following bug report in launchpad: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/90771 I have posted a message to this bug report to ask U200 to try your patch. If U200 is also problematic you may need to include it in the patch. 2) I have just discovered that the 2.6.22-rc5 have some regression in the suspend to disk behavior. I can not wake up if I use the Ubuntu splash screen. Note that this problem is not related to your patch as the original, unpatched, kernel already presents the problem. I am only commenting about it here because you aked me about the suspend to disk behavior.
There's another related bug which take 3 doesn't solve. Bug 7780. I'd like to merge a patch which can fix both, so I'll probably ask for another testing. I'm currently side tracked a bit, please give me a day or two. Thanks.
Ah... also please lemme know if the U200 case is fixed by the same patch (with dmi entry added of course).
Satellite U205 is named Portege M500 in some regions in Asia, and I am experiencing the exact problem. Please add it to the table: Manufacturer: TOSHIBA Product Name: PORTEGE M500
Created attachment 11930 [details] skip-pci-disable-and-suspend-on-tecra-m5-and-satellite-U205-take4.patch Please test this one on top of 2.6.22-rc7. Thanks.
Tried take4. Works perfectly on my M500.
I tried it in my U205 laptop (S5067) an it didn't work :-( Can anyone else using a U205 confirm that take 4 does not work?
Yes, just tried it with my U205-5067 and take 4 didn't work for either s2ram or s2disk. Take 3 works for s2ram.
Created attachment 11940 [details] skip-pci-disable-and-suspend-on-tecra-m5-and-satellite-U205-take5.patch Okay, I made a stupid mistake with the dmi table initializer. Please test this one. Thanks.
Also, let's move over to bug 7780. It's the same problem. Thanks. *** This bug has been marked as a duplicate of bug 7780 ***