Bug 7988 - Toshiba A100 doesn't wake after suspend
Summary: Toshiba A100 doesn't wake after suspend
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks: 7216
  Show dependency tree
 
Reported: 2007-02-11 15:39 UTC by Mitch Davis
Modified: 2008-03-02 11:34 UTC (History)
1 user (show)

See Also:
Kernel Version: kernel.org 2.6.20
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Linux Firmware Kit test results (11.05 KB, text/plain)
2007-05-29 14:37 UTC, Julian Sikorski
Details
DSDT disassembly (303.61 KB, text/plain)
2007-05-29 14:38 UTC, Julian Sikorski
Details
mitch-DSDT.dsl (116.92 KB, text/plain)
2007-05-31 08:16 UTC, Mitch Davis
Details
mitch-results.txt (11.19 KB, text/plain)
2007-05-31 08:18 UTC, Mitch Davis
Details
julian-mitch-results-txt.diff (9.92 KB, patch)
2007-05-31 08:20 UTC, Mitch Davis
Details | Diff
lspci of A100-847 (2.01 KB, text/plain)
2007-06-10 03:33 UTC, Julian Sikorski
Details

Description Mitch Davis 2007-02-11 15:39:56 UTC
Most recent kernel where this bug did *NOT* occur: n/a

Distribution: initramfs (bins from FC6)

Hardware Environment: Toshiba Satellite A100 laptop, Celeron M 1.4GHz, 512M RAM,
SIL SATA HD

Software Environment: virtually no dev drivers.  no net, no blk, no usb, no
anything except for VGA console, keyboard and RTC

Hello,

Thanks for the work you guys do with ACPI.

I have a Toshiba A100-105 laptop which has trouble resuming after
suspend-to-RAM.  The BIOS says the laptop is an "ATI RC410MB+SB450 (Goldfish)".

After Matthew Garret's great talk at LCA2007, I grabbed 2.6.20-rc5 (now 2.6.20)
from kernel.org, turned on CONFIG_PM_TRACE, and turned off as much other stuff
as I could.  There's no block devices, networking, USB, SATA, anything.  I also
made an initramfs with bash and a few utilities from FC6 in it.

After compiling with CONFIG_PM_TRACE and booting into the initramfs, I'm trying
to suspend with echo -n mem > /sys/power/state.

The message "stopping tasks" flashes briefly on the screen, and the power LED
(normally solid green) turns orange and starts doing a rolling fade in/fade out.
 Power to everything seems off.

When I press the power button, the LED goes back to solid green, the backlight
and CD go on, the hard disk LED goes on momentarily, but there is no BIOS screen
and pressing Caps Lock doesn't toggle the LED.  Holding down the power causes
the machine to power-off again (power LED is off) but otherwise there's no
response from the machine.

This continues as long as there's a battery or AC: Turning the laptop on just
turns on the LED and CD.  Removing battery and AC allows the laptop to boot again.

When I reboot, the date and time hasn't changed, which seems to indicate that a
trace value was never written to CMOS.  So I replaced the if() test in
TRACE_RESUME() with if(1).

When I tried again, I found that the date/time was still not changed, which
tells me that the kernel isn't getting as far as resuming any devices.

I modified my kernel to also do a TRACE_RESUME just as it's about to suspend. 
(For the call to TRACE_DEVICE(), I wrote an fn to look up the "zero" device).

@@ -192,13 +194,22 @@
                goto Unlock;
        }
 
+       zero = find_zero_device();
+       if (zero == 0)
+               goto Unlock;
+       TRACE_DEVICE(zero);
+       TRACE_RESUME(0);
        pr_debug("PM: Preparing system for %s sleep\n", pm_states[state]);
        if ((error = suspend_prepare(state)))
                goto Unlock;

+       TRACE_DEVICE(zero);
+       TRACE_RESUME(1);
        pr_debug("PM: Entering %s sleep\n", pm_states[state]);
        error = suspend_enter(state);

+       TRACE_DEVICE(zero);
+       TRACE_RESUME(2);
        pr_debug("PM: Finishing wakeup.\n");
        suspend_finish(state);
  Unlock:

This time the date and time is modified, and the TRACE report on reboot points
to the TRACE_RESUME(1) call.  So I think my laptop is suspending, but never
leaving suspend_enter().

I am running FC6, but Matthew Garret suggested I try Ubuntu.  I did, and Ubuntu
6.10 has the same problem.  I also tried the patch for VIA southbridges
mentioned here, but it doesn't help:

  http://lca2007.linux.org.au/profile/53

I've also updated the BIOS to Toshiba's latest version, but it hasn't helped.

Here are some dumps from my laptop.  Most were taken under FC6, not my initramfs.

  http://www.afork.com/a100/acpidump.txt
  http://www.afork.com/a100/dmidecode.txt
  http://www.afork.com/a100/intr.txt
  http://www.afork.com/a100/lspci.txt
  http://www.afork.com/a100/mjd-trace.diff

I haven't included a dmesg or serial log, as I don't have a serial port and I
have no way of saving the output.  Would that be useful?

Can anyone suggest what's happening here?  Is more information needed?  What can
I do to further diagnose the problem?

Any help you can give me would be most appreciated.

Thanks,

Mitch.
Comment 1 Mitch Davis 2007-02-14 21:26:27 UTC
Here is the .config for my kernel:

  http://www.afork.com/a100/.config

Here is the same file, with the inactive options removed (shorter):

  http://www.afork.com/a100/.config-filtered
Comment 2 Rafael J. Wysocki 2007-02-21 10:05:37 UTC
Can you try the latest -mm kernel (but please do not compile it with 
CONFIG_PREEMPT set)?
Comment 3 Mitch Davis 2007-02-27 14:38:26 UTC
I have now tried 2.6.20-mm2, and the result is exactly the same: The laptop
suspends but never gets to first base on resume.

Please, any ideas?
Comment 4 Julian Sikorski 2007-05-25 09:27:22 UTC
I am afraid I might be hitting the same problem here with Satellite A100-847.
More info on Red Hat bug #229464.
Comment 5 Julian Sikorski 2007-05-29 14:37:43 UTC
Created attachment 11613 [details]
Linux Firmware Kit test results
Comment 6 Julian Sikorski 2007-05-29 14:38:41 UTC
Created attachment 11614 [details]
DSDT disassembly
Comment 7 Rafael J. Wysocki 2007-05-30 10:36:45 UTC
Mitch, Julian, can you both please test 2.6.22-rc3 or the latest -git ?
Comment 8 Julian Sikorski 2007-05-30 11:16:48 UTC
Not until it gets into Fedora development repo - I don't think I'll be able to
compile kernel from scratch. I have tested using 2.6.20.3 to be precise. Also,
what I suspect, is that 5.00 and further bios updates have screwed things
deeper. I tried rolling back to kernel that used to work, and it does not anymore.
Comment 9 Julian Sikorski 2007-05-30 11:25:46 UTC
I mean 2.6.21.3
Comment 10 Julian Sikorski 2007-05-30 11:33:28 UTC
OK, found 2.6.22-rc3 in fedora cvs. I'll give it a go.
Comment 11 Julian Sikorski 2007-05-30 13:38:06 UTC
2.6.22-rc3, precisely funny-versioned 2.6.21-1.3193.fc8 does not work. Tried
with init=/bin/bash.
Comment 12 Mitch Davis 2007-05-31 08:16:53 UTC
Created attachment 11619 [details]
mitch-DSDT.dsl

Dump of DSDT from running Linuxtoolkit r2 on my Toshiba
Comment 13 Mitch Davis 2007-05-31 08:18:17 UTC
Created attachment 11620 [details]
mitch-results.txt

Results of Linuxtoolkit r2 run on my Toshiba.
Comment 14 Mitch Davis 2007-05-31 08:20:16 UTC
Created attachment 11621 [details]
julian-mitch-results-txt.diff

Julian and I both ran Linuxtoolkit r2.	Nevertheless, the ordering of sections
in our results.txt files was different.  I reordered the sections in his file
to match mine, then diffed the two.  Here is the result.
Comment 15 Mitch Davis 2007-05-31 08:27:25 UTC
> Mitch, Julian, can you both please test 2.6.22-rc3 or the latest -git ?

Hi Rafael, thanks for grabbing this bug.

I just tried 2.6.22-rc3 again, with the same minimal config I had when I
reported the bug.  The problem and symptoms are the same :-(

Like Julian, I ran Linuxtoolkit r2, and I've attached my results.  It certainly
seems like his machine and mine are related, although his CPU is faster and the
DSDT seems to have been compiled with the Intel compiler not the Microsoft one.

Doing Linuxtoolkit's suspend/resume fails in the same way - power light goes
green but no response.

Any ideas now?  I have been talking with Matthew Garrett.  I'll point this bug
out to him.  Meanwhile, is there anything else I can do for further diagnosis?

Thanks, Mitch.
Comment 16 Mitch Davis 2007-05-31 08:44:26 UTC
Julian, I'm compiling my kernel with a minimal config (no devices except console
and RTC), and providing a ramdisk with some system commands from FC6 in it.  If
you'd like a copy (3Mb), you can grab it here:

  http://www.afork.com/a100/bug-7988-bzImage

To boot it, I added two lines to my /boot/grub/grub.conf:

  title My kernel
    kernel (hd0,5)/tosh/src/linux/compile/arch/i386/boot/bzImage

Then it shows up in my boot menu.  (You'd have to use a different path)
Comment 17 Rafael J. Wysocki 2007-05-31 09:42:57 UTC
Referring to Comment #15:

I'll have to look at your DSDT and Linuxtoolkit r2 results more thoroughly, but
that'll take some time.
Comment 18 Mitch Davis 2007-06-01 06:34:32 UTC
Might this help?

  http://article.gmane.org/gmane.linux.acpi.devel/23326
  [PATCH]: ACPI: preserve the ebx value in acpi_copy_wakeup_routine

I'll try it...
Comment 19 Mitch Davis 2007-06-01 07:13:58 UTC
> [PATCH]: ACPI: preserve the ebx value in acpi_copy_wakeup_routine
> 
> I'll try it...

It didn't help.  Same failure to resume, code never leaves suspend_enter().
Comment 20 Andrey Borzenkov 2007-06-02 04:43:13 UTC
This one may have similar cause (at least it is also Toshiba): 
http://bugzilla.kernel.org/show_bug.cgi?id=7499. If someone could try if 
compiling the kernel before mentioned commit 
(http://bugzilla.kernel.org/show_bug.cgi?id=7499#c13) fixes it for you too, it 
would be nice. 
Comment 21 Mitch Davis 2007-06-02 16:09:44 UTC
Hi Andrey,

> This one may have similar cause (at least it is also Toshiba): 

Toshiba source from a number of places.  My laptop mainboard is made by (and has
chips from) ATI.  A guy with similar symptom's Toshiba was made by Quanta and
has Intel chips.  And yours seems to have ALi chips.

> http://bugzilla.kernel.org/show_bug.cgi?id=7499.
> If someone could try if compiling the kernel before mentioned commit

I will try it, but it appears in your case that the BIOS at least passes control
back to Linux!  So it seems that your problem and mine are quite different.

Thanks, Mitch.
Comment 22 Julian Sikorski 2007-06-02 16:12:44 UTC
Just to clarify: A100-847 has an Intel Core 2 Duo T7200 CPU, GeForce Go 7600 GPU
and Intel 945PM chipset.
Comment 23 Mitch Davis 2007-06-03 01:07:32 UTC
> Just to clarify: A100-847 has an Intel Core 2 Duo T7200 CPU,
> GeForce Go 7600 GPU and Intel 945PM chipset.

Yes.  Yours has a number of significant hardware differences to mine, but I
think the BIOS is closely related, as are the symptoms.  I think Andrey's and
Olaf Dietsche's laptops (see below) are significantly different.

  http://article.gmane.org/gmane.linux.acpi.devel/23340
Comment 24 Mitch Davis 2007-06-08 07:46:23 UTC
Hi Rafael,

> I'll have to look at your DSDT and Linuxtoolkit r2 results more thoroughly,
> but that'll take some time.

Have you had a chance to have a look?  Alternatively, is there something else
Julian or I could do that would help debug the problem?

Thank you!
Comment 25 Rafael J. Wysocki 2007-06-08 13:43:05 UTC
No, unfortunately I haven't.

I think the problems are related to the graphics adapters.  Please have a look 
at this thread on LKML: http://lkml.org/lkml/2007/6/8/226
Comment 26 Julian Sikorski 2007-06-08 15:54:58 UTC
This thread suggested me to check for nvidia driver. Indeed, I upgraded it from
9361 to 9746 around the time resume stopped working. It did not help, though.
Moreover, I checked that nvidia.ko is not loaded in minimal mode, so it does not
seem to be related.
Comment 27 Rafael J. Wysocki 2007-06-09 04:15:30 UTC
So, the suspend doesn't work with and without the NVidia driver loaded?
Comment 28 Julian Sikorski 2007-06-09 07:50:02 UTC
Yes. And in past (my wild guess is that with earlier bios) it worked with
nvidia.ko loaded.
Comment 29 Mitch Davis 2007-06-10 03:02:40 UTC
Julian, can you post the results of lspci please?  (I have an ATI graphics chip,
not NV)
Comment 30 Julian Sikorski 2007-06-10 03:33:10 UTC
Created attachment 11723 [details]
lspci of A100-847
Comment 31 Mitch Davis 2007-06-10 04:30:04 UTC
Hi Julian,

The symptoms we're seeing sound similar, and I thought our LinuxToolkit results
looked similar (esp BIOS), but it seems there's a huge difference in hardware
between the two machines.

The majority of chips in your laptop are from Intel, whereas in mine, lspci
doesn't have any.  The majority of mine appear to be ATI.

So, it may be that we have two separate problems.

Rafael, what do you suggest we do next?
Comment 32 Rafael J. Wysocki 2007-06-11 12:29:27 UTC
To check if the kernel real-mode resume code is executed, you can use the 
Pavel's beeping patch available at: http://lkml.org/lkml/2007/6/9/80
(uncomment the BEEP before 'movw $0xb800, %ax' to enable the beeping).

Then, if this code turns out to be executed, you can use PM_TRACE to check 
where exactly it fails.  That will be tedious, I'm afraid, but I don't see what 
else can be done at this point.
Comment 33 Julian Sikorski 2007-06-11 12:39:15 UTC
Unlucky, I have x64 here :(. But I doubt it will beep, since PM_TRACE does not
change the clock at all. Still may be worth trying.
Comment 34 Mitch Davis 2007-06-12 06:37:27 UTC
I tried the beep patch (thank you) but I didn't get a beep after trying to arouse my laptop from its slumber.  I will also try to put the beep patch into the early kernel startup (also in real mode) to verify that the beep code works ok on my machine.
Comment 35 Rafael J. Wysocki 2007-06-21 10:12:23 UTC
(In reply to comment #33)
> Unlucky, I have x64 here :(. But I doubt it will beep, since PM_TRACE does
> not
> change the clock at all. Still may be worth trying.

x86_64 version is available now (thanks to Nigel), so you can try it:

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc5/patches/29-Optional-Beeping-During-Resume-From-RAM.patch
Comment 36 Julian Sikorski 2007-07-11 10:29:09 UTC
Compiling the patch into Fedora's 2.6.22 right now. My limited coding skills show that it should beep provided I enable pm_trace, right?

P.S.
Sorry it took so long. Dreaded real life syndrome.
Comment 37 Julian Sikorski 2007-07-11 14:27:56 UTC
OK, here is what I did: 
booted into patched kernel in minimal mode;
echo 1 > /sys/power/s2ram_beep;
pm-suspend;
resume
I got no beep, but I have to confirm if this box has a pc speaker it could beep with at all.
Comment 38 Julian Sikorski 2007-07-18 15:11:46 UTC
Unlucky, this laptop does not beep.
Comment 39 Julian Sikorski 2007-07-18 15:13:08 UTC
And I checked 2.6.22.1, still no luck.
Comment 40 Julian Sikorski 2007-07-18 16:41:31 UTC
Looks like the kernel dies _really_ early. I am slowly starting to lose hope it will ever work.
One idea came to my mind, though. Our lovely vendor has started to enable intel VT in bioses, it was disabled in the past. Unfortunately, there is no option to switch it. Can this be related?
Comment 41 Julian Sikorski 2007-07-22 11:35:17 UTC
What is maybe even more strange, I find clock going totally bonkers when I try to resume *without* PM_TRACE, whileenabling it does not change it at all. Weird. For record, I tried:
pm-suspend --quirk-s3-bios
pm-suspend --quirk-s3-mode
pm-suspend --quirk-s3-bios --quirk-s3-mode
earlier today under the init=/bin/bash boot, which ended up shifting the clock 4 hours forward. And I can't use NTP here.
Comment 42 Julian Sikorski 2007-07-22 12:05:18 UTC
I did some more examination on this and found that the failed resume sequence™ (pm-suspend, resume that fails, holding down the power button to shut down and reboot) shifts the clock 1 hour forward if invoked in minimal boot (init=/bin/bash), but leaves it intact when used with fully-booted system.
Comment 43 Rafael J. Wysocki 2007-08-20 10:01:08 UTC
Can you try if this patch changes anything:
http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc3/patches/26-s2ram-kill-old-debugging-junk.patch
Comment 44 Mitch Davis 2007-08-21 08:04:28 UTC
Hi Rafael,

Thanks for looking at this bug.

I tried 2.6.23-rc3 with this patch and the same thing happened as before: On suspend, the steady green power light goes to orange fading in and out.  After pressing the power button to reawaken it, the power light goes to green, and the batt/disk/power leds all go on, but no beep, no caps lock led, no anything.

Obviously S-T-R works under Windows.  But why?  Is there anything I can do under Windows that would help?  Is it possible to use VMware or QEMU or something to trap what Windows is doing with the hardware?  Maybe it's using ACPI in a way Linux isn't.  Is there some sort of spy program I can run to find out?
Comment 45 Rafael J. Wysocki 2007-08-21 12:17:31 UTC
Well, I'm still thinking that this problem is related to video.

Have you ever tried s2ram (http://en.opensuse.org/s2ram)?
Comment 46 Julian Sikorski 2007-08-24 11:10:04 UTC
The patch has no influence here either, *but* kernel-2.6.23-0.124.rc3.git2.fc8 resumes ok. It does not make the things worse, then. Well, now at least there is caps lock/find response, I need to figure out the correct quirk for the display.
Looks like we were having separate problems indeed, Mitch. Sorry for the noise. 
Comment 47 Julian Sikorski 2007-08-24 13:16:33 UTC
OK, with nvidia blob installed, system wakes up perfectly with no quirks at all. 
Comment 48 Rafael J. Wysocki 2007-09-11 14:15:05 UTC
Hm, I'm not sure what to do with this bug.

The problem is evidently graphics-related and I don't think we can fix it in the kernel.
Comment 49 Mitch Davis 2007-09-11 18:55:38 UTC
Hi,

I have an ATI graphics chip, not nVidia.  I'm trying the suspend from a text console of a cut-down kernel.  Not X and not framebuffer.

We still may be seeing two problems create similar symptoms, and my problem is different to the other reporter's problem.  Apart from s2ram, is there a test you'd like me to do in order to show it's graphics related?

Nevertheless, I will try s2ram tonight, and I will pay special attention to the ATI-specific s2ram notes.

Thanks,

Mitch.
Comment 50 Mitch Davis 2007-09-12 09:22:49 UTC
Hi Rafael,

I just tried s2ram from suspend-0.7.  With -n, it gives this info:

Machine matched entry 258:
    sys_vendor   = 'TOSHIBA'
    sys_product  = 'Satellite A100'
    sys_version  = ''
    bios_version = ''
Fixes: 0x3  S3_BIOS S3_MODE
This machine can be identified by:
    sys_vendor   = 'TOSHIBA'
    sys_product  = 'Satellite A100'
    sys_version  = 'PSAA2A-03501N'
    bios_version = '1.90'

Without any args, s2ram puts the laptop to sleep in the same way as previously indicated, and the laptop fails to wake as previously indicated.  No caps lock, no beep.  That's in both FC6 (kernel-2.6.22.2) and from a bare kernel+initrd (kernel-2.6.23.rc3)

I tried various combos of the recommended flags, and I got the message:
Switching from vt1 to vt1
/proc/sys/kernel/acpi_video_flags does not exist; you need a kernel >= 2.6.16.
switching back to vt1

(and suspend does not happen)

I am compiling the kernel with CONFIG_ACPI turned on.  Do you know what else I need to turn on for s2ram with arguments to work?

Alternatively, any other test I can do?

(I don't think it's the same problem as the laptop with the nVidia chip.  Wish it was)
Comment 51 Rafael J. Wysocki 2007-09-12 12:52:31 UTC
(In reply to comment #50)
> Hi Rafael,
> 
> I just tried s2ram from suspend-0.7.  With -n, it gives this info:
> 
> Machine matched entry 258:
>     sys_vendor   = 'TOSHIBA'
>     sys_product  = 'Satellite A100'
>     sys_version  = ''
>     bios_version = ''
> Fixes: 0x3  S3_BIOS S3_MODE
> This machine can be identified by:
>     sys_vendor   = 'TOSHIBA'
>     sys_product  = 'Satellite A100'
>     sys_version  = 'PSAA2A-03501N'
>     bios_version = '1.90'
> 
> Without any args, s2ram puts the laptop to sleep in the same way as
> previously
> indicated, and the laptop fails to wake as previously indicated.  No caps
> lock,
> no beep.  That's in both FC6 (kernel-2.6.22.2) and from a bare kernel+initrd
> (kernel-2.6.23.rc3)
> 
> I tried various combos of the recommended flags, and I got the message:
> Switching from vt1 to vt1
> /proc/sys/kernel/acpi_video_flags does not exist; you need a kernel >=
> 2.6.16.
> switching back to vt1
> 
> (and suspend does not happen)

That's why the quirks don't work for you.

> I am compiling the kernel with CONFIG_ACPI turned on.  Do you know what else
> I
> need to turn on for s2ram with arguments to work?

CONFIG_SUSPEND in the power management menu.
Comment 52 Mitch Davis 2007-09-13 03:34:42 UTC
CONFIG_SUSPEND is already on.

  http://www.afork.com/a100/dot-config-s2ram.txt

Is there a more specific define?

Also, is it part of the video module, and how would I turn that module on?

Many thanks, Mitch.
Comment 53 Rafael J. Wysocki 2007-09-13 05:32:30 UTC
(In reply to comment #52)
> CONFIG_SUSPEND is already on.
> 
>   http://www.afork.com/a100/dot-config-s2ram.txt

First, you can try to unset CONFIG_DISABLE_CONSOLE_SUSPEND.

> Is there a more specific define?

No.

> Also, is it part of the video module, and how would I turn that module on?

No, it's not.

Can you boot the 2.6.23-rc3 and see if the file /proc/sys/kernel/acpi_video_flags is present?
Comment 54 Mitch Davis 2007-09-21 07:08:12 UTC
Greetings.

I found why I didn't have that file: Needed CONFIG_PROC_SYSCTL.  I recompiled.

I tried s2ram (no longer complains), and the behaviour was identical to before: Suspends, but does not resume properly.  No caps lock, no beep, no nothing.  The computer will resuspend, but needs a disconnection of power and battery to get out of stuck state.

I'm glad s2ram helped Julian.  I think my A100 must be different enough inside that the fix listsed in the whitelist doesn't apply.  (This is similar to Linksys WRT-54s or some DLink wireless cards, which have the same model number but can have different hardware).

Is there any other diagnostic I can do?  Out of Windows and Linux, can I run one OS under another in a virtual machine, and capture how they access portspace when suspending?  Might there be a secret port write (Matthew Garrett found such a port for ATI southbridges)?  Anything else wacky we can try?

Any help you could give would be much appreciated.
Comment 55 Rafael J. Wysocki 2007-09-21 07:27:11 UTC
Well, there are some important suspend-related fixes in the current Linus' tree.  If you can, please test it.
Comment 56 Mitch Davis 2007-09-23 01:51:13 UTC
I tried 2.6.23-rc7-git2 and the result was practically the same.
The laptop still won't resume (orange power LED turns to green but otherwise stuck). 

There are two differences as far as I can see, both relevant to that stuck state.  One is that the backlight doesn't come back on.  The second is that it's no longer possible to hold down the power button and turn it off.  Not sure if that's relevant (doesn't seem useful) but it is a change.

Do you have any ideas of what I can try next?  No idea too silly.
Comment 57 Rafael J. Wysocki 2007-09-23 03:10:33 UTC
Well, I'm not sure if it applies on top of 2.6.23-rc7-git2, but you can try this patch:
http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.23-rc7/patches/23-s2ram-kill-old-debugging-junk.patch
(if it doesn't apply, please let me know).
Comment 58 Rafael J. Wysocki 2007-09-23 05:00:01 UTC
Also, what kind of hard disk drive is there in your box?  IDE or SATA?
Comment 59 Mitch Davis 2007-09-23 05:46:36 UTC
> 23-s2ram-kill-old-debugging-junk.patch

I'll try it soon.

> what kind of hard disk drive is there in your box?

It's SATA.  Controller is Silicon Image 3112, and the driver is sata_sil.  It has given people a fair bit of trouble, although Tejun Heo's work has helped a lot and is now more or less reliable.

Is it worth making a bootable USB so that the SATA drive is never touched?
Comment 60 Rafael J. Wysocki 2007-09-23 05:53:08 UTC
(In reply to comment #59)
> > 23-s2ram-kill-old-debugging-junk.patch
> 
> I'll try it soon.

OK

> > what kind of hard disk drive is there in your box?
> 
> It's SATA.  Controller is Silicon Image 3112, and the driver is sata_sil.  It
> has given people a fair bit of trouble, although Tejun Heo's work has helped
> a
> lot and is now more or less reliable.

Please try if booting the kernel with "libata.noacpi=0" in the command line helps.

> Is it worth making a bootable USB so that the SATA drive is never touched?

Well, that may cause other sorts of problems to appear, but of course you can try. ;-)
 
Comment 61 Mitch Davis 2007-09-30 16:26:58 UTC
> Please try if booting the kernel with "libata.noacpi=0"
> in the command line helps.

It beeps!  With libata.noacpi=0, it beeps!  

Still not there yet, but I'll do some more digging.  Thank you!
Comment 62 Rafael J. Wysocki 2007-10-06 08:43:33 UTC
Now, I think, you can use PM_TRACE to identify the place where it fails.
Comment 63 Mitch Davis 2007-10-06 16:08:17 UTC
Oh that's a good idea, thanks.  I'll do that.
Comment 64 Rafael J. Wysocki 2007-12-12 16:53:47 UTC
Can you test the current mainline kernel, please?
Comment 65 Mitch Davis 2007-12-13 21:44:19 UTC
Hello Rafael,

In comment #61, I mentioned that I got a beep with libata.noacpi=0.  I only got it once, and I've never been able to reproduce it since. :-(

I just tried 2.6.24-rc5-git3, and the behaviour is still the same - the power light turns back to green, but otherwise no response.

I added acpi_sleep=s3_beep, and I tried with and without libata.noacpi=0, but I'm not getting a post-resume response or beep.

Thank you for your continued help.  Is there anything else I can try?
Comment 66 Rafael J. Wysocki 2007-12-25 11:24:23 UTC
FYI, there are some fixes related to libata suspend in the current mainline.

Also, I have some patches that could help us debug the issue a bit further, but they only apply on top of 2.6.24-rc6 (or a later kernel).
Comment 67 Mitch Davis 2007-12-26 14:39:53 UTC
Hi Rafael, Merry Christmas.

> there are some fixes related to libata suspend in the current mainline.

No support for block devices, network or USB is compiled in.  I have tried to make the kernel as simple as possible.

> I have some patches that could help us debug the issue a bit further,
> but they only apply on top of 2.6.24-rc6 (or a later kernel).

Thanks, I would like to try those patches.

Mitch.
Comment 68 Rafael J. Wysocki 2007-12-27 14:27:49 UTC
Please apply the patch series from:
http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.24-rc6/patches/
on top of vanilla 2.6.24-rc6 (or the current -git), compile the kernel with CONFIG_PM_DEBUG set, try to do:

# echo core > /sys/power/pm_test
# echo mem > /sys/power/state

and see if that works (it won't actually suspend your system, but it will busy wait for 5 seconds after executing the suspend sequence and it will execute the resume sequence after that).
Comment 69 Mitch Davis 2008-01-27 18:05:18 UTC
Hello, reporting to you live from LCA :-)

I have tried 2.6.24, with the patches in http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.24/snapshot-080126.tgz, and following your instructions.

On the second echo, I get suspend-type messages, then a delay of several seconds, then everything comes back again.

I really think the BIOS isn't handing control back to Linux, on resume.
Comment 70 Rafael J. Wysocki 2008-01-28 13:06:17 UTC
Well, I assume you've tried the plain suspend with these patches too ...
Comment 71 Mitch Davis 2008-01-28 15:55:11 UTC
Yes.  I have tried with and without libata.noacpi=0.

  echo 1 > /sys/power/pm_trace
  echo mem > /sys/power/state

The results are the same as before.  Suspends, but when power is pressed to resume, light turns green, but no other response.  The date and time is not altered.
Comment 72 Rafael J. Wysocki 2008-01-28 16:00:45 UTC
Hmm.  You might be right that the control doesn't reach the kernel on resume.  I don't know how to verify that, though.
Comment 73 Mitch Davis 2008-01-28 16:49:32 UTC
Hi Rafael,

Somehow the BIOS is able to jump back into Windows ok.  So I think it's either that some piece of hardware isn't being set properly by Linux, or it could be that some data the BIOS cares about is not set properly.  (For an example of the first, see here)

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=709cf5ea7a8bea1b956d361ee7cef1945423200c

Here are some ideas.

 - Is there something to verify the basic suspend/resume mechanism in the BIOS, without all the other Linux stuff around it?

 - Is there some way I can dump ACPI-related state in Windows, and dump state under Linux pre-suspend, to see if there's any difference?

 - Is there a way to run the Linux->BIOS->Linux cycle under an emulator, such as QEMU?

Any help gratefully received.
Comment 74 Mitch Davis 2008-01-28 19:33:12 UTC
Hi,

I've been working with Matthew Garrett.  We've found that if we resume immediately after suspending (like, 1/3 second), then the resume comes back.  No video, but the caps lock works.

It's possibly related to DRAM self-refresh: If self-refresh is not being activated, then a longer delay will lead to corruption of the contents of memory, and suspend won't work.

Any ideas?
Comment 75 Mitch Davis 2008-02-29 07:05:35 UTC
Hello,

I'm proposing we close this bug.

The problem is certainly real, and we seem to have narrowed it down to the self-refresh not being set on the RAM prior to suspend (thanks Matthew G), but I seem to be the only one having the problem (or at least, the only reporter), and I don't know how to get the information from ATI/AMD on how to enable it manually via blacklist.

Rafael, thank you for your continuing help.
Comment 76 Rafael J. Wysocki 2008-03-02 11:34:33 UTC
Okay, I've closed it with "Insufficient data" as the resolution.  Please reopen if necessary.

Note You need to log in before you can comment on or make changes to this bug.