Bug 13795 - abnormal boot and no suspend due to 'async' (fastboot)
abnormal boot and no suspend due to 'async' (fastboot)
Status: CLOSED UNREPRODUCIBLE
Product: ACPI
Classification: Unclassified
Component: Power-Off
All Linux
: P1 normal
Assigned To: acpi_power-off
:
Depends on:
Blocks: 7216 13070
  Show dependency treegraph
 
Reported: 2009-07-18 17:19 UTC by Rafal Kaczynski
Modified: 2011-02-05 21:55 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.30/32-rc5
Tree: Mainline
Regression: Yes


Attachments
pm debug dmesgs (27.76 KB, application/octet-stream)
2009-07-22 18:26 UTC, Rafal Kaczynski
Details
config (57.57 KB, text/plain)
2009-07-22 18:27 UTC, Rafal Kaczynski
Details
config (53.35 KB, text/plain)
2009-07-22 18:30 UTC, Rafal Kaczynski
Details
disable the async probe for multiple ATA ports in one ATA controller (643 bytes, patch)
2009-07-23 07:42 UTC, ykzhao
Details | Diff

Description Rafal Kaczynski 2009-07-18 17:19:12 UTC
Hardware: Fujitsu-Siemens V5505 notebook
Kernel version: 2.6.30, 2.6.31-rc3

Symptom: After executing 'shutdown -h' the system will not start normally but instead after about 2 seconds of LED/disk spin up activty system will stop completely. After about 2 more seconds the start up process will run normally with BIOS splash screen, grub startup and the system booting normally.
If after 'shutdown -h' I would disconnect the power supply and the battery the issue disappears.
This also breaks suspend to ram.

Bisecting found this:
9710794383ee5008d67f1a6613a4717bf6de47bc is first bad commit
commit 9710794383ee5008d67f1a6613a4717bf6de47bc
Author: Arjan van de Ven <arjan@linux.intel.com>
Date:   Sun Mar 15 11:11:44 2009 -0700

    async: remove the temporary (2.6.29) "async is off by default" code

    Now that everyone has been able to test the async code (and it's being used
    in the Moblin betas by default), we can enable it by default.
    The various fixes needed have gone into 2.6.29 already.

    [With an important bugfix from Stefan Richter]

    Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

Reverting this patch on top of 2.6.31-rc3 fixes the issue for me.
Comment 1 Arjan van de Ven 2009-07-18 19:02:44 UTC
Can you be more specific where you get when things get stuck ?
(make sure to also remove "quiet" from the kernel command line if that is present,that will hide all useful messages if it's there)
Comment 2 Rafal Kaczynski 2009-07-18 19:18:43 UTC
Sorry. The symptoms are a bit difficult to explain. The issue is actually visible before kernel boots. Or even before BIOS splash screen appears.
If I shutdown a system running a kernel with this patch (with 'async') the system will shutdown normally but then later when I want to start the computer it won't start normally. I press the power button, the LEDs will turn on and the disk will spin up for about two seconds. Then the LEDs turn off and the disk spins down (as if the computer turns itself off). Then the LEDs turn on again and the computer starts normally.

The same cycle happens when I resume from suspend to ram. But this actually breaks the resume.

If I shutdown the computer and disconnect power supply and battery it will start up normally.

So it seems that 'async' is causing something in the kernel to leave a little mess in...BIOS maybe?

I don't see any errors from kernel either during startup or shutdown. I don't use 'quiet'.
Comment 3 Arjan van de Ven 2009-07-18 19:43:03 UTC
bugzilla-daemon@bugzilla.kernel.org wrote:

> 
> --- Comment #2 from Rafal Kaczynski <fscnoboot@wp.pl>  2009-07-18 19:18:43 ---
> Sorry. The symptoms are a bit difficult to explain. The issue is actually
> visible before kernel boots. Or even before BIOS splash screen appears.
> If I shutdown a system running a kernel with this patch (with 'async') the
> system will shutdown normally but then later when I want to start the computer
> it won't start normally. I press the power button, the LEDs will turn on and
> the disk will spin up for about two seconds. Then the LEDs turn off and the
> disk spins down (as if the computer turns itself off). Then the LEDs turn on
> again and the computer starts normally.
> 

Humm. Funky.

Now the problem with the bisect is that it pointed out the commit that flipped a default,
not which patch actually introduced a problem. In order to chase this down, if you
are up for this, you need to add the kernel parameter to enable this even for earlier kernels,
and redo the bisect. You can start with the git ID that introduces the flip of the switch,
and bisect back to 2.6.29 release....
Comment 4 Rafal Kaczynski 2009-07-19 12:28:26 UTC
So I've started to bisect between 2.6.28 i 2.6.29-rc1. Bisect told me:
67acd8b4b7a3f1b183ae358e1dfdb8a80e170736 is first bad commit
This is:
    Merge git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async

    * git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async:
      async: don't do the initcall stuff post boot
      bootchart: improve output based on Dave Jones' feedback
      async: make the final inode deletion an asynchronous event
      fastboot: Make libata initialization even more async
      fastboot: make the libata port scan asynchronous
      fastboot: make scsi probes asynchronous
      async: Asynchronous function calls to speed up kernel boot

I feel a 'git-lost'. Should I try to revert any particular patch of this merge?
Comment 5 ykzhao 2009-07-21 06:58:30 UTC
Hi, Rafal
    Do you mean that the suspend is affected by the async fastboot? How does this happen? 
    Do you mean that the second boot is abnormal after the box is poweroffed by using "shutdown -h"? Right? It seems that the second boot will remember what happened in the first boot. It can't be understood.
    Will you please double check it again? Please confirm whether the issue still exists if you wait for some time after shutdown the box.
    
    Thanks.
Comment 6 Rafal Kaczynski 2009-07-21 10:32:56 UTC
Dear Yakui,
Yes fastboot will make the system boot abnormally and breaks resume after suspend to ram.

Let me try to define the issue this way:
A - normal boot sequence
1) power LED turn on, hdd spins up
2) BIOS splashscreen
3) grub
4) kernel boot

B - abnormal boot sequence after shutdown of a kernel with 'fastboot' (kernel does not show any errors during shutdown, battery is installed)
1) power LED turn on, hdd spins up
2) power LED turn off, hdd spins down (no activity for about two second)
3) power LED turn on, hdd spins up
2) BIOS splashscreen
3) grub
4) kernel boot

Kernel v2.6.30 or 2.6.30-rc3 - issue exists - case B
If battery is disconnected after shutdown: no issue - case A
Kernel v2.6.30-rc3 with  commit 9710794383ee5008d67f1a6613a4717bf6de47bc reverted - no issue - case A
Kernel v2.6.29 (no parameters)- no issue - case A
Kernel v2.6.29 (fastboot kernel parameter) - issue exists - case B
While bisecting between v2.6.28 and v2.6.29 with fastboot parameter - bisect told me the first bad commit is 67acd8b4b7a3f1b183ae358e1dfdb8a80e170736.

If I shutdown in the evening, keep the battery attached it will still boot abnormally (case B) in the morning.

I'll be glad to test any patch... just please don't ask me to go through 13 bisects again ;-)

I've tried to look at the original fastboot merge. I thought I could try to disable particular areas (scsi, libata...) but this is too complex for me.
Thank you!
Comment 7 ykzhao 2009-07-22 01:12:33 UTC
Hi, Rafal
    Thanks for the response. 
    Can suspend/resume work well if the commit is reverted?
    If so, will you please enable "CONFIG_PM_DEBUG" in kernel configuration and do the following test?
    a. kill the process using /proc/acpi/event
    b. echo freezer > /sys/power/pm_test
    c. echo mem > /sys/power/state ; dmesg > dmesg_freezer 
    d. please wait for five seconds and see whether it can be resumed

    Please echo "devices/platform/core/cpu" > /sys/power/pm_test and do the above test.
    Thanks.
Comment 8 Rafal Kaczynski 2009-07-22 18:25:20 UTC
Hi Yakui,
Yes in every case where I can boot normally after shutdown I can also resume from suspend to ram. In every case when I would get the power LEDs on-off-on at startup it would also not resume.
I've switched to v2.6.29 because of the fastboot parameter there.
I couldn't echo "devices/platform/core/cpu" > /sys/power/pm_test. Echo to pm_test would only work if I specify a single value.
So I've run a series of tests for each value separately with fastboot parameter and without it. I'm attaching the results in pm_debug_dmesgs.tar.gz and also config file used.

In every case where sth was echoed to pm_test I could resume without problem (although the system didn't suspend fully...the LEDs were still on...but I guess this is expected when pm_test is used). Of course when with fastboot and pm_test set to none it wouldn't resume.

Hope it helps...
Comment 9 Rafal Kaczynski 2009-07-22 18:26:52 UTC
Created attachment 22445 [details]
pm debug dmesgs
Comment 10 Rafal Kaczynski 2009-07-22 18:27:52 UTC
Created attachment 22446 [details]
config
Comment 11 Rafal Kaczynski 2009-07-22 18:30:41 UTC
Created attachment 22447 [details]
config
Comment 12 ykzhao 2009-07-23 07:42:09 UTC
Created attachment 22464 [details]
disable the async probe for multiple ATA ports in one ATA controller

Will you please try the debug patch and see whether the issue still exists?
Thanks.
Comment 13 Rafal Kaczynski 2009-07-23 20:43:48 UTC
(In reply to comment #12)
> Created an attachment (id=22464) [details]
> disable the async probe for multiple ATA ports in one ATA controller
> 
> Will you please try the debug patch and see whether the issue still exists?
> Thanks.

This patch helps a little bit. With the patched kernel I still get LEDs on-off-on on startup after shutdown. But I can now resume from suspend-to-RAM about 50% of the time. Sometimes it resumes normally, sometimes it goes into this LEDs on-off-on again. Even if it resumes normally, on the second suspend attempt it will not resume.
Comment 14 Rafal Kaczynski 2009-08-26 18:17:55 UTC
Checked 2.6.30-rc7. No difference.
Comment 15 Florian Mickler 2009-09-25 16:50:17 UTC
ok i'm a completly clueless bystander, but for me it looks like as if the system shutdown is borked somehow like if you turn your system back on, the bios continues to shut down... maybe a race between two 'shut the machine off' mechanism that got uncovered by running (something) async?


a) bios gets asked to shut down and needs a while  (now async)
b) parallel some other shut-off mechanism gets called which is very fast (something like an acpi-low-powerstate fallback?) 

if you now press the powerbutton to turn the machine on, the bios is still in progress of shutting down via a) 

but this is pure speculation.

anyway, i added some cc's...
Comment 16 Florian Mickler 2009-09-25 16:52:24 UTC
hm, i can't add linux-acpi@vger.kernel.org ..
Comment 17 Florian Mickler 2009-09-27 13:41:10 UTC
maybe you could try another reboot mechanism for your machine. (boot with reboot=pci or reboot=acpi or reboot=force on the grub kernel cmdline)



from kernel/arch/x86/reboot.c:

/* reboot=b[ios] | s[mp] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] | p[ci]
   warm   Don't set the cold reboot flag
   cold   Set the cold reboot flag
   bios   Reboot by jumping through the BIOS (only for X86_32)
   smp    Reboot by executing reset on BSP or other CPU (only for X86_32)
   triple Force a triple fault (init)
   kbd    Use the keyboard controller. cold reset (default)
   acpi   Use the RESET_REG in the FADT
   efi    Use efi reset_system runtime service
   pci    Use the so-called "PCI reset register", CF9
   force  Avoid anything that could hang.
 */
Comment 18 Rafal Kaczynski 2009-09-29 20:46:40 UTC
Florian,
Thanks for your comments and ideas!
I've tested on 2.6.29 with fastboot. Tried both S3 suspend and shutdown scenarios with reboot=pci, reboot=acp, reboot=force, reboot-bios. 
There is no difference.
Comment 19 Rafal Kaczynski 2009-10-24 10:04:17 UTC
Tried 2.6.32-rc5 - the issue remains.
Comment 20 Florian Mickler 2010-09-03 06:27:09 UTC
Does this issue still persist?
Comment 21 Len Brown 2011-01-18 07:02:07 UTC
still an issue with 2.6.37?
Comment 22 Florian Mickler 2011-02-05 21:55:04 UTC
I'm closing this as unreproducible for now.

Note You need to log in before you can comment on or make changes to this bug.