Bug 12936

Summary: Fastboot Breaks Resume from S3.
Product: ACPI Reporter: Dennis Jansen (dennis.jansen)
Component: OtherAssignee: Len Brown (lenb)
Status: CLOSED CODE_FIX    
Severity: high CC: acpi-bugzilla, arjan, rjw, shaohua.li, tj, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 12398    
Attachments: dmesg 2.6.29 without fastboot after resume
dmesg 2.6.29 with fastboot
lspci -vvnn
patch: probe ata ports synchronously
dmesg 2.6.29 with fastboot, patch applied
dmesg 2.6.29, patch applied, includes the trace data
dmesg 2.6.29 fastboot, patch not applied, no trace found.
patch: probe scsi disk synchronously
resume diag script
dmesg 2.6.29 fastboot, both patches applied, resumes fine
patch: probe scsi disk after ata port probe is done
lspci normal boot
lspci fastboot -vvxxx
lspci diff -C10 normal fastboot
lspci normal boot from init=/bin/bash
fix shipping in 2.6.30-rc1-git7

Description Dennis Jansen 2009-03-25 10:26:54 UTC
Created attachment 20668 [details]
dmesg 2.6.29 without fastboot after resume

If I enable fastboot my machine doesn't resume from S3, but hangs and then reboots. I'm not sure if it's too early to file bugs for fastboot?

I'll attach a few logs. If you need anything, please let me know.

I'm running the Ubuntu precompiled kernel from here: http://kernel.ubuntu.com/%7Ekernel-ppa/mainline/
Comment 1 Dennis Jansen 2009-03-25 10:29:04 UTC
Created attachment 20669 [details]
dmesg 2.6.29 with fastboot

The fastboot dmesg is cut off at the end I think. Must've been the crash. I'll post a full one if you wish.
Comment 2 Dennis Jansen 2009-03-25 10:29:42 UTC
Created attachment 20670 [details]
lspci -vvnn
Comment 3 Zhang Rui 2009-03-26 02:19:03 UTC
Created attachment 20683 [details]
patch: probe ata ports synchronously

please apply this debug patch on top of 2.6.29 and see if the problem still happens.
Comment 4 Zhang Rui 2009-03-26 03:10:55 UTC
(In reply to comment #1)
> Created an attachment (id=20669) [details]
> dmesg 2.6.29 with fastboot
> 
> The fastboot dmesg is cut off at the end I think. Must've been the crash.
> I'll
> post a full one if you wish.

it would be great if you can attach the full dmesg.

please set CONFIG_PM_DEBUG, and run
echo devices > /sys/power/pm_test;
echo mem > /sys/power/state;
can the system come back after a few seconds?
if yes, please attach the dmesg after this test.
Comment 5 Dennis Jansen 2009-03-27 13:48:44 UTC
The system does not come back at all. Without the patch it turns it self off and on again twice. Then it's running the BIOS again.

With the patch the system resumes but just hangs. That can be traced:
[   11.607544]   Magic number: 0:77:396
[   11.607597]   hash matches drivers/base/power/main.c:390
[   11.607741] rtc_cmos 00:09: setting system clock to 2024-03-02 20:22:30 UTC (1709410950)

I will attach the full dmesgs.
Comment 6 Dennis Jansen 2009-03-27 13:51:13 UTC
Created attachment 20702 [details]
dmesg 2.6.29 with fastboot, patch applied

dmesg 2.6.29 with fastboot, patch applied: Here it starts fine. It resumes and just hangs without even initializing the display.
Comment 7 Dennis Jansen 2009-03-27 13:52:56 UTC
Created attachment 20703 [details]
dmesg 2.6.29, patch applied, includes the trace data
Comment 8 Dennis Jansen 2009-03-27 13:53:42 UTC
Created attachment 20704 [details]
dmesg 2.6.29 fastboot, patch not applied, no trace found.
Comment 9 ykzhao 2009-03-30 03:26:37 UTC
Hi, Dennis
    Will you please do the test as required in comment #4 and attach the output of dmesg_after?
      >echo devices > /sys/power/pm_test;
      > echo mem > /sys/power/state; dmesg >dmesg_after;
   Thanks.
Comment 10 Zhang Rui 2009-03-30 03:29:51 UTC
so this also happens in the test in comment #4?
Comment 11 Dennis Jansen 2009-03-30 09:11:32 UTC
Hi,

yes. You are right Zhang Rui. I *did* the test exactly as in #4,
just one time with the patch and once without it.
I already described my results above.
As you may read from them, the system did not return
from suspend to ram in either, so there is no dmesg_after, only the magic nr. I could retrieve with the patch, see #5. I guess the word "traced" may be misleading. It doesn't mean that the system came back from suspend.
Comment 12 Dennis Jansen 2009-03-30 09:15:05 UTC
Oh and just to be very clear: It's not just the display that's not initialized. The system hangs completely. I can't reset it in any other way(SysRq can't reboot the computer either) but by pressing the power button a few seconds until it turns off. I did these tests out of the single user mode.
Comment 13 Zhang Rui 2009-03-31 03:43:57 UTC
Created attachment 20750 [details]
patch: probe scsi disk synchronously

please apply this patch on top of the previous one,
the problem goes away this time, right?
Comment 14 Dennis Jansen 2009-03-31 14:04:52 UTC
Created attachment 20757 [details]
resume diag script

I have used this script for the resume tests.
Comment 15 Dennis Jansen 2009-03-31 14:07:59 UTC
Created attachment 20758 [details]
dmesg 2.6.29 fastboot, both patches applied, resumes fine

Yes, with both patches applied the resume works fine. For some reason there is a
 WARNING: at kernel/power/main.c:176 suspend_test_finish+0x80/0x90() 
in the kernel I use to test resuming. It's not in the kernel I originally discovered the problem in, though. All logs here except #20668 are from my test kernel.
Comment 16 Zhang Rui 2009-04-03 01:39:03 UTC
Created attachment 20784 [details]
patch: probe scsi disk after ata port probe is done

please revert the previous two debug patches, apply this patch and see if it helps.
Comment 17 Zhang Rui 2009-04-03 01:51:31 UTC
from the dmesg, we can see that
when fastboot is used
1. ahci driver probes port 0
2. sd driver probes the disk in port 0
3. ahci driver probes port 1 and 2
when fastboot is not used,
1. ahci driver probes port 0, 1, 2
2. sd driver probes the disk in port 0

I don't know if this sequence change causes this problem.
but the patch in comment #16 should fix this.

as the scsi/ata device is broken in this bug, cc Tejun.
Comment 18 Dennis Jansen 2009-04-04 10:56:53 UTC
The patch from #16 does not work at all for me. The behavior on resume is the same as vanilla, absolutely no change. As vanilla, I can't even get a "magic number".
Comment 19 Shaohua 2009-04-08 02:44:26 UTC
How about boot option 'libata.noacpi'?
Comment 20 Dennis Jansen 2009-04-08 06:14:11 UTC
With which kernel? Unpatched, or one of the patches?
Comment 21 Shaohua 2009-04-08 06:35:39 UTC
unpatched kernel please.
Comment 22 Dennis Jansen 2009-04-08 07:13:24 UTC
Does not help. Maybe forgot a parameter?

126:[    0.000000] Kernel command line: root=UUID=4349ad91-9018-4e72-817b-46624e9a6c27 ro quiet splash fastboot libata.noacpi single
127:[    0.000000] Booting kernel: `' invalid for parameter `libata.noacpi'
Comment 23 Shaohua 2009-04-08 07:22:25 UTC
sorry, it should be libata.noacpi=1
Comment 24 Zhang Rui 2009-04-08 07:39:10 UTC
please attach the lspci -vvxxx output both w/ and w/o "fastboot" parameter
Comment 25 Dennis Jansen 2009-04-08 07:47:24 UTC
#23: No, doesn't help at all. Behavior is the same as without the parameter.
Comment 26 Dennis Jansen 2009-04-08 07:49:34 UTC
Created attachment 20879 [details]
lspci normal boot
Comment 27 Dennis Jansen 2009-04-08 08:17:57 UTC
Created attachment 20881 [details]
lspci fastboot -vvxxx
Comment 28 Dennis Jansen 2009-04-08 08:28:15 UTC
Created attachment 20882 [details]
lspci diff -C10 normal fastboot
Comment 29 Dennis Jansen 2009-04-08 08:31:32 UTC
Created attachment 20883 [details]
lspci normal boot from init=/bin/bash

At first the lspci normal boot was done from the runnning system, not from init=/bin/bash. So that's fixed now. Now the diff is much smaller and I hope more usable. The fastboot diff is also from init=/bin/bash.
Comment 30 Dennis Jansen 2009-04-09 11:16:39 UTC
Please remove the NEEDINFO tag as it's incorrect.
Comment 31 Len Brown 2009-04-14 15:35:28 UTC
Created attachment 20974 [details]
fix shipping in 2.6.30-rc1-git7

patch from Linus shipping in 2.6.30-rc1-git7
that should fix this bug.
Comment 32 Dennis Jansen 2009-04-15 16:29:36 UTC
Fix confirmed.

2.6.30-rc1 did not fix the problem, rather it appeared without the fastboot parameter as well. But 2.6.30-rc2 is fixed with or without fastboot parameter.

Nice Work!

Power-Off_Retract_Count is up
Comment 33 Dennis Jansen 2009-04-15 16:30:36 UTC
to 23 from all the testing... hope this one won't break like my last one a month ago.