Bug 13233 - 2.6.27 regression - boot delays on battery, but not on A/C - Compaq Presario F756NR
Summary: 2.6.27 regression - boot delays on battery, but not on A/C - Compaq Presario ...
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Shaohua
URL:
Keywords:
: 13234 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-05-03 20:08 UTC by Dmitry Lyzhyn
Modified: 2009-06-01 19:50 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.27.(all), 2.6.28.(all), 2.6.29.(all), 2.6.30rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (20.48 KB, application/octet-stream)
2009-05-03 20:08 UTC, Dmitry Lyzhyn
Details
output of #lspci -vxxx (31.00 KB, application/octet-stream)
2009-05-04 14:35 UTC, Dmitry Lyzhyn
Details
acpidump (152.43 KB, text/plain)
2009-05-04 14:36 UTC, Dmitry Lyzhyn
Details
dmesg of 2.6.29.2 (29.60 KB, application/octet-stream)
2009-05-04 14:37 UTC, Dmitry Lyzhyn
Details
try the debug patch, in which the one-shot mode is disabled for the local APIC timer (667 bytes, patch)
2009-05-12 03:37 UTC, ykzhao
Details | Diff
as per Comment #11 From Len Brown, request: dmesg -s 64000 (22.68 KB, application/octet-stream)
2009-05-12 11:11 UTC, Dmitry Lyzhyn
Details
debug patch (1.05 KB, patch)
2009-05-18 09:26 UTC, Shaohua
Details | Diff
new debug patch (1.20 KB, patch)
2009-05-19 00:46 UTC, Shaohua
Details | Diff

Description Dmitry Lyzhyn 2009-05-03 20:08:46 UTC
Created attachment 21200 [details]
dmesg

Hello,

So far I found this issue affects Compaq Presario F756NR laptop.
Whenever powering without connected power adapter, using the battery only the kernel recognizes hardware devices with long delays between kind of "steps". Looks like it stucks recognizing devices. If you wait about 3-4 minutes then kernel continuing until it stucked on next "step". And this happens over and over again approximately 10-15 times during the system boots up. If you don't wait and hit any key then booting continues immediately but until next stop. So on, you hit any key after each stop and system keeps loading. However if I plug power supply during the one of these delays then system continuing load with no more delays or any stuck. If power supply is plugged all time then system boots up normally without any delays.
This issue never occurs with using kernels 2.6.26 and below.

The only way to avoid this issue on new kernels is to use boot options nolapic or idle=poll.
But as you know nolapic won't let to use SMP and idle=poll heats CPU.

Here is the ACPI kernel configuration:

# Power management options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_HIBERNATION=y
CONFIG_PM_STD_PARTITION="/dev/sda3"
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
# CONFIG_ACPI_BAY is not set
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_WMI=m
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_SBS=y
# CONFIG_APM is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
# CONFIG_CPU_FREQ_DEBUG is not set
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_USERSPACE=m
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
CONFIG_X86_ACPI_CPUFREQ=y
# CONFIG_X86_POWERNOW_K6 is not set
# CONFIG_X86_POWERNOW_K7 is not set
CONFIG_X86_POWERNOW_K8=y
CONFIG_X86_POWERNOW_K8_ACPI=y
# CONFIG_X86_GX_SUSPMOD is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
# CONFIG_X86_SPEEDSTEP_ICH is not set
# CONFIG_X86_SPEEDSTEP_SMI is not set
# CONFIG_X86_P4_CLOCKMOD is not set
CONFIG_X86_CPUFREQ_NFORCE2=m
# CONFIG_X86_LONGRUN is not set
# CONFIG_X86_LONGHAUL is not set
# CONFIG_X86_E_POWERSAVER is not set 

Thnx in advance!
Comment 1 ykzhao 2009-05-04 01:36:35 UTC
Will you please attach the output of acpidump, lspci -vxxx?
Will you please try the following boot option and see whether the issue still exists?
   a. processor.max_cstate=1
   b. nolapic_timer
   c. hpet=disable

   Thanks.
Comment 2 ykzhao 2009-05-04 01:44:39 UTC
*** Bug 13234 has been marked as a duplicate of this bug. ***
Comment 3 Dmitry Lyzhyn 2009-05-04 14:35:13 UTC
Created attachment 21209 [details]
output of #lspci -vxxx
Comment 4 Dmitry Lyzhyn 2009-05-04 14:36:09 UTC
Created attachment 21210 [details]
acpidump
Comment 5 Dmitry Lyzhyn 2009-05-04 14:37:09 UTC
Created attachment 21211 [details]
dmesg of 2.6.29.2
Comment 6 Dmitry Lyzhyn 2009-05-04 15:16:25 UTC
(In reply to comment #1)
> Will you please attach the output of acpidump, lspci -vxxx?
> Will you please try the following boot option and see whether the issue still
> exists?
>    a. processor.max_cstate=1
>    b. nolapic_timer
>    c. hpet=disable
> 
>    Thanks.

 Hello Yakui,

 Thanks for your quick response.

 As per your request outputs of acpidump and lspci -vxxx has been attached to  this case. I also attached dmesg output for kernel 2.6.29.2 that I was using for tests (in case if you may need it).

 Using kernel 2.6.29.2 I've tested advised boot options.
Here are results:

a. processor.max_cstate=1

Still on issue but looks like booting takes less stucking "steps". I mean that I had to hit keyboard only 4 times (sorry I don't know if it does make any sense to you%)) to force booting. Just to remind you, - without processor.max_cstate=1 it was stucking up to 10-15 times when loads.

b. nolapic_timer
 With this option it works perfectly fine. As far as I was able to test the system during 2 hours I didn't catch any issues. However I don't quite understand functionality of nolapic_timer option and how it may affect running system. Will it affect performance or powersave abilities?

c. hpet=disable
 Using this option it works fine during boot time. No stops found. But the similar issue occurs on hibernation. Hibernation process stops and continues only once I plug power adapter or pressed Power Button (funny isn't it:)) The same on resuming, it starts and loads fine and then on very last step it stucks until I plug power adapter or pushed Power Button.


 Please let me know if any additional info I can provide.

I do highly appreciate your assistance!

Cheers,
Dmitry.
Comment 7 Zhang Rui 2009-05-05 02:15:54 UTC
please verify if this is a duplicate of bug #13071
Comment 8 Dmitry Lyzhyn 2009-05-05 12:23:06 UTC
(In reply to comment #7)
> please verify if this is a duplicate of bug #13071

Looks similar, but I can't say exactly. It's better to check with that bug-reporter.
 nolapic_timer works for me but doesn't work well in his case.

Can you tell me how nolapic_timer affects my system? It's not clear enough to me.

On booting time I always have this message:

Clockevents: could not switch to one-shot mode: lapic is not functional.
Could not switch to high resolution mode on CPU 0
Clockevents: could not switch to one-shot mode: lapic is not functional.
Could not switch to high resolution mode on CPU 1

No matter what kernel or boot options I use.
Does this means that kernel cannot work with LAPIC AMD Turion64 X2 processors? And how come that the kernels 2.6.26 and below were working without forcing nolapic_timer option to the kernel and starting from 2.6.27 looks like nolapic_timer is required?

Can we accept using nolapic_timer as solution on my case?

Thank you,
Dmitry
Comment 9 Shaohua 2009-05-06 07:19:11 UTC
how about boot option 'highres=off nohz=off'? if it works, please try boot option 'nohz=off'. I suspect lapic timer doesn't work in one shot mode when cpu can enter c-state in AMD cpu.
Comment 10 Dmitry Lyzhyn 2009-05-06 11:12:06 UTC
(In reply to comment #9)
> how about boot option 'highres=off nohz=off'? if it works, please try boot
> option 'nohz=off'. I suspect lapic timer doesn't work in one shot mode when
> cpu
> can enter c-state in AMD cpu.

'highres=off nohz=off' - works with no issues

 'nohz=off' - doesn't work, the same delays on booting

Thank you
Comment 11 Len Brown 2009-05-12 02:02:26 UTC
if 2.6.26 worked "out of the box" without any command-line workarounds,
and later kernels do not, then this is a regression -- marking it as such.

Can you still boot 2.6.26 on battery, and show us

cat proc/acpi/processor/*/power ?

grep . /sys/devices/system/clocksource/*/*

dmesg -s 64000
----
acpidump shows that this system uses no CST, but instead has FADT C-states.
It is possible that we are using C2 or C3 when we should not be...
Comment 12 ykzhao 2009-05-12 03:35:51 UTC
Hi, Dmitry
    Agree with what Shaohua said. When C1E feature is detected on AMD box, the max c-state is limited to 1. And in such case the local apic can't work in one-shot mode. Instead when the broadcast timer is used, the box can work well.
    
    And from the test in comment #10 we know that the local apic timer works in periodic mode when "nohz=off highres=off". But when "nohz=off" is added, the local apic timer still works in one-shot mode. And the box can't be booted.
   
Thanks.
Comment 13 ykzhao 2009-05-12 03:37:43 UTC
Created attachment 21311 [details]
try the debug patch, in which the one-shot mode is disabled for the local APIC timer

Will you please try the debug patch on the latest kernel and see whether the box can be booted?
   In the debug patch the one-shot mode is disabled for the local APIC timer.
   thanks.
Comment 14 Dmitry Lyzhyn 2009-05-12 11:11:50 UTC
Created attachment 21315 [details]
as per Comment #11 From  Len Brown, request: dmesg -s 64000
Comment 15 Dmitry Lyzhyn 2009-05-12 11:24:55 UTC
(In reply to comment #11)
> if 2.6.26 worked "out of the box" without any command-line workarounds,
> and later kernels do not, then this is a regression -- marking it as such.
> 
> Can you still boot 2.6.26 on battery, and show us
> 
> cat proc/acpi/processor/*/power ?
> 
> grep . /sys/devices/system/clocksource/*/*
> 
> dmesg -s 64000
> ----
> acpidump shows that this system uses no CST, but instead has FADT C-states.
> It is possible that we are using C2 or C3 when we should not be...

Thank you Len,

I do appreciate your interest to my issue.

The kernel 2.6.26 and all below are working perfectly fine, as you said:  "out of the box" without any command-line workarounds:) And starting from 2.6.27 and ALL kernels above are on this issue. So it is really looks as a regression.

As per your request I booted 2.6.26.8 and here are the outputs that you're interesting:

root@dmitry:/home/dima/work/bug1# cat /proc/acpi/processor/*/power
active state:            C0
max_cstate:              C8
bus master activity:     00000000
maximum allowed latency: 3000 usec
states:
    C1:                  type[C1] promotion[--] demotion[--] latency[000] usage[00172803] duration[00000000000000000000]
    C2:                  type[C2] promotion[--] demotion[--] latency[005] usage[00000000] duration[00000000000000000000]
    C3:                  type[C3] promotion[--] demotion[--] latency[020] usage[00000000] duration[00000000000000000000]
active state:            C0
max_cstate:              C8
bus master activity:     00000000
maximum allowed latency: 3000 usec
states:
    C1:                  type[C1] promotion[--] demotion[--] latency[000] usage[00142512] duration[00000000000000000000]
    C2:                  type[C2] promotion[--] demotion[--] latency[005] usage[00000000] duration[00000000000000000000]
    C3:                  type[C3] promotion[--] demotion[--] latency[020] usage[00000000] duration[00000000000000000000]

root@dmitry:/home/dima/work/bug1# grep . /sys/devices/system/clocksource/*/*
/sys/devices/system/clocksource/clocksource0/available_clocksource:hpet acpi_pm jiffies tsc
/sys/devices/system/clocksource/clocksource0/current_clocksource:hpet

Also kindly see attached output of dmesg -s 64000

Please let me know if any other info I can provide.

Cheers!

Dmitry
Comment 16 Dmitry Lyzhyn 2009-05-12 11:39:08 UTC
(In reply to comment #13)
> Created an attachment (id=21311) [details]
> try the debug patch, in which the one-shot mode is disabled for the local
> APIC
> timer
> 
> Will you please try the debug patch on the latest kernel and see whether the
> box can be booted?
>    In the debug patch the one-shot mode is disabled for the local APIC timer.
>    thanks.


Hi Yakui,

 First of all thank you for your patch and all your efforts on this issue. Appreciate it!

As per your advice I used the latest vanilla kernel version which is 2.6.29.3 but I found the difference in source paths to  apic.c
As per your path: arch/x86/kernel/apic/apic.c
For “my” source tree: arch/x86/kernel/apic.c

Also the lines # are different too. In your patch: @@ -164,7 +164,7 @@
For me it was line 152.


However I was able to apply changes manually and now my acpic.c has:

dima@dmitry:/usr/src/linux-2.6.29.3/arch/x86/kernel$ diff apic.c~ apic.c
152c152
<       .features       = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT
---
>       .features       = CLOCK_EVT_FEAT_PERIODIC
dima@dmitry:/usr/src/linux-2.6.29.3/arch/x86/kernel$

Result: The kernel doesn't boot at all. No panics, only stops on step -  PCI: Using ACPI for IRQ routing
Keyboard doesn't  work and only the Power Button helps. To be able to see the output I had to switch off the frame buffer (option vga=normal) otherwise the screen is black and laptop hangs.

Once again Thank you, and I'm looking forward to get this thing resolved.

Cheers!

Dmitry :)
Comment 17 Shaohua 2009-05-18 09:26:04 UTC
Created attachment 21401 [details]
debug patch

can you please try attached patch?
Comment 18 Dmitry Lyzhyn 2009-05-18 22:25:31 UTC
(In reply to comment #17)
> Created an attachment (id=21401) [details]
> debug patch
> 
> can you please try attached patch?

 Applied and now it works perfectly fine. System on battery boots up with no delays.

 However there is another issue I found. I'm not sure if it relates to discussing one. The system doesn't returns from s2ram until Power Button pressed. Looks strange. Laptop suspends to RAM OK, then I "wake" him up, looks fine but it stops on mid of something:) until i hit PowerButton, then all's good. Should I open another case regarding this for kernel 2.6.30?

Please advise.
Comment 19 Shaohua 2009-05-19 00:46:01 UTC
Created attachment 21418 [details]
new debug patch

then how about the new one?
Comment 20 Dmitry Lyzhyn 2009-05-19 07:59:48 UTC
(In reply to comment #19)
> Created an attachment (id=21418) [details]
> new debug patch
> 
> then how about the new one?


EXCELLENT!

All works GREAT!

Now I even have HighResolution ON which I never had. 

dima@dmitry:~/work$ cat /proc/timer_list | grep hres_active
  .hres_active    : 1
  .hres_active    : 1
dima@dmitry:~/work$

Before it always was off. 

 So, spent some time playing with different things and couldn't find any issues.

All works 100%, - Appreciate your efforts!

Dmitry:)

P.S. now I'm looking forward to 2.6.30 release:)
Comment 21 Zhang Rui 2009-05-21 08:08:24 UTC
shaohua,
is this patch target for upstream?
Comment 22 Shaohua 2009-05-21 08:15:39 UTC
patch is sent out for merge, mark this as resolved for Len
Comment 23 Len Brown 2009-06-01 19:50:28 UTC
87ad57bacb25c3f24c54f142ef445f68277705f0
cpuidle: makes AMD C1E work in acpi_idle

7d60e8ab0d5507229dfbdf456501cc378610fa01
cpuidle: fix AMD C1E suspend hang

are shipping upstream after 2.6.30-rc7-git4,
so they should be present in -rc8

However, as this broke at 2.6.27, we should
publish back-ported versions of these patches
for 2.6.27..2.6.29

closed.

Note You need to log in before you can comment on or make changes to this bug.