Bug 12304 - random freeze unless acpi=off - NB Clevo M570TU
Summary: random freeze unless acpi=off - NB Clevo M570TU
Status: REJECTED DOCUMENTED
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: ykzhao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-26 08:11 UTC by Martin Prosicky
Modified: 2009-09-29 06:20 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.28-10
Subsystem:
Regression: No
Bisected commit-id:


Attachments
from normal system without any kernel par. (103.59 KB, application/octet-stream)
2008-12-26 08:14 UTC, Martin Prosicky
Details
acpi_enforce_resources=strict (51.79 KB, text/plain)
2008-12-26 08:16 UTC, Martin Prosicky
Details
last 1000 lines of dmesg output with full ACPI messages turned on (92.42 KB, text/plain)
2008-12-26 08:17 UTC, Martin Prosicky
Details
errors reported by linuxfirmwarekit (12.40 KB, text/plain)
2008-12-26 08:18 UTC, Martin Prosicky
Details
lspci (2.23 KB, text/plain)
2008-12-26 08:19 UTC, Martin Prosicky
Details
acpidump of newest bios (152.33 KB, text/plain)
2008-12-26 08:21 UTC, Martin Prosicky
Details
typical dmesg after login (40.23 KB, text/plain)
2008-12-26 08:23 UTC, Martin Prosicky
Details
acpidumps (10.00 KB, application/octet-stream)
2008-12-29 14:02 UTC, Martin Prosicky
Details
lspci -vxxx (35.33 KB, text/plain)
2008-12-29 14:03 UTC, Martin Prosicky
Details
add the mwait c-state mask to avoid overflow (1.72 KB, patch)
2008-12-30 19:13 UTC, ykzhao
Details | Diff
dmesg with errors from Ubuntu 8.10 64bit CD (110.00 KB, application/x-tar)
2009-01-14 00:25 UTC, Martin Prosicky
Details

Description Martin Prosicky 2008-12-26 08:11:07 UTC
Latest working kernel version: no
Earliest failing kernel version: 2.6.25.5 but probably even erlier
Distribution: OpenSuse11
Hardware Environment: NB Clevo M570TU
Software Environment:
Problem Description: System randomly freeze (no HD activity, "screenshot" on the screen ...)

Steps to reproduce: run any linux

/***************************************************************************/
Details:

It starts freezing 1 month after OpenSuse 11 instalation without any reason (no HW or SW upgrade).
ACPI related cos works with acpi=off

System: 

"VisionBook M570TU" it is "Clevo M570TU"
C2D T9400
2*2GB DDRIII
320GB/7200
GF9800GT

Linux OpenSuse 11 64bit

kernel 2.6.28-10
last bios 1.00.10

Basic facts:
Before the first freeze I had been using the linux sytem for 1 month on this NB without any single freeze.
Now it freeze in aprox. 45 minutes after start. But there were also few seconds and 7 hours extremes.
It freeze with nvidia, nv driver or without graphics.
Starts freezing with no reson known to me. (I used backup I had from the date when it was OK but it also freezed)
I also have Vista 32b (preinstaled) and XP on the NB. XP have been freezing from the very first momemt but Vista is OK.
This NB certainly have some ACPI isues. I can not use sleep mode, can change brightness only with "acpi_osi=" kernel parameter and DSDT is not perfect ....
The only way I can run the NB for unlimited time is to use acpi=off or acpi=ht (with these par. my mouse becames a "slow motion mouse", touchpad works fine)
Tried also with other distribution (Mandriva live CD) which freeze the same way - so it is not distribution related.

I have tried:

I have tried every combination of kernel boot parameters, but only acpi=of helps.


The message 

<6>i801_smbus 0000:00:1f.3: PCI INT C -> GSI 19 (level, low) -> IRQ 19
<4>ACPI: I/O resource 0000:00:1f.3 [0x1c00-0x1c1f] conflicts with ACPI region SMBI [0x1c00-0x1c0f]
<6>ACPI: Device needs an ACPI driver


in my boot.msg looked promising, but after I got rid of it with "acpi_enforse_resources=strict" which produced this:

<6>i801_smbus 0000:00:1f.3: PCI INT C -> GSI 19 (level, low) -> IRQ 19
<3>ACPI: I/O resource 0000:00:1f.3 [0x1c00-0x1c1f] conflicts with ACPI region SMBI [0x1c00-0x1c0f]
<6>ACPI: Device needs an ACPI driver
<4>i801_smbus: probe of 0000:00:1f.3 failed with error -16

the system keep freezing.

Probably the messages "SMART Usage Attribute: 194 Temperature_Celsius changed from 103 to 101" are also related to this - I have hothing so hot in the NB.


I ran linuxfirmwarekit with bad results: SUMMARY:  5 Fails, 4 Warns, 11 Pass, 20 Total
See linuxfirmwarekit.txt for details. Among others the DSDT is buggy but I think the errors are not dengerous. I put

Store("ErrorMartin!!! - Method ATIF", Debug)
Sleep(2000)

to the control path that do not return and it seems that system never reach that point.

I don't know how much are the other errors and warnings dangerous.


A also turned on ACPI debuging mesages and kept writing dmesg output every 0.1s (see last 1000 lines in dmesg.ACPIdebug.txt ) but either I have missed some error/warning or there was no.

After many days of testing I have reached the point when I have no other idea on what is the couse of this behaviour. So any idea will be greatly appreciated.

Martin
Comment 1 Martin Prosicky 2008-12-26 08:14:33 UTC
Created attachment 19490 [details]
from normal system without any kernel par.
Comment 2 Martin Prosicky 2008-12-26 08:16:01 UTC
Created attachment 19491 [details]
acpi_enforce_resources=strict
Comment 3 Martin Prosicky 2008-12-26 08:17:40 UTC
Created attachment 19492 [details]
last 1000 lines of dmesg output with full ACPI messages turned on
Comment 4 Martin Prosicky 2008-12-26 08:18:44 UTC
Created attachment 19493 [details]
errors reported by linuxfirmwarekit
Comment 5 Martin Prosicky 2008-12-26 08:19:20 UTC
Created attachment 19494 [details]
lspci
Comment 6 Martin Prosicky 2008-12-26 08:21:31 UTC
Created attachment 19495 [details]
acpidump of newest bios
Comment 7 Martin Prosicky 2008-12-26 08:23:17 UTC
Created attachment 19496 [details]
typical dmesg after login
Comment 8 ykzhao 2008-12-28 17:02:32 UTC
Will you please try the following boot option and see whether the system still freezes?
   a. idle=poll
   b. processor.max_cstate=1 (the processor should be compiled as built-in kernel module)
   c. nolapic_timer

   Will you please attach the following output?
   ./acpidump --addr 0xBFD1AC20 --length 0x0265 -o cpu0ist
   ./acpidump --addr 0xBFD18620 --length 0x0549 -o cpu0cst
   ./acpidump --addr 0xbfd19CA0 --length 0x1cf -o cpu1ist
   ./acpidump --addr 0xbfd19f20 --length 0x08d -o cpu1cst
   lspci -vxxx
 
   Thanks.
   
Comment 9 Martin Prosicky 2008-12-29 14:02:06 UTC
Created attachment 19531 [details]
acpidumps

acpidump --addr 0xBFD1AC20 --length 0x0265 -o cpu0ist
acpidump --addr 0xBFD18620 --length 0x0549 -o cpu0cst
acpidump --addr 0xbfd19CA0 --length 0x1cf -o cpu1ist
acpidump --addr 0xbfd19f20 --length 0x08d -o cpu1cst
Comment 10 Martin Prosicky 2008-12-29 14:03:59 UTC
Created attachment 19532 [details]
lspci -vxxx
Comment 11 Martin Prosicky 2008-12-29 14:09:57 UTC
(In reply to comment #8)
> Will you please try the following boot option and see whether the system
> still
> freezes?
>    a. idle=poll
>    b. processor.max_cstate=1 (the processor should be compiled as built-in
> kernel module)
>    c. nolapic_timer
> 

This will take some time since all these parametrs helped to prolonge the time to freeze.
Now I can say that with "processor.max_cstate=1" the system freezed after 4.5 hours.
The other parameters are being tested.

>    Will you please attach the following output?
>    ./acpidump --addr 0xBFD1AC20 --length 0x0265 -o cpu0ist
>    ./acpidump --addr 0xBFD18620 --length 0x0549 -o cpu0cst
>    ./acpidump --addr 0xbfd19CA0 --length 0x1cf -o cpu1ist
>    ./acpid
> ump --addr 0xbfd19f20 --length 0x08d -o cpu1cst
>    lspci -vxxx
>    Thanks.
> 
> 

Output attached.

Martin
Comment 12 ykzhao 2008-12-29 17:52:15 UTC
Will you please also try the boot option of "idle=halt" on the latest kernel(2.6.28-rc8)?
Thanks.
Comment 13 Len Brown 2008-12-29 18:49:26 UTC
> I also have Vista 32b (preinstaled) and XP on the NB.
> XP have been freezing from the very first momemt but Vista is OK.

So 32-bit Windows XP suffers the same intermittant hangs
as Linux does?

Can you run memtest overnight to see if it catches any memory errors?
Comment 14 Len Brown 2008-12-29 18:58:50 UTC
ACPI: EC: GPE storm detected, transactions will use polling mode

this is not a good sign, particularly since this appears to be
a brand-new state-of-the-art laptop, buy Clevo, which will
probably appear in many product lines...  But unclear if
this is related to the actual failure, which seems extremely
intermittant.
Comment 15 Len Brown 2008-12-29 19:13:04 UTC
> errors reported by linuxfirmwarekit

blech:

[INFO]-4 CPU frequency steps supported

 Frequency | Speed 
-----------+---------
  2.58 Ghz | 100.0 %
  2.58 Ghz |  95.4 %
  1.65 Ghz |  60.3 %
   800 Mhz |  30.1 %


[FAIL]-Processors are set to SW_ANY
Comment 16 Martin Prosicky 2008-12-30 12:25:46 UTC
(In reply to comment #12)
> Will you please also try the boot option of "idle=halt" on the latest
> kernel(2.6.28-rc8)?
> Thanks.
> 

On the latest stable kernel (2.6.28) I get following results for kernel parameters:

idle=processor.max_cstate=1
   freeze (with 100% certainty - freezed after 4.5 hours)

idle=poll
   OK (with 90% certainty - without freeze for 15 hours)

idle=halt
   OK (with 90% certainty - without freeze for 15 hours)

idle=nolapic_timer
   TODO (but did not freeze for 3.5 hours)

Should I test the "nolapic_timer" or increase the confidence for for example "idle=halt"?
Comment 17 Martin Prosicky 2008-12-30 12:34:07 UTC
(In reply to comment #13)
> > I also have Vista 32b (preinstaled) and XP on the NB.
> > XP have been freezing from the very first momemt but Vista is OK.
> 
> So 32-bit Windows XP suffers the same intermittant hangs
> as Linux does?
> 
> Can you run memtest overnight to see if it catches any memory errors?
> 

OK, I will devote this evening to memtest to make sure. But I have run few single-pass memtests in the past with no errors.
Comment 18 Martin Prosicky 2008-12-30 13:37:03 UTC
(In reply to comment #16)

> idle=halt
>    OK (with 90% certainty - without freeze for 15 hours)

sadly, after 

(In reply to comment #16)
> (In reply to comment #12)
> > Will you please also try the boot option of "idle=halt" on the latest
> > kernel(2.6.28-rc8)?
> > Thanks.
> > 
> 
> On the latest stable kernel (2.6.28) I get following results for kernel
> parameters:
> 
> idle=processor.max_cstate=1
>    freeze (with 100% certainty - freezed after 4.5 hours)
> 
> idle=poll
>    OK (with 90% certainty - without freeze for 15 hours)
> 
> idle=halt
>    OK (with 90% certainty - without freeze for 15 hours)

Sadly, after 16h with "idle=halt" freeze again.
This time it was different in that I was able to move mouse cursor.

> 
> idle=nolapic_timer
>    TODO (but did not freeze for 3.5 hours)
> 
> Should I test the "nolapic_timer" or increase the confidence for for example
> "idle=halt"?
> 
Comment 19 ykzhao 2008-12-30 17:34:43 UTC
Hi, Mprosicky
   Thanks for the test. From the test it seems that there is no freeze when "idle=poll/halt" is added.
   There is no boot option of "idle=nolapic_timer". Instead it is "nolapic_timer".
   
   When "idle=halt" is added, only C1 is used and "halt" is used to enter C1. And when the "processor.max_cstate=1" is added, only C1 is used. But the difference is that "mwait" is used to enter C1.
   How about the boot option of "idle=nomwait"?
   Thanks.
Comment 20 ykzhao 2008-12-30 19:11:08 UTC
Hi, Mprosicky
    Sorry that I don't pay attention to what you have said in comment #8. The system still freezes again after 16 hours when the boot option of "idle=halt" is added.  But it seems that this case is different. The mouse cursor can be moved. Can you confirm whether the OS is still alive? Can the system be logined by using ssh?
   Thanks.
    
Comment 21 ykzhao 2008-12-30 19:13:49 UTC
Created attachment 19553 [details]
add the mwait c-state mask to avoid overflow

Will you please try the debug patch and see whether the system still freezes when the following boot option is added?
    a. idle=halt
    b. processor.max_cstate=1
    
    Thanks.
Comment 22 Martin Prosicky 2008-12-31 13:13:12 UTC
(In reply to comment #19)

>    There is no boot option of "idle=nolapic_timer". Instead it is
> "nolapic_timer".

Yes, you are right, that was a typo. I used "nolapic_timer".
Comment 23 Martin Prosicky 2008-12-31 13:38:39 UTC
(In reply to comment #21)
> Created an attachment (id=19553) [details]
> add the mwait c-state mask to avoid overflow
> 
> Will you please try the debug patch and see whether the system still freezes
> when the following boot option is added?
>     a. idle=halt
>     b. processor.max_cstate=1
> 
>     Thanks.
> 

add a.
   freeze after 10 hours . No activity, ssh not works, no ping, no HD activity, processors run at max freq. - vents on max.

add b.
   will try now

Maybe I can than run the system with "idle=poll" to see if it freeze or not in longer term. (it was able to run for sum of 16h without freeze with "idle=poll", but it freezed after 16h with "idle=halt" ...)
Comment 24 Martin Prosicky 2008-12-31 13:45:28 UTC
(In reply to comment #13)
> > I also have Vista 32b (preinstaled) and XP on the NB.
> > XP have been freezing from the very first momemt but Vista is OK.
> 
> So 32-bit Windows XP suffers the same intermittant hangs
> as Linux does?
> 
> Can you run memtest overnight to see if it catches any memory errors?
> 

RAM is in good shape.
memtest: 12 pass -> Errors=0
Comment 25 Martin Prosicky 2009-01-01 14:53:05 UTC
I tried to check how "processor.max_cstate=1" works and found for me quite interesting thing:
The following results are for low loaded system.


*** Output from PowerTOP for kernel 2.6.28 without any boot par.:

     PowerTOP version 1.9       (C) 2007 Intel Corporation

Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        ( 1.5%)         2.54 Ghz     3.2%
C1                0.0ms ( 0.0%)         2.54 Ghz     0.0%
C2                9.4ms (98.5%)         1.60 Ghz     1.0%
                                         800 Mhz    95.9%

*** Output from PowerTOP for kernel 2.6.28 with "processor.max_cstate=1" boot par.:

     PowerTOP version 1.9       (C) 2007 Intel Corporation

Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        (100.0%)        2.54 Ghz     0.8%
C1                0.0ms ( 0.0%)         2.54 Ghz     0.0%
C2                0.0ms ( 0.0%)         1.60 Ghz     0.3%
                                         800 Mhz    98.9%
----------------------------------------

Does that mean that I am not using C1 states at all ?







Outputs from /proc/acpi/processor/CPU0/power follows:

*** Output from PowerTOP for kernel 2.6.28 without any boot par.:
martin@linux-bs3f:~> cat /proc/acpi/processor/CPU0/power
active state:            C0
max_cstate:              C8
bus master activity:     00000000
maximum allowed latency: 16000 usec
states:
    C1:                  type[C1] promotion[--] demotion[--] latency[001] usage[00004247] duration[00000000000000000000]
    C2:                  type[C2] promotion[--] demotion[--] latency[057] usage[00041399] duration[00000000000897622890]

*** Output from PowerTOP for kernel 2.6.28 with "processor.max_cstate=1" boot par.:
martin@linux-bs3f:~> cat /proc/acpi/processor/CPU0/power
active state:            C0
max_cstate:              C1
bus master activity:     00000000
maximum allowed latency: 16000 usec
states:
    C1:                  type[C1] promotion[--] demotion[--] latency[001] usage[00034937] duration[00000000000000000000]
    C2:                  type[C2] promotion[--] demotion[--] latency[057] usage[00000000] duration[00000000000000000000]
Comment 26 Martin Prosicky 2009-01-01 14:56:02 UTC
Little summary:

*** kernel 2.6.28

idle=processor.max_cstate=1
   freeze (with 100% certainty - after 4.5 hours)

idle=poll
   OK (with 90% certainty - without freeze for 20 hours)

idle=halt
   freeze (with 100% certainty - after 16 hours)



idle=nolapic_timer
   TODO (but did not freeze for 3.5 hours)

*** kernel 2.6.28 + debug patch (id=19553) from comment #21 (cstate.c)

idle=halt
   freeze (with 100% certainty - after 10 hours)

processor.max_cstate=1
   freeze (with 100% certainty - after 7 hours)


I also tried:
*** kernel 2.6.28 with CONFIG_NO_HZ=n

idle=poll
   freeze (with 100% certainty - after 14 hours)
----------------------------------------

So the only working setup (20h) is tickless kernel with idle=poll. 
Comment 27 Martin Prosicky 2009-01-04 10:06:40 UTC
"nolapic_timer" works!!! (with 95% certainty) - no freeze in 36 hours.

With other setting including "idle=poll" NB usually freezed in cca 10h.
The only slight annoyance is that I can't enter processor state higher than C1.

ykzhao, Is there anything I can do to find out why is my local apic timer not reliable? 
Comment 28 Martin Prosicky 2009-01-04 12:46:07 UTC
(In reply to comment #27)
> "nolapic_timer" works!!! (with 95% certainty) - no freeze in 36 hours.
> 
> With other setting including "idle=poll" NB usually freezed in cca 10h.
> The only slight annoyance is that I can't enter processor state higher than
> C1.
> 
> ykzhao, Is there anything I can do to find out why is my local apic timer not
> reliable? 
> 

Don't works anymore. Freezed 1h later. I tried evrything ...
Comment 29 Martin Prosicky 2009-01-05 09:20:58 UTC
Seems that we were tring do make impossible possible.

I did not know about the pact between M$ and Phoenix (my NB bios):
http://www.linuxquestions.org/questions/general-10/warning-there-is-windows-in-my..-bios-544779/#post2705017
http://blog.linuxoss.com/2007/04/phoenix-bios-locks-out-all-oss-except-vista/

The situation decribed there fits perfectly to mine: I can ran only Vista on the NB. Now I can just sell this shi..
Thank you for your exceptional support guys. Next time I will avoid Phoenix bios at any price.
M
Comment 30 ykzhao 2009-01-05 17:30:08 UTC
    From the description in comment #26 it is very strange that the system still freezes even when adding the option of "idle=poll" if the tickless feature is disabled. But if tickless is enabled, there is no freeze when adding the option of "idle=poll".
    
    More worse is that the system still freezes even when adding the option of "idle=halt". After adding the option of "idle=halt", the system will halt when it is idle. But this still can't work well.
   
Hi, Venki
    Any idea about this bug?

Thanks.
Comment 31 Venkatesh Pallipadi 2009-01-05 17:51:00 UTC
This may be related to timer. Can you please try default kernel (with no idle= parameters) but with "hpet=disable" boot option, with and without tickless. We have had some issue with HPET that shows up once in a while and is not resolved as yet AFAIK....
Comment 32 Martin Prosicky 2009-01-05 20:41:08 UTC
(In reply to comment #30)
>     From the description in comment #26 it is very strange that the system
> still freezes even when adding the option of "idle=poll" if the tickless
> feature is disabled. But if tickless is enabled, there is no freeze when
> adding

Sorry if I forgot to report some freeze, but basicly the NB freezed with any combination of kernel/parameters.


> the option of "idle=poll".
> 
>     More worse is that the system still freezes even when adding the option
>     of
> "idle=halt". After adding the option of "idle=halt", the system will halt
> when
> it is idle. But this still can't work well.
> 
> Hi, Venki
>     Any idea about this bug?
> 
> Thanks.
Comment 33 ykzhao 2009-01-05 21:25:47 UTC
Do you try the boot option of "hpet=disable" suggested by Venki?
   Thanks.
Comment 34 Martin Prosicky 2009-01-06 08:40:41 UTC
(In reply to comment #31)
> This may be related to timer. Can you please try default kernel (with no
> idle=
> parameters) but with "hpet=disable" boot option, with and without tickless.
> We
> have had some issue with HPET that shows up once in a while and is not
> resolved
> as yet AFAIK....
> 

This was pretty fast.
"hpet=disable" with and without tickless kernel freezed in 10 minutes.

So to conclude the effect of boot parameters:

without any parameter: 
->   freeze in approx. 1h

with "acpi=off": 
->   do not freeze

with parameters like
   nolapic_timer (record-man with 40 hours)
   idle=poll
   idle=halt
   idle=processor.max_cstate=1
   idle=halt
-> freeze in approx. 10h

The times are not statistically valid because I usually try it only once - it just shows trends. It seems that the parameters from the third group did not solve the problem, just suppress the symptoms.
Comment 35 Martin Prosicky 2009-01-11 06:38:21 UTC
I can trigger the freeze now:

1. Start the NB with linux (OpenSuse 11 64bit, kernel 2.6.28, KDE3.5) with AC cable unpluged (battery supply).

When I was running on battery I have not encountered any freeze (but the operating period when on battery is quite limited)

2. Plug the AC cable -> freeze happem after approx. 5s.

----------------------------------------------------------
Otjer info:
When I am running the NB on battery I can use C3 state, with AC cable pluged in I can use only C2 state. Maybe this is normal?
----------------------------------------------------------

Any ideas on what I can try now to reveal the cause of this?
Comment 36 Martin Prosicky 2009-01-11 12:40:42 UTC
(In reply to comment #35)
> I can trigger the freeze now:
> 
> 1. Start the NB with linux (OpenSuse 11 64bit, kernel 2.6.28, KDE3.5) with AC
> cable unpluged (battery supply).
> 
> When I was running on battery I have not encountered any freeze (but the
> operating period when on battery is quite limited)
> 
> 2. Plug the AC cable -> freeze happem after approx. 5s.

The above worked three times, but no longer. I can't reproduce it now.
It's realy hard to fix this NB when it is one big RANDOM engine.

Just one piquancy. XP instaled on this NB started behaving differently. It does not freeze under normal opartion but freeze everytime I finish burning DVD. Vista is still stable.
> 
> ----------------------------------------------------------
> Otjer info:
> When I am running the NB on battery I can use C3 state, with AC cable pluged
> in
> I can use only C2 state. Maybe this is normal?
> ----------------------------------------------------------
> 
> Any ideas on what I can try now to reveal the cause of this?
> 
Comment 37 ykzhao 2009-01-11 19:38:46 UTC
> Otjer info:
> When I am running the NB on battery I can use C3 state, with AC cable pluged
> in
> I can use only C2 state. Maybe this is normal?
This is normal. From the acpidump it seems that the C3 state can be entered while running on battery. And the deepest C-state is C2 while AC adapter is plugged. This is controlled by BIOS.
Comment 38 ykzhao 2009-01-11 19:44:00 UTC
Hi, Martin
    From the comment #26 it seems that the system will freeze even when adding the boot option of "idle=poll" if the tickless feature is disabled.
    Will you please double check it again?
    Thanks.
Comment 39 Martin Prosicky 2009-01-11 23:32:19 UTC
(In reply to comment #38)
> Hi, Martin
>     From the comment #26 it seems that the system will freeze even when
>     adding
> the boot option of "idle=poll" if the tickless feature is disabled.
>     Will you please double check it again?
>     Thanks.

Yes, till now it freezed unless "acpi=off" (I should check this one, it's long time since I tried it last).
Acording to my records, the original kernel 2.6.28 without tickless feature with "idle=poll" freezed, but I will check it.
Comment 40 Martin Prosicky 2009-01-14 00:25:21 UTC
Created attachment 19783 [details]
dmesg with errors from Ubuntu 8.10 64bit CD

I have some new error messages that could be helpfull.

With Ubuntu 8.10 64 bit CD I am getting this errors:

ACPI: EC: missing confirmations, switch off interrupt mode.

<- this one I'm getting also on my OpenSuse, but in Ubuntu it is immedietely followed by:

ACPI Exception (evregion-0419): AE_TIME, Returned by Handler for [EmbeddedControl] [20080609]
ACPI Error (psparse-0530): Method parse/execution failed [\_SB_.BAT0._BST] (Node ffff88013fa398c0), AE_TIME
ACPI Exception (battery-0360): AE_TIME, Evaluating _BST [20080609]

This messages happen at the end of starting Ubuntu or anytime later. It seems to me that I am not getting this message when starting on battery (not statistically sure - must check)

Maybe the behaviour reported in comment #36 relates to this?

If anyone knows that this errors are not dangerous, please let me know - so I will not waste my time (to ykzhao: I am still checking the tick kernel with idle=poll).

In attached tar you can find two dmesg from Ubuntu. In the fist, this message appears when starting (then I restart). In the second it appears later and after few hours Ubuntu freezes.
Comment 41 ykzhao 2009-01-14 01:29:32 UTC
Hi, Martin
    Please ignore the warning message related with EC. It is harmless.
    As the system works in 64-bit mode, please use the boot option of "noapictimer". Sorry that I give the incorrect boot option of "nolapic_timer".
    
    Thanks.
Comment 42 Martin Prosicky 2009-01-16 14:07:44 UTC
So it seems that my Clevo m570tu can't stand any hard drive (external or internal) connected.


I removed my HDD from NB and booted from Ubuntu 8.10 64 bit CD - without freeze.
When I run the CD with HDD connected - freeze.
When I run the CD without HDD and then connect another external HDD through USB - freeze.

Am I right that the message I get
"Driver 'sd' needs updating - please use bus_type methods"
is harmless ?
Comment 43 Martin Prosicky 2009-01-18 03:51:47 UTC
(In reply to comment #42)

It turns out that disconnecting disks only prolonged the time, it freezed after 12h ...

> So it seems that my Clevo m570tu can't stand any hard drive (external or
> internal) connected.
> 
> 
> I removed my HDD from NB and booted from Ubuntu 8.10 64 bit CD - without
> freeze.
> When I run the CD with HDD connected - freeze.
> When I run the CD without HDD and then connect another external HDD through
> USB
> - freeze.
> 
> Am I right that the message I get
> "Driver 'sd' needs updating - please use bus_type methods"
> is harmless ?
> 
Comment 44 Martin Prosicky 2009-01-19 11:41:34 UTC
Finally I have found what is the reason why my Clevo M570TU freeze.

Short answer: 800 MHz

Long answer:
When I was doing all my tests in Linux, I was using powersave in "Dynamic" "CPU Frequency Policy" (800 1600 and 2534 MHz)
I tried to switch to "Performance" (only 2534 MHz) and ... no freeze for 50 hours (I will keep testing...).
Then I tried "Powersave" (800 and 1600 MHz) ... freeze as usual.

So I started to see why Vista and also XP (Comment #36) is not freezing. As CPU-Z shows me M$ is not using 800 MHz at all (only 1600 and 2534 MHz and some voltages).

Maybe the problem is in voltage that BIOS setup when runs at 800 MHZ, I don't know. Does anyone knows about any tool (except voltmeter) I can use to get CPU voltage under Linux?

------------------------
Other info:
1.
When on battery and using 800 and 1600 MHz there have been no freeze. I tried this many times but only for limited time (2h). I think this also points to voltage problem (lower voltege when on battery => no freeze?). 
2.
martin@linux-bs3f:~> cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
2534000 2533000 1600000 800000

What is 2533000 doing there?
Comment 45 ykzhao 2009-01-19 17:34:47 UTC
Hi, Martin
    Thanks for the test. 
    How about this issue if the userspace governor is used and the CPU frequency is set to 800 or 1600MHz?
    The 2533000 is also one CPU frequency. And the 2534000 is the turbo mode CPU frequency, which will give the high performance. 
    Thanks.
Comment 46 Martin Prosicky 2009-01-20 20:21:12 UTC
(In reply to comment #45)
> Hi, Martin
>     Thanks for the test. 
>     How about this issue if the userspace governor is used and the CPU
> frequency is set to 800 or 1600MHz?

I did what you asked and it freezed with both frequencies. 1600 MHz freezed 3 minutes after setup and 800 MHz after few hours. To tell the truth the 1600 MHz surprised me. How is it that that it freeze with one OS and is OK with other?

2534 MHz is still OK.



>     The 2533000 is also one CPU frequency. And the 2534000 is the turbo mode
> CPU frequency, which will give the high performance. 
>     Thanks.
> 
Comment 47 Martin Prosicky 2009-01-23 08:57:35 UTC
Conclusion:
'Intel(R) Core(TM)2 Duo CPU     T9400  @ 2.53GHz'

*** Linux (e.g. 2.6.29-rc1-10) ***

With AC plugged

800 MHz ......... freeze
1600 MHz ........ freeze
2534(2533) MHz .. OK

With battery

no freeze when AC unplugged, but operation time limited due to battery. (800 and 1600 MHz)

*** M$ OSs: ***

800 MHz ......... not used
1600 MHz ........ OK
2534(2533) MHz .. OK

ykzhao,
is there anything else I can test? To me, it seems like problem with "overvoltaged" processor when NB is AC pluged. But not sure because I did not find a way how to measure it from Linux.
What component controls the CPU core voltage? Is it on OS level (acpi_cpufreq) or is it independent of OS?
Comment 48 ykzhao 2009-01-23 19:39:11 UTC
> ykzhao,
> is there anything else I can test? To me, it seems like problem with
> "overvoltaged" processor when NB is AC pluged. But not sure because I did not
> find a way how to measure it from Linux.
> What component controls the CPU core voltage? Is it on OS level
> (acpi_cpufreq)
> or is it independent of OS?
This is not controlled by OS and this is transparent to OS.  It is controlled by BIOS. 
> 
From the test it seems that the C-state can work well under high frequency. If the system works in tick mode, the issue is easiser to reproduce. 
Even when the boot option of "idle=poll" is added, the system still freezes if the tickless feature is disabled. 
And if the local APIC timer is disabled, the system still works after 36hours.(On the 32-bit kernel the "nolapic_timer" is added. On the 64-bit kernel the "noapictimer" is added). Right?
    Will you please double check whether it still can work well if the local APIC timer is disabled? 
    If so, maybe this issue is related with the local APIC timer.
    
     thanks.

    
Comment 49 Martin Prosicky 2009-01-24 02:53:43 UTC
(In reply to comment #48)
> And if the local APIC timer is disabled, the system still works after
> 36hours.(On the 32-bit kernel the "nolapic_timer" is added. On the 64-bit
> kernel the "noapictimer" is added). Right?
>     Will you please double check whether it still can work well if the local
> APIC timer is disabled? 
>     If so, maybe this issue is related with the local APIC timer.

ykzhao,
I checked my records and OpendSuse 64bit with kernel 2.6.28 (# CONFIG_NO_HZ is not set) and "noapictimer" freezed on 14.01. 2009 after 30 minutes. Sorry, I tried it right after your Comment #41, but forgot to report.
Surprisingly the "placebo" parameter nolapic_timer (with 64bit kernel) worked for 36h (Comment #27) but then it freezed (Comment #28)
Comment 50 ykzhao 2009-02-11 19:19:16 UTC
Hi, Martin
    Sorry for the late response.
    Do you mean that it still freezes when the "nolapic_timer" is added? will you please double check it again?
    
Comment 51 Martin Prosicky 2009-02-15 11:31:01 UTC
(In reply to comment #50)
> Hi, Martin
>     Sorry for the late response.

No problem.
I have had quite restful time. With processor running on max frequency (Kpowersave "Performance" mode) I did not freeze once and with C2 states it do not heat too much. The linux freezed only when I did not log into KDE quickly enough - Kpowersave probably did not take care of the CPU freq. yet. The second time it freezed was when I was trying the "Silent button" that cause the frequency to drop to 1600 MHz and lower the voltage compared to normal mode on this freq. This second freeze probably ruin my assumption that the problem could be caused by too high CPU voltage.


>     Do you mean that it still freezes when the "nolapic_timer" is added? will
> you please double check it again?
> 
> 

I'm trying 2.6.28.5 tickless 64bit kernel with "nolapic_timer" (not "noapictimer") parameter (Comment #41 ?). It is running now for 55h without freeze (in the past it freezed after 37h - Comment #28). This parameter definitely has some positive effect. I also tried the kernel without any parameter and it freezed as usual.

Martin
Comment 52 ykzhao 2009-03-05 00:29:44 UTC
Hi, Martin
    Thanks for the test.
    From the comment #51 it seems that the freeze is difficult to reproduce if the boot option of "nolapic_timer" is added for the tickless kernel. If there is no any boot option, it will freeze as usual.
    Maybe this issue is related with the following :
    a. local APIC timer
    b. the combination between C-state and low CPU frequency. This should be related with BIOS.

Hi, Venki
    Any idea about this issue?

    Thanks.
Comment 53 Martin Prosicky 2009-03-05 14:31:54 UTC
Hi ykzhao,
since my last report I tried (for another cca 40 hours in sum) the kernel with nolapic_timer (Comment #51) and it have not freezed once.
So I can now choose between using low frequency or C-states to save processor power. I prefer the C-states (processor on max, frequency without single freeze in more than month) but it would be nice to have both at once.


(In reply to comment #52)
> Hi, Martin
>     Thanks for the test.
>     From the comment #51 it seems that the freeze is difficult to reproduce
>     if
> the boot option of "nolapic_timer" is added for the tickless kernel. If there
> is no any boot option, it will freeze as usual.
>     Maybe this issue is related with the following :
>     a. local APIC timer
>     b. the combination between C-state and low CPU frequency. This should be
> related with BIOS.
> 
> Hi, Venki
>     Any idea about this issue?
> 
>     Thanks.
> 
Comment 54 ykzhao 2009-06-19 01:51:33 UTC
Hi, Venki
    Any idea bout this bug?
    It seems that the box will be booted normally if the boot option of "nolapic_timer" is added. It will still freeze even when only C1 is used. Of course there is no freeze if the box works in high cpufrequency. 
    Is this issue related with the local APIC timer?
Comment 55 ykzhao 2009-09-29 06:18:20 UTC
Hi, Martin
    It seems that the issue is related with the local APIC timer.So please always add the boot option of "nolapic_timer" on this box to avoid the random freeze. 
    
thanks.
Comment 56 ykzhao 2009-09-29 06:20:02 UTC
As this issue is related with the local APIC timer, the bug will be rejected.

To avoid the random freeze, please always add the boot option of "nolapic_timer".
Thanks.

Note You need to log in before you can comment on or make changes to this bug.