Bug 11993 - SMP boot crash if C-states enabled - LG E500 V.APSCG
Summary: SMP boot crash if C-states enabled - LG E500 V.APSCG
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Venkatesh Pallipadi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-09 00:10 UTC by Pascal Häußler
Modified: 2009-12-22 02:57 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.24, 2.6.27, 2.6.28, 2.6.29
Subsystem:
Regression: No
Bisected commit-id:


Attachments
screen shot of system in crashed state (449.99 KB, image/jpeg)
2008-11-09 23:43 UTC, Pascal Häußler
Details
acpidump output (binary) (18.69 KB, application/octet-stream)
2008-11-09 23:46 UTC, Pascal Häußler
Details
screen shot of system in crashed state - processor.maxcstate=1 (451.57 KB, image/jpeg)
2008-11-10 04:15 UTC, Pascal Häußler
Details
screen shot of system in crashed state - idle=nomwait (449.60 KB, image/jpeg)
2008-11-10 04:16 UTC, Pascal Häußler
Details
acpidump as requested (1) (506 bytes, application/octet-stream)
2008-11-10 04:17 UTC, Pascal Häußler
Details
acpidump as requested (2) (1.39 KB, application/octet-stream)
2008-11-10 04:18 UTC, Pascal Häußler
Details
output of dmesg after booting with idle=poll (36.30 KB, text/plain)
2008-11-10 04:18 UTC, Pascal Häußler
Details
acpidump.out (87.00 KB, application/octet-stream)
2008-11-24 09:03 UTC, Pascal Häußler
Details
content of /proc/cpuinfo (1.38 KB, text/plain)
2008-11-25 10:11 UTC, Pascal Häußler
Details
screen shot of system in crashed state - highres=off (471.48 KB, image/jpeg)
2008-12-01 22:07 UTC, Pascal Häußler
Details

Description Pascal Häußler 2008-11-09 00:10:03 UTC
Latest working kernel version: none

Earliest failing kernel version: 2.6.24

Distribution: Ubuntu 8.04 / 8.10 64 bit and 32 bit

Hardware Environment: Laptop LG E500 V.APSCG, Intel Core2Duo T8100 @2.1GHz, 2GB RAM, ATI Mobility Radeon

Software Environment: Curently Ubuntu 8.10 "Intrepid" 64 bit (same problem with other Distributions and also with 32 bit)

Problem Description: The boot process crashes in a very early stage (before the root file system is mounted). I figured out that the problem vanishes as soon as I compile a kernel with the ACPI_PROCESSOR option turned off. Further more, the systems boot also with ACPI_PROCESSOR enabled as soon as I turn off dual core support in the AMI BIOS i.e. boot on a single core.

I tried to fix a DSDT table issue by using iasl. Nevertheless this does not help. I am not sure if it is a pure kernel problem or a mixed kernel/DSDT or SSDT table problem.

Since the crash occurs that early I cannot provide much helpful information. I could send you a photo of the screen in crashed state and of course also the complete ACPI table infos extracted with acpidump.

Steps to reproduce: Take an LG E500 V.APSCG and simpliy insert a life CD of Ubuntu 8.04/8.10 - or an other distribution.

Note: I did not try older distributions. The oldest kernel I tries was 2.6.24
Comment 1 ykzhao 2008-11-09 05:13:34 UTC
Will you please attach the output of acpidump? It will be great if you can attach the screenshot when the system crashes.
Will you please try the latest kernel(for example: 2.6.27/2.6.26) and see whether the problem still exists?
Will you please try the following boot option with the ACPI_PROCESSOR enabled and see whether the problem still exists?(Of course the processor should be compiled as built-in kernel). 
    a. idle=poll
    b. processor.max_cstate=1
    
 
Comment 2 Pascal Häußler 2008-11-09 23:43:24 UTC
Created attachment 18771 [details]
screen shot of system in crashed state

as requested you can find here a screen shot of the system in crashed state.
Comment 3 Pascal Häußler 2008-11-09 23:46:07 UTC
Created attachment 18772 [details]
acpidump output (binary)

as requested here you find the binary output of acpidump:

acpidump --binary
Comment 4 Pascal Häußler 2008-11-09 23:52:38 UTC
Hi yakui zhao,

I tried the boot options:

a) idle=poll indeed lets the system boot in dual core mode. Nevertheless it seems that significant parts of the ACPI power and thermal management do not work correctly.

b) when I use processor.max_cstate=1 the system still hangs. Nevertheless it hangs at a different stage with a different screen output. But again it is a very early state. If you wish I can attach also a screen shot of this hanging state.

A comment to option a) idle=poll: As I said the system boots when I use the option. Nevertheless when I look into /proc/acpi/processor/P001/info I see this:

pascal@laptignio:/proc/acpi/processor/P001$ cat info 
processor id:            0
acpi id:                 1
bus mastering control:   yes
power management:        no
throttling control:      no
limit interface:         no

Furthermode in /proc/acpi/thermal_zone/THRM I see this:

pascal@laptignio:/proc/acpi/thermal_zone/THRM$ cat cooling_mode 
<setting not supported>
pascal@laptignio:/proc/acpi/thermal_zone/THRM$ cat trip_points 
critical (S5):           100 C

--> it seems to me that some essential information from the systems ACPI tables (I guess from the SSDTs) is not processed successfully.
Comment 5 Pascal Häußler 2008-11-09 23:54:20 UTC
sorry, I forgott to mention that I am using kernel 2.6.27.4.
Comment 6 ykzhao 2008-11-10 00:29:48 UTC
Will you please attach the output of acpidump ?(Not the binary format)

At the same time the following output is also required.
   acpidump --addr 0x7ffce190 --length 0x1FA -o cpu0ist
   acpidump --addr 0x7ffce420 --length 0x594 -o cpu0cst

Thanks.
    
Comment 7 ykzhao 2008-11-10 00:35:03 UTC
Will you please try the boot option of "idle=nomwait" and see whether the system can be booted?

Will you please attach the screenshot when the system hangs with the boot option of "processor.max_cstate=1"?

Thanks.
Comment 8 ykzhao 2008-11-10 00:36:00 UTC
As there is no working kernel, clear the regression flag.
Comment 9 Pascal Häußler 2008-11-10 04:15:31 UTC
Created attachment 18775 [details]
screen shot of system in crashed state - processor.maxcstate=1
Comment 10 Pascal Häußler 2008-11-10 04:16:27 UTC
Created attachment 18776 [details]
screen shot of system in crashed state - idle=nomwait
Comment 11 Pascal Häußler 2008-11-10 04:17:49 UTC
Created attachment 18777 [details]
acpidump as requested (1)

acpidump --addr 0x7ffce190 --length 0x1FA -o cpu0ist
Comment 12 Pascal Häußler 2008-11-10 04:18:17 UTC
Created attachment 18778 [details]
acpidump as requested (2)

acpidump --addr 0x7ffce420 --length 0x594 -o cpu0cst
Comment 13 Pascal Häußler 2008-11-10 04:18:50 UTC
Created attachment 18779 [details]
output of dmesg after booting with idle=poll
Comment 14 Pascal Häußler 2008-11-10 04:21:00 UTC
Hi,

the boot attempt with idle=nomwait failed. I attached a screen shot of this test.

In addition I attached the requested screen shot of the processor.maxcstate=1 test as requested.

Furthermode I attached:
1) the two requested acpidump outputs
2) the output of dmesg - maybe this contains helpful information.

Greetings,
Pascal
Comment 15 Len Brown 2008-11-11 20:50:03 UTC
just for grins, can you verify that booting with "thermal.off=1"
makes no difference (or build with CONFIG_THERMAL=n
and CONFIG_ACPI_THERMAL=n

also, can you verify that booting with just "maxcpus=1" works fine?
Comment 16 Pascal Häußler 2008-11-11 22:12:37 UTC
Hi,

here are the test results:

1) Tested boot with thermal.off=1  
   Result: The system crashes (screen looks the same as if I ommit this option).

2) Tested boot with maxcpus=1
   Result: The system boots. 

best regards,
Pascal
Comment 17 Pascal Häußler 2008-11-21 03:51:07 UTC
Hi,

I've just seen that the state of this bug is still NEEDINFO. Is there anything more you need from me at this stage?

best regards,
Pascal
Comment 18 ykzhao 2008-11-23 05:43:11 UTC
Please attach the output of acpidump.
Sorry I don't update the status of bug in time.
Comment 19 Pascal Häußler 2008-11-23 22:18:40 UTC
Hi,

can you give me more detail about what additional output I should generate with acpidump? I have already attached the two acpidump outputs you have requested in your comment #6 (see below).

If you need additional acpidump output, could you please specify the respective command line arguments for me?

Thanks in advance,
Pascal
Comment 20 ykzhao 2008-11-23 23:09:26 UTC
Hi, Pascal
    You attach two files in comment #11, #12. The two files are obtained by using the following commands:
   acpidump --addr 0x7ffce190 --length 0x1FA -o cpu0ist
   acpidump --addr 0x7ffce420 --length 0x594 -o cpu0cst

    In fact we expect that you attach the output of acpidump by using the following command:
    ./acpidump > acpidump.out
    
    Sorry that I don't describe it very clearly.

    Thanks.
Comment 21 Pascal Häußler 2008-11-24 09:03:30 UTC
Created attachment 19003 [details]
acpidump.out


Output of the following command:

acpidump > acpidump.out
Comment 22 ykzhao 2008-11-25 00:00:03 UTC
Hi, Pascal
    thanks for the info.
    From the acpidump it seems that C2/C3 is supported on MP system.
     >C2 works on MP system : 1
    But from the problem description the system can't be booted on MP system if C- state is enabled. 
    More serious is that MP system still can't be booted even when C1 state is enabled.
    Will you please try the following boot option and see whether the system can booted?
    a. nolapic
    b. nolapic_timer
    c. idle=halt

    Will you please attach the output of /proc/cpuinfo?

    Thanks.
Comment 23 ykzhao 2008-11-25 00:03:55 UTC
Hi, Venki
    Do you have time to look at this issue?
    thanks.
Comment 24 Pascal Häußler 2008-11-25 10:09:56 UTC
Hi,

here we go with the latest experiment results:

a) boot option: nolapic
   Result: System boots but detects and uses only one processor core.

b) boot option: nolapic_timer
   Result: System crashes as without any boot option.

c) boot option: idle=halt
   Result: System boots in dual core mode.

Greetings,
Pascal
Comment 25 Pascal Häußler 2008-11-25 10:11:17 UTC
Created attachment 19015 [details]
content of /proc/cpuinfo

The file contains the content of /proc/cpuinfo after booting the system with idle=halt
Comment 26 Venkatesh Pallipadi 2008-12-01 16:32:40 UTC
Looks like some nohz and idle interaction bug.
Does "nohz=off" also make the system boot? And how about "highres=off"?

Thanks.
Comment 27 Pascal Häußler 2008-12-01 22:04:06 UTC
Hi,

here are the test results:

1) nohz=off
   --> The system boots using both CPU cores.

2) highres=off
   --> The system crashes in early boot. The crash screen looks 
       slightly different with this boot option. I will attach a 
       screen shot.

Greetings
Pascal
Comment 28 Pascal Häußler 2008-12-01 22:07:05 UTC
Created attachment 19097 [details]
screen shot of system in crashed state - highres=off
Comment 29 ykzhao 2009-02-04 20:04:27 UTC
Hi, Pascal
     Sorry for the late response.
     Will you please try the boot option of "nolapic_timer" on the latest kernel(2.6.29-rc2/rc3) and see whether the box can be booted?
     From the problem description it seems that the box still can't be booted under the following case:
     a. processor.max_cstate=1. In such case only C1 is used when CPU enters C1,the lapic timer will be used. It seems that the system can't exit the C1-state because of lapic timer interrupt. (In fact the difference between the boot option of "idle=poll" and "processor.max_cstate=1" is that polling is used when CPU is idle).

     From the test in comment #27 it seems that the box can be booted if adding the boot option of "nohz=off". In fact in such case the system will work in tick mode. After the processor module is loaded, the HPET/pit timer will be used instead of local APIC timer, which means that the local APIC timer won't be used again.
    
     Maybe this issue is related with local APIC timer.(There is no option of "nolapic_timer" is added for the 64-bit platform on the kernel of 2.6.27-7).
    
    Thanks.
     
    
    
     
     
Comment 30 ykzhao 2009-03-04 22:25:29 UTC
Hi, Pascal
    Do you have an opportunity to try the boot option of "nolapic_timer" on the latest kernel and see whether the box can be booted normally?
    Will you please also try the boot option of "idle=nomwait processor.max_cstate=1" and see whether the box can be booted normally?

Hi, Rui
    As there is no response for more than one month, the bug will be rejected. If the problem still exists, please try the boot option as suggested and then reopen the bug.
    Thanks.
Comment 31 Zhang Rui 2009-03-23 02:10:51 UTC
ping Pascal.
Comment 32 Pascal Häußler 2009-03-23 02:43:27 UTC
Hi all,

sorry for my late reply. I was (and am) very busy since I became a daddy - my wife and me got our first child Paul. There was no time for any kernel testing in the meantime but I learned how to change diapers quickly ;-)

I hope I can make the tests you mentioned above. My biggest problem is that I am not to deep into testing kernels. Is there some sort of a "howto" that I could consult to learn how I install rc kernels (which I do not get as complete ubuntu package)?

Best regards,
Pascal
Comment 33 Zhang Rui 2009-03-24 18:30:39 UTC
Congrats, Pascal. :)

Now you just need to try kernel 2.6.29.

you can get the source code tarball linux-2.6.29.tar.bz2 at
http://www.kernel.org/pub/linux/kernel/v2.6/
Comment 34 Pascal Häußler 2009-03-27 08:33:41 UTC
Hi,

thank you :-)

Finally I installed 2.6.29 and tested these options:

1) nolapic_timer
2) idle=nomwait processor.max_cstate=1

In both cases the system boots successfully. Without option 1) or 2) the kernel 2.6.29 also hangs as the older ones did.

I hope this information is helpful for you.

Greetings,
Pascal
Comment 35 ykzhao 2009-03-30 03:05:11 UTC
Hi, Pascal
    thanks for test.
    Now it is very clear that the issue is related with the local APIC timer. If the boot option of "nolapic_timer" is added, the system can be booted successfully.
    
    Do you mean that the box can be booted only when the both boot options of "idle=nomwait" and "processor.max_cstate=2" are added together?
    How about using the boot option of "idle=nomwait" or "processor.max_cstate=1"?
    Thanks.
Comment 36 Pascal Häußler 2009-03-30 21:04:39 UTC
Hi,

sorry for the confusion. I thought I was expected to test the two parameters explicitly in combination. Here we go with the individual tests:

1) idle=nomwait - system does *not* boot.

2) processor.max_cstate=1 - system does boot successfully.

Just for me to understand: The problem we investigate here is only there when "core multi processing" is enabled in the BIOS. When I switch it off (single core mode) the system boots without any trouble. Is it nevertheless possible that it is related to the local APIC timer?

Best regards,
Pascal
Comment 37 Pascal Häußler 2009-04-29 05:10:51 UTC
Hi,

I just see that the status is till NEED INFO. Did I miss anything? Do I need to do another test?

Best regards,

Pascal
Comment 38 ykzhao 2009-05-06 02:14:17 UTC
Hi, Pascal
    Sorry for the slow response.
    From the comment #36 it seems that the box can be booted with "processor.max_cstate=1". But from the comment #9 it seems that the box can't be booted with the same boot option. Contradictory.
     
    Will you please double check it again?
    It will be great if you can double check the boot option of "idle=halt".
    thanks.
Comment 39 Pascal Häußler 2009-05-06 04:06:03 UTC
Hi,

sorry for the confusion. Maybe I made a mistake in the test referred to in comment #9. As you can see in my comment I typed "maxcstate" instead of "max_cstate". Maybe I made the same mistake in the boot option...

ok here we go with the double check:

processor.max_cstate=1 - system boots fine
idle=halt - system boots fine

So the information I delivered in comment #9 is wrong.

Best regards
Pascal

P.S.: added 2.6.28 and 2.6.29 to field "Kernel Version"
Comment 40 ykzhao 2009-06-08 09:30:29 UTC
Hi, Pascal
    Sorry for the late response. And thanks for the confirm again.
    From the above test in comment #39/34 it seems that the box can be booted with the following options:
   a. nolapic_timer
   b. processor.max_cstate=1 (the processor can be waken up from C1 by the local APIC timer interrupt).
   In fact when the boot option of "nolapic_timer" is used, the local APIC timer is replaced by broadcast timer. And the cpu can be waken up from C1 by the broadcast timer. But unfortunately it can't waken up from C2/C3 by the broadcast timer.
  
   Will you please try the following boot options and see whether the box can be booted?
    1. hpet=disable
    2. acpi_skip_timer_override
    thanks.
Comment 41 Pascal Häußler 2009-06-12 19:37:47 UTC
Hi,

here we go with the test results:

1) hpet=disable  -->  system boot fails.
2) acpi_skip_timer_override  -->  system boot fails.
3) hpet=disable + acpi_skip_timer_override  -->  system boot fails.

The system does not boot with any of the parameters. I also tried the combination of both but no success.

Best regards,

Pascal
Comment 42 ykzhao 2009-06-18 09:50:29 UTC
Hi, Pascal
    Thanks for the confirmation.
    It seems that the box can be booted normally only when the C1 is used or in single-core mode.
    If the cpu can enter c2/c3, it can't be waken up from c2/c3 by the broadcast timer.
    I have no idea why the box can't be waken up from c2/c3 by the broadcast timer.
    Will you please attach the output of dmidecode so that only C1 is used for your box?
 

Hi, Venki
     Any idea about this issue?
 

     Thanks.
Comment 43 Pascal Häußler 2009-11-03 08:37:06 UTC
Hi,

sorry for the long time without any new information. I have news to report:

The system works fine with SMP and C states enabled when I use kernel 2.6.31. I tried the recently released Ubuntu 9.10 distribution and the kernel works out-of-the box with standard settings.

It seems that the cpu power management (hibernate, ...) does still not work correctly. Nevertheless the system is running on SMP and I do not have to turn-off ACPI.

Best regards,

Pascal
Comment 44 ykzhao 2009-11-18 07:25:23 UTC
Thanks for the reporting.
    It is good news that the system can work well on the latest upstream kernel.
    It will be great if we can identify the commit which fixes this issue.

Anyway, 
    The issue is fixed. 

Hi, Rui
    Can you close this bug now?

thanks.

Note You need to log in before you can comment on or make changes to this bug.