Bug 11471

Summary: GPE storm detected, kernel freezes
Product: ACPI Reporter: George Gibbs (Vash63)
Component: ACPICA-CoreAssignee: Zhang Rui (rui.zhang)
Status: CLOSED CODE_FIX    
Severity: high CC: acpi-bugzilla, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11167    
Attachments: acpidump log, from 2.6.26
dmesg with nohz=off

Description George Gibbs 2008-08-31 22:00:20 UTC
Latest working kernel version:2.6.26
Earliest failing kernel version:2.6.27-r5 (haven't tested earlier RC's yet)
Distribution: Gentoo, using mainline kernel, not from gentoo repository.
Hardware Environment: HP dv2225nr laptop. Nvidia chipset, turion x2, 2GB
Software Environment: x86_64, gcc-4.3.1
Problem Description: At initial boot, kernel says:

ACPI: EC: GPE storm detected, disabling EC GPE

and then freezes. System does not boot. Adding 'acpi=off' fixes it.

Steps to reproduce:
1. Compile kernel with ACPI
2. Boot to new kernel
Comment 1 Zhang Rui 2008-08-31 22:51:27 UTC
could you please check if reverting commit fa95ba04e6ba11d71e1b87becd054b38faf546c8 helps?
Comment 2 George Gibbs 2008-08-31 22:57:22 UTC
Sure, but how?
Comment 3 George Gibbs 2008-09-01 02:06:26 UTC
Ok... that got rid of the error message, but it's still not booting. Still boots with acpi=off though. There are some other messages of interest in the dmesg... 

[    8.446128] powernow-k8: Your BIOS does not provide ACPI _PSS objects in a way that Linux understands. Please report this to the Linux ACPI maintainers and complain to your BIOS vendor.

HP hasn't had a single update for my BIOS ever though, so I doubt complaining to them is going to help much. Full dmesg with "acpi=off" is here:

http://rafb.net/p/Q7DHV988.html
Comment 4 George Gibbs 2008-09-01 02:37:53 UTC
Oh, and adding to it, 2.6.26 still gives me the same error message... but then it continues booting with no issues. I'm not so sure the freeze is due to the gpe storm thing now, but it's definitely ACPI related as acpi=off fixes it.
Comment 5 Rafael J. Wysocki 2008-09-01 04:57:07 UTC
Handled-By : Zhang Rui <rui.zhang@intel.com>
Comment 6 Zhang Rui 2008-09-01 18:04:27 UTC
please attach the full dmesg output of 2.6.26 with acpi turned on.
and it would be great if you can get a screenshot when the system hangs.
Comment 7 Zhang Rui 2008-09-01 18:55:30 UTC
does boot option "processor.max_cstate=1" help?
Comment 8 ykzhao 2008-09-01 20:30:28 UTC
Hi, George
    When the boot option of 'processor.max_cstate=1" is used, you had better set CONFIG_ACPI_PROCESSOR=y in kernel configuration.
    Please attach the output of acpidump.
    Thanks.
Comment 9 George Gibbs 2008-09-01 22:52:14 UTC
http://rafb.net/p/UEIN6921.html

Here's a dmesg of 2.6.26 with acpi on. It's not the vanilla kernel (gentoo patchset), sorry, but my 2.6.27 tests have been straight from linus' git. Trying the max_cstate in a few minutes.
Comment 10 George Gibbs 2008-09-01 23:24:34 UTC
Created attachment 17567 [details]
acpidump log, from 2.6.26

This is from 2.6.26, as I can't get the system to boot with acpi on in 2.6.27 rc's.
Comment 11 George Gibbs 2008-09-04 02:00:51 UTC
Oh, and if I wasn't clear earlier, processor.max_cstate=1 in the boot options did not help.
Comment 12 Rafael J. Wysocki 2008-09-07 14:54:52 UTC
On Sunday, 7 of September 2008, Vash63 wrote:
> Well, I have the issue on 2.6.27 rc1-5 and not on 2.6.26, so yeah.
> 
> On Sat, Sep 6, 2008 at 2:30 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=11471
> > Subject         : GPE storm detected, kernel freezes
> > Submitter       : George Gibbs <Vash63@gmail.com>
> > Date            : 2008-08-31 22:00 (7 days old)
> > Handled-By      : Zhang Rui <rui.zhang@intel.com>
Comment 13 ykzhao 2008-09-07 23:46:19 UTC
Hi, George
    Do you mean that the system still can't be booted even when the boot option of "processor.max_cstate=1" is added?
    
    Will you please try the following boot option and see whether the system can be booted?
    a. add the boot option of "idle=poll"
    c. add the boot option of "nolapic_timer"
    Thanks.
Comment 14 George Gibbs 2008-09-08 23:46:29 UTC
Nice, idle=poll fixed it. I think it put up some errors afterwords but it booted now, nolapic_timer doesn't help at all.
Comment 15 ykzhao 2008-09-09 02:44:41 UTC
Hi, George
   thanks for the test. 
   From the acpidump it seems that there is no _CST object and only C1 is supported on your system. 
     >   C2 Latency : FFFF //greater than 100,C2 is not supported
     >   C3 Latency : FFFF  //greater than 1000, C3 is not supported.
    It is very interesting that the system still can't be booted with the boot option of "nolapic_timer". But the system can be booted with C-states disabled. 
    
    On the laptop of bug 11101/11240 the system can boot with the boot option of "nolapic_timer".

    Will you please add the boot option of "nohz=off"(disable tickless feature) and see whether the system can be booted?
    Thanks.
Comment 16 George Gibbs 2008-09-09 03:27:04 UTC
Ok, nohz=off fixes it also, and a lot more elegantly than idle=poll. Seems faster that way too. And I'm hoping that your previous post wasn't indicating that my L2 cache isn't being used... what is C2?
Comment 17 ykzhao 2008-09-09 19:30:18 UTC
Hi, George
    Thanks for the test. It seems that the system can work well when tickless feature is disabled.
    Will you please use the git bisect to identify which commit causes the regression? Of course the tickless feature should be enabled.
    Thanks.
Comment 18 Shaohua 2008-09-09 23:22:51 UTC
can you post the dmesg of the system? This looks like the AMD C1E issue.
Comment 19 George Gibbs 2008-09-10 04:46:50 UTC
Created attachment 17712 [details]
dmesg with nohz=off
Comment 20 George Gibbs 2008-09-10 04:48:35 UTC
Dmesg posted. Not sure how to identify exactly which commit caused the regression... I haven't been working with my own kernels like this very long.
Comment 21 Chuck Ebbert 2008-09-12 16:08:07 UTC
From dmesg:
[    0.000999] using C1E aware idle routine

Can you retry with -rc6? A bunch of fixes for C1E went in recently.
Comment 22 George Gibbs 2008-09-13 03:25:24 UTC
Ok, nice, -rc6 boots with no patches and without having to add nohz=off. So it seems it was fixed. Would this be a 'CODE_FIX' resolution?