Kernel Bug Tracker – Bug 11471
GPE storm detected, kernel freezes
Last modified: 2008-09-14 16:21:11 UTC
Latest working kernel version:2.6.26
Earliest failing kernel version:2.6.27-r5 (haven't tested earlier RC's yet)
Distribution: Gentoo, using mainline kernel, not from gentoo repository.
Hardware Environment: HP dv2225nr laptop. Nvidia chipset, turion x2, 2GB
Software Environment: x86_64, gcc-4.3.1
Problem Description: At initial boot, kernel says:
ACPI: EC: GPE storm detected, disabling EC GPE
and then freezes. System does not boot. Adding 'acpi=off' fixes it.
Steps to reproduce:
1. Compile kernel with ACPI
2. Boot to new kernel
could you please check if reverting commit fa95ba04e6ba11d71e1b87becd054b38faf546c8 helps?
Sure, but how?
Ok... that got rid of the error message, but it's still not booting. Still boots with acpi=off though. There are some other messages of interest in the dmesg...
[ 8.446128] powernow-k8: Your BIOS does not provide ACPI _PSS objects in a way that Linux understands. Please report this to the Linux ACPI maintainers and complain to your BIOS vendor.
HP hasn't had a single update for my BIOS ever though, so I doubt complaining to them is going to help much. Full dmesg with "acpi=off" is here:
Oh, and adding to it, 2.6.26 still gives me the same error message... but then it continues booting with no issues. I'm not so sure the freeze is due to the gpe storm thing now, but it's definitely ACPI related as acpi=off fixes it.
Handled-By : Zhang Rui <firstname.lastname@example.org>
please attach the full dmesg output of 2.6.26 with acpi turned on.
and it would be great if you can get a screenshot when the system hangs.
does boot option "processor.max_cstate=1" help?
When the boot option of 'processor.max_cstate=1" is used, you had better set CONFIG_ACPI_PROCESSOR=y in kernel configuration.
Please attach the output of acpidump.
Here's a dmesg of 2.6.26 with acpi on. It's not the vanilla kernel (gentoo patchset), sorry, but my 2.6.27 tests have been straight from linus' git. Trying the max_cstate in a few minutes.
Created attachment 17567 [details]
acpidump log, from 2.6.26
This is from 2.6.26, as I can't get the system to boot with acpi on in 2.6.27 rc's.
Oh, and if I wasn't clear earlier, processor.max_cstate=1 in the boot options did not help.
On Sunday, 7 of September 2008, Vash63 wrote:
> Well, I have the issue on 2.6.27 rc1-5 and not on 2.6.26, so yeah.
> On Sat, Sep 6, 2008 at 2:30 PM, Rafael J. Wysocki <email@example.com> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > The following bug entry is on the current list of known regressions
> > from 2.6.26. Please verify if it still should be listed and let me know
> > (either way).
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11471
> > Subject : GPE storm detected, kernel freezes
> > Submitter : George Gibbs <Vash63@gmail.com>
> > Date : 2008-08-31 22:00 (7 days old)
> > Handled-By : Zhang Rui <firstname.lastname@example.org>
Do you mean that the system still can't be booted even when the boot option of "processor.max_cstate=1" is added?
Will you please try the following boot option and see whether the system can be booted?
a. add the boot option of "idle=poll"
c. add the boot option of "nolapic_timer"
Nice, idle=poll fixed it. I think it put up some errors afterwords but it booted now, nolapic_timer doesn't help at all.
thanks for the test.
From the acpidump it seems that there is no _CST object and only C1 is supported on your system.
> C2 Latency : FFFF //greater than 100,C2 is not supported
> C3 Latency : FFFF //greater than 1000, C3 is not supported.
It is very interesting that the system still can't be booted with the boot option of "nolapic_timer". But the system can be booted with C-states disabled.
On the laptop of bug 11101/11240 the system can boot with the boot option of "nolapic_timer".
Will you please add the boot option of "nohz=off"(disable tickless feature) and see whether the system can be booted?
Ok, nohz=off fixes it also, and a lot more elegantly than idle=poll. Seems faster that way too. And I'm hoping that your previous post wasn't indicating that my L2 cache isn't being used... what is C2?
Thanks for the test. It seems that the system can work well when tickless feature is disabled.
Will you please use the git bisect to identify which commit causes the regression? Of course the tickless feature should be enabled.
can you post the dmesg of the system? This looks like the AMD C1E issue.
Created attachment 17712 [details]
dmesg with nohz=off
Dmesg posted. Not sure how to identify exactly which commit caused the regression... I haven't been working with my own kernels like this very long.
[ 0.000999] using C1E aware idle routine
Can you retry with -rc6? A bunch of fixes for C1E went in recently.
Ok, nice, -rc6 boots with no patches and without having to add nohz=off. So it seems it was fixed. Would this be a 'CODE_FIX' resolution?