Created attachment 21264 [details] acpidump output Further details as requested... Slight change of conditions - turned off the fglrx kernel module and compiled a 2.6.29.2 kernel. Experienced the same "storm". interrupts, dmesg and acpi information attached. The following text is duplicated from the linux-acpi mailing list for reference: Subject: System overloaded with ACPI interrupts Hello, I think I'm experiencing some kind of ACPI issue, I haven't been able to identify a series of actions that causes this. I first noticed the problem when looking at 'top' - seeing kacpi at the top of the process list (followed by kacpi_notify). I have powertop installed, the dump is attached - acpi interrupts seem to be excessive. The computer is perhaps not the best machine: Arima M620-DC. I think the same chassis may be used by a number of other manufacturers. The system does not boot in this state - it occurs rather unpredictably after high CPU loads. I had thought this was a CPU temperature issue and replaced the thermal paste of the heatsink, which improved matters. I have also blacklisted the following modules: blacklist i2c_i801 blacklist yenta_socket The kernel reports a conflict between ACPI SBUS and i2c_i801. I forget why I disabled yenta_socket, but I don't use the card reader. The OS is openSuSE 11.1 with the 2.6.27.21-0.1-default kernel installed. I've also tried booting the system with acpi=noirq. This gives a 'irq 5: nobody cared' and the wireless card (ipw2200) then doesn't work. My normal kernel command line contains the following parameters: hpet=force lapic vga=0x317 This is what all the attached logs are generated from and when my system functions with all devices working. I have also disassembled the ACPI tables. I can post them if required? Hopefully someone can suggest a temporary solution or a permanent one? Thanks, Chris The response: hi, please try a recent vanilla kernel and see if it's reproducible. 2.6.29 would be a good choice. if the problem still exists in 2.6.29 kernel, please attach the output of dmesg and "grep . /sys/firmware/acpi/interrupts/*" after the interrupt storm occurs. please also attach the acpidump of this laptop. it would be great of you can open a new bug report at http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI and attach all the info there.
Created attachment 21265 [details] disassembly of acpi tables
Created attachment 21266 [details] dmesg output after storm
Created attachment 21267 [details] grep . /sys/firmware/acpi/interrupts/*
Hi, Christopher From the info in comment #3 it seems that the GPE0 is triggered so frequently. And from the acpidump it seems that this is caused by the bogus BIOS. >Method (_L00, 0, NotSerialized) { } When the GPE0 is triggered, there is nothing to do in the _L00 method. And then GPE0 will be triggered again. So IMO this is a BIOS issue.And it had better be fixed by upgrading BIOS. thanks.
the problem happens in every kernel release that you have tried, right? please run "echo disable > /sys/firmware/acpi/interrupts/gpe00" before the interrupt storm and see if it helps.
Created attachment 21269 [details] try the custom DSDT Will you please try the custom and see whether the issue still exists? In the custom DSDT the polarity of THRM_POL will be inverted. How to use the custom DSDT can be found in : http://www.lesswatts.org/projects/acpi/faq.php Note: As the DSDT.hex is already attached, the first four steps can be skipped. Thanks.
(In reply to comment #4) > Hi, Christopher > So IMO this is a BIOS issue.And it had better be fixed by upgrading BIOS. Quite possibly a BIOS issue! However, I've requested a newer BIOS a couple times before and been told that there isn't one. So there's not much I can do unless you know another source. Chris
(In reply to comment #5) > the problem happens in every kernel release that you have tried, right? yes - for as long as I can remember (I can remember as far back as SuSE 10.1, but can't remember what kernel that was running - it probable it was happening before that too). > please run "echo disable > /sys/firmware/acpi/interrupts/gpe00" before the > interrupt storm and see if it helps. I tried this and I think it helped - at least I tried to provoke the problem and it didn't appear in about 45 mins of trying. Thanks! What practical impact does disabling a gpe00 have? (other than solving my problem).
(In reply to comment #6) > Created an attachment (id=21269) [details] > try the custom DSDT > > Will you please try the custom and see whether the issue still exists? > In the custom DSDT the polarity of THRM_POL will be inverted. > How to use the custom DSDT can be found in : > http://www.lesswatts.org/projects/acpi/faq.php > > Note: As the DSDT.hex is already attached, the first four steps can be > skipped. > Thanks. I recompiled the kernel and installed it and then booted the system. I still get the interrupt storm with the patched DSDT - logs attached. The number is less, but the system wasn't running as long as last time so this is probably just proportional to the difference in time. Chris
Created attachment 21279 [details] kernel log with customized DSDT
Created attachment 21280 [details] interrupts with custom DSDT, still gpe00
Method (_L00, 0, NotSerialized) { } this is gotten from the acpidump you attached. We can see that nothing is done in the GPE00 handler. So IMO, GPE00 is a nop to Linux kernel, i.e. disabling this GPE is harmless. And "echo disable > /sys/firmware/acpi/interrupts/gpe00" is the command to disable GPE00. then my question is that, 1. does this problem exist in every kernel you've tried? 2. does this happen from the beginning, or it's caused at runtime by some specific actions?
Hi, Rui As there exists the GPE storm on GPE00, it can't be disabled by using the command of "echo disable > /sys/firmware/acpi/interrupts/gpe00". And the problem is related with the bogus GPE _L00 method. From the ICH4 chipset it seems that the GPE00 is driven by THRM signal. And whether the GPE00_STS is set is controlled by the bit of THRM_POL. In the custom DSDT the polarity of THRM_POL bit is inverted. But from the log it seems that the problem still exists even after the custom DSDT is used. Thanks.
(In reply to comment #12) > then my question is that, > 1. does this problem exist in every kernel you've tried? Every 2.6 series kernel. > 2. does this happen from the beginning, or it's caused at runtime by some > specific actions? The system normally starts in a stable state - unless rebooting after the interrupt storm. In which case the storm sometimes continues (I think turning off for a few minutes normally resets everything). When the system is in a stable state echo disable > /sys/firmware/acpi/interrupts/gpe00 is effective. I have now added this to the boot.local script, and so far I have had no more interrupt storms. Normally I can cause it by running some graphically intensive websites (with lots of CSS and flash on the pages). It seems to be independent of the graphics driver in use with X (I've tried ATI's and the open source radeon driver). I think it's in some way related to CPU load. It's impossible to give an exact scenario which will initiate the interrupt storm, sometimes it won't happen. Thanks for your help! Chris
(In reply to comment #14) > (In reply to comment #12) > > then my question is that, > > 1. does this problem exist in every kernel you've tried? > > Every 2.6 series kernel. > > > 2. does this happen from the beginning, or it's caused at runtime by some > > specific actions? > > The system normally starts in a stable state - unless rebooting after the > interrupt storm. In which case the storm sometimes continues (I think > turning > off for a few minutes normally resets everything). > > When the system is in a stable state > > echo disable > /sys/firmware/acpi/interrupts/gpe00 Right. When the system is in the stable state, the GPE00 can be disabled by "echo disable > /sys/firmware/acpi/interrupts/gpe00". > > is effective. I have now added this to the boot.local script, and so far I > have had no more interrupt storms. > > Normally I can cause it by running some graphically intensive websites (with > lots of CSS and flash on the pages). It seems to be independent of the > graphics driver in use with X (I've tried ATI's and the open source radeon > driver). I think it's in some way related to CPU load. From the ACPIdump and ICh4 spec we know that the GPE00 is related with thermal.When the cpu temperature arises, the GPE00 interrupt will be triggered. But nothing can be done in the _L00 method. Then the interrupt storm happens. > > It's impossible to give an exact scenario which will initiate the interrupt > storm, sometimes it won't happen. From the ICH4 spec the GPE00_STS can be controlled via the polarity of THRM_POL bit. But in the custom DSDT the polarity of THRM_POL bit is inverted in the _L00 method, there still exists the interrupt storm. In fact IMO this is a BIOS bug.(The bogus GPE00 method). And it had better be fixed by upgrading BIOS.
(In reply to comment #15) > > From the ICH4 spec the GPE00_STS can be controlled via the polarity of > THRM_POL > bit. But in the custom DSDT the polarity of THRM_POL bit is inverted in the > _L00 method, there still exists the interrupt storm. > > > In fact IMO this is a BIOS bug.(The bogus GPE00 method). And it had better be > fixed by upgrading BIOS. right, but we still need to make sure that this happens on Windows as well. But I don't know how to verify an interrupt storm on Windows, does anyone have any ideas?
for windows, run perfmon and add a counter for interrupts/sec ?
(In reply to comment #17) > for windows, run perfmon > and add a counter for interrupts/sec ? I reinstalled Windows into Virtual Box and no longer have a non- Virtual Box installation. I've not really had a dual boot machine for about 5 years, so it's not very easy to test this. Since disabling gpe00 in the boot scripts - I've not encountered this issue again. (To date). If I could get a BIOS update that would be great - but I have no idea where to look. I investigated once before without success. It's a bit difficult to find what you want when you don't know where to look (Arima may have sold part of their business, the OEM I bought through went bust and the BIOS manufacturer doesn't seem to have an updates website). Anyone, correct me if I'm wrong - you might know other places to look, or understand the .tw website.
If we can prove that Windows figures out how to work properly in the face of this BIOS bug, then it justifies spending the effort to make Linux handle the same bug. I don't know what "virtual box" is, but if windows isn't talking to the real hardware, then that isn't interesting. I'm closing this bug as "documented" at this point, as a workaround is documented that gets you going. If you can show Windows on the hardware works, or we run into other systems with the same issue, we can re-open and investigate further.