Bug 3686 - Thermal related GPE keep firing: kacpid eats 80% CPU w/ processor events when compiling
Summary: Thermal related GPE keep firing: kacpid eats 80% CPU w/ processor events whe...
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Alexey Starikovskiy
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-11-02 08:06 UTC by Curzio Basso
Modified: 2007-10-25 13:45 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.10
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
/proc/interrupts after problem starts (455 bytes, text/plain)
2004-11-04 03:02 UTC, Curzio Basso
Details
kernel config (26.02 KB, text/plain)
2004-11-08 02:19 UTC, Curzio Basso
Details
kernel config file, ACPI processor disabled (26.02 KB, text/plain)
2004-11-08 03:28 UTC, Curzio Basso
Details
Do not do acpi_thermal_check recursively/in parallel (1.38 KB, patch)
2007-06-08 05:20 UTC, Alexey Starikovskiy
Details | Diff

Description Curzio Basso 2004-11-02 08:06:43 UTC
Distribution: 
gentoo 1.4.16

Hardware Environment: 
Compaq notebook with PIII Mobile 1GHz

Software Environment: 
Autoconf: autoconf-2.59-r5
Automake: automake-1.8.5-r1
Binutils: binutils-2.14.90.0.8-r1
Libtools: libtool-1.5.2-r5

Problem Description:
After some time a compilation (I tested this with different packages, gimp is
one of them) is started kacpid starts to use ~80% of the CPU time.
Killing the compilation does not help, and I have to reboot the system.
I checked this happens also if I boot in single mode, to rule out any influence
of the KDE power-management system.
Booting with acpi=off of course 'solves' the problem.
No problem is reported at boot.

Steps to reproduce:
Comment 1 Len Brown 2004-11-04 01:12:02 UTC
please attach the /proc/interrupts after this badness.

Also, please kill acpid (the user-land one)
and see if "cat /proc/acpi/event" shows any clues
Comment 2 Curzio Basso 2004-11-04 03:02:59 UTC
Created attachment 3956 [details]
/proc/interrupts after problem starts
Comment 3 Curzio Basso 2004-11-04 03:28:15 UTC
First of all I have to make a correction: some time after I interrupt the
compilation, kacpid goes back to normal CPU usage. Then, starting the
compilation the problem comes back again.

when the problem starts, 'cat /proc/acpi/event' shows a string of 
"processor CPU0 00000080 00000000"
until kacpid goes back to normal.

Regarding the userspace acpid: I am doing all the tests now in single mode, and
no acpid is running (I mean, 'ps aux | grep acpi' does not show anything apart
kacpid). AFAIK, a userspace daemon is started only when I use KDE, but as I
said, the problem is not dependent on that.
Comment 4 Len Brown 2004-11-05 19:09:43 UTC
please confirm that if you
# rmmod processor
before the test, or build the kernel with CONFIG_ACPI_PROCESSOR=n
that this issue goes away.

please attach your .config
Comment 5 Curzio Basso 2004-11-08 02:19:27 UTC
Created attachment 3983 [details]
kernel config
Comment 6 Curzio Basso 2004-11-08 03:28:41 UTC
Created attachment 3984 [details]
kernel config file, ACPI processor disabled
Comment 7 Curzio Basso 2004-11-08 03:30:42 UTC
I still get the same problem with CONFIG_ACPI_PROCESSOR=n.
Comment 8 Venkatesh Pallipadi 2004-11-09 07:09:30 UTC

It may be due to thermal events. Can you try unloading the thermal module or 
deconfiguring ACPI -> Thermal in the kernel config and check whether it makes 
any difference.
Comment 9 Curzio Basso 2004-11-10 05:01:38 UTC
The problems does NOT depend on the 'processor' or the 'thermal' modules:
I can unload them, or not compile them at all, and the problem is still there.
Comment 10 Luming Yu 2004-11-10 07:50:17 UTC
  9:     239543          XT-PIC  acpi

This should be sci storm. To verify it, please apply the following patches one 
by one, and see which patch could make the symptom disappear:

--- edited/drivers/acpi/ec.0    2004-11-11 00:01:30.000000000 +0800
+++ edited/drivers/acpi/ec.c    2004-11-11 00:03:13.000000000 +0800
@@ -395,9 +395,14 @@

        acpi_disable_gpe(NULL, ec->gpe_bit, ACPI_ISR);

+#if 0
        status = acpi_os_queue_for_execution(OSD_PRIORITY_GPE,
                acpi_ec_gpe_query, ec);

+#endif
+       status = AE_OK;
+
+
        if (status == AE_OK)
                return ACPI_INTERRUPT_HANDLED;
        else


--- edited/drivers/acpi/events/evgpe.c.1        2004-11-11 00:10:44.000000000 
+0800
+++ edited/drivers/acpi/events/evgpe.c  2004-11-11 00:09:28.000000000 +0800
@@ -668,6 +668,7 @@
                 * Execute the method associated with the GPE
                 * NOTE: Level-triggered GPEs are cleared after the method 
completes.
                 */
+#if 0
                if (ACPI_FAILURE (acpi_os_queue_for_execution 
(OSD_PRIORITY_GPE,
                                 acpi_ev_asynch_execute_gpe_method,
                                 gpe_event_info))) {
@@ -675,6 +676,7 @@
                                "acpi_ev_gpe_dispatch: Unable to queue handler 
for GPE[%2X], event is disabled\n",
                                gpe_number));
                }
+#endif
                break;

        default:


Comment 11 Curzio Basso 2004-11-12 04:35:35 UTC
The second patch is working. 
I compiled a good amount of packages with ACPI support compiled in the kernel,
and processor and thermal modules loaded, and kacpid always behaved nicely.

Is the patch going in the kernel, or should I keep it somewhere and apply it to
every new kernel?
Comment 12 Luming Yu 2004-11-15 08:42:56 UTC
To comments #11.
No, that patch is just for debugging. As for your problem, we need a policy 
for thermal events. The problem here is that you thermal GPE keep firing.
It's not necessary, because such kind of event is calling for cooling.  
Whether cooling method is working or not, thermal GPE need to be disabled for 
a reasonable duration. Just keeping firing thermal event is meaningless.
Comment 13 Luming Yu 2004-11-15 08:44:35 UTC
I'm considering that policy.
Comment 14 Guenther Thomsen 2005-04-03 20:42:08 UTC
My gateway 2000 "server" (dual PIII, Serverworks OSB4 chipset) experiences a 
similiar problem with 2.6.11 (which is actually the first 2.6 kernel which 
finds the on-board SCSI HBA while ACPI is enabled and is hence the first to 
complete the boot process on this machine - probably due to outdated and buggy 
BIOS). 
 
kacpid eats a good share of the available CPU time (~65 to 100%) when the 
machine is else idle. No events are reported in /proc/acpi/events. 
Comment 15 Guenther Thomsen 2005-04-03 21:28:45 UTC
I should add, that the machine receives some 9000 interrupts per second, while 
idle. 
 
--8<--procs -----------memory---------- ---swap-- -----io---- --system-- 
----cpu---- 
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa 
 1  0      4   9388  46116 461504    0    0     0     0 9289 16687  0 46 54  0 
 1  0      4   9388  46116 461504    0    0     0     0 9294 16693  0 46 54  0 
 0  0      4   9388  46128 461492    0    0     0    10 9286 16682  0 46 54  0 
 1  0      4   9388  46128 461492    0    0     0     0 9296 16651  0 46 54  0 
 1  0      4   9388  46128 461492    0    0     0     0 9305 16627  0 46 54  0 
 1  0      4   9388  46128 461492    0    0     0     0 9297 16618  0 46 54  0 
 1  0      4   9388  46128 461492    0    0     0     0 9297 16625  0 45 55  0 
 1  0      4   9388  46128 461492    0    0     0     0 9281 16600  0 46 54  0 
 1  0      4   9388  46128 461492    0    0     0     0 9279 16629  0 45 55  0 
 0  0      4   9388  46128 461492    0    0     0     0 9293 16701  0 45 55  0 
-->8-- 
 
--8<-- 
tho@paris:linux-2.6.11-acpi-patch>cat /proc/interrupts  
           CPU0       CPU1        
  0:   15170798          0  local-APIC-edge  timer 
  1:          8          0    IO-APIC-edge  i8042 
  4:         58          1    IO-APIC-edge  serial 
  7:          0          0    IO-APIC-edge  parport0 
  9:   41760388   41613523   IO-APIC-level  acpi 
 10:          0          0   IO-APIC-level  ohci_hcd 
 14:         36          8    IO-APIC-edge  ide0 
 15:         26         11    IO-APIC-edge  ide1 
 16:      24376      17396   IO-APIC-level  ide2 
 18:          4          3   IO-APIC-level  tmscsim 
 19:          0          0   IO-APIC-level  EMU10K1 
 20:      34115          0   IO-APIC-level  eth0 
 24:      41735      41701   IO-APIC-level  sym53c8xx 
 25:         15         15   IO-APIC-level  sym53c8xx 
NMI:          0          0  
LOC:   15171296   15171306  
ERR:          0 
MIS:          0 
-->8-- 
 
 
Comment 16 Roeland Merks 2005-08-09 05:26:22 UTC
I have the same problem. acpid is using 99% CPU and writing comments to
/var/log/messages, as follows, about 20 times per second.

I am using kernel 2.6.11.4-21.8-obj (installed by SuSE9.3).

Aug  9 14:24:41 millipede kernel:     ACPI-0615: *** Warning: Unable to turn
cooling device [d6fe7820] 'on'
Aug  9 14:24:41 millipede kernel:     ACPI-0212: *** Warning: Device is not
power manageable
Aug  9 14:24:41 millipede kernel:     ACPI-0615: *** Warning: Unable to turn
cooling device [d6fe7820] 'on'
Aug  9 14:24:42 millipede kernel:     ACPI-0212: *** Warning: Device is not
power manageable
ug  9 14:24:42 millipede kernel:     ACPI-0212: *** Warning: Device is not power
manageable

...

Comment 17 Venkatesh Pallipadi 2005-08-31 19:11:19 UTC
Do you still see the problem with 2.6.13 kernel?
Comment 18 Luming Yu 2005-09-04 23:03:18 UTC
Could you also try reproduce it under windows. 
 
Other possible reason is that associated methods don't work. 
Or Kacpid doesn't invoke cooling method or low frequency ... to 
decrease temperature... 
 
 
 
Comment 19 Adrian Bunk 2006-02-13 14:40:45 UTC
I'm assuming this issue is already fixed.

Please reopen this bug if it's still present in recent 2.6 kernels.
Comment 20 Ilya Gavrilov 2006-03-14 08:42:53 UTC
I have same problem with 2.6.15
on my laptop HP Compaq nc6220
Comment 21 James Georgas 2006-07-20 18:56:07 UTC
I have a Tyan K8W that does something similar. kacpid sucks up one of the
processors, and /proc/interrupts shows acpi making lots and lots of interrupt
requests.

Also, the 'sensors' program (from the lm_sensors package) reports garbage
numbers for voltages and temperatures.

The kacpid thrashing happens whether or not I load the lm_sensors modules.

My kernel is 2.6.16.
Comment 22 Alexey Starikovskiy 2007-06-04 10:35:11 UTC
Could you please check if 2.6.21 kernel has still has this problem?
Comment 23 Alexey Starikovskiy 2007-06-08 05:20:17 UTC
Created attachment 11710 [details]
Do not do acpi_thermal_check recursively/in parallel

Please try the latest kernel with and without following patch.
Comment 24 Alexey Starikovskiy 2007-06-26 00:05:19 UTC
Please re-open the bug if the problem is still present.
Comment 25 Len Brown 2007-10-25 13:45:40 UTC
patch based on comment #23,
6e2157858ac94530fddbf19dc59ab6b392baf1f3
shipped in linux-2.6.24-rc1

Note You need to log in before you can comment on or make changes to this bug.