Bug 3686
Summary: | Thermal related GPE keep firing: kacpid eats 80% CPU w/ processor events when compiling | ||
---|---|---|---|
Product: | ACPI | Reporter: | Curzio Basso (curzio.basso) |
Component: | Power-Processor | Assignee: | Alexey Starikovskiy (astarikovskiy) |
Status: | CLOSED INSUFFICIENT_DATA | ||
Severity: | high | CC: | acpi-bugzilla, bunk |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.10 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
/proc/interrupts after problem starts
kernel config kernel config file, ACPI processor disabled Do not do acpi_thermal_check recursively/in parallel |
Description
Curzio Basso
2004-11-02 08:06:43 UTC
please attach the /proc/interrupts after this badness. Also, please kill acpid (the user-land one) and see if "cat /proc/acpi/event" shows any clues Created attachment 3956 [details]
/proc/interrupts after problem starts
First of all I have to make a correction: some time after I interrupt the compilation, kacpid goes back to normal CPU usage. Then, starting the compilation the problem comes back again. when the problem starts, 'cat /proc/acpi/event' shows a string of "processor CPU0 00000080 00000000" until kacpid goes back to normal. Regarding the userspace acpid: I am doing all the tests now in single mode, and no acpid is running (I mean, 'ps aux | grep acpi' does not show anything apart kacpid). AFAIK, a userspace daemon is started only when I use KDE, but as I said, the problem is not dependent on that. please confirm that if you # rmmod processor before the test, or build the kernel with CONFIG_ACPI_PROCESSOR=n that this issue goes away. please attach your .config Created attachment 3983 [details]
kernel config
Created attachment 3984 [details]
kernel config file, ACPI processor disabled
I still get the same problem with CONFIG_ACPI_PROCESSOR=n. It may be due to thermal events. Can you try unloading the thermal module or deconfiguring ACPI -> Thermal in the kernel config and check whether it makes any difference. The problems does NOT depend on the 'processor' or the 'thermal' modules: I can unload them, or not compile them at all, and the problem is still there. 9: 239543 XT-PIC acpi This should be sci storm. To verify it, please apply the following patches one by one, and see which patch could make the symptom disappear: --- edited/drivers/acpi/ec.0 2004-11-11 00:01:30.000000000 +0800 +++ edited/drivers/acpi/ec.c 2004-11-11 00:03:13.000000000 +0800 @@ -395,9 +395,14 @@ acpi_disable_gpe(NULL, ec->gpe_bit, ACPI_ISR); +#if 0 status = acpi_os_queue_for_execution(OSD_PRIORITY_GPE, acpi_ec_gpe_query, ec); +#endif + status = AE_OK; + + if (status == AE_OK) return ACPI_INTERRUPT_HANDLED; else --- edited/drivers/acpi/events/evgpe.c.1 2004-11-11 00:10:44.000000000 +0800 +++ edited/drivers/acpi/events/evgpe.c 2004-11-11 00:09:28.000000000 +0800 @@ -668,6 +668,7 @@ * Execute the method associated with the GPE * NOTE: Level-triggered GPEs are cleared after the method completes. */ +#if 0 if (ACPI_FAILURE (acpi_os_queue_for_execution (OSD_PRIORITY_GPE, acpi_ev_asynch_execute_gpe_method, gpe_event_info))) { @@ -675,6 +676,7 @@ "acpi_ev_gpe_dispatch: Unable to queue handler for GPE[%2X], event is disabled\n", gpe_number)); } +#endif break; default: The second patch is working. I compiled a good amount of packages with ACPI support compiled in the kernel, and processor and thermal modules loaded, and kacpid always behaved nicely. Is the patch going in the kernel, or should I keep it somewhere and apply it to every new kernel? To comments #11. No, that patch is just for debugging. As for your problem, we need a policy for thermal events. The problem here is that you thermal GPE keep firing. It's not necessary, because such kind of event is calling for cooling. Whether cooling method is working or not, thermal GPE need to be disabled for a reasonable duration. Just keeping firing thermal event is meaningless. I'm considering that policy. My gateway 2000 "server" (dual PIII, Serverworks OSB4 chipset) experiences a similiar problem with 2.6.11 (which is actually the first 2.6 kernel which finds the on-board SCSI HBA while ACPI is enabled and is hence the first to complete the boot process on this machine - probably due to outdated and buggy BIOS). kacpid eats a good share of the available CPU time (~65 to 100%) when the machine is else idle. No events are reported in /proc/acpi/events. I should add, that the machine receives some 9000 interrupts per second, while idle. --8<--procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 4 9388 46116 461504 0 0 0 0 9289 16687 0 46 54 0 1 0 4 9388 46116 461504 0 0 0 0 9294 16693 0 46 54 0 0 0 4 9388 46128 461492 0 0 0 10 9286 16682 0 46 54 0 1 0 4 9388 46128 461492 0 0 0 0 9296 16651 0 46 54 0 1 0 4 9388 46128 461492 0 0 0 0 9305 16627 0 46 54 0 1 0 4 9388 46128 461492 0 0 0 0 9297 16618 0 46 54 0 1 0 4 9388 46128 461492 0 0 0 0 9297 16625 0 45 55 0 1 0 4 9388 46128 461492 0 0 0 0 9281 16600 0 46 54 0 1 0 4 9388 46128 461492 0 0 0 0 9279 16629 0 45 55 0 0 0 4 9388 46128 461492 0 0 0 0 9293 16701 0 45 55 0 -->8-- --8<-- tho@paris:linux-2.6.11-acpi-patch>cat /proc/interrupts CPU0 CPU1 0: 15170798 0 local-APIC-edge timer 1: 8 0 IO-APIC-edge i8042 4: 58 1 IO-APIC-edge serial 7: 0 0 IO-APIC-edge parport0 9: 41760388 41613523 IO-APIC-level acpi 10: 0 0 IO-APIC-level ohci_hcd 14: 36 8 IO-APIC-edge ide0 15: 26 11 IO-APIC-edge ide1 16: 24376 17396 IO-APIC-level ide2 18: 4 3 IO-APIC-level tmscsim 19: 0 0 IO-APIC-level EMU10K1 20: 34115 0 IO-APIC-level eth0 24: 41735 41701 IO-APIC-level sym53c8xx 25: 15 15 IO-APIC-level sym53c8xx NMI: 0 0 LOC: 15171296 15171306 ERR: 0 MIS: 0 -->8-- I have the same problem. acpid is using 99% CPU and writing comments to /var/log/messages, as follows, about 20 times per second. I am using kernel 2.6.11.4-21.8-obj (installed by SuSE9.3). Aug 9 14:24:41 millipede kernel: ACPI-0615: *** Warning: Unable to turn cooling device [d6fe7820] 'on' Aug 9 14:24:41 millipede kernel: ACPI-0212: *** Warning: Device is not power manageable Aug 9 14:24:41 millipede kernel: ACPI-0615: *** Warning: Unable to turn cooling device [d6fe7820] 'on' Aug 9 14:24:42 millipede kernel: ACPI-0212: *** Warning: Device is not power manageable ug 9 14:24:42 millipede kernel: ACPI-0212: *** Warning: Device is not power manageable ... Do you still see the problem with 2.6.13 kernel? Could you also try reproduce it under windows. Other possible reason is that associated methods don't work. Or Kacpid doesn't invoke cooling method or low frequency ... to decrease temperature... I'm assuming this issue is already fixed. Please reopen this bug if it's still present in recent 2.6 kernels. I have same problem with 2.6.15 on my laptop HP Compaq nc6220 I have a Tyan K8W that does something similar. kacpid sucks up one of the processors, and /proc/interrupts shows acpi making lots and lots of interrupt requests. Also, the 'sensors' program (from the lm_sensors package) reports garbage numbers for voltages and temperatures. The kacpid thrashing happens whether or not I load the lm_sensors modules. My kernel is 2.6.16. Could you please check if 2.6.21 kernel has still has this problem? Created attachment 11710 [details]
Do not do acpi_thermal_check recursively/in parallel
Please try the latest kernel with and without following patch.
Please re-open the bug if the problem is still present. patch based on comment #23, 6e2157858ac94530fddbf19dc59ab6b392baf1f3 shipped in linux-2.6.24-rc1 |