Bug 9781 (runaway_kacpid)

Summary: kacpid uses 80% CPU - E4400/D945GCNL
Product: ACPI Reporter: Dennis (dennis.aldana)
Component: Config-InterruptsAssignee: ykzhao (yakui.zhao)
Status: CLOSED DUPLICATE    
Severity: high CC: acpi-bugzilla
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.23.11 Subsystem:
Regression: --- Bisected commit-id:
Attachments: atached result of acpidump
dmesg just after booting
/proc/acpi/interrupts just after booting
dmsg one minute after passing the commands
/proc/acpi/interrupts one minute after passing the commands
dmesg just after booting
/proc/acpi/interrupts just after booting
dmsg after
acpi interrupts after
lspci -vxxx (when the bug is present)
dmesg when the bug is not present
lspci -vxxx (when the bug is not present)
dmesg just after booting
/proc/acpi/interrupts just after booting
dmsg one minute after increasing debug layer and level
/proc/acpi/interrupts one minute after increasing debub layer and level
sorry uploaded the wrong file
dmesg when the bug is not present and the debug layer and level are set as you told me
use the attached tool to read from I/O port
defer the gpe enable
dmesg just after inserting 0 to the port
interrupts just after inserting zero to the port, when the bug stopped
dmsg after one minute
interrupts after one minute
try the debug patch
try the debug patch

Description Dennis 2008-01-20 12:38:36 UTC
Latest working kernel version: N/A
Earliest failing kernel version: 2.6.20
Distribution: Vanilla Kernel, tested in default kernels for: zenwalk, ubuntu, opensuse, fedora
Hardware Environment: Processor: Core 2 Duo E4400 @ 2.00 GHz Motherboard: Intel Desktop Board D945GCNL
Problem Description: Processes kacpid and kacpid_notify consume around 75% and 10% respectively of the first processor at all times. 

Workarround: Disable acpi at boot leaving hyper threading using acpi=ht, this however makes the computer not able to halt by itself, ie. must press the poweroff button.

Useful information:

$ top
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
   53 root      15  -5     0    0    0 R   76  0.0  13:31.75 kacpid             
   54 root      15  -5     0    0    0 R    9  0.0   1:50.00 kacpi_notify   

$ cat /proc/interrupts 
           CPU0       CPU1       
  0:    1234009          0   IO-APIC-edge      timer
  1:       3378          0   IO-APIC-edge      i8042
  7:          0          0   IO-APIC-edge      parport0
  8:          0          0   IO-APIC-edge      rtc
  9:   25224917          0   IO-APIC-fasteoi   acpi
 14:      13664          0   IO-APIC-edge      libata
 15:          0          0   IO-APIC-edge      libata
 16:     121633          0   IO-APIC-fasteoi   uhci_hcd:usb4, i915@pci:0000:00:02.0
 17:       4562          0   IO-APIC-fasteoi   eth0
 18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb3
 19:      56486          0   IO-APIC-fasteoi   uhci_hcd:usb2, libata
 22:        496          0   IO-APIC-fasteoi   HDA Intel
 23:          2          0   IO-APIC-fasteoi   uhci_hcd:usb1, ehci_hcd:usb5
NMI:          0          0 
LOC:    1233974    1233892 
ERR:          0

Steps to reproduce: None needed, always present
Comment 1 Dennis 2008-01-20 12:41:15 UTC
Created attachment 14511 [details]
atached result of acpidump
Comment 2 Len Brown 2008-01-20 22:36:02 UTC
so this was a problem in 2.6.20 and is still a problem in 2.6.23.11.
Is there any version of ACPI-enabled linux where this was not a problem?
Comment 3 ykzhao 2008-01-21 00:13:29 UTC
Hi, Dennis
Will you please enable the ACPI debug function in kernel configuration and do the following test after the system is booted?
a. cat /proc/interrupts > interrupts_before and dmesg > dmesg_before
b. killall acpid 
c.echo 0x04 > /sys/module/acpi/parameters/debug_layer and echo 0x08000000 > /sys/module/acpi/parameters/debug_layer
d. cat /proc/acpi/event > acpi_event
e. After the system runs about 1 minutes, cat /proc/interrupts > interrupts_after and dmesg > dmesg_after

It will be great if you can attach the above outputs.
Thanks.
Comment 4 Dennis 2008-01-21 12:00:25 UTC
Len Brown: i know the bug is present in kernels from 2.6.20 to 2.6.23.11 because that is as far as i have tested. I cannot assure if its present in any older kernels or any newer.

ykzhao: i have the results of the commands you asked, just in b. i found no process called acpid, perhaps you ment kacpid, but still it cannot be killed. Also for step d. /proc/acpi/event is not readable by cat(?), the terminal just stays idle for some time. Ill try to see more about this. As for the last step, i left my system iddle for a minute afer step c. and while d. was working and then ran the command. I cant upload the files now, but ill repeat the steps and upload them later.
Comment 5 Dennis 2008-01-22 06:56:26 UTC
Created attachment 14532 [details]
dmesg just after booting
Comment 6 Dennis 2008-01-22 06:57:10 UTC
Created attachment 14533 [details]
/proc/acpi/interrupts just after booting
Comment 7 Dennis 2008-01-22 06:57:46 UTC
Created attachment 14534 [details]
dmsg one minute after passing the commands
Comment 8 Dennis 2008-01-22 06:58:35 UTC
Created attachment 14535 [details]
/proc/acpi/interrupts one minute after passing the commands
Comment 9 ykzhao 2008-01-23 00:53:21 UTC
Hi, Dennis
  Thanks for the info. But the info is not helpful .
  Will you please enable the ACPI debug function in kernel configuration and do the test as required in the comment #3? 
  How to cat /proc/acpi/event is listed in the following.
  a. use the command of "lsof /proc/acpi/event" to list the PID who is using the /proc/acpi/event. 
  b. kill -9 PID
  c. cat /proc/acpi/event > acpi_event
 
  After the test , please attach the above output.
Thanks.

   
Comment 10 Dennis 2008-01-24 09:17:27 UTC
i had enabled this in .config and compiled the kernel before running the test
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_DEBUG_FUNC_TRACE=y
what else do i need to enable acpi debug?
cat /proc/acpi/event stays waiting and reporting events untill i press Ctrl C, how long do i leave it working? it doesn't really report anything most of the time. 

Also i wanted to confirm the bug is present in a core 2 duo E4500 in the same motherboard, with kernel 2.6.23.12. 

Some additional info (tested in both desktops) if you boot to an ntfs system like vista or xp and hard reset the computer (press the reset button) and boot into Linux, the bug is not present and kacpid + kacpid_notify work normally, /proc/interrupts doesn't show so much calls to cpu0 either. This doesn't work if you hard reset Linux.  
Comment 11 ykzhao 2008-01-24 17:16:58 UTC
If you have enabled the following in .config, it is ok and acpi debug function is enabled.
>CONFIG_ACPI_DEBUG=y
>CONFIG_ACPI_DEBUG_FUNC_TRACE=y

cat /proc/acpi/event will stay waiting and only when some events happens, it will report the events. So it is necessary to wait for some time. 
After the system runs about 1 minute, Press Ctr+c to quit the command of cat /proc/acpi/event and then get the output of dmesg, /proc/interrupts. 

Will you please attach the output after finishing the test as required in comment #3? Of course the output of lspci -vxxx is helpful.

It will be great if you can attach the output of dmesg, /proc/interrupts , lspci -vxxx when the Linux can work normally(First boot into vista or XP and then into Linux). 
Comment 12 Dennis 2008-01-27 10:49:30 UTC
Created attachment 14606 [details]
dmesg just after booting
Comment 13 Dennis 2008-01-27 10:51:01 UTC
Created attachment 14607 [details]
/proc/acpi/interrupts just after booting
Comment 14 Dennis 2008-01-27 10:52:50 UTC
Created attachment 14608 [details]
dmsg after
Comment 15 Dennis 2008-01-27 10:53:25 UTC
Created attachment 14609 [details]
acpi interrupts after
Comment 16 Dennis 2008-01-27 10:54:09 UTC
Created attachment 14610 [details]
lspci -vxxx (when the bug is present)
Comment 17 Dennis 2008-01-27 11:03:47 UTC
Created attachment 14611 [details]
dmesg when the bug is not present
Comment 18 Dennis 2008-01-27 11:04:13 UTC
Created attachment 14612 [details]
lspci -vxxx (when the bug is not present)
Comment 19 Dennis 2008-01-27 11:06:46 UTC
ok, this time i tried to increase verbosity of the kernel and here are the results, i hope they are helpful, just cat /proc/acpi/event > acpi_event left a blank file so i didnt attached it
Comment 20 ykzhao 2008-01-27 22:54:14 UTC
Hi, Dennis
    thanks for the info. There is an error in the comment #3:
     > echo 0x04 > /sys/module/acpi/parameters/debug_layer and echo 0x08000000 >
/sys/module/acpi/parameters/debug_layer ( should be debug_level).
    Sorry, It is my fault.
    Will you please test your system again as required in the following?
    a. echo 0x04 > /sys/module/acpi/parameters/debug_layer and echo 0x08000000 >
/sys/module/acpi/parameters/debug_level 
    b. After the system runs about 1 minutes, cat /proc/interrupts >
interrupts_after and dmesg > dmesg_after

   I am very sorry for the fault in comment #3.
   Thanks.
   
    
    
Comment 21 Dennis 2008-01-28 08:48:38 UTC
never mind, we all make mistakes, i have the new results, with the debug level set so high both CPUs work 100%... and dmesg_after indeed shows something loop-like
Comment 22 Dennis 2008-01-28 08:49:24 UTC
Created attachment 14621 [details]
dmesg just after booting
Comment 23 Dennis 2008-01-28 08:50:11 UTC
Created attachment 14622 [details]
/proc/acpi/interrupts just after booting
Comment 24 Dennis 2008-01-28 08:51:55 UTC
Created attachment 14623 [details]
dmsg one minute after increasing debug layer and level

just the end of dmesg was shown by the command since it became too big, if you need the full dmesg tell me how to get it and ill upload it here.
Comment 25 Dennis 2008-01-28 08:52:39 UTC
Created attachment 14624 [details]
/proc/acpi/interrupts one minute after increasing debub layer and level
Comment 26 Dennis 2008-01-28 09:06:27 UTC
Created attachment 14625 [details]
sorry uploaded the wrong file
Comment 27 Dennis 2008-01-28 09:09:30 UTC
Created attachment 14626 [details]
dmesg when the bug is not present and the debug layer and level are set as you told me
Comment 28 Dennis 2008-01-28 09:12:02 UTC
Comment on attachment 14625 [details]
sorry uploaded the wrong file

this is the right dmesg after the commands
Comment 29 ykzhao 2008-01-28 18:52:49 UTC
Thanks for the info. 
Comment 30 ykzhao 2008-01-29 22:33:38 UTC
Created attachment 14645 [details]
use the attached tool to read from I/O port

Hi, Dennis
   Will you please use the attached tool to read the following I/O port and attach the output? 
   a. 0x504  16-bit access
   b. 0x50c  16-bit access
   c. 0x52c  16-bit access
   How to use the attached tool can be found in the readme.
   Thanks.
Comment 31 ykzhao 2008-01-29 22:51:08 UTC
Hi, Dennis
    From the log in comment #22 and #24 it seems that this is a problem of acpi interrupt floods, which is caused by GPE 0x1D .
    Will you please attach the output using the attached tool in comment #30? It will be great if you can do the test in two modes: hard reset and warm reset(first boot into Xp or vista and then into linux).
    
    Will you please do the following test after the system is booted(hard reset)?
    a. write 0x2000 into the 0x52C I/O port (16 bit access)
    b. cat /proc/interrupts > interrupts_before ; dmesg > dmesg_before;
    c. echo 0xa. echo 0x04 > /sys/module/acpi/parameters/debug_layer and echo 0x08000000> /sys/module/acpi/parameters/debug_level 
    d. After the system runs about 1 minutes, cat /proc/interrupts >
interrupts_after and dmesg > dmesg_after.  At the same time please confirm whether the problem disappear.
    
    Thanks.
Comment 32 Dennis 2008-01-30 05:56:44 UTC
Sure, happy to help. This are the results of the tests using the tool:

When the bug is present(normal linux function):
a. ./ior --addr 0x504 --width 16 
 the value of IO port 0x504 is fefb
b. ./ior --addr 0x50c --width 16 
 the value of IO port 0x50c is 1eda
c. ./ior --addr 0x52c --width 16 
 the value of IO port 0x52c is 3000

When the bug is not present (reseting xp)
a. ./ior --addr 0x504 --width 16 
 the value of IO port 0x504 is fefb
b. ./ior --addr 0x50c --width 16 
 the value of IO port 0x50c is 3eda
c. ./ior --addr 0x52c --width 16 
 the value of IO port 0x52c is 3000

as for the second set of tests, shall i make them when the bug is present, not present, or make them for both cases? also, step c, what do you mean by echo 0xa.?
Comment 33 ykzhao 2008-01-30 17:03:02 UTC
Hi, Dennis
   Thanks for the test. 
   > echo 0xa is a mistake. please ignore it. 
   You can test it when the bug is present.  And it is noted that if the bit 13 of 0x52c IO port returned value is 1, write 0x0000 into the 0x52c port(Otherwise please write 0x2000 into the 0x52C port).
   Thanks.
   
Comment 34 ykzhao 2008-01-30 19:40:01 UTC
Created attachment 14656 [details]
defer the gpe enable 

Hi, Dennis
   Will you please try the attached patch and see whether the bug can be fixed? 
   Thanks.
Comment 35 Dennis 2008-01-30 20:50:20 UTC
I did the tests in comment 31, since the value of IO port 0x52c is 3000, i wrote 0 to 0x52c and the bug disappeared!. Ill attach the results. Ill try the patch in a moment when i get the bug back, thanks for your help.  
Comment 36 Dennis 2008-01-30 20:51:25 UTC
Created attachment 14657 [details]
dmesg just after inserting 0 to the port
Comment 37 Dennis 2008-01-30 20:52:12 UTC
Created attachment 14658 [details]
interrupts just after inserting zero to the port, when the bug stopped
Comment 38 Dennis 2008-01-30 20:52:52 UTC
Created attachment 14659 [details]
dmsg after one minute
Comment 39 Dennis 2008-01-30 20:53:21 UTC
Created attachment 14660 [details]
interrupts after one minute
Comment 40 ykzhao 2008-01-30 21:31:28 UTC
Hi, Dennis
   Thanks for so quick response. 
   From the log it seems that the problem is caused by the GPE 0x1D(maybe it is related with the GPIO). And I will investigate it. 
   Will you please test the attched patch in comment #34 and see whether the bug can be fixed? 
   It will be great if you can attach the output of dmesg and /proc/interrupts with the boot option of "acpi.debug_layer=0x04 acpi.debug_level=0x08000010".
   Thanks.
Comment 41 Dennis 2008-01-30 22:27:32 UTC
Yes, i patched and recompiled the kernel but the bug was still present and acting in the same way. 
I couldn't boot the system with "acpi.debug_layer=0x04 acpi.debug_level=0x08000010" because of the heavy load of printing the debug messages, i left it some time but the screen just kept showing the same loop from comment 26.
Comment 42 ykzhao 2008-01-30 23:04:31 UTC
Thanks for the test. 
   It seems that the patch in the comment #34 can't solve the problem. I will investigate the GPE 0x1D issue. 
Thanks.
Comment 43 ykzhao 2008-03-11 02:46:33 UTC
Hi, Dennis
    The problem is caused by acpi interrupt floods, which is related with GPE 0x1D. 
    At the same time GPE 0x1D is shared by three ACPI device(SLEEP button, PS2K, PS2M). But from the log it seems that PS2M PNP device isn't registered and OS can't register the mouse device. 
   > PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
   
   Will you please try the boot option of "i8042.nopnp" and see whether the bug still exists? 
   thanks.
Comment 44 Dennis 2008-03-11 09:56:48 UTC
Hi! 
I tried i8042.nopnp but the bug still exists... 
    dmesg | grep PNP
    [   33.978863] i8042: PNP detection disabled
I also tried actually plug in a PS/2 mouse but still. 
Comment 45 ykzhao 2008-03-12 00:03:50 UTC
Thanks for the test.
It seems that the boot option of "i8042.nopnp" can't resolve the bug.

Will you please check whether there is the BIOS option that enables/disables the wakeup function of PS/2 KBD&Mouse?

Thanks.
Comment 46 ykzhao 2008-03-12 02:27:46 UTC
Created attachment 15223 [details]
try the debug patch

Will you please try the debug patch and see whether the problem still exists?
Comment 47 ykzhao 2008-03-12 02:40:44 UTC
Hi, Dennis
    It seems that this bug is related with the shared 0x1D GPE. It is a wakeup GPE and shared by three device : Sleep button, PS2K, PS2M. And GPE 0x1D will be configured as Run_Wake(Runtime wakeup) gpe in button driver. 
    According to ACPI spec the wake events should not be intermixed with non-wake (runtime) events on the same GPE input. In this desktop the wake events are intermixed with the non-wake(runtime) events on the same 0x1D gpe input.
   
    Please try the debug patch in comment #46. In this patch the GPE 0x1D is only configured as the wakeup gpe. 

    Thanks.

    
Comment 48 Dennis 2008-03-12 11:08:19 UTC
Searched the BIOS but couldn't find something related.
I compiled the patch with kernel 2.6.24 and the bug was solved!
Kacpid acts normally, theres no loop in dmesg after increasing the debug layer & level and /proc/interrupts doesn't show large numbers. I took dmesg before and after as before should i upload them?

Thanks for your efforts so far in this bug.
Comment 49 ykzhao 2008-03-13 22:17:36 UTC
Created attachment 15258 [details]
try the debug patch

Will you please try the debug patch and see whether the problem still exists?

In this patch OSPM will call _PSW method to disable the device's ability to wake up the system. 
Thanks.
Comment 50 Christopher Daley 2008-03-17 07:49:33 UTC
I've compiled with the patch in comment #49 and it fixed the bug for me!  Thank you all.  I'm using kernel 2.6.24.3.
Comment 51 ykzhao 2008-03-17 17:45:37 UTC
Thanks for the test.
It seems that the patch in comment #49 can fix the bug. I will try to send it to the upstream kernel.
At the same time the bug will be marked as the duplicated of bug 10224.

*** This bug has been marked as a duplicate of bug 10224 ***
Comment 52 Dennis 2008-08-14 10:40:16 UTC
hello, I origialy reported bug 9781 due to kacpid eating 80% of cpu. The patch was made and it was included in the upstream kernel. I've using the 2.6.25.4 kernel wich didn't show the bug and had been working well; however, today i realized kacpid was once again consuming a lot of cpu, less than before however (44%). 

# uname -a
Linux zenwalk 2.6.25.4 #1 SMP PREEMPT Fri May 16 14:10:46 CEST 2008 i686 Intel(R) Core(TM)2 Duo CPU     E4400  @ 2.00GHz GenuineIntel GNU/Linux

I'm pretty sure it was working ok before.

iow --addr 0x52c --width 16 --value 0

that still fixes the problem. 

Am I using the correct kernel version? I've been on this system for months without kacpid.
Comment 53 ykzhao 2008-08-14 18:07:19 UTC
The bug is fixed by the following commit:
   > commit 729b2bdbfa19dd9be98dbd49caf2773b3271cc24
   > Author: Zhao Yakui <yakui.zhao@intel.com>
   > Date:   Wed Mar 19 13:26:54 2008 +0800
   > ACPI : Disable the device's ability to wake the sleeping system in the boot phase
   But unfortunately this patch is not in the stable 2.6.25.4 kernel. Maybe you can apply the patch manually on the 2.6.25.4 kernel or try the latest stable kernel. 
   thanks.