Bug 13013

Summary: kacpid 100% cpu utilization
Product: ACPI Reporter: Ales Seifert (seifert)
Component: ACPICA-CoreAssignee: Zhang Rui (rui.zhang)
Status: REJECTED INSUFFICIENT_DATA    
Severity: high CC: dzhonw, rui.zhang, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30-24 Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci -vxxx output
dmesg output
grep . /sys/firmware/acpi/interrupts/* output
acpidump output
l /proc/acpi/thermal_zone/*/* output
lspci -vxxx output
cat /proc/acpi/thermal_zone/*/*

Description Ales Seifert 2009-04-05 14:48:23 UTC
Created attachment 20817 [details]
lspci -vxxx output

2.6.29-59-default #1 SMP Sun Apr 5 12:34:54 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux - openSUSE 11.1

HP EliteBook 8730w


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   34 root      15  -5     0    0    0 R  101  0.0 196:35.70 kacpid

it is the same with or without X running

boot acpi=off helps but as I need all cores and powermanagement it is not the solution

Reproducible always.
Comment 1 Ales Seifert 2009-04-05 14:49:07 UTC
Created attachment 20818 [details]
dmesg output
Comment 2 Ales Seifert 2009-04-05 14:53:41 UTC
Created attachment 20819 [details]
grep . /sys/firmware/acpi/interrupts/* output
Comment 3 Ales Seifert 2009-04-05 14:55:43 UTC
Created attachment 20820 [details]
acpidump output
Comment 4 ykzhao 2009-04-07 02:32:24 UTC
Will you please attach the output of /proc/acpi/thermal_zone/*/* when this issue happens?
    Will you please confirm whether the issue happens after the box is booted or when overheating?
    Thanks.
Comment 5 Zhang Rui 2009-04-07 03:27:27 UTC
is this a regression? I mean are there any kernels released earlier that work for you before?
Comment 6 Ales Seifert 2009-04-07 05:00:54 UTC
Created attachment 20848 [details]
l /proc/acpi/thermal_zone/*/* output
Comment 7 Ales Seifert 2009-04-07 05:04:30 UTC
This issue happens always from the notebook is booted.

Yes this is probably regresion, 2.6.27.19 was previous kernel I used and it worked fine.
Comment 8 Ales Seifert 2009-04-07 05:20:49 UTC
Created attachment 20849 [details]
lspci -vxxx output
Comment 9 ykzhao 2009-04-08 05:56:36 UTC
hi, Ales
    sorry for my mistake. please attach the output of "cat /proc/acpi/thermal_zone/*/*".
    From the description it seems that this is a regression. Will you please use git-bisect to identify the commit which causes the regression?
    Thanks.
Comment 10 Ales Seifert 2009-04-08 06:48:13 UTC
Created attachment 20877 [details]
cat /proc/acpi/thermal_zone/*/*
Comment 11 Ales Seifert 2009-04-09 07:31:48 UTC
I tested 2.6.27.19 again today and found it has same bug, i just didn't noticed before. So it seems not to be regression anymore. Is there anything I can provide to help you to resolve the bug?
Comment 12 ykzhao 2009-04-10 15:27:18 UTC
Hi, Ales
    Thanks for the info.
    Does this issue also happen on earlier kernel? For example: 2.6.26.xx or 2.6.17 kernel?
    And from the info in comment #10 it seems that the temperature is far lower than the threshold.
    Thanks.
Comment 13 Ales Seifert 2009-04-12 09:44:14 UTC
Will try earlier kernels as soon as I get them downloaded... :)

In the meantime I found, that sometimes when my notebook wakeup from hibernation the problem goes away. Sometimes it comes back after next hibernation cycle.

It has definitively nothing to do with the temperature.
Comment 14 Zhang Rui 2009-05-05 02:21:42 UTC
ping Ales.
Comment 15 Ales Seifert 2009-05-05 06:04:40 UTC
(In reply to comment #14)
> ping Ales.

Sorry for the delay, have tried 2.6.25.5 with the same problem, older kernel doesn't even boot correctly for me with some disk partition problems, so I cannot test it (tried 2.6.22, 2.6.19, 2.6.17)

Just to confirm comment #13, hibernation helps in 99%. After hibernation I can work normally without any kacpid utilization till next restart.
Comment 16 Ales Seifert 2009-05-11 05:57:40 UTC
Some more information, after wakeup from sleep (not hibernation) problem comes back again every time. After hibernation I can work normally without any kacpid utilization till next restart or sleep.

Any idea what I could do to analyze where could the problem be?
Comment 17 Ales Seifert 2009-05-20 08:19:31 UTC
Hi, I tried to enable debugging and here are some output:

exregion-0290 [00] ex_system_io_space_han: System-IO (width 8)
R/W 0 Address=000000000000EF80

repeated endlessly and many times in second
Comment 18 Ales Seifert 2009-05-20 08:43:32 UTC
(In reply to comment #14)
> ping Ales.

ping  Zhang Rui  :)
Comment 19 Ales Seifert 2009-05-20 09:42:06 UTC
some more info: since I updated to 2.6.29-163 hybernation doesn't help anymore, kacpid has 100% CPU utilization after resume,

kernel boot parameter acpi=ht makes system usable, but with limited acpi functionality
Comment 20 Zhang Rui 2009-05-21 01:57:52 UTC
well, this bug looks like a duplicate of bug 13268 to me.
please make a double check.
Comment 21 Ales Seifert 2009-05-21 05:13:12 UTC
Hi, thanks for pointing me to that bug, solution provided works for me as well!!

"echo disable > /sys/firmware/acpi/interrupts/gpe00"

But there are a few symptoms different:

1. Temperature of my system is low
2. Problem occurs just after the boot even in single user mode

I will try to test it within Windows and will let you know if same interrupt storm is there too.
Comment 22 Ales Seifert 2009-05-21 05:21:24 UTC
One more question...

Is disabling this GPE harmless? It is not blank like in the bug 13268.

        Method (_L00, 0, NotSerialized)
        {
            \_TZ.THEV ()
        }

        Method (THEV, 0, Serialized)
        {
            Store (\_SB.PCI0.LPCB.SMAB (0x19, 0x00, 0x00), Local0)
            Store (0x00, Local1)
            Store (0x38, Local3)
            Store (\_SB.PCI0.LPCB.SMAB (0x5D, 0x23, 0x00), Local0)
            If (LEqual (And (Local0, 0xFF00), 0x00))
            {
                If (And (Local0, 0x02))
                {
                    Store (\_SB.PCI0.LPCB.SMAB (0x5D, 0x25, 0x00), Local2)
                    If (LEqual (And (Local2, 0xFF00), 0x00))
                    {
                        If (And (Local2, 0x01))
                        {
                            Or (Local1, 0x20, Local1)
                            And (Local3, Not (0x20), Local3)
                        }

                        If (And (Local2, 0x02))
                        {
                            Or (Local1, 0x08, Local1)
                            And (Local3, Not (0x08), Local3)
                        }

                        If (And (Local2, 0x04))
                        {
                            Or (Local1, 0x10, Local1)
                            And (Local3, Not (0x10), Local3)
                        }
                    }
                }

                If (And (Local0, 0x04))
                {
                    Store (\_SB.PCI0.LPCB.SMAB (0x5D, 0x24, 0x00), Local2)
                    If (LEqual (And (Local2, 0xFF00), 0x00))
                    {
                        If (And (Local2, 0x01))
                        {
                            Or (Local1, 0x20, Local1)
                        }

                        If (And (Local2, 0x02))
                        {
                            Or (Local1, 0x08, Local1)
                        }

                        If (And (Local2, 0x04))
                        {
                            Or (Local1, 0x10, Local1)
                        }
                    }
                }
            }
            Else
            {
                Store (0x38, Local1)
            }

            Acquire (THER, 0xFFFF)
            Or (THSC, Local1, THSC)
            And (WHTR, Not (0x38), Local4)
            Or (Local4, Local3, WHTR)
            Release (THER)
            If (And (Local1, 0x20))
            {
                Notify (LOCZ, 0x80)
            }

            If (And (Local1, 0x08))
            {
                Notify (CPUZ, 0x80)
            }

            If (And (Local1, 0x10))
            {
                Notify (CP2Z, 0x80)
            }
        }
Comment 23 Zhang Rui 2009-06-18 06:13:26 UTC
No, we can not disable GPE00 in this case.

I guess that
1. GPE00 is fired.
2. \_TZ.THEV sends a notification to the ACPI thermal driver.
3. ACPI thermal driver receives the notification and re-evaluate the thermal zone temperature
4. re-evaluating the temperature triggers another GPE00 interrupt.
so this is an endless loop.

Please clear CONFIG_ACPI_THERMAL or blacklist the ACPI thermal driver and see if it helps.
Comment 24 Ales Seifert 2009-06-18 13:49:59 UTC
I've clared CONFIG_ACPI_THERMAL (now I'm running 2.6.30-24).
after make and make install I have got:
....
Kernel image:   /boot/vmlinuz-2.6.30-24
Initrd image:   /boot/initrd-2.6.30-24 
.
.
.
FATAL: Module thermal not found.
WARNING: no dependencies for kernel module 'thermal' found.
Kernel Modules: scsi_mod libata ahci hwmon thermal_sys processor fan jbd mbcache ext3 edd crc-t10dif sd_mod usbcore ohci-hcd uhci-hcd ehci-hcd hid usbhid
Features:       block usb resume.userspace resume.kernel
Bootsplash:     openSUSE (800x600)
48274 blocks

after reboot the system was OK, but after sleep (to RAM) and resume kacpid utilization is back again, only what helps is:

"echo disable > /sys/firmware/acpi/interrupts/gpe00"

any idea? Do you think it is a BIOS bug or an acpi bug?
Comment 25 Zhang Rui 2009-06-19 03:23:01 UTC
please open the "/etc/modprobe.d/blacklist" file and add "blacklist thermal" at the end of this file.
reboot and see if the problem still exists.
Comment 26 Ales Seifert 2009-06-19 07:53:38 UTC
yes the problem still exists, module thermal gets loaded despite 

"blacklist thermal" in "/etc/modprobe.d/blacklist"

unloading "modprobe -r thermal" doesn't help
Comment 27 Ales Seifert 2009-06-19 08:08:42 UTC
edited /etc/sysconfig/kernel and removed "thermal" from initrd so now is thermal not loaded after reboot, system is OK without kacpid utilization... will try sleep and resume if it stay the same...
Comment 28 Ales Seifert 2009-06-19 08:12:25 UTC
after sleep and resume kacpid utilization is back and

"echo disable > /sys/firmware/acpi/interrupts/gpe00"

seems doesn't help anymore ...

"thermal" is still not loaded
Comment 29 Ales Seifert 2009-06-19 08:14:45 UTC
"echo disable > /sys/firmware/acpi/interrupts/gpe00" works when out from dock... it is really alchemy
Comment 30 Zhang Rui 2009-06-22 08:48:34 UTC
this doesn't seem like a software problem to me.
it would be greate if you can verify if the problem still exists in windows.
Comment 31 Jonathan 2009-07-19 21:16:10 UTC
This problem affects me when the lid is closed, using either a home-compiled kernel or debian stock kernels.  Filed without response to debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=536240

Trying "echo disable > /sys/firmware/acpi/interrupts/gpe00" returns an error (echo: write error: invalid argument).

Getting rid of the thermal module doesn't change the behaviour.

Linux terek 2.6.30.1-nimloth #2 SMP Thu Jul 9 14:36:36 KGT 2009 i686 GNU/Linux
Comment 32 Zhang Rui 2009-07-20 01:49:09 UTC
Jonathan,
this is a different problem.
please open a new bug and attach
1. acpidump output
2. output of "grep . /sys/firmware/acpi/interrupts/*" both before and after closing the lid.
Comment 33 Jonathan 2009-07-20 07:45:38 UTC
New bug posted at:
http://bugzilla.kernel.org/show_bug.cgi?id=13802

I wasn't sure what category of ACPI it belonged in, so that might have to be changed.
Comment 34 Zhang Rui 2009-08-12 06:21:44 UTC
Ales,
can you verify if the problem still exists in windows please?
as this doesn't look like a Linux kernel bug to me.
Comment 35 Zhang Rui 2009-09-03 06:55:17 UTC
no response from the bug report for more than two months.
close it.