Bug 11976

Summary: 100% kacpid cpu usage unless BIOS fan throttling disabled - D945GCLF/D945GCLF2
Product: ACPI Reporter: Marcel Greter (marcel.greter)
Component: BIOSAssignee: Zhang Rui (rui.zhang)
Status: REJECTED WILL_NOT_FIX    
Severity: high CC: acpi-bugzilla, jirka, jyro215, marcel.greter
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Gentoo 2.6.25-hardened-r9 Subsystem:
Regression: --- Bisected commit-id:
Attachments: acpidump with throtteling enabled
dmesg with throtteling enabled
interrupts/debug info, throttling enabled
Disable the global SMI on the Intel ICH chipset
try the custom DSDT in which the I/O address for SMbus controller is changed from 0x2000 to 0x3000

Description Marcel Greter 2008-11-07 14:50:25 UTC
Failing kernel version:
vmlinuz-2.6.25-hardened-r9

Distribution:
Gentoo AMD64 2008.0

Hardware Environment:
D945GCLF2

Software Environment:
GCC x86_64-pc-linux-gnu-3.4.6
sys-devel/binutils-2.18-r3
sys-libs/glibc-2.6.1

Problem Description:
When automatic fan throtteling is enabled in the BIOS, kacpid will hog the cpu. On the dual core version, it first takes one cpu, then the other. You also cannot do a soft-shutdown anymore.

There are two threads on the net that explain the problem:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/254326
http://forums.gentoo.org/viewtopic-p-5274191.html#5274191

Steps to reproduce:
Use D945GCLF/D945GCLF2 Hardware and enable automatic throtteling in the BIOS.
Then do some stuff (like installing gentoo) and kacpid will go into "endless loop".

Solution:
Disable automatic throtteling in the BIOS

I had the same problem with gentoo i686 and amd64 minimal install CD. As I know, those Boot-CDs use the normal gentoo-sources kernel. So it's not a "hardened" problem. It also seems that D945GCLF boards are affected too (although you might also disable acpi completely on this board, but that doesn't make sense on a dual-core).

This is my first kernel bug report, so I hope I did everything to ensure that this is a valid bug report. Not sure if this is fixable on the kernel side, but you may code an exception for those boards in the worst case.
Comment 1 Len Brown 2008-11-07 22:43:14 UTC
please re-open when you can reproduce the issue using a recent
kernel.org.  eg. 2.6.27.stable
Comment 2 Marcel Greter 2008-11-08 12:01:02 UTC
Compiled vanilla kernel 2.6.27.5, Same result :(

# uname -r
2.6.27.5

# dmesg
BUG: soft lockup - CPU#0 stuck for 61s! [kacpid:98]
Modules linked in: snd_hda_intel snd_pcm snd_timer 8139too mii snd snd_page_alloc r8169
CPU 0:
Modules linked in: snd_hda_intel snd_pcm snd_timer 8139too mii snd snd_page_alloc r8169
Pid: 98, comm: kacpid Not tainted 2.6.27.5 #1
RIP: 0010:[<ffffffff8027f443>]  [<ffffffff8027f443>] kmem_cache_alloc+0x80/0xab
RSP: 0000:ffff88003e801d20  EFLAGS: 00000202
RAX: ffff880028c973a8 RBX: 00000000000080d0 RCX: 0000000000000000
RDX: ffff880028c973a8 RSI: 00000000000080d0 RDI: ffff88003f1c7b00
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880028d012d0
R10: ffff88003d8e8568 R11: ffff88003d8e8400 R12: 0000000000000000
R13: 0000000000000202 R14: 0000000000000001 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff80725a00(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000047695c CR3: 0000000023c98000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
 [<ffffffff8037faf4>] acpi_ps_alloc_op+0x73/0x92
 [<ffffffff8037f17b>] acpi_ps_parse_loop+0x25b/0x854
 [<ffffffff8037eafd>] acpi_ps_parse_aml+0x77/0x29e
 [<ffffffff8037fd52>] acpi_ps_execute_method+0x124/0x1c6
 [<ffffffff8037caa5>] acpi_ns_evaluate+0x14d/0x1a4
 [<ffffffff803742b8>] acpi_ev_asynch_execute_gpe_method+0xbf/0x112
 [<ffffffff8036d25e>] acpi_os_execute_deferred+0x0/0x2c
 [<ffffffff8036d281>] acpi_os_execute_deferred+0x23/0x2c
 [<ffffffff8023f03c>] run_workqueue+0x7e/0xfa
 [<ffffffff8023f0b8>] worker_thread+0x0/0xec
 [<ffffffff8023f19a>] worker_thread+0xe2/0xec
 [<ffffffff80242169>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80242169>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80241bb0>] kthread+0x3d/0x63
 [<ffffffff8020c1f9>] child_rip+0xa/0x11
 [<ffffffff80241b73>] kthread+0x0/0x63
 [<ffffffff8020c1ef>] child_rip+0x0/0x11
Comment 3 ykzhao 2008-11-09 06:29:49 UTC
Will you please attach the output of acpidump, dmesg when automatic fan throttling is enabled in BIOS?
Thanks.
Comment 4 Marcel Greter 2008-11-09 10:09:46 UTC
Created attachment 18751 [details]
acpidump with throtteling enabled
Comment 5 Marcel Greter 2008-11-09 10:10:36 UTC
Created attachment 18752 [details]
dmesg with throtteling enabled
Comment 6 Marcel Greter 2008-11-09 10:16:40 UTC
There is no difference (in acpidump/dmesg) if throtteling is enabled or not. So not sure if this is of any help. Sometimes the boot-process doesn't get past "floppy0: no floppy controllers found". If you look at the dmesg, you see that a few lines above it initialises some acpi-stuff (but that's just a wild guess).

Just tell me when you need more info!
Comment 7 ykzhao 2008-11-09 22:51:52 UTC
Hi, Marcel
    From the acpidump we can know that GPE 0x1D is shared by several ACPI devices(SLPB, UAR2, UAR1). 
    As the SLPB is run-time wakeup GPE, it will be enabled. If GPE 0x1D is triggered, the L1D ACPI object will be evaluated, in which the WAKE object will called. The following is the definition about WAKE object.
     >Method (WAKE, 0, NotSerialized)
                {
                    If (And (PSTS, 0x01))
                    {
      >                  If (LAnd (And (PST1, 0x04, Local0), And (PEN1, 0x04, Local1)))
       >                 {
        >                    Store (Local0, PST1)
         >                   Notify (\_SB.PCI0.LPC.UAR1, 0x02)
          >              }
           >             Store (PST1, PST1)
            >            Store (PSTS, PSTS)
            >       }
     At the same time we know that the bit 2/3 of PEN1 is cleared when the _PSW object is called to disable the wake ability of the UAR1/UAR2 devices.
     > Method (_PSW, 1, NotSerialized) 
                    {
                        If (LEqual (Arg0, Zero))
                        {
                            And (PEN1, 0xFB, PEN1)
                        }

                }
      So there is nothing to do in the WAKE object except the following:
      >Store (PST1, PST1)
      >Store (PSTS, PSTS)
      In such case the trigger source about GPE 0x1D can't be cleared and the GPE 0x1D will be triggered infinitely. So the kacpid usage will be almost 100%.
      
      But it is interesting that there is no problem if the automatic fan throttling is disabled in BIOS.
      Maybe this is caused by the BIOS and can't be fixed by Linux-kernel.
      thanks.
    
Comment 8 Len Brown 2008-11-11 21:22:25 UTC
Please verify that the board has the latest BIOS.
In BIOS SETUP, if you choose to set all the settings
to default, what is this feature set to?

Please paste the output from
grep acpi /proc/interrupts
grep . /sys/firmware/acpi/interrupts/*

I'm not sure how Zhao-Yakui concluded we're screaming
in _L1D, but the above will confirm it.

Yakui,
in addition to the above... _L1D also evaluates
Store (0x01, ILED)
which I would hope turns out to be something to turn off
the interrupt source.
Comment 9 Marcel Greter 2008-11-11 23:49:41 UTC
Created attachment 18814 [details]
interrupts/debug info, throttling enabled
Comment 10 Marcel Greter 2008-11-12 00:35:40 UTC
Shipped BIOS: LF94510J.86A.0099.2008.0731.0303
Current BIOS: LF94510J.86A.0103.2008.0814.1910

The upgrade didn't help. Still kacpid problems.
The debug above (id=18814) is from the old BIOS version.

The good news is, that the feature is disabled by default.
Comment 11 ykzhao 2008-11-13 01:24:10 UTC
Hi, Marcel
    Thanks for the test. It seems that the issue can't be resovled by the BIOS upgrading. Only when the feature is disabled in BIOS, the problem will disappear.
    Will you please confirm whether the windows can work well on this box if the feature is enabled in BIOS? (Please check whether the cpu usage is very high).
    Thanks.
    
Comment 12 Jiri Pirko 2008-11-22 13:25:48 UTC
I'm testing D945GCLF2. I had described troubles with Debian testing kernel 2.6.26-1 (100% kacpid cpu usage). Now I'm running 2.6.28-rc6 and I do not experience this issue. It seems to be solved.
Comment 13 Zhang Rui 2008-11-23 18:58:34 UTC
marcel, can you please try 2.6.28-rc6 to see if the problem has been fixed?
Comment 14 Jori Hardman 2008-12-02 15:13:51 UTC
I am experiencing a similar problem after ArchLinux recently upgraded its kernel to 2.6.27.  Mine is slightly different in that kacpid uses 20-50% of the cpu instead of all of it.  I installed the 2.6.28-rc6 kernel to see if the problem was fixed, but kacpid still takes over after running anything cpu intensive.  The problem occurs on my laptop, which is a Compal ifl90 with intel core2duo.
Comment 15 Zhang Rui 2008-12-02 16:54:51 UTC
Jori,
the problem only happens if the BIOS "auto fan throttling" option is enabled
please make sure if you have the same problem.
if it's not the same bug, please open a new bug report and attach the acpidump and "grep . /sys/firmware/acpi/interrupts/*"
Comment 16 ykzhao 2008-12-03 01:21:19 UTC
Created attachment 19122 [details]
Disable the global SMI on the Intel ICH chipset

Will you please try the debug patch on the latest kernel and see whether the issue still exists?(Of course the FAN throtting should be enabled in BIOS)
    In the debug patch the SMI is disabled.
    Thanks.
Comment 17 ykzhao 2008-12-03 01:36:49 UTC
Will you please attach the output of lspci -vxxx?
    From the dmesg in comment #5 it seems that the I/O address for SMbus is 0x3000-0x301f.
    But from the acpidump it seems that the I/O address for SMbus controller is
0x2000-0x2016. And in the AML code the SMbus controller will be accessed.
    It is incorrect.
    
    
Comment 18 ykzhao 2008-12-03 01:39:58 UTC
Created attachment 19123 [details]
try the custom DSDT in which the I/O address for SMbus controller is changed from 0x2000 to 0x3000

Does someone try the custom DSDT and see whether the problem still exists? Of course the Fan throtting should be enabled in BIOS.
   In the custom DSDT the I/O address is changed from 0x2000 to 0x3000.
   Thanks.
Comment 19 ykzhao 2008-12-03 01:40:58 UTC
How to use the custom DSDT can be found in 
   http://www.lesswatts.org/projects/acpi/faq.php

Thanks.
Comment 20 Marcel Greter 2008-12-03 13:35:42 UTC
Sorry, have been busy the last weeks.
I'm currently running vanilla-sources-2.6.28_rc7

# uname -r
2.6.28-rc7

Unfortunately I still have kacpid problems.
I'm now gonna lock into the custom DSDT.
Comment 21 Marcel Greter 2008-12-03 14:25:16 UTC
The custom DSDT seems to fix the problem. I'm not yet 100% certain, as the system is only running a few minutes so far (but normaly i can trigger the problem by then). I also do see kacpid in "top" from time to time, but it will no longer hog the cpu.

# uname -r
2.6.28-rc7

As a side note, you need sys-power/iasl on gentoo. Download the patch above and save as DSDT.dsl. Then simply fallow the faq from step 4.

Can this change be implemented into the default kernel or do we need to patch each kernel version ourself? You may change the status at your discretion.
Comment 22 ykzhao 2008-12-03 17:14:00 UTC
Thanks for so quick response.
    From the test result it is confirmed that this issue is related with BIOS. On the box there exists the SMbus controller. But the I/O address defined in PCI and AML code is different. The SMBus I/O address obtained from the PCI config space is 0x3000-0x301f. From the AML code the I/O address is 0x2000-0x201f.
    
   When the Fan throtting is enabled in BIOS, the GPE0 will be triggered and in the _L00 object the SMbus I/O port will be accessed. As the SMbus I/O port defined in AML code is incorrect, it can't be accessed correctly. If the Fan throtting is disabled, maybe the GPE0 won't be triggered and SMbus I/o port won't be accessed. 
   
   Based on the above analysis and test result, it is confirmed that this issue is related with the BIOS. And it had better be fixed by BIOS upgrading.
   Thanks.
   
Comment 23 Zhang Rui 2008-12-03 17:33:42 UTC
yes, this is a BIOS bug that we can not fix in Linux/ACPI.
you need to either upgrade your BIOS to see if it has been fixed or use the customized DSDT for each version of your kernel.
close this bug.
Marcel, please re-open it if you can reproduce this problem with the customized DSDT.