Bug 12170 (Jori) - acpid takes 20-50% of cpu - Compal IFL90
Summary: acpid takes 20-50% of cpu - Compal IFL90
Status: CLOSED UNREPRODUCIBLE
Alias: Jori
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Alexey Starikovskiy
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-06 10:33 UTC by Jori Hardman
Modified: 2009-08-13 03:03 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.28
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
My acpid.log (471.48 KB, text/plain)
2008-12-06 10:34 UTC, Jori Hardman
Details
Output of grep . /sys/firmware/acpi/interrupts/* (2.07 KB, text/plain)
2008-12-06 10:34 UTC, Jori Hardman
Details
git bisect log output (1.16 KB, text/plain)
2009-01-07 11:47 UTC, Jori Hardman
Details
dmesg output with '#define DEBUG' uncommented in ec.c (458.07 KB, text/plain)
2009-01-08 14:39 UTC, Jori Hardman
Details
separate MSI delays (1.54 KB, patch)
2009-03-29 07:39 UTC, Alexey Starikovskiy
Details | Diff

Description Jori Hardman 2008-12-06 10:33:20 UTC
Latest working kernel version: 2.6.25-9
Earliest failing kernel version: 2.6.27
Distribution: Arch
Hardware Environment: Compal IFL90

After a recent arch kernel upgade to 2.6.27, kacpid would take over 20-50% of my cpu cycles after the computer was put under load.  I upgraded my kernel to 2.6.28-rc6 to see if the problem is still present, and it is.  Downgrading the kernel back to 2.6.25-9 has solved the problem for now, and everything works normally.  I will attach my acpid.log and also the output of grep . /sys/firmware/acpi/interrupts/*.
Comment 1 Jori Hardman 2008-12-06 10:34:05 UTC
Created attachment 19185 [details]
My acpid.log
Comment 2 Jori Hardman 2008-12-06 10:34:31 UTC
Created attachment 19186 [details]
Output of grep . /sys/firmware/acpi/interrupts/*
Comment 3 ykzhao 2008-12-07 05:40:18 UTC
From the info of /sys/firmware/acpi/interrupt/* it seems that the GPE 1C is triggered at very high frequency.
    Will you please attach the output of acpidump, lspci -vxxx?
    As this is a regression, will you please use the git-bisect to identify which commit causes the regression?
    Thanks.
Comment 4 Len Brown 2008-12-08 19:25:08 UTC
just a wild guess, try bisecting with target drivers/acpi/ec.c
to see if the changes in that file were related to this regression...
Comment 5 Jori Hardman 2008-12-08 19:29:16 UTC
I apologize for my slow action on this.  I'm really busy this week and next week, but after that I should have the time to hunt down the change and provide the info needed for an actual fix.
Comment 6 Jori Hardman 2008-12-19 16:49:45 UTC
I've looked at the problem a little more and found kernel 2.6.27.7 to be the first kernel version with the bug.  I have yet to git bisect to find the commit, but I should be able to do that sometime within the next week.
Comment 7 Zhang Rui 2009-01-04 00:45:11 UTC
ping Jori. :)
Comment 8 Jori Hardman 2009-01-04 21:44:51 UTC
Sorry for my slowness!  I was much busier during the holiday season than I ever expected.  I am back to work bisecting as of now.
Comment 9 Jori Hardman 2009-01-06 11:23:51 UTC
I'm still new to the use of git bisect and I have a couple questions.  When I start bisecting, I use the current git tree as bad.  The bug arose between 2.6.27.6 and 2.6.27.7.  Is there any way to narrow my search to commits between just those two versions?  If not, what should I use as the first good version?
Comment 10 Zhang Rui 2009-01-06 17:46:19 UTC
hmm, as you are using the stable kernel, first you need to git clone the stable tree.
then run
git bisect good v2.6.27.6
git bisect bad v2.6.27.7
git bisect start
Comment 11 Jori Hardman 2009-01-07 11:47:08 UTC
Finally found the commit.  Thanks for bearing with me while I figured this out.  Ends up I cloned the wrong tree.
Comment 12 Jori Hardman 2009-01-07 11:47:52 UTC
Created attachment 19704 [details]
git bisect log output
Comment 13 ykzhao 2009-01-07 17:07:07 UTC
Hi, Jori
  thanks for the git-bisect.
  From the output the last bad commit is :
  >commit d09277432f84ae0c8588032518e1ff7842ef5606
  >Author: Alexey Starikovskiy <astarikovskiy@suse.de>
  >Date:   Sun Nov 9 19:01:06 2008 +0300
    ACPI: EC: lower interrupt storm treshold
    
    http://bugzilla.kernel.org/show_bug.cgi?id=11892
   
  thanks for your work

Hi, Rui
   How about assigning this bug to EC category?

Thanks.
Comment 14 Zhang Rui 2009-01-07 17:10:39 UTC
right.
let's see if Alexey has some ideas. :)
Comment 15 Alexey Starikovskiy 2009-01-08 03:02:56 UTC
Guys, why gpe_all != sci (comment #2)?
Jori, please enable '#define DEBUG' and post dmesg. 
Comment 16 Jori Hardman 2009-01-08 08:08:11 UTC
Alexey,
Can you please be a little more descriptive about how and where to enable '#define DEBUG'?  This is the first kernel bug I've worked on and I'm unfamiliar with a lot of this stuff.

Thanks.
Comment 17 Alexey Starikovskiy 2009-01-08 12:20:59 UTC
please open <kernel-source>/drivers/acpi/ec.c in text editor and uncomment '#define DEBUG' statement at the beginning of the file. Save and compile the kernel as usual.
Comment 18 Jori Hardman 2009-01-08 14:39:38 UTC
Created attachment 19724 [details]
dmesg output with '#define DEBUG' uncommented in ec.c
Comment 19 Jori Hardman 2009-02-07 17:37:53 UTC
The laptop that experiences this problem died a couple weeks ago, but I finally got it back today.  I compiled the latest kernel from Torvald's git tree, and I still experience the same problem.  Has there been any progress made toward finding a fix?  I'd really like to run a later kernel than 2.6.27.6.
Comment 20 Jori Hardman 2009-02-26 19:46:56 UTC
Is there anything else I can do to help find a fix?
Comment 21 Jori Hardman 2009-03-03 12:43:58 UTC
I have some additional info that may be of some use.  I recently booted the ubuntu jaunty alpha4 livecd to see if the same issue occurs.  Although the kernel used is 2.6.28, the problem isn't present.  Does the ubuntu kernel patchset change anything in ec.c?
Comment 22 Francisco Pina Martins 2009-03-26 09:58:11 UTC
Hello all!
I have the same laptop model and I'm thinking of switching from Ubuntu to arch, so I'm kind of concerned about this bug.
I'm currently running Ubuntu Hardy, but I should be switching to arch pretty soon (as soon as I get enough free time for a full migration).
I've been running Arch (2.6.28-ARCH)on virtualbox-ose and it doesn't seem to be an issue in there, but well, it's virtualized...
Anyway, just let me know if there's anything I can do to help.
Comment 23 Alexey Starikovskiy 2009-03-26 13:01:56 UTC
Yes, one thing you could try is to check if changing udelay(...) in drivers/acpi/ec.c to msleep(1) helps. This was changed back and forth several times already, and always there was a reason... Now I start to think that MSI users (who require udelay()) should have different EC driver...
Comment 24 Jori Hardman 2009-03-27 00:31:00 UTC
Francisco,

What bios version are you using?  This bug only occurs for me when the cpu is under load.  Kacpid takes over the cpu as soon as gnome loads, but for testing purposes I've just been using the stress program found in the arch repositories to put load on the cpu.  I'm interested to see if the bug can be duplicated or if it's just a problem with my configuration, so post as soon as your able to test a full installation.
Comment 25 Jori Hardman 2009-03-29 01:26:44 UTC
Alexey,

I am happy to report that I tried your suggestion, and I have the 2.6.28 kernel running with no problem.  It seems we've found the fix.
Comment 26 Alexey Starikovskiy 2009-03-29 07:39:04 UTC
Created attachment 20719 [details]
separate MSI delays

Please check if this patch works?
Comment 27 Jori Hardman 2009-03-30 03:47:37 UTC
I applied the patch to the 2.6.29 kernel and it works like a charm.  Thanks for finding a solution.
Comment 28 Francisco Pina Martins 2009-03-30 21:21:13 UTC
Hello again.
I'm running Bios Version 1.13
I should be installing Arch this weekend, so by Sunday I should have something to report. I'm glad to hear that there is a solution.
Thank you for finding it!
Hope it gets merged soon.

Francisco
Comment 29 Francisco Pina Martins 2009-04-07 09:59:56 UTC
I have finally installed Arch.
I have everything configured under Gnome and got a 100% usable system.

My uname -a reads:

Linux MegalaptopII 2.6.28-ARCH #1 SMP PREEMPT Tue Mar 17 07:22:53 CET 2009 x86_64 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz GenuineIntel GNU/Linux

I am experiencing no issues with acpi or CPU usage.

I'm guessing it must be specific to your config...

Hope this info can be of some help.
Comment 30 Jori Hardman 2009-04-13 19:54:11 UTC
I just compiled the latest arch kernel sans patch, and I still experience the issue.  I wonder what the difference is between my machine and yours.

Alexey, any chance that this patch will be merged soon?
Comment 31 Alexey Starikovskiy 2009-04-13 20:17:51 UTC
Len Brown thinks it is 2.6.31 material...
Comment 32 Francisco Pina Martins 2009-04-13 21:03:55 UTC
Owo!
2.6.31 will still take a while, I guess...

I don't know the difference between our machines. you can compare your specs with the ones I posted. Maybe it's CPU specific?
Comment 33 Jori Hardman 2009-04-13 22:06:03 UTC
Linux arch 2.6.29-ARCH #1 SMP PREEMPT Sat Apr 4 20:53:21 CDT 2009 x86_64 Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz GenuineIntel GNU/Linux

The cpus are different, but it's only a clock speed difference.  I don't think that would affect this, but who knows?
Comment 34 Francisco Pina Martins 2009-04-15 21:03:06 UTC
It is indeed strange.
But it might in fact be the difference.
Maybe someone else with a Compal IFL90 could give some feedback too.
In the meantime I tried out the "stress" program in Arch you mentioned and I still have no CPU usage issues.
Using the latest kernel now (2.6.29-ARCH) and still no issues.
Btw, I'm using 'top' to check on the CPU usage.
Comment 35 Jori Hardman 2009-06-14 20:21:47 UTC
Now that 2.6.30 has been released, could this be patched for the next kernel release?
Comment 36 Zhang Rui 2009-06-15 02:06:12 UTC
Alexey, what's the status of this bug and the MSI patches?
Comment 37 Len Brown 2009-08-13 03:03:38 UTC
no activity in this bug report for 2 months.
please re-open if this is still an issue in the latest stable kernel.

Note You need to log in before you can comment on or make changes to this bug.