Bug 15924 - kacpid consumes ~100% CPU, system freezes randomly
kacpid consumes ~100% CPU, system freezes randomly
Status: CLOSED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: PCI
All Linux
: P1 high
Assigned To: drivers_pci@kernel-bugs.osdl.org
:
Depends on:
Blocks: 15310 56331
  Show dependency treegraph
 
Reported: 2010-05-06 21:12 UTC by Jaroslav Kameník
Modified: 2013-04-09 06:23 UTC (History)
12 users (show)

See Also:
Kernel Version: 2.6.34 RC6
Tree: Mainline
Regression: Yes


Attachments
acpidump (583.52 KB, text/plain)
2010-05-06 21:12 UTC, Jaroslav Kameník
Details
acpi interupts on 2.6.34-rc5 (3.78 KB, text/plain)
2010-05-06 21:14 UTC, Jaroslav Kameník
Details
acpidump with 2.6.34-rc6 (583.52 KB, application/octet-stream)
2010-05-07 08:00 UTC, Jaroslav Kameník
Details
acpi/wakeup with 2.6.33 (511 bytes, application/octet-stream)
2010-05-07 08:01 UTC, Jaroslav Kameník
Details
acpi/wakeup with 2.3.34.rc6 (511 bytes, application/octet-stream)
2010-05-07 08:02 UTC, Jaroslav Kameník
Details
lspci -vvxxx (63.38 KB, application/octet-stream)
2010-05-10 07:23 UTC, Jaroslav Kameník
Details
acpi interrupts in 2.6.33.3 kernel (3.78 KB, text/plain)
2010-05-10 08:34 UTC, Jaroslav Kameník
Details
dmesg for Comment #12 on RC6 (107.27 KB, text/plain)
2010-05-12 10:59 UTC, Jaroslav Kameník
Details
patch-1 (692 bytes, patch)
2010-05-13 02:05 UTC, Zhang Rui
Details | Diff
patch-2 (822 bytes, patch)
2010-05-13 02:06 UTC, Zhang Rui
Details | Diff
Make sure GPEs stay disabled (2.40 KB, patch)
2010-05-20 14:35 UTC, Matthew Garrett
Details | Diff
ACPI / ACPICA: Fix problems related to GPE reference counting (22.09 KB, patch)
2010-05-24 20:55 UTC, Rafael J. Wysocki
Details | Diff
Debug patch (1.53 KB, patch)
2010-06-09 22:53 UTC, Rafael J. Wysocki
Details | Diff
dmesg for comment #47 (80.30 KB, text/plain)
2010-06-10 18:30 UTC, Jaroslav Kameník
Details
Debug patch #2 (582 bytes, patch)
2010-06-10 22:37 UTC, Rafael J. Wysocki
Details | Diff
dsdt - f.08 (111.72 KB, application/octet-stream)
2010-06-11 08:42 UTC, Jaroslav Kameník
Details
dmesg for comment #51 (171.54 KB, application/octet-stream)
2010-06-11 08:44 UTC, Jaroslav Kameník
Details
Debug patch #3 (1.80 KB, patch)
2010-06-12 12:19 UTC, Rafael J. Wysocki
Details | Diff
Debug patch #4 (782 bytes, patch)
2010-06-12 19:59 UTC, Rafael J. Wysocki
Details | Diff
dmesg for Comment #55 (244.29 KB, text/plain)
2010-06-14 19:14 UTC, Jaroslav Kameník
Details
PCI / PM: Do not use native PCIe PME by default (3.16 KB, patch)
2010-06-17 20:13 UTC, Rafael J. Wysocki
Details | Diff

Description Jaroslav Kameník 2010-05-06 21:12:21 UTC
Created attachment 26268 [details]
acpidump

hardware: laptop HP 8540w (i7 620M, 8GB ram, BIOS F.06)

I found this problem on OpenSuse 11.3 M5 and M6 with kernels 2.6.34 RC3 and 2.6.34 RC5 but it persists even with original 2.6.34 RC6 from kernel.org. (We discussed it on OSS bugzilla - https://bugzilla.novell.com/show_bug.cgi?id=601270)

On OSS 11.3 M3 with older kernel 2.6.33.3 problem didn't occure.

I found that after dissabling USB port in BIOS kacpid stops to consume CPU. Laptop has USB3 ports, so maybe problem is there...
Comment 1 Jaroslav Kameník 2010-05-06 21:14:18 UTC
Created attachment 26269 [details]
acpi interupts on 2.6.34-rc5
Comment 2 Rafael J. Wysocki 2010-05-06 21:56:16 UTC
Please attach the output of acpidump and the contents of /proc/acpi/wakeup .
Comment 3 Jaroslav Kameník 2010-05-07 08:00:32 UTC
Created attachment 26271 [details]
acpidump with 2.6.34-rc6
Comment 4 Jaroslav Kameník 2010-05-07 08:01:43 UTC
Created attachment 26272 [details]
acpi/wakeup with 2.6.33
Comment 5 Jaroslav Kameník 2010-05-07 08:02:16 UTC
Created attachment 26273 [details]
acpi/wakeup with 2.3.34.rc6
Comment 6 Zhang Rui 2010-05-10 06:18:34 UTC
please attach the lspci -vvxxx output when the interrupt storms happens.
Comment 7 Len Brown 2010-05-10 06:59:39 UTC
> /sys/firmware/acpi/interrupts/gpe01:  246416	enabled

GPE01 appears to be the problem

Does the kacpid thrashing stop after this?

# echo disable > /sys/firmware/acpi/interrupts/gpe01
# cat /sys/firmware/acpi/interrupts/gpe01

> Device  S-state   Status   Sysfs node
> LANC      S0    *disabled  pci:0000:00:19.0
> HDEF      S3    *disabled  pci:0000:00:1b.0
> EHC1      S0    *disabled  pci:0000:00:1d.0
> EHC2      S0    *disabled  pci:0000:00:1a.0
> ...

All of the devices in /proc/acpi/wakeup have the
run_wake flag set in 2.6.34, while none had this set in 2.6.33

It should be possible to bisect which commit caused this regression,
particularly if the bisect is limited to the PCI and ACPI code.

GPE01's handler seems to be involved in PCI hotplug notifies:

        Method (_L01, 0, NotSerialized)
        {
            Add (L01C, 0x01, L01C)
            \_SB.PCI0.RP01.HPLG ()
            \_SB.PCI0.RP02.HPLG ()
            \_SB.PCI0.RP03.HPLG ()
            \_SB.PCI0.RP04.HPLG ()
            \_SB.PCI0.RP05.HPLG ()
            \_SB.PCI0.RP06.HPLG ()
        }

It isn't immediately clear from the AML how the handler
clears the event that caused the interrupt.

BTW. there seem to be checksum errors in the acpi tables,
please attach the full output from acpidump
Comment 8 Jaroslav Kameník 2010-05-10 07:23:45 UTC
Created attachment 26308 [details]
lspci -vvxxx
Comment 9 Jaroslav Kameník 2010-05-10 07:31:51 UTC
(In reply to comment #7)

# echo disable > /sys/firmware/acpi/interrupts/gpe01
# cat /sys/firmware/acpi/interrupts/gpe01

writes something as "  204208        enabled" (number is growing) and kacpid continues in eating cpu.


"acpidump with 2.6.34-rc6" that I attached is all acpidump command gave me (std+err output in one file) and I don't see nothing like verbose/.. flag there. What exactly you mean with full output?
Comment 10 Zhang Rui 2010-05-10 08:12:37 UTC
please attach the acpi interrupts in 2.6.33.3 kernel.
Comment 11 Jaroslav Kameník 2010-05-10 08:34:38 UTC
Created attachment 26311 [details]
acpi interrupts in 2.6.33.3 kernel
Comment 12 Zhang Rui 2010-05-12 07:34:44 UTC
(In reply to comment #9)
> (In reply to comment #7)
> 
> # echo disable > /sys/firmware/acpi/interrupts/gpe01
> # cat /sys/firmware/acpi/interrupts/gpe01
> 
> writes something as "  204208        enabled" (number is growing) and kacpid
> continues in eating cpu.
> 
this is weird. we should disable the gpe by running this command.
please run
"echo 0x44 > /sys/module/acpi/parameters/debug_layer &&
 echo 0x88000104 > /sys/module/acpi/parameters/debug_level &&
 dmesg -c &&
 echo disable > /sys/firmware/acpi/interrupts/gpe01 &&
 dmesg > /tmp/dmesg"
and attach the /tmp/dmesg file here.
Comment 13 Jaroslav Kameník 2010-05-12 10:59:19 UTC
Created attachment 26354 [details]
dmesg for Comment #12 on RC6
Comment 14 Zhang Rui 2010-05-13 02:05:39 UTC
Created attachment 26361 [details]
patch-1

please try this patch and do the same test as the one in comment #9.
Comment 15 Zhang Rui 2010-05-13 02:06:46 UTC
Created attachment 26362 [details]
patch-2

this patch-1 doesn't help, please convert that patch, apply this one and do the test in comment #9.
Comment 16 Jaroslav Kameník 2010-05-13 11:45:57 UTC
I applied second patch and it seems to be working;). Test #9 wrote disabled and kacpid stopped eating cpu.
Comment 17 Rafael J. Wysocki 2010-05-13 20:26:16 UTC
Just to clarify, do you mean the patch from comment #15?
Comment 18 Jaroslav Kameník 2010-05-13 21:49:22 UTC
Yes, the patch from #15 which changes the system.c file. Kacpid still eats cpu at startup but I can disable that bad gpe01 with commands from #9 now and stop it.

I understood #15 that I shouldn't try the patch 1 from #14, is it ok?
Comment 19 Rafael J. Wysocki 2010-05-13 21:50:58 UTC
Please the patch from comment #14 too, we'd like to know if it works anyway.
Comment 20 Jaroslav Kameník 2010-05-14 12:52:09 UTC
With both patches applied test from #9 writes " 0 disabled" but kacpid does not stop and there is still lots of interupts in 'grep . /sys/firmware/acpi/interrupts/*' (on the row gpe01 is disabled but number still growing).
Comment 21 Zhang Rui 2010-05-18 01:51:22 UTC
(In reply to comment #20)
> With both patches applied test from #9 writes " 0 disabled" but kacpid does not
> stop and there is still lots of interupts in 'grep .
> /sys/firmware/acpi/interrupts/*' (on the row gpe01 is disabled but number still
> growing).

do you mean the GPE storm can be disabled if you apply patch-2 only, but it can not be disabled if you apply both patch-1 and patch-2?
Comment 22 Antoni Grzymala 2010-05-18 12:07:17 UTC
Justo to confirm I have experienced the same regression on a similar machine:

product: HP EliteBook 8540p
version: 68CVD Ver. F.06 (02/25/2010)
i7 620M, 4GB RAM.

Previous BIOS release (one I got the machine shipped with) exhibited same behaviour.
Comment 23 Jaroslav Kameník 2010-05-18 15:20:10 UTC
(In reply to comment #21)

Yes. With patch-2 I was able to disable it with 'echo disable > /sys/firmware/acpi/interrupts/gpe01' and storm stopped, 'echo enable ...' started it again.

 With both patches it looks like it is disabled since boot but behaves as it is not. 

After boot 'cat /sys/firmware/acpi/interrupts/gpe01' writes ' somenumber disabled', number is still growing quickly.

After 'echo disable > /sys/firmware/acpi/interrupts/gpe01', it still writes disabled, number drops to one and grows quickly again.

short listing (.disable are commands from #9) with few secs between commands - 

cat /sys/firmware/acpi/interrupts/gpe01
   74437        disabled
cat /sys/firmware/acpi/interrupts/gpe01
   76697        disabled
cat /sys/firmware/acpi/interrupts/gpe01
   84399        disabled
./disable
       1        disabled
cat /sys/firmware/acpi/interrupts/gpe01
    2684        disabled
cat /sys/firmware/acpi/interrupts/gpe01
    6059        disabled
cat /sys/firmware/acpi/interrupts/gpe01
   34875        disabled
./disable
       1        disabled
cat /sys/firmware/acpi/interrupts/gpe01
    1603        disabled
cat /sys/firmware/acpi/interrupts/gpe01
    2673        disabled
Comment 24 Jaroslav Kameník 2010-05-18 19:24:16 UTC
I have recompiled it once again to be sure with patch-2 only and it works as I wrote in #23. The similar sequence of commands after boot gave

cat /sys/firmware/acpi/interrupts/gpe01
   85898        enabled
cat /sys/firmware/acpi/interrupts/gpe01
   87292        enabled
echo disable > /sys/firmware/acpi/interrupts/gpe01
cat /sys/firmware/acpi/interrupts/gpe01
  104925        disabled
cat /sys/firmware/acpi/interrupts/gpe01
  104925        disabled
echo enable > /sys/firmware/acpi/interrupts/gpe01
cat /sys/firmware/acpi/interrupts/gpe01
  111831        enabled
cat /sys/firmware/acpi/interrupts/gpe01
  113668        enabled
disable > /sys/firmware/acpi/interrupts/gpe01
cat /sys/firmware/acpi/interrupts/gpe01
  117682        disabled
cat /sys/firmware/acpi/interrupts/gpe01
  117682        disabled
Comment 25 Zhang Rui 2010-05-19 08:27:09 UTC
acpi_ev_asynch_enable_gpe enables the GPE if gpe_register_info->enable_for_run is not cleared, so I guess we can not disable the GPE storm because it's enabled by the deferred GPE handler.

IMO, this sounds like a problem in the gpe reference counting patch.
Say, we don't update the runtime_count/wakeup_count when invoking acpi_set_gpe.
so in order to make acpi_set_gpe work properly, we can not use runtime_count/wakeup_count to update the gpe_register_info->enable_for_run/enable_for_wake flag, as we do in acpi_ev_update_gpe_enable_masks currently.

But I'm still wondering why patch-1 breaks patch-2...
Comment 26 Matthew Garrett 2010-05-19 21:33:57 UTC
Hm. It's correct that acpi_set_gpe() doesn't change the refcounting - but we probably need to set a force_enable/disable flag that's tested as well as the runtime count. I'll look at this tomorrow.
Comment 27 Rafael J. Wysocki 2010-05-19 21:36:11 UTC
I'm not sure what you mean.

Do you think acpi_set_gpe() shouldn't use acpi_ev_enable_gpe(), but instead it
should call acpi_hw_clear_gpe() and acpi_hw_write_gpe_enable_reg() directly
(in the "enable" case)?
Comment 28 Matthew Garrett 2010-05-19 21:39:34 UTC
I think they're orthogonal. acpi_set_gpe() in this case should mean "Turn off the gpe even if the reference counting says it should be enabled", and then any codepath that tests the runtime reference count should also test that flag. But that doesn't answer the question of why this GPE is enabled in the first place.
Comment 29 Rafael J. Wysocki 2010-05-19 21:52:16 UTC
acpi_set_gpe() turns off the GPE unconditionally in the "disable" case.  Only the
"enable" case takes the enable mask read from gpe_event_info->register_info into
account.
Comment 30 Matthew Garrett 2010-05-19 21:53:39 UTC
But we have acpi_ev_asynch_enable_gpe which may then enable it again...
Comment 31 Zhang Rui 2010-05-20 02:25:59 UTC
(In reply to comment #28)
> I think they're orthogonal. acpi_set_gpe() in this case should mean "Turn off
> the gpe even if the reference counting says it should be enabled", and then any
> codepath that tests the runtime reference count should also test that flag.

I agree.

> But
> that doesn't answer the question of why this GPE is enabled in the first place.

sorry I don't understand.
what do you mean by "this GPE is enabled IN THE FIRST PLACE"?
Comment 32 Matthew Garrett 2010-05-20 14:35:33 UTC
Created attachment 26458 [details]
Make sure GPEs stay disabled

Can you test this without any other patches? It should result in disable working correctly when you write it. We still need to work out why this GPE is enabled at all, though.
Comment 33 Jaroslav Kameník 2010-05-20 19:37:42 UTC
With last patch only after boot

-- kacpid takes ~100%

#cat /sys/firmware/acpi/interrupts/gpe01
   62586        enabled
# cat /sys/firmware/acpi/interrupts/gpe01
   64478        enabled
# echo disable > /sys/firmware/acpi/interrupts/gpe01

-- kacpid is ok

# cat /sys/firmware/acpi/interrupts/gpe01
   73971        enabled
# cat /sys/firmware/acpi/interrupts/gpe01
   73971        enabled
# echo enable > /sys/firmware/acpi/interrupts/gpe01

-- kacpid is still ok

# cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
# cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
# echo disable > /sys/firmware/acpi/interrupts/gpe01
# cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
# cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
Comment 34 Rafael J. Wysocki 2010-05-20 22:54:44 UTC
(In reply to comment #32)
> Created an attachment (id=26458) [details]
> Make sure GPEs stay disabled

Well, it looks like we need to do something like this.
Comment 35 Rafael J. Wysocki 2010-05-20 23:09:31 UTC
Although it seems we can avoid adding a new flag by making acpi_set_gpe() claer the GPE's bit in its register's enable_for_run mask.
Comment 36 Rafael J. Wysocki 2010-05-20 23:11:58 UTC
Ah, that's not going to work without this patch: https://patchwork.kernel.org/patch/101264/.
Comment 37 Jaroslav Kameník 2010-05-21 13:21:28 UTC
(In reply to comment #36)

Should I try it alone or with earlier patches? And still against RC6 or newer kernel?
Comment 38 Rafael J. Wysocki 2010-05-21 20:37:59 UTC
I'm going to prepare a patch on top of 2.6.34 (final) for you.
Comment 39 Rafael J. Wysocki 2010-05-24 18:24:00 UTC
For the record, the Matthew's patch in comment #32 has a problem that it will set ACPI_GPE_DISABLED for all wakeup GPEs during suspend, which isn't correct.
Comment 40 Rafael J. Wysocki 2010-05-24 20:55:22 UTC
Created attachment 26532 [details]
ACPI / ACPICA: Fix problems related to GPE reference counting

Please check if this patch (on top of 2.6.34) changes anything.
Comment 41 Jaroslav Kameník 2010-05-26 18:35:08 UTC
I got similar result as in #33

after boot

-- kacpid takes ~100%

#cat /sys/firmware/acpi/interrupts/gpe01
  126811        enabled
#cat /sys/firmware/acpi/interrupts/gpe01
  129801        enabled
#echo disable > /sys/firmware/acpi/interrupts/gpe01

-- kacpid is ok

#cat /sys/firmware/acpi/interrupts/gpe01
  155255        enabled
#cat /sys/firmware/acpi/interrupts/gpe01
  155255        enabled
#echo enable > /sys/firmware/acpi/interrupts/gpe01

-- kacpid is still ok

#cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
#cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
#echo disable > /sys/firmware/acpi/interrupts/gpe01
#cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
#cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
Comment 42 Steven Rostedt 2010-05-28 19:28:20 UTC
I have an old box (Compaq Preserio) that I bought back in 2003. I'm currently installing Fedora 12 on it. The default .31 kernel could not handle the PCI wifi card, .32 (and .33) would lock up in X, and .34 which seemed to fix the previous two, would have kacpi_notify running at 100%.

I ran ftrace on it (workqueue and sched events along with tracing the acpi_ev_*). The kacpi_notify never goes to sleep. It is scheduled out via a "cond_resched" in the running state. This means on a non preempt, non voluntary preempt kernel, this would lock up the box.

Here's what I did:

# echo 'sched_switch' 'sched_wakeup' sched_wakeup_new 'workqueue' irq > /debug/tracing/set_event 
# echo acpi_os_execute_deferred >> /debug/tracing/set_ftrace_filter 
# echo 'acpi_ev_*' >> /debug/tracing/set_ftrace_filter
# echo function > /debug/tracing/current_tracer
# echo 1 > /debug/tracing/options/stacktrace 

Which would produce the following:
    kacpi_notify-17    [000]   807.509455: acpi_ev_is_notify_object <-acpi_ex_opcode_2A_0T_0R
    kacpi_notify-17    [000]   807.509456: acpi_ev_queue_notify_request <-acpi_ex_opcode_2A_0T_0R
    kacpi_notify-17    [000]   807.509462: workqueue_insertion: thread=kacpi_notify:17 func=acpi_os_execute_deferred
    kacpi_notify-17    [000]   807.509462: <stack trace>
 => ftrace_raw_event_workqueue_insertion
 => insert_work
 => __queue_work
 => queue_work_on
 => queue_work
 => __acpi_os_execute
 => acpi_os_execute
 => acpi_ev_queue_notify_request
    kacpi_notify-17    [000]   807.509485: acpi_ev_address_space_dispatch <-acpi_ex_access_region


I'm currently bisecting to find where the issue is. On a bad kernel, kacpi_notify takes up 100% of the CPU, on a good kernel, it does not even show up.
Comment 43 Steven Rostedt 2010-05-28 19:33:04 UTC
I forgot to say, I added the patch in comment 40 and still get the issue. The above comment included that patch.
Comment 44 Rafael J. Wysocki 2010-05-28 23:06:06 UTC
I think the problem went in with this commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f0be67188c60ebf1b5d00354b44b4b24f5af313;hp=f517709d65beed95f52f021b43e3035b52ef791a
and the problem most likely is that one of the GPEs that wasn't enabled in the
previous kernels in now enabled.
Comment 46 Jaroslav Kameník 2010-06-09 19:51:02 UTC
Comment #45

I tried it with 2.6.34-git16, it fires lots of interrupts after boot but it is possible to disable it and enable again. I got this:

-- kacpid eats ~100%

# cat /sys/firmware/acpi/interrupts/gpe01
  123457        enabled
# cat /sys/firmware/acpi/interrupts/gpe01
  124981        enabled
# echo disable > /sys/firmware/acpi/interrupts/gpe01  

-- kacpid is ok

# cat /sys/firmware/acpi/interrupts/gpe01
  132234        disabled
# cat /sys/firmware/acpi/interrupts/gpe01
  132234        disabled
# echo enable > /sys/firmware/acpi/interrupts/gpe01

-- kacpid eats ~100%

# cat /sys/firmware/acpi/interrupts/gpe01
  143519        enabled
# cat /sys/firmware/acpi/interrupts/gpe01
  145585        enabled
# echo disable > /sys/firmware/acpi/interrupts/gpe01

-- kacpid is ok

# cat /sys/firmware/acpi/interrupts/gpe01
  149538        disabled
# cat /sys/firmware/acpi/interrupts/gpe01
  149538        disabled
Comment 47 Rafael J. Wysocki 2010-06-09 22:53:50 UTC
Created attachment 26705 [details]
Debug patch

Please apply this patch on top of the previous ones and attach dmesg output generated after a fresh boot.
Comment 48 Jaroslav Kameník 2010-06-10 18:30:16 UTC
Created attachment 26715 [details]
dmesg for comment #47
Comment 49 Rafael J. Wysocki 2010-06-10 21:08:59 UTC
It does seem that GPE01 is legitimately enabled in acpi_ev_initialize_gpe_block(),
so the question is what the old code did to disable it afterwards (or how did it know not to enable it).

Jaroslav, I cannot extract the DSDT from your output of acpidump, can you attach
the contents of /proc/acpi/dsdt from your system?
Comment 50 Rafael J. Wysocki 2010-06-10 22:35:02 UTC
Never mind, I managed to extract the DSDT after all.
Comment 51 Rafael J. Wysocki 2010-06-10 22:37:25 UTC
Created attachment 26721 [details]
Debug patch #2

Please apply this patch on top of the previous one and send the output of dmesg from a fresh boot.
Comment 52 Jaroslav Kameník 2010-06-11 08:42:54 UTC
Created attachment 26722 [details]
dsdt - f.08

HP released bios update (F.08), it did not help with kacpid but maybe there are changes..
Comment 53 Jaroslav Kameník 2010-06-11 08:44:49 UTC
Created attachment 26723 [details]
dmesg for comment #51
Comment 54 Rafael J. Wysocki 2010-06-11 15:30:32 UTC
Well, the last debug patch was too general, it seems, and we need some more precise information.  I'll do my best to prepare another debug patch for you in the next couple of days.

Apparently, GPE01 is legitimately enabled on your system, but we need to find out why it keeps firing.
Comment 55 Rafael J. Wysocki 2010-06-12 12:19:48 UTC
Created attachment 26741 [details]
Debug patch #3

Please apply this patch on top of the previous ones and attach the dmesg output (this can generate quite a lot of output).
Comment 56 Rafael J. Wysocki 2010-06-12 19:59:46 UTC
Created attachment 26746 [details]
Debug patch #4

Also please check if the problem is reproducible with the attached patch applied
(it shouldn't matter whether or not the previous patches are applied).
Comment 57 Jaroslav Kameník 2010-06-14 19:14:16 UTC
Created attachment 26770 [details]
dmesg for Comment #55
Comment 58 Rafael J. Wysocki 2010-06-14 20:06:44 UTC
This appears to be correct.

Please try the patch from comment #56, then.
Comment 59 Jaroslav Kameník 2010-06-14 20:14:20 UTC
(In reply to comment #56)

It seems to be OK with this patch applied;)

cat /sys/firmware/acpi/interrupts/gpe01 writes

       0        enabled

after boot, I can enable/disable it and it shows correct state and no interupt storm starts.
Comment 60 Rafael J. Wysocki 2010-06-14 23:40:38 UTC
OK, thanks for testing.  You can safely run with this patch applied.

The issue is that we call the BIOS to allow us to use the native PCIe PME
mechanism, which is coupled to the native PCIe hotplug.  Since the BIOS
allows us to use the registers of PCIe root ports directly, we do just that,
but then the BIOS gets confused and fires the hotplug notification.  So, in
fact we shouldn't try to use the native PCIe PME on your machine.

I'm not sure what the long-term solution of the problem will be chosen, but at
least we know what the problem _is_.
Comment 61 Rafael J. Wysocki 2010-06-15 19:59:52 UTC
Jaroslav, can you please test if the kernel works correctly on your machine,
without the patch from comment #56, if the pcie_pme=off kernel command line
switch is used?
Comment 62 Rafael J. Wysocki 2010-06-17 20:13:32 UTC
Created attachment 26830 [details]
PCI / PM: Do not use native PCIe PME by default

Please test the kernel with this patch applied without the previous patches
and report back.
Comment 63 Antoni Grzymala 2010-06-17 23:38:09 UTC
(In reply to comment #62)
> Created an attachment (id=26830) [details]
> PCI / PM: Do not use native PCIe PME by default
> 
> Please test the kernel with this patch applied without the previous patches
> and report back.

Works fine patched against the latest release candidate (2.6.35-rc3):

$ cat /sys/firmware/acpi/interrupts/gpe01
       0        enabled
Comment 64 Rafael J. Wysocki 2010-06-18 13:46:19 UTC
Thanks for the confirmation, I'll push the patch upstream shortly.
Comment 65 Zhang Rui 2010-06-21 07:00:15 UTC
Rafael, so this is introduced by the new pcie pme driver which is shipped in 2.6.34, right?
Comment 66 Rafael J. Wysocki 2010-06-21 18:09:20 UTC
Yes.  To be precise, by its initialization code.
Comment 68 Ozan Caglayan 2010-10-12 10:54:23 UTC
Hi,

I'm having a very similar problem with the same laptop running 2.6.36_rc6 but this time it's a kworker which consumes ~100% CPU.

I first tried rmmod'ing all the modules but it didn't help. Then I built the kernel with some ACPI/PM verbose flags turned on and found out an infinite number of the following message was written in kernel log:

scsi host1: __pm_runtime_resume() returns 1!

I then passed "on\n" to all power/control files in sysfs but the logging didn't appear. I disabled PM_RUNTIME to see if it was causing the kworker stuff, but nope, it still continues.

Then I found this report and tried the gpe01 thing. Disabling it stopped the kworker CPU consumation problem. When enabled the number of interrupts in /sys/firmware/acpi/interrupts/gpe01 increases very fast.

But the problem is not fixed with the pcie_pme=off switch so I think the main problem is different but the solution is the same.

Should I file a new bug report or can this be considered as a duplicate to this?

How can I debug to find out the real issue?

Thanks
Comment 69 Rafael J. Wysocki 2010-10-12 18:38:19 UTC
The original issue should be fixed in 2.6.36-rc, so I think that's
a different problem with the same symptom.  Please file a separate bug
report and add my address to the CC list in there.
Comment 70 Antoni Grzymala 2010-10-27 23:07:53 UTC
(In reply to comment #68)

> I'm having a very similar problem with the same laptop running 2.6.36_rc6 but
> this time it's a kworker which consumes ~100% CPU.

Same here, with 2.6.36 final. Will try to file a separate bug report ASAP.

Note You need to log in before you can comment on or make changes to this bug.