Kernel Bug Tracker – Bug 20232
kworker consumes ~100% CPU on HP Elitebook 8540w running 2.6.36_rc6-git4
Last modified: 2011-03-28 22:59:11 UTC
I'm having a serious CPU hogging problem with an HP Elitebook 8540w running 2.6.36_rc6. A kworker consumes ~100% CPU during all the uptime since booting.
I first tried rmmod'ing all the modules but it didn't help. Then I built the
kernel with some ACPI/PM verbose flags turned on and found out that an infinite
number of the following message was written in kernel log:
scsi host1: __pm_runtime_resume() returns 1!
I then passed "on\n" to all power/control files in sysfs but the messages didn't
disappear. I disabled PM_RUNTIME to see if it was causing the kworker stuff, but
nope, it still continues.
Then I found a similar report and tried writing "disable" to /sys/firmware/acpi/interrupts/gpe01 and it stopped the kworker CPU consumation problem *although the load average doesn't drop under ~1.3*. When enabled the number of interrupts in /sys/firmware/acpi/interrupts/gpe01 increases very fast.
The problem is not fixed with the pcie_pme=off trick suggested in the other bug report related to this laptop.
(CC'ing İbrahim, the owner of the laptop)
BTW, 22.214.171.124 doesn't have this symptom. We didn't have much time to bisect the issue but if we can't find out the cause without bisecting, I'll try to bisect too.
(In reply to comment #0)
> I'm having a serious CPU hogging problem with an HP Elitebook 8540w running
> 2.6.36_rc6. A kworker consumes ~100% CPU during all the uptime since booting.
what's the latest good kernel?
what's the earlies bad kernel?
> Then I found a similar report and tried writing "disable" to
> /sys/firmware/acpi/interrupts/gpe01 and it stopped the kworker CPU consumation
> problem *although the load average doesn't drop under ~1.3*. When enabled the
> number of interrupts in /sys/firmware/acpi/interrupts/gpe01 increases very
> The problem is not fixed with the pcie_pme=off trick suggested in the other bug
> report related to this laptop.
hmm, seems an ACPI interrupt storm.
please attach the acpidump of this laptop.
please attach the dmesg output after boot for both the good and the bad kernel.
Created attachment 33442 [details]
Created attachment 33452 [details]
dmesg of bad kernel
I'll try to post the earliest bad and the latest good kernel as soon as possible.
okay 126.96.36.199 is good, 2.6.36_rc1 is bad. I'm attaching the good one's dmesg. Seems that it's broken by a commit during the merge window.
Created attachment 33472 [details]
dmesg of good kernel 188.8.131.52
Can you test 2.6.36-rc7 with pcie_ports=compat, please?
// Add me to CC
OK will try that and post the result here.
I didn't have time to rebuild rc8 but I tried that parameter on rc6 and it fixed the kworker issue but even on a completely idle system load average doesn't drop under 1.09~. I'll post some details later.
Created attachment 33912 [details]
dmesg of 2.6.36_rc8-git4 with pcie_ports=compat
Ok the pcie_ports=compat still works under 2.6.36_rc8-git4 but there's an "unexpected driver unregister!" backtrace in the dmesg that I've recently attached. And also the load average is still above > 1, I don't know if it's related or not.
so with pcie_ports=compat, the interrupt storm goes away,
as indicated by the acpi interrupt in /proc/interrupts and
the gpe is /sys/firmware/acpi/interrupts no longer incrementing
quickly, but you still have something running on the cpu 100% of the time?
what does top(1) show?
There is a skew in the loadaverage due to a commit introduced in 2.6.36:
Author: Peter Zijlstra <email@example.com> 2010-04-22 21:50:19
Committer: Ingo Molnar <firstname.lastname@example.org> 2010-04-23 11:02:02
sched: Cure load average vs NO_HZ woes
Check https://bugzilla.kernel.org/show_bug.cgi?id=16525 and the
referenced threads for more details on that.
It is not clear to me, if that skew is accompanied by any harmful symptoms though...
p.s.: for convenience, I post the backtrace mentioned in comment #12:
[ 1.120742] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[ 1.120744] ------------[ cut here ]------------
[ 1.120749] WARNING: at drivers/base/driver.c:262 driver_unregister+0x36/0x6f()
[ 1.120751] Hardware name: HP EliteBook 8540w
[ 1.120752] Unexpected driver unregister!
[ 1.120753] Modules linked in:
[ 1.120756] Pid: 1, comm: swapper Not tainted 2.6.36_rc8-143 #1
[ 1.120757] Call Trace:
[ 1.120762] [<ffffffff81045ec4>] warn_slowpath_common+0x80/0x98
[ 1.120765] [<ffffffff81045f70>] warn_slowpath_fmt+0x41/0x43
[ 1.120769] [<ffffffff81b7666d>] ? pci_hotplug_init+0x0/0x4e
[ 1.120772] [<ffffffff812c0604>] driver_unregister+0x36/0x6f
[ 1.120775] [<ffffffff8123939c>] pcie_port_service_unregister+0xd/0xf
[ 1.120777] [<ffffffff81b76914>] pciehp_acpi_slot_detection_init+0x96/0x132
[ 1.120780] [<ffffffff81b766c9>] ? pcied_init+0x0/0x79
[ 1.120782] [<ffffffff81b766d7>] pcied_init+0xe/0x79
[ 1.120786] [<ffffffff81000348>] do_one_initcall+0x7a/0x132
[ 1.120789] [<ffffffff81b4fd54>] kernel_init+0x17d/0x20b
[ 1.120792] [<ffffffff81003924>] kernel_thread_helper+0x4/0x10
[ 1.120794] [<ffffffff81b4fbd7>] ? kernel_init+0x0/0x20b
[ 1.120796] [<ffffffff81003920>] ? kernel_thread_helper+0x0/0x10
[ 1.120803] ---[ end trace 6d450e935ee1897c ]---
well yes the storm and the kworker which hogs the cups goes away with pcie_ports=compat. But even on a basic console login which stays idle for hours the load average stays always above 1. The output of top doesn't show any surprising, no task which uses the cpu excessively. But i dont know why the load average doesn't converge to 0.
I'm experiencing the same problem in my laptop. Two or four kworker processes constantly at the top of top, consuming between 1 and ~20% cpu, load average of ~1. The trackpad is unusably jerky in X. pcie_ports=compad didn't change anything.
Back to 184.108.40.206 for now (with 2.6.35.x I had a similar problem: several kslowd00x processes hogging my cpu and making my trackpad jerky. At least they don't appear with 2.6.36...).
(In reply to comment #16)
> I'm experiencing the same problem in my laptop. Two or four kworker processes
> constantly at the top of top, consuming between 1 and ~20% cpu, load average of
> ~1. The trackpad is unusably jerky in X. pcie_ports=compad didn't change
That's pcie_ports=compat, not pcie_ports=compad. If the latter is what you have
tested, please retest and report back.
If pcie_ports=compat doesn't help on your machine, the problem you're seeing
is certainly different. In that case, please file a separate bug report
for that issue.
so what's the status of this bug? :)
Still continues with 2.6.36. pcie_ports still fixes the issue still with the backtrace.
The load average is unrelated to this bug. Check the patch in bug #16525 for that.
Len, that means: (Ozan, please correct me if I'm wrong):
(In reply to comment #13)
> so with pcie_ports=compat, the interrupt storm goes away,
> as indicated by the acpi interrupt in /proc/interrupts and
> the gpe is /sys/firmware/acpi/interrupts no longer incrementing
> quickly, but you still have something running on the cpu 100% of the time?
> what does top(1) show?
Answer: With pcie_ports=compat the kworker @100%cpu goes away, and everything is fine.
Not everything, the backtrace is still there, that needs to be fixed.
I'll take care of this shortly.
Created attachment 39752 [details]
PCI / Hotplug: Fix unexpected driver unregister in pciehp_acpi.c
This patch should fix the warning in comment #14, please verify.
Okay, I'll try the patch ASAP but will the users of this laptop pass pcie_ports=compat explicitly to fix the issue? If yes, this is bad. If there's some sort of DMI quirk list that will be patched, that's reasonable.
We're hoping to have a better fix than a DMI quirk, but not in 2.6.37,
so please use the command line workaround for now.
Ozan, can you please send the output of "ls /sys/bus/pci/drivers" and
"ls /sys/bys/pci_express/drivers" ?
Sorry, not this information. The output of "ls /sys/bus/pci/slots/".
Also please rmmod the pciehp module and modprobe acpiphp module instead.
Please check if the problem is reproducible with that in place.
Created attachment 41832 [details]
PCI / ACPI: Request _OSC control once for each root bridge
The attached patch may help, so please test it.
If it doesn't help, please send the output of "dmesg | grep _OSC" generated
right after a fresh boot.
/sys/bus/pci/slots is empty.
As pciehp is built into the kernel image, I could not find any way to avoid it from loading, so I'll need time to recompile and try what you've suggested in #28 and #29.
The patch from comment #29 is on top of the current mainline (2.6.37-rc8 at the
Ok I'll try with that kernel.
Created attachment 41872 [details]
Good dmesg 2.6.37_rc7_git4
Well I tried with 2.6.37_rc7-git4 + the patch in #23, and did a normal reboot e.g. without pcie_pme=compat and the issue seems to get fixed, new dmesg is attached.
Let's keep the bug report open until 2.6.37 gets released and I'll close this as fixed if 2.6.37 works OK.
Sorry for being late to switch to 2.6.37_rc*.
Do you mean that 2.6.37-rc7-git4 with the patch from comment #23 works for
you without pcie_pme=compat and without the patch from comment #29 ?
If so, 2.6.37-rc8 should work for you too (it contains the patch from
comment #23). Please confirm.
What I've tried as 2.6.37_rc7-git4 + patch in comment#23 was not vanilla at all. It's carrying a patch from upstream that seems related to the issue so maybe it was this commit which fixed the issue:
Author: Matthew Garrett <email@example.com>
Date: Thu Dec 9 18:31:59 2010 -0500
PCI: _OSC "supported" field should contain supported features, not enabled
From testing with Windows, the call to the PCI root _OSC method includes
the full set of features supported by the operating system even if the
hardware has already indicated that it doesn't support ASPM or MSI.
https://bugzilla.redhat.com/show_bug.cgi?id=638912 is a case where making
the _OSC call will incorrectly configure the chipset unless the supported
field has bits 1, 2 and 4 set. Rework the functionality to ensure that
we match this behaviour.
Anyway, I'll try with a vanilla 2.6.37_rc8 with and without the patch in comment29 to see the outcome.
First, what do you mean saying "upstream"?
Second, if the "PCI: _OSC "supported" field should contain supported features,
not enabled ones" patch helps, the patch from comment #29 rather won't help.
I'll attach a patch on top of the one from comment #29 that may help.
Created attachment 41892 [details]
PCI / ACPI: Pass all _OSC support bits to the BIOS simultaneously
Patch to test on top of the patch from comment #29.
Okay, I'll try to resume what's going on as I think I've caused a little bit of confusion:
- 2.6.36.x is still showing the issue on those laptops
- The problem goes away on 2.6.36.x with pci_ports=compat but this gives a backtrace while unregistering a driver (patch to fix is available in comment #23)
- A complete solution is offered within the patch in comment #29
Then I tried 2.6.37_rc7-git4 with the patch in comment #23 to see at least if the backtrace is fixed when booting with pcie_ports=compat.
A plain reboot (with no pcie_ports=compat) cured the kworker issue.
Either the switch to 2.6.37_rc* cured the issue or the patch that I've taken from fedora f-15 entitled "PCI: _OSC "supported" field should contain supported features, not enabled ones". That was the patch I misleadingly told as "from upstream", sorry.
Then last night, I switched to 2.6.37_rc8 which already contains your patch in comment #23. I also put the patch in comment #29 on top of it and dropped the "PCI: _OSC .." patch from Matthew Garrett.
But unfortunately a lot of machines broke while booting this kernel. I'll send the photo just after this comment.
Created attachment 41992 [details]
2.6.37_rc8+patch#29 ACPI trace
The crash seems to be caused by the patch from comment #29, which apparently
tries to parse the HEST table too early.
However, you appear to say that the patch from comment #29 on top of 2.6.36.y
works correctly. Is that also the case on machines that crash with
2.6.37-rc8 + the patch from comment #29 (I mean, if those machines are
booted with 2.6.36.y + patch from comment #29, do they boot correctly or
Can you please attach a dmesg output from vanilla 2.6.37-rc8 on one of the
machines that crash with the patch from comment #29 on top of that kernel?
No I didn't say that 2.6.36.y + patch from comment #29 booted correctly as you've said that the patch was against the top of the current mainline, so I even didn't try to patch 2.6.36.y.
I'll send you the dmesg from vanilla 2.6.37_rc8-git1 tomorrow. Sorry I'm not an owner of this laptop so things are going slowly..
Created attachment 42132 [details]
PCI / ACPI: Request _OSC control once for each root bridge (v3)
In that case it's better if you test the attached patch when you have access
to the machine in question.
It is a replacement for the patch in comment #29 that should fix the problem
with HEST parsing attempted too early.
Created attachment 42312 [details]
dmesg of vanilla 2.6.37_rc8-git4
Okay vanilla 2.6.37_rc8-git4 still problematic. I've attached the dmesg of it.
Applying your v3 patch on top of it *seems* to fix the issue. I'm occasionally seeing a kworker in top with ~%20-50 CPU usage but at least it does not hog the CPU eternally.
/sys/firmware/acpi/interrupts/gpe_all is constant 127 since booting and does not increment insanely with time.
Here's a diff between the vanilla and the patched dmesg's:
--- dmesg.vanilla 2011-01-04 10:13:47.694000464 +0200
+++ dmesg.patched 2011-01-04 10:13:58.577000494 +0200
@@ -247,6 +247,7 @@
ACPI: Power Resource [APPR] (off)
ACPI: Power Resource [LPP] (on)
ACPI: No dock devices found.
+HEST: Table not found.
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
pci_root PNP0A08:00: host bridge window [io 0x0000-0x0cf7]
@@ -405,7 +406,6 @@
ACPI: PCI Interrupt Link [LNKF] (IRQs 1 3 4 5 6 7 11 12 14 15) *10
ACPI: PCI Interrupt Link [LNKG] (IRQs 1 3 4 5 6 7 10 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 1 3 4 5 6 7 11 12 14 15) *0, disabled.
-HEST: Table is not found!
vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
SCSI subsystem initialized
@@ -635,39 +635,12 @@
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
-pcieport 0000:00:01.0: ACPI _OSC control granted for 0x1c
pcieport 0000:00:01.0: setting latency timer to 64
pcieport 0000:00:01.0: irq 40 for MSI/MSI-X
-pcieport 0000:00:1c.0: ACPI _OSC control granted for 0x1c
-pcieport 0000:00:1c.0: setting latency timer to 64
-pcieport 0000:00:1c.0: irq 41 for MSI/MSI-X
-pcieport 0000:00:1c.1: ACPI _OSC control granted for 0x1c
-pcieport 0000:00:1c.1: setting latency timer to 64
-pcieport 0000:00:1c.1: irq 42 for MSI/MSI-X
-pcieport 0000:00:1c.3: ACPI _OSC control granted for 0x1c
-pcieport 0000:00:1c.3: setting latency timer to 64
-pcieport 0000:00:1c.3: irq 43 for MSI/MSI-X
-pcieport 0000:00:1c.7: ACPI _OSC control granted for 0x1c
-pcieport 0000:00:1c.7: setting latency timer to 64
-pcieport 0000:00:1c.7: irq 44 for MSI/MSI-X
-pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
-pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
-pci 0000:01:00.1: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:01.0:pcie01: service driver pcie_pme loaded
-pcieport 0000:00:1c.0: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:1c.0:pcie01: service driver pcie_pme loaded
-pcieport 0000:00:1c.1: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:1c.1:pcie01: service driver pcie_pme loaded
-pcieport 0000:00:1c.3: Signaling PME through PCIe PME interrupt
-pci 0000:44:00.0: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:1c.3:pcie01: service driver pcie_pme loaded
-pcieport 0000:00:1c.7: Signaling PME through PCIe PME interrupt
-pci 0000:45:00.0: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:1c.7:pcie01: service driver pcie_pme loaded
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
pci-stub: invalid id string ""
@@ -676,35 +649,34 @@
ACPI: Power Button [PWRF]
ACPI: acpi_idle registered with cpuidle
Monitor-Mwait will be used to enter C-1 state
-Monitor-Mwait will be used to enter C-2 state
Monitor-Mwait will be used to enter C-3 state
thermal LNXTHERM:00: registered as thermal_zone0
OK, thanks for testing!
Apparently, with the patch from comment #44 _OSC is not executed on your
system, so it doesn't use native PCI Express services and that's why the
GPE storm doesn't appear any more (so the patch definitely helps).
Which appears to be fine, because your system doesn't support ASPM, as
indicated by the ACPI tables.
Handled-By : Rafael J. Wysocki <firstname.lastname@example.org>
Patch : https://patchwork.kernel.org/patch/449231/
Thanks. BTW, if it is not too invasive for 2.6.36, it will be good to CC email@example.com.
Rafael can you check the following screenshot? The user tells that he gets this trace with 2.6.37 + your v3 patch. I'm not quite sure that he's booting the right kernel but the trace seems to be a little different than the one caused by your v2 patch?
This is an entirely different bug.
It's the aer_service_init() code path that should be executed way after
acpi_pci_root_init() that calls acpi_hest_init() in the patch from
Apart from this, it looks like the user actually _has_ HEST.
Can you open a new bug entry for this one, please, and put the slide in there
along with (non-failing) boot log and the output of acpidump from the
merged in .38-rc1:
Author: Rafael J. Wysocki <firstname.lastname@example.org>
Date: Fri Jan 7 00:55:09 2011 +0100
PCI/ACPI: Request _OSC control once for each root bridge (v3)
Even though the patch fixes the initial problem, it reoccurs after a suspend to RAM / resume cycle.
On top of what kernel?
Linux ortwin-hp 2.6.37 #14 SMP PREEMPT Mon Feb 7 18:48:47 CET 2011 x86_64 Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz GenuineIntel GNU/Linux
I think you're seeing a different problem. Please file a separate bug for
it an put my address into the CC list.
Confirming unfortunately that the issue reappears after a suspend/resume cycle.
perf top after resume:
PerfTop: 1138 irqs/sec kernel:90.7% exact: 0.0% [1000Hz cycles], (all, 4 CPUs)
samples pcnt function DSO
_______ _____ ___________________________________ _________________________________
3873.00 33.6% __acpi_acquire_global_lock /lib/modules/2.6.37/build/vmlinux
1043.00 9.1% acpi_os_read_port /lib/modules/2.6.37/build/vmlinux
879.00 7.6% acpi_ns_search_one_scope /lib/modules/2.6.37/build/vmlinux
577.00 5.0% acpi_ns_lookup /lib/modules/2.6.37/build/vmlinux
474.00 4.1% acpi_ps_peek_opcode /lib/modules/2.6.37/build/vmlinux
367.00 3.2% acpi_ex_name_segment /lib/modules/2.6.37/build/vmlinux
324.00 2.8% __acpi_release_global_lock /lib/modules/2.6.37/build/vmlinux
303.00 2.6% acpi_ps_get_next_namestring /lib/modules/2.6.37/build/vmlinux
269.00 2.3% acpi_ex_system_memory_space_handler /lib/modules/2.6.37/build/vmlinux
216.00 1.9% pci_conf1_read /lib/modules/2.6.37/build/vmlinux
199.00 1.7% kmem_cache_free /lib/modules/2.6.37/build/vmlinux
188.00 1.6% __memset /lib/modules/2.6.37/build/vmlinux
181.00 1.6% acpi_ps_parse_loop /lib/modules/2.6.37/build/vmlinux
179.00 1.6% kmem_cache_alloc /lib/modules/2.6.37/build/vmlinux
139.00 1.2% acpi_os_write_port /lib/modules/2.6.37/build/vmlinux
120.00 1.0% acpi_ps_get_next_package_end /lib/modules/2.6.37/build/vmlinux
100.00 0.9% acpi_ex_get_name_string /lib/modules/2.6.37/build/vmlinux
90.00 0.8% add_preempt_count /lib/modules/2.6.37/build/vmlinux
83.00 0.7% acpi_ut_create_generic_state /lib/modules/2.6.37/build/vmlinux
78.00 0.7% _raw_spin_lock_irqsave /lib/modules/2.6.37/build/vmlinux
75.00 0.7% acpi_ps_get_opcode_info /lib/modules/2.6.37/build/vmlinux
54.00 0.5% acpi_ps_get_next_simple_arg /lib/modules/2.6.37/build/vmlinux
54.00 0.5% _raw_spin_unlock_irqrestore /lib/modules/2.6.37/build/vmlinux
53.00 0.5% acpi_ds_exec_end_op /lib/modules/2.6.37/build/vmlinux
50.00 0.4% kfree /lib/modules/2.6.37/build/vmlinux
50.00 0.4% acpi_ps_append_arg /lib/modules/2.6.37/build/vmlinux
46.00 0.4% acpi_ut_update_object_reference /lib/modules/2.6.37/build/vmlinux
44.00 0.4% sub_preempt_count /lib/modules/2.6.37/build/vmlinux
38.00 0.3% acpi_ex_extract_from_field /lib/modules/2.6.37/build/vmlinux
38.00 0.3% acpi_ds_exec_begin_op /lib/modules/2.6.37/build/vmlinux
Please open a new bug.
Or please let me know its number in case you've done it already.
I have now opened #29722
A patch referencing this bug report has been merged in v2.6.38-8569-g16c29da:
Author: Rafael J. Wysocki <email@example.com>
Date: Sat Mar 5 13:21:51 2011 +0100
PCI/ACPI: Report ASPM support to BIOS if not disabled from command line