Bug 20232 - kworker consumes ~100% CPU on HP Elitebook 8540w running 2.6.36_rc6-git4
kworker consumes ~100% CPU on HP Elitebook 8540w running 2.6.36_rc6-git4
Status: CLOSED CODE_FIX
Product: ACPI
Classification: Unclassified
Component: Config-Interrupts
All Linux
: P1 normal
Assigned To: Rafael J. Wysocki
:
Depends on:
Blocks: 7216 16444
  Show dependency treegraph
 
Reported: 2010-10-13 06:13 UTC by Ozan Caglayan
Modified: 2011-03-28 22:59 UTC (History)
11 users (show)

See Also:
Kernel Version: 2.6.36_rc6-git4
Tree: Mainline
Regression: Yes


Attachments
acpidump (597.05 KB, text/plain)
2010-10-13 09:06 UTC, Ozan Caglayan
Details
dmesg of bad kernel (65.52 KB, text/plain)
2010-10-13 09:06 UTC, Ozan Caglayan
Details
dmesg of good kernel 2.6.35.7 (60.80 KB, text/plain)
2010-10-13 13:03 UTC, Ozan Caglayan
Details
dmesg of 2.6.36_rc8-git4 with pcie_ports=compat (67.51 KB, text/plain)
2010-10-18 07:00 UTC, Ozan Caglayan
Details
PCI / Hotplug: Fix unexpected driver unregister in pciehp_acpi.c (1.09 KB, patch)
2010-12-11 01:24 UTC, Rafael J. Wysocki
Details | Diff
PCI / ACPI: Request _OSC control once for each root bridge (6.52 KB, patch)
2010-12-29 00:53 UTC, Rafael J. Wysocki
Details | Diff
Good dmesg 2.6.37_rc7_git4 (69.99 KB, text/plain)
2010-12-29 14:35 UTC, Ozan Caglayan
Details
PCI / ACPI: Pass all _OSC support bits to the BIOS simultaneously (1.81 KB, patch)
2010-12-29 20:17 UTC, Rafael J. Wysocki
Details | Diff
2.6.37_rc8+patch#29 ACPI trace (257.70 KB, image/jpeg)
2010-12-31 08:01 UTC, Ozan Caglayan
Details
PCI / ACPI: Request _OSC control once for each root bridge (v3) (9.62 KB, patch)
2011-01-02 14:54 UTC, Rafael J. Wysocki
Details | Diff
dmesg of vanilla 2.6.37_rc8-git4 (53.73 KB, text/plain)
2011-01-04 08:19 UTC, Ozan Caglayan
Details

Description Ozan Caglayan 2010-10-13 06:13:38 UTC
I'm having a serious CPU hogging problem with an HP Elitebook 8540w running 2.6.36_rc6. A kworker consumes ~100% CPU during all the uptime since booting.

I first tried rmmod'ing all the modules but it didn't help. Then I built the
kernel with some ACPI/PM verbose flags turned on and found out that an infinite
number of the following message was written in kernel log:

scsi host1: __pm_runtime_resume() returns 1!

I then passed "on\n" to all power/control files in sysfs but the messages didn't
disappear. I disabled PM_RUNTIME to see if it was causing the kworker stuff, but
nope, it still continues.

Then I found a similar report and tried writing "disable" to /sys/firmware/acpi/interrupts/gpe01 and it stopped the kworker CPU consumation problem *although the load average doesn't drop under ~1.3*. When enabled the number of interrupts in /sys/firmware/acpi/interrupts/gpe01 increases very fast.

The problem is not fixed with the pcie_pme=off trick suggested in the other bug report related to this laptop.

(CC'ing İbrahim, the owner of the laptop)

Thanks
Comment 1 Ozan Caglayan 2010-10-13 06:15:05 UTC
BTW, 2.6.32.24 doesn't have this symptom. We didn't have much time to bisect the issue but if we can't find out the cause without bisecting, I'll try to bisect too.
Comment 2 Zhang Rui 2010-10-13 06:52:34 UTC
(In reply to comment #0)
> I'm having a serious CPU hogging problem with an HP Elitebook 8540w running
> 2.6.36_rc6. A kworker consumes ~100% CPU during all the uptime since booting.
> 
what's the latest good kernel?
what's the earlies bad kernel?

> 
> Then I found a similar report and tried writing "disable" to
> /sys/firmware/acpi/interrupts/gpe01 and it stopped the kworker CPU consumation
> problem *although the load average doesn't drop under ~1.3*. When enabled the
> number of interrupts in /sys/firmware/acpi/interrupts/gpe01 increases very
> fast.
> 
> The problem is not fixed with the pcie_pme=off trick suggested in the other bug
> report related to this laptop.
>
hmm, seems an ACPI interrupt storm.
please attach the acpidump of this laptop.
please attach the dmesg output after boot for both the good and the bad kernel.
Comment 3 Ozan Caglayan 2010-10-13 09:06:01 UTC
Created attachment 33442 [details]
acpidump
Comment 4 Ozan Caglayan 2010-10-13 09:06:31 UTC
Created attachment 33452 [details]
dmesg of bad kernel
Comment 5 Ozan Caglayan 2010-10-13 09:07:03 UTC
I'll try to post the earliest bad and the latest good kernel as soon as possible.
Comment 6 Ozan Caglayan 2010-10-13 13:03:23 UTC
okay 2.6.35.7 is good, 2.6.36_rc1 is bad. I'm attaching the good one's dmesg. Seems that it's broken by a commit during the merge window.
Comment 7 Ozan Caglayan 2010-10-13 13:03:46 UTC
Created attachment 33472 [details]
dmesg of good kernel 2.6.35.7
Comment 8 Rafael J. Wysocki 2010-10-13 20:00:20 UTC
Can you test 2.6.36-rc7 with pcie_ports=compat, please?
Comment 9 Ozan Caglayan 2010-10-14 09:07:27 UTC
// Add me to CC

OK will try that and post the result here.
Comment 10 Ozan Caglayan 2010-10-15 09:02:13 UTC
I didn't have time to rebuild rc8 but I tried that parameter on rc6 and it fixed the kworker issue but even on a completely idle system load average doesn't drop under 1.09~. I'll post some details later.
Comment 11 Ozan Caglayan 2010-10-18 07:00:10 UTC
Created attachment 33912 [details]
dmesg of 2.6.36_rc8-git4 with pcie_ports=compat
Comment 12 Ozan Caglayan 2010-10-18 07:17:42 UTC
Ok the pcie_ports=compat still works under 2.6.36_rc8-git4 but there's an "unexpected driver unregister!" backtrace in the dmesg that I've recently attached. And also the load average is still above > 1, I don't know if it's related or not.
Comment 13 Len Brown 2010-10-19 01:44:50 UTC
so with pcie_ports=compat, the interrupt storm goes away,
as indicated by the acpi interrupt in /proc/interrupts and
the gpe is /sys/firmware/acpi/interrupts no longer incrementing
quickly, but you still have something running on the cpu 100% of the time?
what does top(1) show?
Comment 14 Florian Mickler 2010-10-19 07:11:19 UTC
There is a skew in the loadaverage due to a commit introduced in 2.6.36:

commit 74f5187ac873042f502227701ed1727e7c5fbfa9
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>  2010-04-22 21:50:19
Committer: Ingo Molnar <mingo@elte.hu>  2010-04-23 11:02:02

    sched: Cure load average vs NO_HZ woes


Check https://bugzilla.kernel.org/show_bug.cgi?id=16525 and the
referenced threads for more details on that. 

It is not clear to me, if that skew is accompanied by any harmful symptoms though...

Regards,
Flo

p.s.: for convenience, I post the backtrace mentioned in comment #12:

[    1.120742] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    1.120744] ------------[ cut here ]------------
[    1.120749] WARNING: at drivers/base/driver.c:262 driver_unregister+0x36/0x6f()
[    1.120751] Hardware name: HP EliteBook 8540w
[    1.120752] Unexpected driver unregister!
[    1.120753] Modules linked in:
[    1.120756] Pid: 1, comm: swapper Not tainted 2.6.36_rc8-143 #1
[    1.120757] Call Trace:
[    1.120762]  [<ffffffff81045ec4>] warn_slowpath_common+0x80/0x98
[    1.120765]  [<ffffffff81045f70>] warn_slowpath_fmt+0x41/0x43
[    1.120769]  [<ffffffff81b7666d>] ? pci_hotplug_init+0x0/0x4e
[    1.120772]  [<ffffffff812c0604>] driver_unregister+0x36/0x6f
[    1.120775]  [<ffffffff8123939c>] pcie_port_service_unregister+0xd/0xf
[    1.120777]  [<ffffffff81b76914>] pciehp_acpi_slot_detection_init+0x96/0x132
[    1.120780]  [<ffffffff81b766c9>] ? pcied_init+0x0/0x79
[    1.120782]  [<ffffffff81b766d7>] pcied_init+0xe/0x79
[    1.120786]  [<ffffffff81000348>] do_one_initcall+0x7a/0x132
[    1.120789]  [<ffffffff81b4fd54>] kernel_init+0x17d/0x20b
[    1.120792]  [<ffffffff81003924>] kernel_thread_helper+0x4/0x10
[    1.120794]  [<ffffffff81b4fbd7>] ? kernel_init+0x0/0x20b
[    1.120796]  [<ffffffff81003920>] ? kernel_thread_helper+0x0/0x10
[    1.120803] ---[ end trace 6d450e935ee1897c ]---
Comment 15 Ozan Caglayan 2010-10-19 22:31:31 UTC
well yes the storm and the kworker which hogs the cups goes away with pcie_ports=compat. But even on a basic console login which stays idle for hours the load average stays always above 1. The output of top doesn't show any surprising, no task which uses the cpu excessively. But i dont know why the load average doesn't converge to 0.
Comment 16 Francisco Vazquez 2010-10-24 09:04:23 UTC
I'm experiencing the same problem in my laptop. Two or four kworker processes constantly at the top of top, consuming between 1 and ~20% cpu, load average of ~1. The trackpad is unusably jerky in X. pcie_ports=compad didn't change anything.

Back to 2.6.34.7 for now (with 2.6.35.x I had a similar problem: several kslowd00x processes hogging my cpu and making my trackpad jerky. At least they don't appear with 2.6.36...).
Comment 17 Rafael J. Wysocki 2010-10-24 14:00:21 UTC
(In reply to comment #16)
> I'm experiencing the same problem in my laptop. Two or four kworker processes
> constantly at the top of top, consuming between 1 and ~20% cpu, load average of
> ~1. The trackpad is unusably jerky in X. pcie_ports=compad didn't change
> anything.

That's pcie_ports=compat, not pcie_ports=compad.  If the latter is what you have
tested, please retest and report back.

If pcie_ports=compat doesn't help on your machine, the problem you're seeing
is certainly different.  In that case, please file a separate bug report
for that issue.
Comment 18 Zhang Rui 2010-11-08 02:28:31 UTC
so what's the status of this bug? :)
Comment 19 Ozan Caglayan 2010-11-10 06:33:35 UTC
Still continues with 2.6.36. pcie_ports still fixes the issue still with the backtrace.
Comment 20 Ozan Caglayan 2010-12-01 16:57:50 UTC
any suggestions?
Comment 21 Florian Mickler 2010-12-01 17:53:05 UTC
The load average is unrelated to this bug. Check the patch in bug #16525 for that. 

Len, that means: (Ozan, please correct me if I'm wrong):

(In reply to comment #13)
> so with pcie_ports=compat, the interrupt storm goes away,
> as indicated by the acpi interrupt in /proc/interrupts and
> the gpe is /sys/firmware/acpi/interrupts no longer incrementing
> quickly, but you still have something running on the cpu 100% of the time?
> what does top(1) show?

Answer: With pcie_ports=compat the kworker @100%cpu goes away, and everything is fine.
Comment 22 Rafael J. Wysocki 2010-12-01 20:37:20 UTC
Not everything, the backtrace is still there, that needs to be fixed.

I'll take care of this shortly.
Comment 23 Rafael J. Wysocki 2010-12-11 01:24:51 UTC
Created attachment 39752 [details]
PCI / Hotplug: Fix unexpected driver unregister in pciehp_acpi.c

This patch should fix the warning in comment #14, please verify.
Comment 24 Ozan Caglayan 2010-12-19 13:10:54 UTC
Okay, I'll try the patch ASAP but will the users of this laptop pass pcie_ports=compat explicitly to fix the issue? If yes, this is bad. If there's some sort of DMI quirk list that will be patched, that's reasonable.
Comment 25 Rafael J. Wysocki 2010-12-19 13:33:39 UTC
We're hoping to have a better fix than a DMI quirk, but not in 2.6.37,
so please use the command line workaround for now.
Comment 26 Rafael J. Wysocki 2010-12-23 23:45:51 UTC
Ozan, can you please send the output of "ls /sys/bus/pci/drivers" and
"ls /sys/bys/pci_express/drivers" ?
Comment 27 Rafael J. Wysocki 2010-12-24 00:12:55 UTC
Sorry, not this information.  The output of "ls /sys/bus/pci/slots/".
Comment 28 Rafael J. Wysocki 2010-12-24 00:15:34 UTC
Also please rmmod the pciehp module and modprobe acpiphp module instead.
Please check if the problem is reproducible with that in place.
Comment 29 Rafael J. Wysocki 2010-12-29 00:53:22 UTC
Created attachment 41832 [details]
PCI / ACPI: Request _OSC control once for each root bridge

The attached patch may help, so please test it.

If it doesn't help, please send the output of "dmesg | grep _OSC" generated
right after a fresh boot.
Comment 30 Ozan Caglayan 2010-12-29 06:59:05 UTC
On 2.6.36.1:

/sys/bus/pci/slots is empty.
/sys/bus/pci_express/drivers contains
 pciehp
 pci_pme
 aer

As pciehp is built into the kernel image, I could not find any way to avoid it from loading, so I'll need time to recompile and try what you've suggested in #28 and #29.
Comment 31 Rafael J. Wysocki 2010-12-29 10:02:23 UTC
The patch from comment #29 is on top of the current mainline (2.6.37-rc8 at the
moment).
Comment 32 Ozan Caglayan 2010-12-29 10:19:33 UTC
Ok I'll try with that kernel.
Comment 33 Ozan Caglayan 2010-12-29 14:35:52 UTC
Created attachment 41872 [details]
Good dmesg 2.6.37_rc7_git4
Comment 34 Ozan Caglayan 2010-12-29 14:37:25 UTC
Well I tried with 2.6.37_rc7-git4 + the patch in #23, and did a normal reboot e.g. without pcie_pme=compat and the issue seems to get fixed, new dmesg is attached.

Let's keep the bug report open until 2.6.37 gets released and I'll close this as fixed if 2.6.37 works OK.

Sorry for being late to switch to 2.6.37_rc*.
Comment 35 Rafael J. Wysocki 2010-12-29 14:59:09 UTC
Do you mean that 2.6.37-rc7-git4 with the patch from comment #23 works for
you without pcie_pme=compat and without the patch from comment #29 ?

If so, 2.6.37-rc8 should work for you too (it contains the patch from
comment #23).  Please confirm.
Comment 36 Ozan Caglayan 2010-12-29 15:02:14 UTC
Yes exactly.

But,

What I've tried as 2.6.37_rc7-git4 + patch in comment#23 was not vanilla at all. It's carrying a patch from upstream that seems related to the issue so maybe it was this commit which fixed the issue:

commit 885c252ffb059dc493200bdb981bdd21cabe4442
Author: Matthew Garrett <mjg@redhat.com>
Date:   Thu Dec 9 18:31:59 2010 -0500

    PCI: _OSC "supported" field should contain supported features, not enabled
ones

    From testing with Windows, the call to the PCI root _OSC method includes
    the full set of features supported by the operating system even if the
    hardware has already indicated that it doesn't support ASPM or MSI.
    https://bugzilla.redhat.com/show_bug.cgi?id=638912 is a case where making
    the _OSC call will incorrectly configure the chipset unless the supported
    field has bits 1, 2 and 4 set. Rework the functionality to ensure that
    we match this behaviour.

Anyway, I'll try with a vanilla 2.6.37_rc8 with and without the patch in comment29 to see the outcome.
Comment 37 Rafael J. Wysocki 2010-12-29 19:01:15 UTC
First, what do you mean saying "upstream"?

Second, if the "PCI: _OSC "supported" field should contain supported features,
not enabled ones" patch helps, the patch from comment #29 rather won't help.

I'll attach a patch on top of the one from comment #29 that may help.
Comment 38 Rafael J. Wysocki 2010-12-29 20:17:46 UTC
Created attachment 41892 [details]
PCI / ACPI: Pass all _OSC support bits to the BIOS simultaneously

Patch to test on top of the patch from comment #29.
Comment 39 Ozan Caglayan 2010-12-31 07:55:39 UTC
Okay, I'll try to resume what's going on as I think I've caused a little bit of confusion:

- 2.6.36.x is still showing the issue on those laptops
- The problem goes away on 2.6.36.x with pci_ports=compat but this gives a backtrace while unregistering a driver (patch to fix is available in comment #23)
- A complete solution is offered within the patch in comment #29

Then I tried 2.6.37_rc7-git4 with the patch in comment #23 to see at least if the backtrace is fixed when booting with pcie_ports=compat.

A plain reboot (with no pcie_ports=compat) cured the kworker issue.

Either the switch to 2.6.37_rc* cured the issue or the patch that I've taken from fedora f-15 entitled "PCI: _OSC "supported" field should contain supported features, not enabled ones". That was the patch I misleadingly told as "from upstream", sorry.

Then last night, I switched to 2.6.37_rc8 which already contains your patch in comment #23. I also put the patch in comment #29 on top of it and dropped the "PCI: _OSC .." patch from Matthew Garrett.

But unfortunately a lot of machines broke while booting this kernel. I'll send the photo just after this comment.
Comment 40 Ozan Caglayan 2010-12-31 08:01:57 UTC
Created attachment 41992 [details]
2.6.37_rc8+patch#29 ACPI trace
Comment 41 Rafael J. Wysocki 2011-01-02 00:26:34 UTC
The crash seems to be caused by the patch from comment #29, which apparently
tries to parse the HEST table too early.

However, you appear to say that the patch from comment #29 on top of 2.6.36.y
works correctly.  Is that also the case on machines that crash with
2.6.37-rc8 + the patch from comment #29 (I mean, if those machines are
booted with 2.6.36.y + patch from comment #29, do they boot correctly or
crash)?
Comment 42 Rafael J. Wysocki 2011-01-02 00:37:09 UTC
Can you please attach a dmesg output from vanilla 2.6.37-rc8 on one of the
machines that crash with the patch from comment #29 on top of that kernel?
Comment 43 Ozan Caglayan 2011-01-02 09:32:38 UTC
No I didn't say that 2.6.36.y + patch from comment #29 booted correctly as you've said that the patch was against the top of the current mainline, so I even didn't try to patch 2.6.36.y.

I'll send you the dmesg from vanilla 2.6.37_rc8-git1 tomorrow. Sorry I'm not an owner of this laptop so things are going slowly..
Comment 44 Rafael J. Wysocki 2011-01-02 14:54:43 UTC
Created attachment 42132 [details]
PCI / ACPI: Request _OSC control once for each root bridge (v3)

In that case it's better if you test the attached patch when you have access
to the machine in question.

It is a replacement for the patch in comment #29 that should fix the problem
with HEST parsing attempted too early.
Comment 45 Ozan Caglayan 2011-01-04 08:19:31 UTC
Created attachment 42312 [details]
dmesg of vanilla 2.6.37_rc8-git4
Comment 46 Ozan Caglayan 2011-01-04 08:24:00 UTC
Okay vanilla 2.6.37_rc8-git4 still problematic. I've attached the dmesg of it.

Applying your v3 patch on top of it *seems* to fix the issue. I'm occasionally seeing a kworker in top with ~%20-50 CPU usage but at least it does not hog the CPU eternally.

/sys/firmware/acpi/interrupts/gpe_all is constant 127 since booting and does not increment insanely with time.

Here's a diff between the vanilla and the patched dmesg's:

--- dmesg.vanilla       2011-01-04 10:13:47.694000464 +0200
+++ dmesg.patched       2011-01-04 10:13:58.577000494 +0200
@@ -247,6 +247,7 @@
 ACPI: Power Resource [APPR] (off)
 ACPI: Power Resource [LPP] (on)
 ACPI: No dock devices found.
+HEST: Table not found.
 PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
 ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
 pci_root PNP0A08:00: host bridge window [io  0x0000-0x0cf7]
@@ -405,7 +406,6 @@
 ACPI: PCI Interrupt Link [LNKF] (IRQs 1 3 4 5 6 7 11 12 14 15) *10
 ACPI: PCI Interrupt Link [LNKG] (IRQs 1 3 4 5 6 7 10 12 14 15) *0, disabled.
 ACPI: PCI Interrupt Link [LNKH] (IRQs 1 3 4 5 6 7 11 12 14 15) *0, disabled.
-HEST: Table is not found!
 vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none
 vgaarb: loaded
 SCSI subsystem initialized
@@ -635,39 +635,12 @@
 io scheduler noop registered
 io scheduler deadline registered
 io scheduler cfq registered (default)
-pcieport 0000:00:01.0: ACPI _OSC control granted for 0x1c
 pcieport 0000:00:01.0: setting latency timer to 64
 pcieport 0000:00:01.0: irq 40 for MSI/MSI-X
-pcieport 0000:00:1c.0: ACPI _OSC control granted for 0x1c
-pcieport 0000:00:1c.0: setting latency timer to 64
-pcieport 0000:00:1c.0: irq 41 for MSI/MSI-X
-pcieport 0000:00:1c.1: ACPI _OSC control granted for 0x1c
-pcieport 0000:00:1c.1: setting latency timer to 64
-pcieport 0000:00:1c.1: irq 42 for MSI/MSI-X
-pcieport 0000:00:1c.3: ACPI _OSC control granted for 0x1c
-pcieport 0000:00:1c.3: setting latency timer to 64
-pcieport 0000:00:1c.3: irq 43 for MSI/MSI-X
-pcieport 0000:00:1c.7: ACPI _OSC control granted for 0x1c
-pcieport 0000:00:1c.7: setting latency timer to 64
-pcieport 0000:00:1c.7: irq 44 for MSI/MSI-X
-pcieport 0000:00:01.0: Signaling PME through PCIe PME interrupt
-pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
-pci 0000:01:00.1: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:01.0:pcie01: service driver pcie_pme loaded
-pcieport 0000:00:1c.0: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:1c.0:pcie01: service driver pcie_pme loaded
-pcieport 0000:00:1c.1: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:1c.1:pcie01: service driver pcie_pme loaded
-pcieport 0000:00:1c.3: Signaling PME through PCIe PME interrupt
-pci 0000:44:00.0: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:1c.3:pcie01: service driver pcie_pme loaded
-pcieport 0000:00:1c.7: Signaling PME through PCIe PME interrupt
-pci 0000:45:00.0: Signaling PME through PCIe PME interrupt
-pcie_pme 0000:00:1c.7:pcie01: service driver pcie_pme loaded
 pci_hotplug: PCI Hot Plug PCI Core version: 0.5
 pciehp: PCI Express Hot Plug Controller Driver version: 0.4
 pci-stub: invalid id string ""
@@ -676,35 +649,34 @@
 ACPI: Power Button [PWRF]
 ACPI: acpi_idle registered with cpuidle
 Monitor-Mwait will be used to enter C-1 state
-Monitor-Mwait will be used to enter C-2 state
 Monitor-Mwait will be used to enter C-3 state
 thermal LNXTHERM:00: registered as thermal_zone0
Comment 47 Rafael J. Wysocki 2011-01-04 22:43:43 UTC
OK, thanks for testing!

Apparently, with the patch from comment #44 _OSC is not executed on your
system, so it doesn't use native PCI Express services and that's why the
GPE storm doesn't appear any more (so the patch definitely helps).

Which appears to be fine, because your system doesn't support ASPM, as
indicated by the ACPI tables.

Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
Patch : https://patchwork.kernel.org/patch/449231/
Comment 48 Ozan Caglayan 2011-01-05 06:47:00 UTC
Thanks. BTW, if it is not too invasive for 2.6.36, it will be good to CC stable@kernel.org.
Comment 49 Ozan Caglayan 2011-01-11 09:58:35 UTC
Rafael can you check the following screenshot? The user tells that he gets this trace with 2.6.37 + your v3 patch. I'm not quite sure that he's booting the right kernel but the trace seems to be a little different than the one caused by your v2 patch?

http://bugs.pardus.org.tr/attachment.cgi?id=6374

Thanks,
Comment 50 Rafael J. Wysocki 2011-01-11 19:52:07 UTC
This is an entirely different bug.

It's the aer_service_init() code path that should be executed way after
acpi_pci_root_init() that calls acpi_hest_init() in the patch from
comment #47.

Apart from this, it looks like the user actually _has_ HEST.

Can you open a new bug entry for this one, please, and put the slide in there
along with (non-failing) boot log and the output of acpidump from the
affected machine?
Comment 51 Florian Mickler 2011-01-22 12:36:00 UTC
merged in .38-rc1:

commit 415e12b2379239973feab91850b0dce985c6058a
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Fri Jan 7 00:55:09 2011 +0100

    PCI/ACPI: Request _OSC control once for each root bridge (v3)
Comment 52 Ortwin Glück 2011-02-10 08:10:52 UTC
Even though the patch fixes the initial problem, it reoccurs after a suspend to RAM / resume cycle.
Comment 53 Rafael J. Wysocki 2011-02-10 15:00:35 UTC
On top of what kernel?
Comment 54 Ortwin Glück 2011-02-10 15:11:11 UTC
Linux ortwin-hp 2.6.37 #14 SMP PREEMPT Mon Feb 7 18:48:47 CET 2011 x86_64 Intel(R) Core(TM) i7 CPU M 620 @ 2.67GHz GenuineIntel GNU/Linux
Comment 55 Rafael J. Wysocki 2011-02-10 15:18:33 UTC
I think you're seeing a different problem.  Please file a separate bug for
it an put my address into the CC list.
Comment 56 Ozan Caglayan 2011-02-11 06:20:01 UTC
Confirming unfortunately that the issue reappears after a suspend/resume cycle.
Comment 57 Ortwin Glück 2011-02-11 07:56:50 UTC
perf top after resume:
-------------------------------------------------------------------------------------------------------------------
   PerfTop:    1138 irqs/sec  kernel:90.7%  exact:  0.0% [1000Hz cycles],  (all, 4 CPUs)
-------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                            DSO
             _______ _____ ___________________________________ _________________________________

             3873.00 33.6% __acpi_acquire_global_lock          /lib/modules/2.6.37/build/vmlinux
             1043.00  9.1% acpi_os_read_port                   /lib/modules/2.6.37/build/vmlinux
              879.00  7.6% acpi_ns_search_one_scope            /lib/modules/2.6.37/build/vmlinux
              577.00  5.0% acpi_ns_lookup                      /lib/modules/2.6.37/build/vmlinux
              474.00  4.1% acpi_ps_peek_opcode                 /lib/modules/2.6.37/build/vmlinux
              367.00  3.2% acpi_ex_name_segment                /lib/modules/2.6.37/build/vmlinux
              324.00  2.8% __acpi_release_global_lock          /lib/modules/2.6.37/build/vmlinux
              303.00  2.6% acpi_ps_get_next_namestring         /lib/modules/2.6.37/build/vmlinux
              269.00  2.3% acpi_ex_system_memory_space_handler /lib/modules/2.6.37/build/vmlinux
              216.00  1.9% pci_conf1_read                      /lib/modules/2.6.37/build/vmlinux
              199.00  1.7% kmem_cache_free                     /lib/modules/2.6.37/build/vmlinux
              188.00  1.6% __memset                            /lib/modules/2.6.37/build/vmlinux
              181.00  1.6% acpi_ps_parse_loop                  /lib/modules/2.6.37/build/vmlinux
              179.00  1.6% kmem_cache_alloc                    /lib/modules/2.6.37/build/vmlinux
              139.00  1.2% acpi_os_write_port                  /lib/modules/2.6.37/build/vmlinux
              120.00  1.0% acpi_ps_get_next_package_end        /lib/modules/2.6.37/build/vmlinux
              100.00  0.9% acpi_ex_get_name_string             /lib/modules/2.6.37/build/vmlinux
               90.00  0.8% add_preempt_count                   /lib/modules/2.6.37/build/vmlinux
               83.00  0.7% acpi_ut_create_generic_state        /lib/modules/2.6.37/build/vmlinux
               78.00  0.7% _raw_spin_lock_irqsave              /lib/modules/2.6.37/build/vmlinux
               75.00  0.7% acpi_ps_get_opcode_info             /lib/modules/2.6.37/build/vmlinux
               54.00  0.5% acpi_ps_get_next_simple_arg         /lib/modules/2.6.37/build/vmlinux
               54.00  0.5% _raw_spin_unlock_irqrestore         /lib/modules/2.6.37/build/vmlinux
               53.00  0.5% acpi_ds_exec_end_op                 /lib/modules/2.6.37/build/vmlinux
               50.00  0.4% kfree                               /lib/modules/2.6.37/build/vmlinux
               50.00  0.4% acpi_ps_append_arg                  /lib/modules/2.6.37/build/vmlinux
               46.00  0.4% acpi_ut_update_object_reference     /lib/modules/2.6.37/build/vmlinux
               44.00  0.4% sub_preempt_count                   /lib/modules/2.6.37/build/vmlinux
               38.00  0.3% acpi_ex_extract_from_field          /lib/modules/2.6.37/build/vmlinux
               38.00  0.3% acpi_ds_exec_begin_op               /lib/modules/2.6.37/build/vmlinux
Comment 58 Rafael J. Wysocki 2011-02-11 19:04:54 UTC
Please open a new bug.
Comment 59 Rafael J. Wysocki 2011-02-22 22:57:08 UTC
Or please let me know its number in case you've done it already.
Comment 60 Ortwin Glück 2011-02-23 11:51:29 UTC
I have now opened #29722
Comment 61 Florian Mickler 2011-03-28 22:59:11 UTC
A patch referencing this bug report has been merged in v2.6.38-8569-g16c29da:

commit 8b8bae901ce23addbdcdb54fa1696fb2d049feb5
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Sat Mar 5 13:21:51 2011 +0100

    PCI/ACPI: Report ASPM support to BIOS if not disabled from command line

Note You need to log in before you can comment on or make changes to this bug.