Bug 12530 - ACPI Exceptions and EC GPE storm - Apple MacBook Pro 1,1
Summary: ACPI Exceptions and EC GPE storm - Apple MacBook Pro 1,1
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Alexey Starikovskiy
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-23 16:49 UTC by Javier Marcet
Modified: 2009-08-13 03:06 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.28.1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg of my current system, still going on after a GPE storm (41.77 KB, text/plain)
2009-01-23 16:56 UTC, Javier Marcet
Details

Description Javier Marcet 2009-01-23 16:49:12 UTC
Latest working kernel version: 2.6.24

Earliest failing kernel version: 2.6.25

Distribution: Gentoo

Hardware Environment: Apple MacBook Pro 1,1 15"

Software Environment: 
Linux-2.6.28.1-i686-Genuine_Intel-R-_CPU_T2600_@_2.16GHz-with-glibc2.0
binutils: 2.19
gcc: 4.3.2
glibc: 2.9_p20081201
libtool: 2.2.6a
os-headers: 2.6.27-r2
xf86-video-ati: 6.10.0
xorg-server: 1.5.3

Problem Description:

Up to 2.6.24 I compiled my kernels with MMConfig as PCI access mode and everything worked, with perfect stability. I could have uptimes of several days/weeks with various suspend/resume cycles.

After 2.6.24 (I'm not 100% sure which was the first version failing), I began getting ACPI Exceptions during boot, followed later on by a GPE Storm and eventually a system freeze.

I discovered that compiling the kernel with Any as PCI access mode got rid of the ACPI Exceptions, although the GPE Storms still happened.

Right now, with 2.6.28.1 the GPE Storm takes a while longer to show, but it does, and after that, while the system continues to work fine otherwise, it can freeze anytime.


Compiled with Any as PCI access mode I see this during boot:

[    0.156983] ACPI: bus type pci registered
[    0.160012] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
[    0.160128] PCI: MCFG area at e0000000 reserved in E820
[    0.160241] PCI: Using MMCONFIG for extended config space
[    0.160353] PCI: Using configuration type 1 for base access
[    0.160584] ACPI: EC: EC description table is found, configuring boot EC
[    0.160806] ACPI: EC: non-query interrupt received, switching to interrupt mode
[    0.164476] ACPI: BIOS _OSI(Linux) query ignored via DMI
[    0.166883] ACPI: Interpreter enabled
[    0.166997] ACPI: (supports S0 S3 S4 S5)
[    0.167527] ACPI: Using IOAPIC for interrupt routing
[    0.183448] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[    0.183515] ACPI: EC: driver started in interrupt mode
[    0.183638] ACPI: No dock devices found.
[    0.186678] ACPI: PCI Root Bridge [PCI0] (0000:00)

Whereas with MMConfig:

[    0.156986] ACPI: bus type pci registered
[    0.156986] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
[    0.160002] PCI: Not using MMCONFIG.
[    0.160113] PCI: Fatal: No config space access function found
[    0.160584] ACPI: EC: EC description table is found, configuring boot EC
[    0.160803] ACPI: EC: non-query interrupt received, switching to interrupt mode
[    0.164466] ACPI: BIOS _OSI(Linux) query ignored
[    0.166888] ACPI: Interpreter enabled
[    0.167003] ACPI: (supports S0 S3 S4 S5)
[    0.167531] ACPI: Using IOAPIC for interrupt routing
[    0.167681] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255
[    0.168027] ACPI Exception (evregion-0419): AE_ERROR, Returned by Handler for [PCI_Config] [20080926]
[    0.170002] ACPI Error (psparse-0524): Method parse/execution failed [\_SB_.PCI0.LPCB.LNKA._STA] (Node f7011ba0), AE_ERROR
[    0.170451] ACPI Error (uteval-0232): Method execution failed [\_SB_.PCI0.LPCB.LNKA._STA] (Node f7011ba0), AE_ERROR
[    0.170887] ACPI Exception (evregion-0419): AE_ERROR, Returned by Handler for [PCI_Config] [20080926]
[    0.171213] ACPI Error (psparse-0524): Method parse/execution failed [\_SB_.PCI0.LPCB.LNKB._STA] (Node f7011c60), AE_ERROR
[    0.171661] ACPI Error (uteval-0232): Method execution failed [\_SB_.PCI0.LPCB.LNKB._STA] (Node f7011c60), AE_ERROR
[    0.173450] ACPI Exception (evregion-0419): AE_ERROR, Returned by Handler for [PCI_Config] [20080926]
[    0.173776] ACPI Error (psparse-0524): Method parse/execution failed [\_SB_.PCI0.LPCB.LNKC._STA] (Node f7011d20), AE_ERROR
[    0.174223] ACPI Error (uteval-0232): Method execution failed [\_SB_.PCI0.LPCB.LNKC._STA] (Node f7011d20), AE_ERROR
[    0.174658] ACPI Exception (evregion-0419): AE_ERROR, Returned by Handler for [PCI_Config] [20080926]
[    0.174983] ACPI Error (psparse-0524): Method parse/execution failed [\_SB_.PCI0.LPCB.LNKD._STA] (Node f7011de0), AE_ERROR
[    0.175431] ACPI Error (uteval-0232): Method execution failed [\_SB_.PCI0.LPCB.LNKD._STA] (Node f7011de0), AE_ERROR
[    0.175866] ACPI Exception (evregion-0419): AE_ERROR, Returned by Handler for [PCI_Config] [20080926]
[    0.176190] ACPI Error (psparse-0524): Method parse/execution failed [\_SB_.PCI0.LPCB.LNKE._STA] (Node f7011ea0), AE_ERROR
[    0.176674] ACPI Error (uteval-0232): Method execution failed [\_SB_.PCI0.LPCB.LNKE._STA] (Node f7011ea0), AE_ERROR
[    0.177109] ACPI Exception (evregion-0419): AE_ERROR, Returned by Handler for [PCI_Config] [20080926]
[    0.177435] ACPI Error (psparse-0524): Method parse/execution failed [\_SB_.PCI0.LPCB.LNKF._STA] (Node f7011f60), AE_ERROR
[    0.177882] ACPI Error (uteval-0232): Method execution failed [\_SB_.PCI0.LPCB.LNKF._STA] (Node f7011f60), AE_ERROR
[    0.178317] ACPI Exception (evregion-0419): AE_ERROR, Returned by Handler for [PCI_Config] [20080926]
[    0.178644] ACPI Error (psparse-0524): Method parse/execution failed [\_SB_.PCI0.LPCB.LNKG._STA] (Node f7013030), AE_ERROR
[    0.179091] ACPI Error (uteval-0232): Method execution failed [\_SB_.PCI0.LPCB.LNKG._STA] (Node f7013030), AE_ERROR
[    0.179526] ACPI Exception (evregion-0419): AE_ERROR, Returned by Handler for [PCI_Config] [20080926]
[    0.179853] ACPI Error (psparse-0524): Method parse/execution failed [\_SB_.PCI0.LPCB.LNKH._STA] (Node f70130f0), AE_ERROR
[    0.180239] ACPI Error (uteval-0232): Method execution failed [\_SB_.PCI0.LPCB.LNKH._STA] (Node f70130f0), AE_ERROR
[    0.183465] PCI: MCFG area at e0000000 reserved in ACPI motherboard resources
[    0.183580] PCI: Using MMCONFIG for extended config space
[    0.200120] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[    0.200180] ACPI: EC: driver started in interrupt mode
[    0.200303] ACPI: No dock devices found.
[    0.200303] ACPI: PCI Root Bridge [PCI0] (0000:00)


One way or the other, while using the system I see messages like these:
[ 1238.512942] CE: hpet increasing min_delta_ns to 15000 nsec
[ 1238.513012] CE: hpet increasing min_delta_ns to 22500 nsec
[ 1238.513081] CE: hpet increasing min_delta_ns to 33750 nsec
[ 1285.219964] CE: hpet increasing min_delta_ns to 50624 nsec
[ 2107.230042] CE: hpet increasing min_delta_ns to 75936 nsec

Eventually:
[ 5487.896178] ACPI: EC: GPE storm detected, transactions will use polling mode
[ 6363.446910] ACPI: EC: missing confirmations, switch off interrupt mode.

This only happens under X11, i.e., I could not trigger the GPE storm without using X11, although the 'hpet increasing min_delta_ns' messages still show up.
Comment 1 Javier Marcet 2009-01-23 16:56:04 UTC
Created attachment 19977 [details]
dmesg of my current system, still going on after a GPE storm
Comment 2 Javier Marcet 2009-01-24 03:47:53 UTC
I found this report http://bugzilla.kernel.org/show_bug.cgi?id=12250
of which this could well be a dup.

I can confirm that 2.6.24.7 works fine, without errors.
2.6.25 OTOH says it can't find the MMCFG space, although there aren't any ACPI exceptions thrown. However reverting b6ce068a1285a24185b01be8a49021827516b3e1 seems to fix it.
So far I couldn't port it to any 2.6.26+ kernel successfully, hence I'm testing 2.6.25.20 with the above commit reverted and see how it behaves under load.

So far so good. The only thing which has come up is:
[  560.582772] CE: hpet increasing min_delta_ns to 15000 nsec
[  563.807043] CE: hpet increasing min_delta_ns to 22500 nsec

That and that ATM it hasn't survived a suspend/resume. Other than that it seems quite stable. I've already stress tested it a little (make -j3 the kernel while watching a 720p x264 mkv read over wifi from a nfs share) and haven't had any problem, which I could not say of any 2.6.26+ kernel, or 2.6.25 without reverting the mentioned commit.
Comment 3 Len Brown 2009-01-27 08:55:56 UTC
There are 4 issues mentioned in this bug report.

1. PCI config space warnings

It seems backwards that building with Any yields:

PCI: Using MMCONFIG for extended config space

while building with MMConfig yields:

PCI: Not using MMCONFIG.

This may be a PCI bug.  Perhaps you're the first person
on an 2.6.25 MMCONFIG machine to not use "Any"?

2. CE: hpet increasing min_delta_ns to 15000 nsec

Please file a separate bug against timers for this message.
(I'm guessing that it is independent of #1 and #3)

3. GPE storm

ACPI: EC: GPE storm detected, transactions will use polling mode
ACPI: EC: missing confirmations, switch off interrupt mode.

This may be a known issue, for I recall that there were
some GPE issues on the Apple macbook a few releases back.

4. system freeze

this is the most serious, but most mysterious thing.
It is related in any way to #1-#3?
Does it happen in graphics mode only?
Comment 4 Javier Marcet 2009-01-30 01:49:37 UTC
Sorry for taking so long to reply. I was trying to reproduce my results consistently.

1. I know what it looks like, but that's what happens.
All the distributions I checked compile their kernels with Any as access mode. That could be why this has not been noticed any earlier.

2. I've realized that 2 & 3 are different and not involve the other one.
Booting with nohz it's harder to get these messages, but they do happen, with or without X11. They don't seem to affect performance or stability, though.

3. This started to happen alongside the MMCONFIG issue. At 2.6.25. At first ( 2.6.25) I had a sharp performance decrease when the storm was detected, soon followed by a crash. Right now it is non deterministic at all, sometimes it takes several hours to happen, other times it does right after starting the X11 session.

4. I've had problems freezing my system the same way all this week.
I tend to compile my kernels with voluntary preemption.
I've tried with full preemption and I got several panics while starting X11; with no preemption the system had too much latency to be used as desktop.

All in all, it seems the freezes are completely random.
The only thing for certain is that before a GPE storm happens, the system works wonderfully, including suspend/resume.
Once the storm kicks in, at the very least it can't survive a suspend anymore.
Sadly I can't reproduce reliably one of those freezes, whatever I do.

#1 nowadays (2.6.28) seems harmless. Whether I use Any or MMCONFIG with the subsequent warnings, the system behaves the same way.
#2 happens in console mode too, without having had a #3.
#3 only happens in X11. Or I might not have stressed enough the system without X11.
If at all, #4 seems related to #3, not to #1 or #2.
Comment 5 Zhang Rui 2009-06-08 07:52:26 UTC
do you know if there is an ambient light sensor on your laptop?
please attach the acpidump output.
Comment 6 Corrado Zoccolo 2009-07-05 13:47:20 UTC
Javier,
can you try the patches attached at: http://bugzilla.kernel.org/show_bug.cgi?id=1294 (comments 20 and 29), and see if they improve the GPE storm issue?
Comment 7 Corrado Zoccolo 2009-07-06 13:00:52 UTC
Sorry, in previous comment, the correct bug url is the following: http://bugzilla.kernel.org/show_bug.cgi?id=12949
Comment 8 Javier Marcet 2009-07-09 03:49:36 UTC
It's been a while since I saw a GPE storm.
But a few minutes ago, I got one.

I'm compiling 2.6.29.6 with it right now. Maybe it even fixes another issue
that has come up recently...

Anyway, since I don't use HPET I haven't had any of those problems anymore.

I'll see what happens with the last patch on 12949, both with and without HPET.
Comment 9 Len Brown 2009-08-13 03:06:49 UTC
closing due to no activity in this bug report in a month.
please re-open if this is still a problem in the latest stable kernel.

Note You need to log in before you can comment on or make changes to this bug.