Bug 105491

Summary: GPE flooding prevention - ACPICA log improvement - ACPI AE_NOT_FOUND errors for GPE _L6F on Skylake system.
Product: ACPI Reporter: Ben Kero (ben.kero)
Component: BIOSAssignee: Lv Zheng (lv.zheng)
Status: CLOSED DUPLICATE    
Severity: normal CC: aaron.lu, bugzilla, georgediam, kanetas, lenb, wael.nasreddine
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.3.0-rc4 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
acpidump
DMI information from the failing system
Interrupts from the failing system
lspci results from the failing system
dsdt from iasl -d dsdt.dat
acpidump with BIOS 3.50

Description Ben Kero 2015-10-05 01:10:04 UTC
Booting Linux on my Skylake system results in ACPI errors filling the kernel ring buffer:

[ 1621.356444] ACPI Error: [PGRT] Namespace lookup failure, AE_NOT_FOUND (20150818/psargs-359)
[ 1621.356446] ACPI Error: Method parse/execution failed [\_GPE._L6F] (Node ffff8804658e00c8), AE_NOT_FOUND (20150818/psparse-542)
[ 1621.356448] ACPI Exception: AE_NOT_FOUND, while evaluating GPE method [_L6F] (20150818/evgpe-592)
...
[ 1621.355548] ACPI Error: [PGRT] Namespace lookup failure, AE_NOT_FOUND (20150818/psargs-359)
[ 1650.270201] ACPI Error: Method parse/execution failed [\_GPE._L6F] (Node ffff8804658e00c8), AE_NOT_FOUND (20150818/psparse-542)

These messages are filling my buffer constantly. I run out of disk space within hours (300GB+).

Using the kernel option 'acpi=off' removes these errors. 'The errors are still present with 'acpi=ht'.

All of my hardware seems to work as expected though.

I've updated to the latest BIOS (v1.20) (http://asrock.com/mb/Intel/Fatal1ty%20Z170%20Gaming-ITXac/index.us.asp?cat=Download&os=BIOS)

According to the 01.org page (https://01.org/linux-acpi/documentation/debug-how-isolate-linux-acpi-issues) the problem could be with the ACPI table parsing code.

Hardware:

Core i7 6700K processor
Fatal1ty Z170 Gaming-ITX/ac motherboard
nVidia GeForce GTX970 GPU
Comment 1 Ben Kero 2015-10-05 17:44:17 UTC
Created attachment 189491 [details]
dmesg output
Comment 2 Ben Kero 2015-10-05 17:44:40 UTC
Created attachment 189501 [details]
acpidump

acpi dump of the failing system
Comment 3 Ben Kero 2015-10-05 17:45:06 UTC
Created attachment 189511 [details]
DMI information from the failing system
Comment 4 Ben Kero 2015-10-05 17:45:26 UTC
Created attachment 189521 [details]
Interrupts from the failing system
Comment 5 Ben Kero 2015-10-05 17:45:42 UTC
Created attachment 189531 [details]
lspci results from the failing system
Comment 6 Lv Zheng 2015-10-09 03:00:21 UTC
For this bug, I really cannot find PGRT in any of the DSDT/SSDTs.
You should check if any BIOS update is available.

Or if you are searching for a workaround, you should disable GPE6F.
Let's mark this as duplicate to another similar bugs.

Thanks and best regards
-Lv

*** This bug has been marked as a duplicate of bug 53071 ***
Comment 7 Lv Zheng 2015-10-09 06:36:42 UTC
We could prepare some fixes in ACPICA to reduce the redundant errors.
For example:
ACPI_WARNING_ONCE
ACPI_ERROR_ONCE
ACPI_EXCEPTION_ONCE

Let me also check if it is possible.

Thanks and best regards
-Lv
Comment 8 Ben Kero 2015-10-09 20:15:39 UTC
As mentioned in comment #0, I am on the latest BIOS. ASRock has not been responsive to my attempts to tell them about the problem.

I'm just a user, and don't actually know what PGRT is. I also don't know how to disable GPE6F. I imagine 6F is a known ACPI event, although I haven't been able to find out what it is. Is there documentation for this that I might have missed? If it's a BIOS setting, I can try finding it if I have some clue what the feature is related to.

If you can add these ACPI logging features to the Linux kernel I will definitely use those. I figured that this would already be detected as a GPE storm and stop being printk'd.

I've ran cat /sys/firmware/acpi/tables/DSDT > dsdt.dat && iasl -d dsdt.dat, and opened it up in a text editor. There were 17 external control methods that couldn't be resolved. I found this, not sure if relevant:

        Name (PRES, One)
        Method (_L6F, 0, NotSerialized)  // _Lxx: Level-Triggered GPE
        {
            \_SB.UGPS ()
            If ((RTD3 == One))
            {
                If (CondRefOf (\_GPE.AL6F))
                {
                    AL6F ()
                }
            }

            If ((PGRT == One))
            {
                If ((SGGP == One))
                {
                    If (CondRefOf (\_GPE.P0L6))
                    {
                        P0L6 (\_SB.CAGS (P0WK))
                    }
                }

                If ((P1GP == One))
                {
                    If (CondRefOf (\_GPE.P1L6))
                    {
                        P1L6 (\_SB.CAGS (P1WK))
                    }
                }

                If ((P2GP == One))
                {
                    If (CondRefOf (\_GPE.P2L6))
                    {
                        P2L6 (\_SB.CAGS (P2WK))
                    }
                }
            }
            \_SB.CGLS ()
        }

I'll attach the rest of the file.
Comment 9 Ben Kero 2015-10-09 20:16:23 UTC
Created attachment 189851 [details]
dsdt from iasl -d dsdt.dat
Comment 10 Lv Zheng 2015-10-10 02:12:33 UTC
It is the marked line:
        Method (_L6F, 0, NotSerialized)  // _Lxx: Level-Triggered GPE
        {
            ...
            If ((PGRT == One))
                 ^^^^
            ...
The problem is triggered by this line.
I decompiled DSDT and all SSDTs, and failed to find the PGRT.
iasl result:
Search "PGRT" (2 hits in 1 file)
  105491\dsdt.dsl (2 hits)
	Line 108:     External (PGRT)
	Line 20095:             If (LEqual (PGRT, One))

Then this looks like a BIOS bug.

When ACPI hardware issues GPE6F to the OS, _L6F will be executed by OS.
OS actually has no idea what is implemented in _L6F (that's the purpose of ACPI - OS needn't know the board specific details).

In Linux, you can try to disable GPE via /sys/firmware/acpi/interrupts/gpe6F.
Echo "disable" to this file could temporarily disables the GPE.
While this mechanism is buggy and in bug 53071, a new mechanism is implemented to ensure GPE can actually be disabled.

I don't know why there are 17 unresolved externals.
The causes that I can imagine:
1. This is a BIOS bug, some tables are missing.
2. iasl may have trouble to parse the tables (ASAIK, there is no such issue).
3. The externals are defined in the dynamic loaded tables while the Linux initialization order may be wrong, enabled GPE6F too earlier. Maybe it is safe to enable GPE6F after loading a specific table via an initialization step.

All dynamic table loadings that I can find:
  105491\ssdt5.dsl (7 hits)
	Line 267:                     Load (CST0, HC0)
	Line 277:                 Load (IST0, HI0)
	Line 288:                     Load (HWP0, HW0)
	Line 293:                         Load (HWPL, HW2)
	Line 388:                 Load (CST1, HC1)
	Line 400:                 Load (IST1, HI1)
	Line 412:                 Load (HWP1, HW1)
If you want to confirm, you can use the following command:
"acpidump -a <address>" to dump the tables, then extract and decompile them.

The table addresses can be found in this package:
        Name (SSDT, Package (0x15)
        {
            "CPU0IST ", 
            0x87522818, 
            ^^^^^^^^^^ Address of IST0
            0x000007AA, 
            "APIST   ", 
            0x87521618, 
            ^^^^^^^^^^ Address of IST1
            0x000005AA, 
            "CPU0CST ", 
            0x87521C18, 
            ^^^^^^^^^^ Address of CST0
            0x0000037F, 
            "APCST   ", 
            0x87524C18, 
            ^^^^^^^^^^ Address of CST1
            0x00000119, 
            "CPU0HWP ", 
            0x857E7A18, 
            ^^^^^^^^^^ Address of HWP0
            0x0000008E, 
            "APHWP   ", 
            0x861D1618, 
            ^^^^^^^^^^ Address of HWP1
            0x00000119, 
            "HWPLVT  ", 
            0x861D1498, 
            ^^^^^^^^^^ Address of HWPL
            0x00000130
        })

It may not be easier to implement such logging facility as unlike the linux WARN_ONCE stuff, here we need to limit the errors to a specific error (AE_NOT_FOUND) against a specific namespace node, rather than limit the log entries issued from the same code position.
So it is better to just diable the GPE until the problem is root caused and fixed.

Thanks and best regards
-Lv
Comment 11 kbz 2016-02-22 14:58:47 UTC
Hi, I have this problem as well. I'm also have a z170 gaming-itx/ac. 

Everything was fine until I added drm.debug=14 (see https://bugs.freedesktop.org/show_bug.cgi?id=94248) to /etc/default/grub.

I now have the exact same symptoms with the additional bonus that my fan is running constantly and my cpu temp is 100 deg. C
Comment 12 Mentat 2016-03-15 20:17:45 UTC
There is new BIOS 3.10 for ASRock Z170 PRO4S (http://www.asrock.com/mb/Intel/Z170%20Pro4S/index.pl.asp?cat=Download&os=BIOS). Can someone check if it solves the problem?
Comment 13 SK 2016-03-23 02:37:47 UTC
(In reply to Mentat from comment #12)
> There is new BIOS 3.10 for ASRock Z170 PRO4S
> (http://www.asrock.com/mb/Intel/Z170%20Pro4S/index.pl.
> asp?cat=Download&os=BIOS). Can someone check if it solves the problem?

BIOS 3.10 does NOT resolve the problem (ASRock Z170 PRO4 with kernel 4.5).
Comment 14 SK 2016-04-25 12:23:34 UTC
(In reply to SK from comment #13)
> (In reply to Mentat from comment #12)
> > There is new BIOS 3.10 for ASRock Z170 PRO4S
> > (http://www.asrock.com/mb/Intel/Z170%20Pro4S/index.pl.
> > asp?cat=Download&os=BIOS). Can someone check if it solves the problem?
> 
> BIOS 3.10 does NOT resolve the problem (ASRock Z170 PRO4 with kernel 4.5).

BIOS 3.50 solves the problem with my system  (ASRock Z170 PRO4 with kernel 4.4).
Comment 15 Lv Zheng 2016-04-29 02:20:36 UTC
(In reply to SK from comment #14)
> (In reply to SK from comment #13)
> > (In reply to Mentat from comment #12)
> > > There is new BIOS 3.10 for ASRock Z170 PRO4S
> > > (http://www.asrock.com/mb/Intel/Z170%20Pro4S/index.pl.
> > > asp?cat=Download&os=BIOS). Can someone check if it solves the problem?
> > 
> > BIOS 3.10 does NOT resolve the problem (ASRock Z170 PRO4 with kernel 4.5).
> 
> BIOS 3.50 solves the problem with my system  (ASRock Z170 PRO4 with kernel
> 4.4).

OK.
Could you upload 3.50 acpidump so that we may use it to generate quirks for the buggy BIOS.

Thanks
-Lv
Comment 16 SK 2016-04-29 07:38:53 UTC
Created attachment 214691 [details]
acpidump with BIOS 3.50
Comment 17 Len Brown 2016-05-16 21:32:34 UTC
i don't see a quirk for this BIOS.
re-opened, and assigned to BIOS category.
Comment 18 Lv Zheng 2016-05-17 04:50:33 UTC
Since improving the log here doesn't seem possible, you can disable the GPE.
Please try the following patches:
https://patchwork.kernel.org/patch/9099221/
https://patchwork.kernel.org/patch/9099251/
https://patchwork.kernel.org/patch/9099271/
And boot the kernel built with the patches applied:
acpi_block_gpe=0x6F

Thanks
-Lv
Comment 19 Lv Zheng 2016-05-17 04:51:52 UTC

*** This bug has been marked as a duplicate of bug 117481 ***
Comment 20 George Diamantopoulos 2016-05-19 09:00:00 UTC
Solved for me too with BIOS version 2.00 on Asrock H170 Pro4S. Tested with kernels 4.5 and 4.6.
Comment 21 Lv Zheng 2016-05-20 03:10:46 UTC
Closing.