Bug 114201

Summary: ACPI Error: No handler or method for GPE xx, disabling event (20160108/evgpe-790) - AMD A8-4500M APU
Product: Drivers Reporter: Antonín Dach (dach)
Component: WatchdogAssignee: drivers_watchdog (drivers_watchdog)
Status: CLOSED CODE_FIX    
Severity: normal CC: abyomi0, alexander.konotop, dev, edallagnol, eipi1is0, eugene.shatokhin, joaocagnoni, kontrollator, kruykaze, lenb, littlejth, luya, lv.zheng, marci_r, martin.bohun, michel, mjb, moby, nicoadamo, ray.huang, rui.zhang, seem, theredbaron1834, towo, tux-86
Priority: P1    
Hardware: x86-64   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=114811
https://bugzilla.kernel.org/show_bug.cgi?id=114681
Kernel Version: 4.5 Subsystem:
Regression: No Bisected commit-id:
Attachments: Acpidump from working kernel 4.4
ACPI Dump 4.4.3 working
dmesg acpi debug working kernel 4.4.3
dmesg with acpi debugging enabled
[PATCH] ACPI: Add configuration item to configure ACPICA error logs out
kernel45_dmesg_without_acpi_debugging
kernel45_dmesg_without_acpi_debugging (correct)
dmesg 4.5.0 without acpi debug
[PATCH] Events: Mute debugging messages when bug is hit
do GPE registers work during boot? here's dmesg

Description Antonín Dach 2016-03-09 18:49:03 UTC
I want to report bug with new release candidate 6.

My AMD machine gets flooded with this message, after 1 minute my journal log is 15mb long. Journalclt runs cpu:100% 


First I thought it was due my kernel line parameter acpi_osi='!Windows 2012' but removing this acpi_osi didn't solve the issue so it's unrelated.


GPU Card:    AMD A8-4500M APU with Radeon(tm) HD Graphics
GPU driver:    Open AMD
Kernel:    4.5rc6

I can't look deeper into the issue cause journalctl can't pick up the messages from kernel.

For example
...
Mar 09 19:20:17 NORMANDY kernel: ACPI Error: No handler or method for GPE 12, disabling event (20160108/evgpe-790)
Mar 09 19:20:17 NORMANDY systemd-journald[4935]: Missed 52 kernel messages

... to infinity

THere is full useless 15Mb long journal https://gist.githubusercontent.com/anonymous/f3b2fd41516a038fdc4b/raw/db5633b3d9973671d717f7860c05dd0f30df3c1b/gistfile1.txt it's no use, it's full of the acpi error.
Comment 1 Len Brown 2016-03-14 23:20:13 UTC
Did this work properly with previous kernels?
If yes, what is the latest kernel that works,
and the earlier kernel that fails?
Comment 2 Len Brown 2016-03-14 23:22:06 UTC
also, please confirm the .config for the working and failing kernels
are the same.
Comment 3 Lv Zheng 2016-03-14 23:31:15 UTC
It's weird.
ACPICA should be able to automatically disable a GPE if it is not handled by the AML code.
Please upload the acpidump for confirmation.

Thanks
-Lv
Comment 4 Antonín Dach 2016-03-15 00:38:43 UTC
(In reply to Len Brown from comment #1)
> Did this work properly with previous kernels?
> If yes, what is the latest kernel that works,
> and the earlier kernel that fails?

I hope I didn't cased any annoyance I use the MANJARO kernel that are identical to Arch and they are pretty vanilla.
Everything is working fine with the linux4.4.5

Should I proceed with building vanilla kernel45 or just try the same config build on arch kernel? 
I will be trying the released kernel45 soon enough and when that fails I will try to build it myself.


(In reply to Lv Zheng from comment #3)
> It's weird.
> ACPICA should be able to automatically disable a GPE if it is not handled by
> the AML code.
> Please upload the acpidump for confirmation.
> 
> Thanks
> -Lv

Hi I am posting tared acpitables via gdrive link for (I don't know if raw tables would be enough) https://drive.google.com/file/d/0B0PFSVlqeNCmMG1abjhvbVB1d1U/view?usp=sharing I did it twice once for kernel44 and for kernel45rc7 that causes the GME error.
Comment 5 tux-86 2016-03-15 02:27:09 UTC
I have same problem with the latest 4.5.0 kernel on Siduction

Hardware: AMD A10-7300 Radeon R6 M255DX
Comment 6 John Hertzog 2016-03-16 04:35:26 UTC
I have this issue as well. I'm running an AMD A10-4600m with internal graphics on Arch.
Comment 7 John Hertzog 2016-03-16 19:26:03 UTC
It appears that for me, this issue first starts occuring in 4.5 rc1. In 4.4.5, everything works as expected. These kernels were compiled with the same standard Arch Linux config (4.4.5, 4.5, and 4.5rc1).
Comment 8 Lv Zheng 2016-03-17 02:33:50 UTC
(In reply to Antonín Dach from comment #4)
> (In reply to Len Brown from comment #1)
> > Did this work properly with previous kernels?
> > If yes, what is the latest kernel that works,
> > and the earlier kernel that fails?
> 
> I hope I didn't cased any annoyance I use the MANJARO kernel that are
> identical to Arch and they are pretty vanilla.
> Everything is working fine with the linux4.4.5
> 
> Should I proceed with building vanilla kernel45 or just try the same config
> build on arch kernel? 
> I will be trying the released kernel45 soon enough and when that fails I
> will try to build it myself.
> 
> 
> (In reply to Lv Zheng from comment #3)
> > It's weird.
> > ACPICA should be able to automatically disable a GPE if it is not handled
> by
> > the AML code.
> > Please upload the acpidump for confirmation.
> > 
> > Thanks
> > -Lv
> 
> Hi I am posting tared acpitables via gdrive link for (I don't know if raw
> tables would be enough)
> https://drive.google.com/file/d/0B0PFSVlqeNCmMG1abjhvbVB1d1U/
> view?usp=sharing I did it twice once for kernel44 and for kernel45rc7 that
> causes the GME error.

No, I also need FADT which is not in the post.
Please help to confirm if this is the same bug:
https://bugzilla.kernel.org/show_bug.cgi?id=114811
If so, I can use acpidump posted on that bug entry.

Thanks
-Lv
Comment 9 Antonín Dach 2016-03-17 08:12:30 UTC
Created attachment 209621 [details]
Acpidump from working kernel 4.4
Comment 10 Antonín Dach 2016-03-17 08:17:10 UTC
(In reply to Lv Zheng from comment #8)
> 
> No, I also need FADT which is not in the post.
> Please help to confirm if this is the same bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=114811
> If so, I can use acpidump posted on that bug entry.
> 
> Thanks
> -Lv

Hi

Yes it is the same issue so confirmed (I can't boot either without quiet parameter), I uploaded my FADT as well from booted kernel44.

I hope that information will help. :)
Comment 11 Jan Kreuzer 2016-03-20 07:44:55 UTC
Created attachment 209971 [details]
ACPI Dump 4.4.3 working

Same Problem here, Distribution: OpenSuse Tumbleweed, last working (Distro) Kernel 4.4.3-1-default, not working kernel 4.5.0-1-default.
Laptop HP-DV6-6110 AMD-A6 3410mx. ACPI-Dump from working kernel attached, can only boot with acpi=off
Comment 12 Lv Zheng 2016-03-21 07:46:30 UTC
It looks that the gpe auto-disabling mechanism has lock issue now.

Originally, the GPE_RAW_HANDLER enabling patch contains very complicated code so that the GPE lock is only unlocked before invoking the callback.

But the patch is simplied again and again and forms such a commit that the auto-disabling mechanism is now not protected by the GPE lock.

Let me generate a fix for you to see if it can be avoided.

Thanks
-Lv
Comment 13 Lv Zheng 2016-03-22 05:37:15 UTC
(In reply to Lv Zheng from comment #12)
> It looks that the gpe auto-disabling mechanism has lock issue now.
> 
> Originally, the GPE_RAW_HANDLER enabling patch contains very complicated
> code so that the GPE lock is only unlocked before invoking the callback.
> 
> But the patch is simplied again and again and forms such a commit that the
> auto-disabling mechanism is now not protected by the GPE lock.
> 
> Let me generate a fix for you to see if it can be avoided.

I didn't see lock issue in the GPE handling code.

I de-compiled the dsdt, trying to figure out what the reported error GPEs are:

A. via _Lxx/_Exx
Line 680:         Method (_L1C, 0, NotSerialized)  // _Lxx: Level-Triggered GPE
Line 690:         Method (_L08, 0, NotSerialized)  // _Lxx: Level-Triggered GPE
Line 712:         Method (_L05, 0, NotSerialized)  // _Lxx: Level-Triggered GPE
Line 728:         Method (_L18, 0, NotSerialized)  // _Lxx: Level-Triggered GPE
Line 742:         Method (_L10, 0, NotSerialized)  // _Lxx: Level-Triggered GPE
B. via _PRW
    Method (GPRW, 2, NotSerialized)
    {
        Store (Arg0, Index (PRWP, Zero))
        Store (Arg1, Index (PRWP, One))
        If (LAnd (LEqual (DAS3, Zero), LEqual (DAS1, Zero)))
        {
            If (LLessEqual (Arg1, 0x03))
            {
                Store (Zero, Index (PRWP, One))
            }
        }
        Else
        {
            If (LAnd (LEqual (DAS3, Zero), LEqual (Arg1, 0x03)))
            {
                Store (Zero, Index (PRWP, One))
            }

            If (LAnd (LEqual (DAS1, Zero), LEqual (Arg1, One)))
            {
                Store (Zero, Index (PRWP, One))
            }
        }

        Return (PRWP)
    }

Line 3588:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    If (LEqual (WKPM, One))
                    {
                        Return (GPRW (0x08, 0x05))
                    }
                    Else
                    {
                        Return (GPRW (0x08, Zero))
                    }
                }
Line 4621:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    If (LEqual (WKPM, One))
                    {
                        Return (GPRW (0x08, 0x04))
                    }
                    Else
                    {
                        Return (GPRW (0x08, Zero))
                    }
                }
Line 5628:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    If (LEqual (WKPM, One))
                    {
                        Return (GPRW (0x08, 0x05))
                    }
                    Else
                    {
                        Return (GPRW (0x08, Zero))
                    }
                }
Line 5956:                     Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                    Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                    {
                        0x08, 
                        0x05
                    })
Line 5971:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    If (LEqual (WKPM, One))
                    {
                        Return (GPRW (0x08, 0x05))
                    }
                    Else
                    {
                        Return (GPRW (0x08, Zero))
                    }
                }
Line 6099:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    If (LEqual (WKPM, One))
                    {
                        Return (GPRW (0x08, 0x05))
                    }
                    Else
                    {
                        Return (GPRW (0x08, Zero))
                    }
                }
Line 6430:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    If (LEqual (WKPM, One))
                    {
                        Return (GPRW (0x08, 0x05))
                    }
                    Else
                    {
                        Return (GPRW (0x08, Zero))
                    }
                }
Line 6763:                 Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                {
                    0x08, 
                    0x04
                })
Line 6870:                 Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                {
                    0x08, 
                    0x04
                })
Line 6977:                 Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                {
                    0x08, 
                    0x04
                })
Line 7084:                 Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                {
                    0x08, 
                    0x04
                })
Line 7243:                 Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
                {
                    0x1A, 
                    0x04
                })
Line 7259:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 7498:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 7737:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 7934:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 7964:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 8198:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 8432:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 8613:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 8710:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    Return (GPRW (0x18, 0x03))
                }
Line 10612:                 Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
                {
                    If (LEqual (WKPM, One))
                    {
                        Return (GPRW (0x04, 0x05))
                    }
                    Else
                    {
                        Return (GPRW (0x04, Zero))
                    }
                }
C. via _GPE
Line 9902:                     Name (_GPE, 0x16)  // _GPE: General Purpose Events

Most of the errors are against unknown GPEs.
So ACPICA in fact won't even control those status bits, they must be controlled by some external IO driver.

Please first do the following test:
1. re-configure the kernel and enable CONFIG_ACPI_DEBUG;
2. re-compile the kernel and boot it with "acpi.debug_layer=0x00000004, acpi.debug_level=0x08000000";
3. save the dmesg output for the booted kernel and upload here.

So that we can see what's happening to these GPEs' enabling bits.

Thanks and best regards
-Lv
Comment 14 Lv Zheng 2016-03-22 06:03:59 UTC
I checked the error log.
It seems there isn't a single line error log for the known GPEs:
_PRW: 0x04, 0x08, 0x18, 0x1A
_GPE: 0x16
_Lxx: 0x05, 0x08, 0x10, 0x18, 0x1C

And for the unknown GPEs, ACPI subsystem will only set its enable bits to 0, while the errors that are kept on logged because of "enable bit=1".
This looks like a non-ACPI bug to me as ACPI subsystem won't set enable bit to 1 for the unknown GPEs.
This should be caused by some external drivers, for example GPIO drivers.

Thanks and best regards
-Lv
Comment 15 Jan Kreuzer 2016-03-22 16:45:10 UTC
Created attachment 210241 [details]
dmesg acpi debug working kernel 4.4.3

Booting 4.5 with acpi debug not possible as the system never reaches a state where i can log in.

Jan Kreuzer
Comment 16 Antonín Dach 2016-03-22 16:52:13 UTC
(In reply to Jan Kreuzer from comment #15)
> Created attachment 210241 [details]
> dmesg acpi debug working kernel 4.4.3
> 
> Booting 4.5 with acpi debug not possible as the system never reaches a state
> where i can log in.
> 
> Jan Kreuzer

I think you will be able to get in if you pass "quiet" in kernel command line during boot. In /etc/default/grub you'll find line similar to

GRUB_CMDLINE_LINUX_DEFAULT="resume=UUID=a3645b4d-45b3-4541-afda-ce148f11ae37" and by adding few commands, you will get in.

Here is mine, 

GRUB_CMDLINE_LINUX_DEFAULT="resume=UUID=a3645b4d-45b3-4541-afda-ce148f11ae37 quiet splash udev.log-priority=3 usbhid.mousepoll=2 acpi_osi='!Windows 2012'"

//"usbhid.mousepoll=2 acpi_osi='!Windows 2012'" - these are for mouse and brightness key

After editing, run sudo update-grub and you will be able to boot, since the message is hidden.

I will be posting my outputs in a few minutes, hopes.
Comment 17 Antonín Dach 2016-03-22 22:29:55 UTC
Created attachment 210271 [details]
dmesg with acpi debugging enabled

Hi, I did what you have been asking for but the dmesg message is infinite and filled with the GPE error, and it didn't even start from 0 :/, in 2minutes it generated 1 Gb of output so I am pastin the first lines, the rest is the same repeating, anyone using SSD should be warned to use this kind of debugging.
Comment 18 Lv Zheng 2016-03-23 08:20:13 UTC
(In reply to Antonín Dach from comment #17)
> Created attachment 210271 [details]
> dmesg with acpi debugging enabled
> 
> Hi, I did what you have been asking for but the dmesg message is infinite
> and filled with the GPE error, and it didn't even start from 0 :/, in
> 2minutes it generated 1 Gb of output so I am pastin the first lines, the
> rest is the same repeating, anyone using SSD should be warned to use this
> kind of debugging.

That's enough.
I got this:
Status=FF, Enable=FF

It seems the GPE status register and GPE enable register are not working at all.

Thanks
-Lv
Comment 19 Lv Zheng 2016-03-23 08:46:30 UTC
The GPE registers on this platform:

[0DCh 0220  12]                   GPE0 Block : [Generic Address Structure]
[0DCh 0220   1]                     Space ID : 01 [SystemIO]
[0DDh 0221   1]                    Bit Width : 40
[0DEh 0222   1]                   Bit Offset : 00
[0DFh 0223   1]         Encoded Access Width : 04 [QWord Access:64]
[0E0h 0224   8]                      Address : 0000000000000420

[0E8h 0232  12]                   GPE1 Block : [Generic Address Structure]
[0E8h 0232   1]                     Space ID : 00 [SystemMemory]
[0E9h 0233   1]                    Bit Width : 00
[0EAh 0234   1]                   Bit Offset : 00
[0EBh 0235   1]         Encoded Access Width : 00 [Undefined/Legacy]
[0ECh 0236   8]                      Address : 0000000000000000

Can someone upload a boot dmesg on the failed system by booting the kernel without acpi.debug_xxx parameters.

Thanks
-Lv
Comment 20 Antonín Dach 2016-03-23 10:31:17 UTC
(In reply to Lv Zheng from comment #19)

> Can someone upload a boot dmesg on the failed system by booting the kernel
> without acpi.debug_xxx parameters.
> 
> Thanks
> -Lv

Hi, what do you mean? if I don't pass acpi.debug_xxx parameters the dmesg is similarly filled with gpe messages without the lines that say:

evgpe-0423 ev_gpe_detect         : Read registers for GPE 18-1F: Status=FF, Enable=FF, RunEnable=10, WakeEnable=00

It doesn't matter if CONFIG_ACPI_DEBUG is enabled or not, dmesg is the same.

And if you by 'failing' meant non booting here is my cam record, it's low res as I don't have a good camera: 
youtube.com/watch?v=h5J8qbbzsJE

So how exactly we are suppose to get you the dmesg? I have no clue at this point.
Comment 21 Lv Zheng 2016-03-24 02:10:07 UTC
I just want to check if there is any resource conflict caused the GPE port to be reconfigured by the pcibios module.

We have similar issue reported on other platforms.
The ACPI reset/sleep registers are reconfigured, which caused no-op writes to the ACPI reported ones and the system couldn't be shut down.

I'll post a patch to remove the ACPICA error messages to obtain the minimal boot log without the flooding ACPI errors.

Thanks and best regards
-Lv
Comment 22 Lv Zheng 2016-03-24 03:33:00 UTC
Created attachment 210521 [details]
[PATCH] ACPI: Add configuration item to configure ACPICA error logs out

debugging facility to disable error logs
Comment 23 Lv Zheng 2016-03-24 03:34:01 UTC
After applying the attachment 210521 [details], you should be able to configure CONFIG_ACPI_NO_ERROR_MESSAGES=y to discard the flooding ACPICA errors.

Thanks
-Lv
Comment 24 Antonín Dach 2016-03-24 11:34:22 UTC
Created attachment 210561 [details]
kernel45_dmesg_without_acpi_debugging

(In reply to Lv Zheng from comment #23)
> After applying the attachment 210521 [details], you should be able to
> configure CONFIG_ACPI_NO_ERROR_MESSAGES=y to discard the flooding ACPICA
> errors.
> 
> Thanks
> -Lv

Hi, I got you dmesg on patched kernel45 in following scenario,

CONFIG_ACPI_NO_ERROR_MESSAGES=y
# CONFIG_ACPI_DEBUG is not set

and kernel cmdline is without "acpi.debug_layer=0x00000004, acpi.debug_level=0x08000000"

I hope this helps :)
Comment 25 Antonín Dach 2016-03-24 11:36:07 UTC
Created attachment 210571 [details]
kernel45_dmesg_without_acpi_debugging (correct)

Sorry missed upload
Comment 26 Jan Kreuzer 2016-03-27 06:51:50 UTC
Created attachment 210791 [details]
dmesg 4.5.0 without acpi debug

My dmesg. I dont see anything strange in it, however after about 1 minute after startup a kworker process starts taking all cpu time and the system freezes.

Greetings 
Jan Kreuzer
Comment 27 Lv Zheng 2016-03-28 05:40:49 UTC
(In reply to Jan Kreuzer from comment #26)
> Created attachment 210791 [details]
> dmesg 4.5.0 without acpi debug
> 
> My dmesg. I dont see anything strange in it,
Yes, it seems nothing is strange in the both boot logs.

> however after about 1 minute
> after startup a kworker process starts taking all cpu time and the system
> freezes.
This is caused by the bug.

Maybe we can use debugging patch to address when this happens (after which initialization step, the gpe register response starts to got wrong).

Thanks and best regards
-Lv
Comment 28 Lv Zheng 2016-03-30 07:49:48 UTC
Created attachment 211051 [details]
[PATCH] Events: Mute debugging messages when bug is hit
Comment 29 Lv Zheng 2016-03-30 07:57:10 UTC
Let's first check if the GPE registers have worked during the boot.

Please:

1. apply the following patches:
attachment 210521 [details]
attachment 211051 [details]

2. configure and build the kernel:
Disable error logging by configure CONFIG_ACPI_NO_ERROR_MESSAGES=y.
Enable debugging messages by configure CONFIG_ACPI_DEBUG=y

3. boot the built kernel with the following boot parameters:
initcall_debug acpi.debug_layer=0x00000004, acpi.debug_level=0x08000000

4. save the dmesg output and upload here.

Thanks
-Lv
Comment 30 Antonín Dach 2016-03-30 11:29:11 UTC
Created attachment 211071 [details]
do GPE registers work during boot? here's dmesg

(In reply to Lv Zheng from comment #29)
> Let's first check if the GPE registers have worked during the boot.
> 

Hi, I did what you asked, hope it helps

-AD
Comment 31 Lv Zheng 2016-03-31 03:18:29 UTC
From the following lines:
[   18.491505]   evevent-0219 ev_fixed_event_detect : Fixed Event Block: Enable 00000120 Status 00000010
[   18.491513]     evgpe-0423 ev_gpe_detect         : Read registers for GPE 00-07: Status=08, Enable=68, RunEnable=68, WakeEnable=00
[   18.491527]     evgpe-0423 ev_gpe_detect         : Read registers for GPE 08-0F: Status=00, Enable=02, RunEnable=02, WakeEnable=00
[   18.491534]     evgpe-0423 ev_gpe_detect         : Read registers for GPE 10-17: Status=00, Enable=41, RunEnable=41, WakeEnable=00
[   18.491541]     evgpe-0423 ev_gpe_detect         : Read registers for GPE 18-1F: Status=00, Enable=10, RunEnable=10, WakeEnable=00

GPE registers seem to be working until this point.

[   18.515488] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
[   18.515605] sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x780b, Revision ID: 0x16
[   18.515699] sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
[   18.515710] sp5100_tco: Last reboot was not triggered by watchdog.
[   18.515782] sp5100_tco: initialized (0xffffc90000f0cb00). heartbeat=60 sec (nowayout=0)
[   18.531097] Non-volatile memory driver v1.3
[   18.547789]   evevent-0219 ev_fixed_event_detect : Fixed Event Block: Enable 00000120 Status 00000010
[   18.547802]     evgpe-0423 ev_gpe_detect         : Read registers for GPE 00-07: Status=FF, Enable=FF, RunEnable=68, WakeEnable=00
[   18.547809]     evgpe-0427 ev_gpe_detect         : GPE BUG Caught!

And stops working after sp5100_tco initialization.

There is also a progress on another same bug (Bug 114811).
The following commit was bisected to be the culprit:
commit bdecfcdb5461834aab24002bb18d3cbdd907b7fb
Author: Huang Rui <ray.huang@amd.com>
Date:   Mon Nov 23 18:07:35 2015 +0800

    sp5100_tco: fix the device check for SB800 and later chipsets
   
    For SB800 and later chipsets, the register definitions are the same
    with SB800. And for SB700 and older chipsets, the definitions should
    be same with SP5100/SB7x0.

What I can help seems to end here, please contact the developer to fix this issue.

Thanks and best regards
-Lv
Comment 32 Lv Zheng 2016-03-31 03:23:55 UTC
Ping Huang Rui.
Comment 33 Lv Zheng 2016-03-31 05:24:14 UTC
It is likely that the culprit commit changed watchdog register addresses, causing false writes of the ACPI_DISABLE value to the SMI_CMD register and disabled ACPI hardware.
Comment 34 towo 2016-03-31 15:53:20 UTC
I had the same issue since linux 4.5.
After your post about sp5100_tco i have seen, that this module
was not loaded until linux 4.5.
Now i have blaklisted sp5100_tco and linux 4.6 works like linux 4.4.x before,
no log-floods about GPA anymore.

My Hardware is

towo@Equinox:~$ inxi -v2
System:    Host: Equinox Kernel: 4.6.0-rc1-siduction-amd64 x86_64 (64 bit) Console: tty 0
           Distro: siduction 15.1.0 White Room - xfce - (201601260255)
Machine:   System: LENOVO product: 80EC v: Lenovo Z50-75
           Mobo: LENOVO model: Lancer 5B3 v: 31900058WIN Bios: LENOVO v: A4CN40WW (V 2.09) date: 08/24/2015
CPU:       Quad core AMD FX-7500 Radeon R7 10 Compute Cores 4C+6G (-MCP-) speed/max: 1300/2100 MHz
Graphics:  Card-1: Advanced Micro Devices [AMD/ATI] Kaveri [Radeon R6/R7 Graphics]
           Card-2: Advanced Micro Devices [AMD/ATI] Jet PRO [Radeon R5 M230]
           Display Server: N/A drivers: ati (unloaded: fbdev,vesa,radeon)
           tty size: 238x34 Advanced Data: N/A out of X
Network:   Card-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller driver: r8169
           Card-2: Realtek RTL8723BE PCIe Wireless Network Adapter
Drives:    HDD Total Size: 120.0GB (6.6% used) ID-1: model: Samsung_SSD_850
Info:      Processes: 138 Uptime: 2 min Memory: 204.7/6912.8MB Init: systemd runlevel: 5
           Client: Shell (bash) inxi: 2.2.33
Comment 35 Alexander Konotop 2016-04-01 12:19:55 UTC
Blacklisting of sp5100_tco also helped for me
Comment 36 Lv Zheng 2016-04-05 05:13:05 UTC
*** Bug 114681 has been marked as a duplicate of this bug. ***
Comment 37 Lv Zheng 2016-04-05 05:14:49 UTC
*** Bug 114811 has been marked as a duplicate of this bug. ***
Comment 38 Huang Rui 2016-04-06 08:23:19 UTC
Hi Lv Zheng,

I just come back from a vacation. Apology to late.
This driver is written for very old platforms since SB800 about 2010... I am trying to find HW person to confirm this interface.
Does it work if you tried to revert this patch?

Thanks,
Rui
Comment 39 Lv Zheng 2016-04-07 02:48:44 UTC
(In reply to Huang Rui from comment #38)
> Hi Lv Zheng,
> 
> I just come back from a vacation. Apology to late.
> This driver is written for very old platforms since SB800 about 2010... I am
> trying to find HW person to confirm this interface.
> Does it work if you tried to revert this patch?

I think you should ask the watchdog driver maintainer.
I took a look at this bug and helped debugging because I was the owner of the ACPICA GPE bugs.
But the watchdog driver actually isn't maintained by us, so possibly you should ask the watchdog driver maintainer and submit fix through his tree.

Thanks and best regards
-Lv
Comment 40 Sergej Starikov 2016-04-16 16:00:38 UTC
(In reply to Huang Rui from comment #38)
> Hi Lv Zheng,
> 
> I just come back from a vacation. Apology to late.
> This driver is written for very old platforms since SB800 about 2010... I am
> trying to find HW person to confirm this interface.
> Does it work if you tried to revert this patch?
> 
> Thanks,
> Rui

Reverting the patch on top of 4.5 fixes the ACPI Error spamming, the sp5100_tco module loads and the output is this:

 sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
 sp5100_tco: PCI Revision ID: 0x3a
 sp5100_tco: failed to find MMIO address, giving up.


Without reverting the patch on 4.5 i get the following sp5100_tco output:

 sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver v0.05
 sp5100_tco: PCI Vendor ID: 0x1022, Device ID: 0x780b, Revision ID: 0x3a
 sp5100_tco: Using 0xfed80b00 for watchdog MMIO address
 sp5100_tco: Last reboot was not triggered by watchdog.
 sp5100_tco: initialized (0xffffc9000001eb00). heartbeat=60 sec (nowayout=0)
 ACPI Error: No handler or method for GPE 00, disabling event (20160108/evgpe-790)
 ... lots of ACPI errors ...

Blacklisting the module also fixes the error spamming for me.
My Device 0x780b (PCI_DEVICE_ID_AMD_HUDSON2_SMBUS) is from about 2013 (Lenovo Edge E145 with AMD A4-5000 APU with Radeon(TM) HD Graphics).
Comment 41 moby@pcsn.net 2016-04-30 12:42:16 UTC
I can confirm the same.  Toshiba Satellite L775D laptop running OpenSuSE Tumbelweed.  Machine gets severely slow with kernel 4.5.X.  Log file gets flooded with messages described above.  Blacklisting sp5100_tco resolves the issue with no side-effects so far.
Comment 42 Martin Bohun 2016-05-01 05:54:00 UTC
Same problem here. acer ASPIRE 5560 laptop (AMD Dual-Core Processor A4-3305M) running openSUSE Tumbleweed kernel 4.5.0.
Adding modprobe.blacklist=sp5100_tco to kernel boot args solved the problem.
Comment 43 Lukasz 2016-05-02 14:47:33 UTC
It's my first post, so hi everyone.

Thank you for trying to fix this. I have similar problem on Fedora 24 - https://bugzilla.redhat.com/show_bug.cgi?id=1329910 - and I'm also on kernel 4.5.1 and AMD cpu with integrated graphics.
Comment 44 Lucas Stach 2016-05-03 17:36:28 UTC
My system with AMD A10-5750M APU exhibited the same symptoms. I've posted a patch to fix this and hope it get's accepted for inclusion in 4.6 still, so it can be applied to the 4.5 stable tree in a timely manner.

For reference, the patch can be found here:
http://www.spinics.net/lists/linux-watchdog/msg09165.html
Comment 45 Luya Tshimbalanga 2016-05-12 17:36:45 UTC
*** Bug 118151 has been marked as a duplicate of this bug. ***
Comment 46 Stephen Bell 2016-05-19 14:23:02 UTC
This also effects me on a HP 11-e012au with an AMD A6-1450 on Arch. Stuck on 4.4 as everything after just spams the log, eating up CPU till it crashes. If anyone needs more info, etc, glad to offer it if I can. Not sure what other info is needed though.
Comment 47 Eugene A. Shatokhin 2016-05-19 14:45:54 UTC
(In reply to Stephen from comment #46)
> This also effects me on a HP 11-e012au with an AMD A6-1450 on Arch. Stuck on
> 4.4 as everything after just spams the log, eating up CPU till it crashes.
> If anyone needs more info, etc, glad to offer it if I can. Not sure what
> other info is needed though.

The patch from https://bugzilla.kernel.org/show_bug.cgi?id=114201#c44 helped in my case. Could you try it on your system?
Comment 48 Stephen Bell 2016-05-19 20:19:45 UTC
I have never built a kernel before, lets see if I can do it right...

On Arch, so I just added "amd.patch" to the ABS pkgbuild and ran it. The patch seemed to work, got "patching file drivers/watchdog/sp5100_tco.c". Everything finished, I install, and reboot. 

No flooding the journal, no CPU being eaten by journald, everything seems to work. Great, the patch seems to works. No issues with it now so far, and at least not with my A6-1450.
Comment 49 towo 2016-05-23 11:31:43 UTC
Patch from comment 44 works nice in kernel 4.5 and 4.6, thanks at all for that patch.
Comment 50 kruykaze 2016-06-30 00:07:15 UTC
Has this patch been merged? I've been holding the kernel back
Comment 51 towo 2016-06-30 04:47:33 UTC
Yes, the patch was merged.
Comment 52 Antonín Dach 2018-06-27 14:16:43 UTC
Already merged, it's working closing.