Bug 180111

Summary: [regression] Auto serialize mechanism is disabled
Product: ACPI Reporter: Lv Zheng (lv.zheng)
Component: ACPICA-CoreAssignee: Lv Zheng (lv.zheng)
Status: CLOSED CODE_FIX    
Severity: normal CC: rui.zhang, viktor_jaegerskuepper, yu.c.chen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.9-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: bootup log
reboot log
suspend log
acpidump output
dmesg output for boot with kernel 4.10
acpidump output

Description Lv Zheng 2016-10-24 07:29:19 UTC
From: Deak, Imre <imre.deak@intel.com>

after upgrading to 4.9-rc1 I get the following errors occasionally
during boot and suspend-to-ram on an APL system:

[   59.513434] ACPI Error: [TMP_] Namespace lookup failure, AE_ALREADY_EXISTS (20160831/dswload2-330)
[   59.513446] ACPI Exception: AE_ALREADY_EXISTS, During name lookup/catalog (20160831/psobject-227)
[   59.513469] ACPI Error: Method parse/execution failed [\_SB.PCI0.SCPG] (Node ffff8802770b7bb8), AE_ALREADY_EXISTS (20160831/psparse-543)
[   59.515415] ACPI Error: Method parse/execution failed [\_SB.PCI0.SDHA._PS3] (Node ffff8802770b7dc0), AE_ALREADY_EXISTS (20160831/psparse-543)
[   59.517118] ACPI: Marking method _PS3 as Serialized because of AE_ALREADY_EXISTS error

Other than these error messages, booting and suspend/resume seems to work ok.
Bisecting points to
commit 74f51b80a0c4ff84fbeb7f12ea43ce66934d29aa
Author: Lv Zheng <lv.zheng@intel.com>
Date:   Wed Sep 7 14:07:10 2016 +0800

    ACPICA: Namespace: Fix dynamic table loading issues

Reverting this on top of 4.9-rc1 gets rid of the error messages.
Comment 1 Lv Zheng 2016-10-24 07:30:41 UTC
Created attachment 242441 [details]
bootup log
Comment 2 Lv Zheng 2016-10-24 07:30:55 UTC
Created attachment 242451 [details]
reboot log
Comment 3 Lv Zheng 2016-10-24 07:31:09 UTC
Created attachment 242461 [details]
suspend log
Comment 4 Lv Zheng 2016-10-24 07:31:31 UTC
Created attachment 242471 [details]
acpidump output
Comment 5 Lv Zheng 2016-10-24 07:32:03 UTC
From: Imre Deak

I attached the dmesg containing the error during boot (happened 1 out
of 200 boots), during reboot (20 out of 200 reboots), during suspend
(happened all the time).

I attached the acpidump output.
Comment 6 Lv Zheng 2016-10-24 07:34:09 UTC
From: Lv Zheng

I just checked the error and the related code.

1. What the error is
The error message is generated by an old method auto-serial mechanism.
However this mechanism should have been disabled by new auto-serial approach.

See: the error is generated by the code in drivers/acpi/acpica/psparse.c:
			if ((status == AE_ALREADY_EXISTS) &&
			    (!(walk_state->method_desc->method.
			       info_flags & ACPI_METHOD_SERIALIZED))) {
				/*
				 * Method is not serialized and tried to create an object
				 * twice. The probable cause is that the method cannot
				 * handle reentrancy. Mark as "pending serialized" now, and
				 * then mark "serialized" when the last thread exits.
				 */
				walk_state->method_desc->method.info_flags |=
				    ACPI_METHOD_SERIALIZED_PENDING;
			}
It can only happen when ACPI_METHOD_SERIALIZED flag is not set.
However, in the upstream kernel, this flag is set by the code in drivers/acpi/acpica/dsinit.c:
		if (obj_desc->method.info_flags & ACPI_METHOD_SERIALIZED) {
			info->serial_method_count++;
			break;
		}

		if (acpi_gbl_auto_serialize_methods) {

			/* Parse/scan method and serialize it if necessary */

			acpi_ds_auto_serialize_method(node, obj_desc);
			if (obj_desc->method.
			    info_flags & ACPI_METHOD_SERIALIZED) {

				/* Method was just converted to Serialized */

				info->serial_method_count++;
				info->serialized_method_count++;
				break;
			}
		}
Which is invoked by acpi_ds_initialize_objects(), always invoked in acpi_ns_load_table().
Unless acpi_gbl_auto_serialize_methods = false or acpi_ns_parse_table() failed, the flag won't be remained unset.

The error message is generated by old auto-serialize implementation.
And acpi_gbl_auto_serialize_methods is meant to mute the warnings.

Both mechanism should work, thus this error message is not fatal.
It just makes ACPICA to fall back to use old auto-serial method.
So the issue is not urgent.

2. Why the error message is caused
I couldn't see whether the bisected commit can be related to this feature.
It seems we need your help to debug what the problem is.

First, please try the mentioned branch here:
https://github.com/zetalog/linux
acpica-lock branch.
Comment 7 Lv Zheng 2016-10-24 07:36:25 UTC
I also commented the bisected commit here:
https://github.com/acpica/acpica/commit/767ee53
Comments are against each change blocks.
Comment 8 Lv Zheng 2016-10-24 08:20:52 UTC
The error in boot is:
[    8.814443] ACPI Error: [PMCS] Namespace lookup failure, AE_ALREADY_EXISTS (20160831/dswload2-330)
[    8.814460] ACPI Exception: AE_ALREADY_EXISTS, During name lookup/catalog (20160831/psobject-227)
[    8.814477] ACPI Error: Method parse/execution failed [\_SB.PCI0.LPD3] (Node ffff8802770e0b40), AE_ALREADY_EXISTS (20160831/psparse-543)
[    8.814499] ACPI Error: Method parse/execution failed [\_SB.PCI0.SPI3._PS3] (Node ffff8802770e2370), AE_ALREADY_EXISTS (20160831/psparse-543)
[    8.814514] ACPI: Marking method _PS3 as Serialized because of AE_ALREADY_EXISTS error

The error in reboot is:
[   48.841801] ACPI Error: [TMP_] Namespace lookup failure, AE_ALREADY_EXISTS (20160831/dswload2-330)
[   48.854102] ACPI Exception: AE_ALREADY_EXISTS, During name lookup/catalog (20160831/psobject-227)
[   48.866259] ACPI Error: Method parse/execution failed [\_SB.PCI0.SCPG] (Node ffff8802770b7bb8), AE_ALREADY_EXISTS (20160831/psparse-543)
[   48.882283] ACPI Error: Method parse/execution failed [\_SB.PCI0.SDIO._PS0] (Node ffff8802770b92a8), AE_ALREADY_EXISTS (20160831/psparse-543)
[   48.898797] ACPI: Marking method _PS0 as Serialized because of AE_ALREADY_EXISTS error

The error in suspend is:
[   50.883605] ACPI Error: [TMP_] Namespace lookup failure, AE_ALREADY_EXISTS (20160831/dswload2-330)
[   50.883618] ACPI Exception: AE_ALREADY_EXISTS, During name lookup/catalog (20160831/psobject-227)
[   50.883640] ACPI Error: Method parse/execution failed [\_SB.PCI0.SCPG] (Node ffff8802770b7bb8), AE_ALREADY_EXISTS (20160831/psparse-543)
[   50.885213] ACPI Error: Method parse/execution failed [\_SB.PCI0.SDHA._PS3] (Node ffff8802770b7dc0), AE_ALREADY_EXISTS (20160831/psparse-543)
[   50.885684] intel-lpss 0000:00:18.1: power state changed by ACPI to D3cold
[   50.885885] ACPI: Marking method _PS3 as Serialized because of AE_ALREADY_EXISTS error
...
[   53.882933] ACPI Error: Method parse/execution failed [   53.888608] [\_SB.PCI0.SCPG] (Node ffff8802770b7bb8), AE_ALREADY_EXISTS[   53.895591] intel-lpss 0000:00:17.1: power state changed by ACPI to D3cold
[   53.895800] intel-lpss 0000:00:17.2: power state changed by ACPI to D3cold
[   53.895984] intel-lpss 0000:00:17.3: power state changed by ACPI to D3cold
[   53.896168] intel-lpss 0000:00:18.0: power state changed by ACPI to D3cold
[   53.896348] intel-lpss 0000:00:18.1: power state changed by ACPI to D3cold
[   53.896526] intel-lpss 0000:00:18.3: power state changed by ACPI to D3cold
[   53.896709] intel-lpss 0000:00:16.0: power state changed by ACPI to D3cold
[   53.896888] intel-lpss 0000:00:16.1: power state changed by ACPI to D3cold
[   53.897063] intel-lpss 0000:00:16.2: power state changed by ACPI to D3cold
[   53.965545]  (20160831/psparse-543)
[   53.969500] ACPI Error: [   53.972164] Method parse/execution failed [\_SB.PCI0.SDC._PS0] (Node ffff8802770b9668), AE_ALREADY_EXISTS (20160831/psparse-543)
[   53.985245] ACPI: 
Marking method _PS0 as Serialized because of AE_ALREADY_EXISTS error

Both PMCS and TMP are actually in the methods that should be set to Serialized by default:
ssdt3.dsl:
        Method (LPD3, 0, Serialized)
        {
            OperationRegion (PMCS, PCI_Config, 0x84, 0x04)
            Field (PMCS, WordAcc, NoLock, Preserve)
            {
                PMSR,   32
            }

            Local0 = PMSR /* \_SB_.PCI0.LPD3.PMSR */
            Local0 &= One
        }
dsdt.dsl:
            Method (SCPG, 2, Serialized)
            {
                Name (TMP, Zero)
                If (Arg0 == One)
                {
                    TMP = GENR /* \_SB_.PCI0.GENR */
                    GENR = (TMP | Arg1)
                }
                ElseIf (Arg0 == Zero)
                {
                    TMP = GENR /* \_SB_.PCI0.GENR */
                    GENR = (TMP & Arg1)
                }
            }
Problem is triggered by parallel running _PS3/_PS0:
ssdt3.dsl, \_SB.PCI0.SPI3._PS3 is invoking LPD3:
        Method (_PS3, 0, NotSerialized)  // _PS3: Power State 3
        {
            LPD3 ()
        }
dsdt.dsl, \_SB.PCI0.SDIO._PS0 is invoking SCPG:
                Method (_PS0, 0, NotSerialized)  // _PS0: Power State 0
                {
                    DPGE = Zero
                    I3EN = Zero
                    SCPG (Zero, 0xFFFFFFBE)
                    Sleep (0x02)
                }
dsdt.dsl, \_SB.PCI0.SDHA._PS3 is invoking SCPG:
                 Method (_PS0, 0, NotSerialized)  // _PS0: Power State 0
                {
                    DPGE = Zero
                    I3EN = Zero
                    SCPG (Zero, 0xFFFFFFBE)
                    Sleep (0x02)
                }
dsdt.dsl, \_SB.PCI0.SDC._PS0 is invoking SCPG:
                Method (_PS0, 0, NotSerialized)  // _PS0: Power State 0
                {
                    Local1 = GPC1 (0x00C005B0)
                    Local1 &= 0xFFFFC3FF
                    Local1 |= 0x3000
                    SPC1 (0x00C005B0, Local1)
                    Local1 = GPC1 (0x00C005B8)
                    Local1 &= 0xFFFFC3FF
                    Local1 |= 0x3000
                    SPC1 (0x00C005B8, Local1)
                    Local1 = GPC1 (0x00C005C0)
                    Local1 &= 0xFFFFC3FF
                    Local1 |= 0x3000
                    SPC1 (0x00C005C0, Local1)
                    Local1 = GPC1 (0x00C005C8)
                    Local1 &= 0xFFFFC3FF
                    Local1 |= 0x3000
                    SPC1 (0x00C005C8, Local1)
                    Local1 = GPC1 (0x00C005D8)
                    Local1 &= 0xFFFFC3FF
                    Local1 |= 0x3000
                    SPC1 (0x00C005D8, Local1)
                    GRR3 = ~RXST /* \_SB_.PCI0.SDC_.RXST */
                    DPGE = Zero
                    I3EN = Zero
                    SCPG (Zero, 0xFFFFFBFE)
                    Sleep (0x02)
                    SCPG (One, 0x0401)
                    GPOV = Zero
                    If ((GDW0 == Zero) && (RXEV == Zero))
                    {
                        GDW0 = GPC0 (0x00C005D0)
                        GDW1 = GPC1 (0x00C005D0)
                    }
                }
Comment 9 Lv Zheng 2016-10-25 00:58:33 UTC
This looks more like the root cause fix:
https://github.com/zetalog/linux/commit/108009d0

Please pull the tree and try the fixes.
Comment 10 Lv Zheng 2016-10-25 05:32:53 UTC
I reproduced the issue with customized table and confirmed that this fix can fix the issue.
Patches are sent to the community.
Let me mark the bug as resolved.
Comment 11 Lv Zheng 2016-11-07 21:07:45 UTC
Patch upstreamed in 4.9-rc3.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=25ccd2429fd0
Closing...
Comment 12 Viktor Jägersküpper 2017-03-12 19:55:41 UTC
Created attachment 255199 [details]
dmesg output for boot with kernel 4.10
Comment 13 Viktor Jägersküpper 2017-03-12 19:55:55 UTC
I think this issue is not completely fixed. During boot I always get an error like this:
[    0.196051] ACPI Error: Method parse/execution failed [\_PR.CPU1._PDC] (Node ffff9033ab0c0000), AE_ALREADY_EXISTS (20160930/psparse-543)

I did not notice any problems, e.g. with suspend or resume, and the error was also listed in the dmesg output in earlier kernel versions (at least since 3.16), but not shown during boot.

I am using Debian, the earliest version I could test is 4.9-rc3 which has your fix. Attached is the acpidump output and the dmesg output for booting with 4.10.
Comment 14 Viktor Jägersküpper 2017-03-12 19:56:42 UTC
Created attachment 255201 [details]
acpidump output
Comment 15 Lv Zheng 2017-03-13 05:52:56 UTC
(In reply to Viktor Jägersküpper from comment #13)
> I think this issue is not completely fixed. During boot I always get an
> error like this:
> [    0.196051] ACPI Error: Method parse/execution failed [\_PR.CPU1._PDC]
> (Node ffff9033ab0c0000), AE_ALREADY_EXISTS (20160930/psparse-543)
> 
> I did not notice any problems, e.g. with suspend or resume, and the error
> was also listed in the dmesg output in earlier kernel versions (at least
> since 3.16), but not shown during boot.
> 
> I am using Debian, the earliest version I could test is 4.9-rc3 which has
> your fix. Attached is the acpidump output and the dmesg output for booting
> with 4.10.

Looks like an issue fixed by this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a37052adb5

So please check with the upstream kernel.

Thanks
Lv
Comment 16 Lv Zheng 2017-03-13 05:55:15 UTC
The analysis is as follows:

In the dmesg:
[    0.195462] ACPI: Dynamic OEM Table Load:
[    0.195469] ACPI: SSDT 0xFFFF9033AA582C00 00022A (v01 PmRef  Cpu0Ist  00003000 INTL 20040311)
[    0.195610] ACPI: Dynamic OEM Table Load:
[    0.195613] ACPI: SSDT 0xFFFF9033AA533E00 000152 (v01 PmRef  Cpu1Ist  00003000 INTL 20040311)
[    0.196046] ACPI: Dynamic OEM Table Load:
[    0.196051] ACPI Error: Method parse/execution failed [\_PR.CPU1._PDC] (Node ffff9033ab0c0000), AE_ALREADY_EXISTS (20160930/psparse-543)

So it looks the problem was in the dynamic loaded table, I guess it is Cpu1Cst. As _PR.CPU1._PDC is:
    Scope (\_PR.CPU1)
    {
        Name (HI1, 0x00)
        Name (HC1, 0x00)
        Name (TLD1, 0x00)
        Method (_PDC, 1, NotSerialized)  // _PDC: Processor Driver Capabilities
        {
            CreateDWordField (Arg0, 0x08, CAP1)
            Store (CAP1, PDC1)
            If (LEqual (TLD1, 0x00))
            {
                If (LEqual (And (PDC1, 0x0A), 0x0A))
                {
                    If (And (CFGD, 0x02))
                    {
                        OperationRegion (IST1, SystemMemory, DerefOf (Index (SSDT, 0x04)), DerefOf (Index (SSDT, 0x05
                            )))
                        Load (IST1, HI1)
                        ^^^^^^^^^^^^^^^^
: Loading IST1, which is done previously.
[    0.195610] ACPI: Dynamic OEM Table Load:
[    0.195613] ACPI: SSDT 0xFFFF9033AA533E00 000152 (v01 PmRef  Cpu1Ist  00003000 INTL 20040311)
            "CPU1IST ", 
            0xCFEE84C0, 
            0x00000152, 
Table length exactly matches (000152).
                    }

                    If (And (CFGD, 0x10))
                    {
                        OperationRegion (CST1, SystemMemory, DerefOf (Index (SSDT, 0x0A)), DerefOf (Index (SSDT, 0x0B
                            )))
                        Load (CST1, HC1)
: Loading CST1, the only thing that can cause another dynamic loading failure.
[    0.196046] ACPI: Dynamic OEM Table Load:
[    0.196051] ACPI Error: Method parse/execution failed [\_PR.CPU1._PDC] (Node ffff9033ab0c0000), AE_ALREADY_EXISTS (20160930/psparse-543)
            "CPU1CST ", 
            0xCFEE87B0, 
            0x0000018A, 
                    }

                    Store (0x01, TLD1)
                }
            }
        }
    }

The dynamic loaded table addresses are obtained according to the following AML code:
        Name (SSDT, Package (0x18)
        {
            "CPU0IST ", 
            0xCFEE8000, 
            0x0000022A, 
            "CPU1IST ", 
            0xCFEE84C0, 
            0x00000152, 
            "CPU0CST ", 
            0xCFEE8620, 
            0x0000018A, 
            "CPU1CST ", 
            0xCFEE87B0, 
            0x0000018A, 
            "CPU2IST ", 
            0x00000000, 
            0xF000E816, 
            "CPU3IST ", 
            0x00000000, 
            0xF000E816, 
            "CPU2CST ", 
            0xCFEE8940, 
            0x0000018A, 
            "CPU3CST ", 
            0xCFEE8AD0, 
            0x0000018A
        })

I firstly doubt that both IST1 and CST1 should be loaded, they apparently should contain redundant stuffs, causing failures like the one you reported.
You can provide us the 2 tables as a proof by:
# acpidump -a 0xCFEE84C0 > ssdt-ist1.dat
# acpidump -a 0xCFEE87B0 > ssdt-cst1.dat
And upload the 2 tables here.
So what the CFGD is?
It's defined as:
        Name (CFGD, 0x020383F2)
And there is no AML code changing it.
So both the following condition check will be TRUE:
                    If (And (CFGD, 0x02))
                    If (And (CFGD, 0x10))
Don't you think it's an AML table issue?
And if it doesn't cause functional problems, why do we need to take care of it?

I then noticed that it seems there is no such a line for Cpu1Cst table:
[    0.195613] ACPI: SSDT 0xFFFFXXXXXXXXXXXX 00018A (v01 PmRef  Cpu1Cst  00003000 INTL 20040311)
And this line should be there if a table is successfully installed.
So I guess the AE_ALREADY_EXISTS is there because of the same problem for table mutex: There are code paths in early boot stage invoked without holding the table mutex and the dynamic table loading code has been affected by this.

Thanks and best regards
Lv