Bug 106151

Summary: ACPI Lid broken, need samsung-laptop.lid_handling=1. - Samsung N210 Plus
Product: ACPI Reporter: Jussi Virtanen (virtajus)
Component: Power-Sleep-WakeAssignee: Lv Zheng (lv.zheng)
Status: CLOSED DUPLICATE    
Severity: normal CC: aaron.lu, froggi_mail, julijonas.kikutis, lv.zheng, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.2.3-1-ARCH Subsystem:
Regression: No Bisected commit-id:
Attachments: acpidump as requested
Full dmesg output, ec.c DEBUG uncommented, "HandleLidSwitch=ignore"
[PATCH] ACPI / EC: Apply default event clearing timing to Samsung platforms
[PATCH] ACPI / EC: Remove CLEAR_ON_RESUME quirk
dmesg, patches 191131+191141, ecdebug, lid suspend
dmesg, patch 191131, ecdebug, lid suspend
dmesg, ecdebug, HandleLidSwitch=ignore, 15min wait before lid closing
[PATCH] ACPI / button: Fix issues by using cached LID status
[PATCH] ACPI / button: Fix issues by using cached LID status
dmesg, patches 191131+191911, ecdebug, lid suspend
dmesg, lid_handling=1, ecdebug, lid suspend
override _LID to use EC.LIDS field
Script used to echo /proc/acpi state to /dev/kmsg
dmesg, ecdebug, HandleLidSwitch=ignore, lids.aml
dmesg, ecdebug, HandleLidSwitch=suspend, lids.aml
dmesg, ecdebug, HandleLidSwitch=ignore, lid_handling=1, lids.aml
dmesg, ecdebug, HandleLidSwitch=suspend, lid_handling=1, lids.aml
Samsung N210P dmidecode output
quirk patch for samsung n210 plus

Description Jussi Virtanen 2015-10-17 19:34:03 UTC
Samsung N210 Plus wakes up from suspend when lid is opened, but immediately begins to suspend again. Only way to get out from the suspend loop is by rebooting. /proc/acpi/button/lid/LID0/state is stuck at "closed" after the first time laptop is awoken by opening the lid. If lid action is inhibited by adding "HandleLidSwitch=ignore" to /etc/systemd/logind.conf, then laptop correctly does not suspend anymore after closing the lid - also, /proc/acpi/button/lid/LID0/state works correctly, ie. it is "closed" only when the lid is closed.

Not sure if related, but in both cases, the lid state switches to "closed" only after 12 seconds or so after closing the lid.
Comment 1 Jussi Virtanen 2015-10-17 19:54:44 UTC
If the laptop is suspended manually (systemctl suspend) and then awoken by opening the lid, lid state is "open" and there is no suspend loop.
Comment 2 Aaron Lu 2015-10-19 07:08:20 UTC
acpidump please:
# acpidump > acpidump.txt
Comment 3 Jussi Virtanen 2015-10-19 08:30:14 UTC
Created attachment 190491 [details]
acpidump as requested
Comment 4 Aaron Lu 2015-10-20 03:08:23 UTC
+Lv.

From the dump, the LID state is got from a field in EC operation region.

(In reply to Jussi Virtanen from comment #0)
> Not sure if related, but in both cases, the lid state switches to "closed"
> only after 12 seconds or so after closing the lid.

Is it also the case with "HandleLidSwitch=ignore"? i.e. the LID state changed to closed only after 12 seconds(but not suspend of course since we have disabled that).

It may be related, in that the EC state changed too slow. So on resume, the LID state changed too slow.

And I wonder if it is always the case with old kernels?
Comment 5 Aaron Lu 2015-10-20 03:23:46 UTC
(In reply to Aaron Lu from comment #4)
> (In reply to Jussi Virtanen from comment #0)
> > Not sure if related, but in both cases, the lid state switches to "closed"
> > only after 12 seconds or so after closing the lid.
> 
> Is it also the case with "HandleLidSwitch=ignore"? i.e. the LID state
> changed to closed only after 12 seconds(but not suspend of course since we
> have disabled that).

Please ignore the above question, obviously you mean there is the delay for both cases.
Comment 6 Lv Zheng 2015-10-20 03:35:24 UTC
Did you use latest upstream kernel?
There are bugs fixed in EC event handling.

If you are using latest upstream kernel,
please enable the debugging of the EC driver:
In drivers/acpi/ec.c, uncomment the following line and rebuild the kernel:
/* #define DEBUG */

And please perform the test using the kernel:
1. boot the kernel with HandleLidSwitch=ignore;
2. perform LID close and wait for the state to be changed to "closed".
Upload the dmesg output of the test here.

For the suspend issue, IMO, it is different from this issue.

Thanks and best regards
-Lv
Comment 7 Jussi Virtanen 2015-10-25 12:05:44 UTC
Created attachment 191071 [details]
Full dmesg output, ec.c DEBUG uncommented, "HandleLidSwitch=ignore"

Using Arch Linux, so it's supposed to be vanilla kernel + few patches. Just to be safe, I switched to upstream kernel from www.kernel.org. At the time, the most recent one was 4.3.0-rc6, which was used during the tests. Behaviour was the same as with Arch Linux kernel 4.2.3-1, which was used in my initial report.

Attached is the dmesg output obtained using the instructions from comment #6. Lid closing starts from line 1096. Exact steps taken:

1. Booted the kernel, waited a few minutes
2. Echoed "CLOSING LID" to /dev/kmsg
3. Closed the lid
4. Waited for about 20 seconds
5. Opened the lid
6. Echoed "LID OPEN" to /dev/kmsg


WRT to comment #4, I tested kernel 2.6.32-5, and the change to /proc/acpi/button/lid/LID0/state was delayed for about 12 seconds just the same as with the recent one.
Comment 8 Lv Zheng 2015-10-26 06:48:40 UTC
Interesting.

From the log, we can see the EC GPE simply comes very late after closing LID:
[   59.352583] [drm:drm_calc_timestamping_constants [drm]] *ERROR* crtc 23: Can't calculate constants, dotclock = 0!
[  202.226612] CLOSING LID
[  214.192846] ACPI : EC: ===== IRQ (0) =====
[  214.192877] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
It takes 12s. And before these log entries, SCI_EVT is not flagged:
[   59.352096] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1
So the delay seems to be the nature of the platform, we have nothing to do with it.

The problem is how can we stop it from being immediately suspended.
First of all, let's remove some old Samsung quirks.
And if you still can see the issue, please upload the EC debug enabled dmesg with the quirk removal patches applied.

Thanks and best regards
-Lv
Comment 9 Lv Zheng 2015-10-26 07:30:22 UTC
Created attachment 191131 [details]
[PATCH] ACPI / EC: Apply default event clearing timing to Samsung platforms
Comment 10 Lv Zheng 2015-10-26 07:30:49 UTC
Created attachment 191141 [details]
[PATCH] ACPI / EC: Remove CLEAR_ON_RESUME quirk
Comment 11 Lv Zheng 2015-10-26 07:35:43 UTC
Test 1:
Please
1. Apply attachment 191131 [details] and attachment 191141 [details].
2. Enable the debugging of the EC driver and build the kernel.
3. Boot the kernel with HandleLidSwitch=ignore.
4. Perform LID close and wait for the state to be changed to "closed".
5. Upload the dmesg output of the test here.

Test 2:
Please
1. Apply attachment 191131 [details]
2. Enable the debugging of the EC driver and build the kernel.
3. Boot the kernel with HandleLidSwitch=ignore.
4. Perform LID close and wait for the state to be changed to "closed".
6. Upload the dmesg output of the test here.

In order to enable the EC debugging, you can uncomment the following line from drivers/acpi/ec.c
   /* #define DEBUG */

Thanks in advance.

Best regards
-Lv
Comment 12 Lv Zheng 2015-10-27 02:02:23 UTC
Hi,

Sorry, the test request is wrong. Let me correct it.

Test 1:
Please
1. Apply attachment 191131 [details] and attachment 191141 [details].
2. Enable the debugging of the EC driver and build the kernel.
3. Boot the kernel without HandleLidSwitch=ignore.
4. Perform LID open to wake the system up and see if it suspends again.
5. Upload the dmesg output of the test here.

Test 2:
Please
1. Apply attachment 191131 [details]
2. Enable the debugging of the EC driver and build the kernel.
3. Boot the kernel without HandleLidSwitch=ignore.
4. Perform LID open to wake the system up and see if it suspends again.
5. Upload the dmesg output of the test here.

In order to enable the EC debugging, you can uncomment the following line from drivers/acpi/ec.c
   /* #define DEBUG */

Thanks in advance.
Comment 13 Jussi Virtanen 2015-10-30 10:38:05 UTC
Created attachment 191531 [details]
dmesg, patches 191131+191141, ecdebug, lid suspend

Comment 12 test 1. Issue still persists.

Lid closing at line 1128.
Opening at line 1344.
After this, lid status is stuck at closed, and machine begins to suspend again. For possible additional info, I let the machine suspend once, woke up with power button, then saved dmesg.
Comment 14 Jussi Virtanen 2015-10-30 10:41:09 UTC
Created attachment 191541 [details]
dmesg, patch 191131, ecdebug, lid suspend

Comment 12 test 2. Issue still persists.

Lid closing at line 1132.
Opening at line 1359.
After this, lid status is stuck at closed, and machine begins to suspend again. For possible additional info, I let the machine suspend once, woke up with power button, then saved dmesg.
Comment 15 Jussi Virtanen 2015-10-30 11:00:40 UTC
Created attachment 191551 [details]
dmesg, ecdebug, HandleLidSwitch=ignore, 15min wait before lid closing

Some additional info that might be of use:

I was looking at the dmesg output and noticed that there are two ACPI/EC events triggered after the boot even without any user action. First one comes around one minute after the boot (also mentioned in comment #8). Let's call this event 1. Second one comes much later, around four minutes after boot. Let's call this event 2. Incidentally, event 2 might've happened at the same time as the lid closing in attachment 191071 [details], confusing the output somewhat.

The test in this attachment is the same as in comment #7, except in step 1. instead of waiting "a few minutes" I waited for 15 minutes.
Event 1 at line 1058.
Event 2 at line 1107.
Lid closing at line 1168.
Opening at line 1266.

There doesn't seem to be any more of these automatic acpi events after the four minute mark.

Additional info about event 1:
When this happens, the display switches off briefly (for maybe one second), then turns on again. I did not mention this earlier as I did not notice it happens exactly at the same time as event 1. Also, the lid state will not change before this event happens, so for example, if I boot the machine and close the lid right after logging in, the state will not change before ~72 seconds after boot (one minute for event 1 to happen plus the 12 second lid action lag mentioned earlier).

Additional info about event 2:
Did not notice any change in behaviour with this.
Comment 16 Jussi Virtanen 2015-10-30 11:13:33 UTC
Comment on attachment 191071 [details]
Full dmesg output, ec.c DEBUG uncommented, "HandleLidSwitch=ignore"

Possibly confusing output because of incidental collision of lid closing, and automatic acpi events
Comment 17 Lv Zheng 2015-11-02 02:30:01 UTC
Hi, Jussie

First of all, let me say something about the quirk before proceeding.

IMO, we needn't investigate the logs that are generated without removing the Samsung quirks (comment 7 and comment 15) but should take serious to the logs without the event clearing timing Samsung quirks (comment 13 and comment 14).

As with quirks enabled, for 1 SCI_EVT IRQ, Linux EC driver generated 2 QR_EC requests:
[   58.509028] ACPI : EC: ===== IRQ (1) =====
[   58.509063] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[   58.509072] ACPI : EC: ##### Command(QR_EC) submitted/blocked #####
                                ********************************
[   58.509153] ACPI : EC: ##### Event started #####
[   58.509169] ACPI : EC: 2: Increase command
[   58.509178] ACPI : EC: ***** Command(QR_EC) started *****
[   58.509192] ACPI : EC: ===== TASK (1) =====
[   58.509218] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[   58.509228] ACPI : EC: EC_SC(W) = 0x84
[   58.509242] ACPI : EC: ##### Command(QR_EC) unblocked #####
[   58.509251] ACPI : EC: ##### Command(QR_EC) submitted/blocked #####
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[   58.509382] ACPI : EC: ===== IRQ (0) =====
[   58.509407] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   58.509420] ACPI : EC: EC_DATA(R) = 0x5f
                          *****************
[   58.509427] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[   58.509478] ACPI : EC: ***** Command(QR_EC) stopped *****
[   58.509489] ACPI : EC: 1: Decrease command
[   58.509501] ACPI : EC: ##### Query(0x5f) scheduled #####
[   58.509514] ACPI : EC: 2: Increase command
[   58.509522] ACPI : EC: ***** Command(QR_EC) started *****
[   58.509532] ACPI : EC: ===== TASK (1) =====
[   58.509558] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[   58.509568] ACPI : EC: EC_SC(W) = 0x84
[   58.509581] ACPI : EC: ##### Command(QR_EC) unblocked #####
[   58.509613] ACPI : EC: ##### Query(0x5f) started #####
[   58.509721] ACPI : EC: ===== IRQ (1) =====
[   58.509759] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   58.509776] ACPI : EC: EC_DATA(R) = 0x00
                          ^^^^^^^^^^^^^^^^^
[   58.509786] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[   58.509867] ACPI : EC: ***** Command(QR_EC) stopped *****
[   58.509880] ACPI : EC: 1: Decrease command
[   58.509901] ACPI : EC: ##### Event stopped #####
[   58.509915] ACPI : EC: ##### Event started #####
[   58.509923] ACPI : EC: ##### Event stopped #####
[   58.509946] ACPI : EC: 2: Increase command
[   58.509957] ACPI : EC: ***** Command(RD_EC) started *****
[   58.509968] ACPI : EC: ===== TASK (1) =====
[   58.509999] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[   58.510009] ACPI : EC: EC_SC(W) = 0x80
[   58.510157] ACPI : EC: ===== IRQ (1) =====
[   58.510189] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[   58.510198] ACPI : EC: EC_DATA(W) = 0x83
[   58.510287] ACPI : EC: ===== IRQ (0) =====
[   58.510312] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1
[   58.510324] ACPI : EC: EC_DATA(R) = 0x01
[   58.510374] ACPI : EC: ***** Command(RD_EC) stopped *****
[   58.510386] ACPI : EC: 1: Decrease command
[   58.510513] ACPI : EC: ##### Query(0x5f) stopped #####
Firmware returns 0x5f for the 1st QR_EC request (mared by *) and 0x00 for the 2nd QR_EC request (marked by ^). There is no evidence that this addtional QR_EC won't cause any side effect on this platform.

While we can see without the quirk, Linux EC driver only generates 1 QR_EC for 1 SCI_EVT indication, which looks normal:
[   59.239348] ACPI : EC: ===== IRQ (1) =====
[   59.239382] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[   59.239391] ACPI : EC: ##### Command(QR_EC) submitted/blocked #####
                                ********************************
[   59.239466] ACPI : EC: ##### Event started #####
[   59.239482] ACPI : EC: 2: Increase command
[   59.239492] ACPI : EC: ***** Command(QR_EC) started *****
[   59.239504] ACPI : EC: ===== TASK (1) =====
[   59.239526] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[   59.239536] ACPI : EC: EC_SC(W) = 0x84
[   59.239711] ACPI : EC: ===== IRQ (0) =====
[   59.239750] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   59.239767] ACPI : EC: EC_DATA(R) = 0x5f
                          *****************
[   59.239777] ACPI : EC: ##### Command(QR_EC) unblocked #####
[   59.239787] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[   59.239833] ACPI : EC: ***** Command(QR_EC) stopped *****
[   59.239850] ACPI : EC: 1: Decrease command
[   59.239862] ACPI : EC: ##### Query(0x5f) scheduled #####
[   59.239873] ACPI : EC: ##### Event stopped #####
[   59.239886] ACPI : EC: ##### Query(0x5f) started #####
[   59.239975] ACPI : EC: 2: Increase command
[   59.239986] ACPI : EC: ***** Command(RD_EC) started *****
[   59.239998] ACPI : EC: ===== TASK (1) =====
[   59.240024] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[   59.240034] ACPI : EC: EC_SC(W) = 0x80
[   59.240178] ACPI : EC: ===== IRQ (0) =====
[   59.240210] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[   59.240220] ACPI : EC: EC_DATA(W) = 0x83
[   59.240319] ACPI : EC: ===== IRQ (1) =====
[   59.240353] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1
[   59.240368] ACPI : EC: EC_DATA(R) = 0x01
[   59.240416] ACPI : EC: ***** Command(RD_EC) stopped *****
[   59.240429] ACPI : EC: 1: Decrease command
[   59.240553] ACPI : EC: ##### Query(0x5f) stopped #####
Firmware only returns 0x5f for the 1st QR_EC request (marked by *) and there is no 2nd QR_EC generated by the Linux EC driver.

And on your platform, SCI_EVT is not flagged after resuming, so the CLEAR_ON_RESUME quirk is not relevant to this bug.

IMO, we may learn the platform behavior using the log posted in comment 14.

Thanks and best regards
-Lv
Comment 18 Lv Zheng 2015-11-02 02:59:02 UTC
Hi, Jussie

2nd, let me explain what the behavior the BIOS expects by decoding the BIOS provided AML.

In fact, the LID behavior on this platform is simple:
DefinitionBlock ("dsdt.aml", "DSDT", 1, "INTEL", "BEARG31A", 0x06040000)
{
    OperationRegion (GNVS, SystemMemory, 0x7F5C0D7C, 0x0100)
    Field (GNVS, AnyAcc, Lock, Preserve)
    {
        Offset (0x10), 
        LIDS,   8, 
    }

    Scope (\_SB)
    {
        Device (LID0)
        {
            Name (_HID, EisaId ("PNP0C0D"))
            Method (_LID, 0, NotSerialized)
            {
                Return (LIDS) /* \LIDS */
            }
        }

        Device (PCI0)
        {
            Name (_HID, EisaId ("PNP0A08"))
            Device (LPC0)
            {
                Name (_ADR, 0x001F0000)
                Device (H_EC)
                {
                    Name (_HID, EisaId ("PNP0C09"))
                    OperationRegion (ECR, EmbeddedControl, 0x00, 0xFF)
                    Field (ECR, ByteAcc, Lock, Preserve)
                    {
                        Offset (0x83), 
                        LIDS,   1, 
                    }

                    Method (_Q5E, 0, NotSerialized)
                    {
                        \LIDS = LIDS /* \_SB_.PCI0.LPC0.H_EC.LIDS */
                        Notify (\_SB.LID0, 0x80)
                    }

                    Method (_Q5F, 0, NotSerialized)
                    {
                        \LIDS = LIDS /* \_SB_.PCI0.LPC0.H_EC.LIDS */
                        Notify (\_SB.LID0, 0x80)
                    }
                }
            }
        }
    }
}

There is only 1 operation region field \_SB_.PCI0.LPC0.H_EC.LIDS and a BIOS saved status \LIDS related to the LID status returning to the OSPMs. And both 0x5E and 0x5F can trigger LID status saving and OSPM notification (I don't why 2 redundant events are provided by the BIOS, hope they are not meant for triggering issues).

Thanks
-Lv
Comment 19 Lv Zheng 2015-11-02 03:18:10 UTC
Hi, Jussie

3rd, let's check the details in comment 7 and comment 15.

====================
In comment 15, what we can obtain is (searching using Query(0x??) scheduled):
Line 1058:
[   58.509028] ACPI : EC: ===== IRQ (1) =====
[   58.509063] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[   58.509420] ACPI : EC: EC_DATA(R) = 0x5f
The 1st event after booting is a LID status change event (0x5F). I guess this is a "LID open" event.

Line 1107:
[  253.606972] ACPI : EC: ===== IRQ (0) =====
[  253.607003] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[  253.607386] ACPI : EC: EC_DATA(R) = 0x70
The 2nd event after booting is not a LID event (0x70).

Line 1168:
[  981.651755] CLOSING LID
[  994.447593] ACPI : EC: ===== IRQ (1) =====
[  994.447624] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[  994.448005] ACPI : EC: EC_DATA(R) = 0x5e
The 3rd event came 12 minutes after closing LID, which is a LID status change event (0x5E). I guess this is a "LID close" event.

Line 1217:
[ 1009.225680] ACPI : EC: ===== IRQ (0) =====
[ 1009.225711] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[ 1009.226067] ACPI : EC: EC_DATA(R) = 0x5f
...
[ 1013.196901] LID OPEN
The 4th event is also a LID status change event (0x5F), came before opening LID might be the "LID open" event. You just logged the "LID OPEN" after opending the LID.
====================

====================
In comment 7, what we can obtain is (searching using Query(0x??) scheduled):
Line 1047:
[   59.350745] ACPI : EC: ===== IRQ (1) =====
[   59.350776] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[   59.351160] ACPI : EC: EC_DATA(R) = 0x5f
The 1st event after booting is a "LID open" event (0x5F) after booting.

Line 1096:
[  202.226612] CLOSING LID
[  214.192846] ACPI : EC: ===== IRQ (0) =====
[  214.192877] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[  214.193231] ACPI : EC: EC_DATA(R) = 0x5e
The 2nd event is a "LID close" event (0x5E) which occurred after "CLOSING LID" log.

Line 1145:
[  216.374946] ACPI : EC: ===== IRQ (1) =====
[  216.374974] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[  216.375839] ACPI : EC: EC_DATA(R) = 0x70
The 3rd event after booting is not a LID event (0x70).

Line 1206:
[  225.186180] ACPI : EC: ===== IRQ (1) =====
[  225.186211] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[  225.186563] ACPI : EC: EC_DATA(R) = 0x5f
...
[  227.135066] LID OPEN
The 4th event is a "LID open" event (0x5F) which occurred before "LID OPEN".
====================

So I couldn't see anything abnormal here from the EC log, if you can see LID status stuck at "close", IMO, the bug should be in the ACPI bus core or in the LID driver.
One more thing we can do is to let you execute the userspace tool to obtain the EC status. Let me ask you do this later.

Thanks and best regards
-Lv
Comment 20 Lv Zheng 2015-11-02 05:04:00 UTC
Hi, Jussie

4th, let's check the dtails in comment 14:

Line 1052:
[   59.233010] ACPI : EC: ===== IRQ (1) =====
[   59.233042] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[   59.233477] ACPI : EC: EC_DATA(R) = 0x5f
The 1st event after booting is a "LID open" event (0x5F).

Line 1086
[  210.394269] ACPI : EC: ===== IRQ (1) =====
[  210.394301] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[  210.394684] ACPI : EC: EC_DATA(R) = 0x70
The 2nd event after booting is not a LID event (0x70).

Line 1132:
[  961.510438] CLOSING LID
[  973.699144] ACPI : EC: ===== IRQ (0) =====
[  973.699175] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
...
[  973.699501] ACPI : EC: EC_DATA(R) = 0x5e
The 3rd event is a "LID close" event (0x5E) which occurred after "CLOSING LID" log.

Line 1140:
[  973.699308] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
Note that there is no EC event after the above line. At this point, "LID close" event should still be in progress.

Line 1234:
[  974.973676] ACPI : EC: 2: Increase command
[  974.973680] ACPI : EC: ***** Command(QR_EC) started *****
[  974.973683] ACPI : EC: ===== TASK (1) =====
[  974.973695] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[  974.973697] ACPI : EC: EC_SC(W) = 0x84
[  974.976640] ACPI : EC: ===== TASK (1) =====
[  974.976655] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[  974.976660] ACPI : EC: EC_DATA(R) = 0x00
                          *****************
[  974.976663] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[  974.976665] ACPI : EC: ***** Command(QR_EC) stopped *****
[  974.976667] ACPI : EC: 1: Decrease command
[  974.976670] ACPI : EC: 0 stale EC events cleared
After resuming, there is no EC event (marked by *).

Line 1360:
[  977.737510] LID OPEN
This time, we couldn't see "LID open" event. This could be the problem, but I don't see anything abnormal related to the EC driver that has caused such situation.
So if there no 0x5F can be seen after resuming, \LIDS won't get updated and this probably means: if OPSMs execute \_SB.LID0._LID to obtain the status of the LID at this time, "close" will still be returned. I guess the "open" event will come, but will only come after 12seconds as what has been demonstrated in comment 7 and comment 15. So executing \_SB.LID0._LID so early after resuming should be the problem.

In order to solve this issue, OSPMs should stop evaluating _LID spontaneously, but should only evaluate it in response to the LID status change events.

Thanks and best regards
-Lv
Comment 21 Lv Zheng 2015-11-03 06:06:23 UTC
Created attachment 191901 [details]
[PATCH] ACPI / button: Fix issues by using cached LID status
Comment 22 Lv Zheng 2015-11-03 06:06:58 UTC
Hi, Jussie

Could you help to try this fix patch?

Thanks and best regards
-Lv
Comment 23 Lv Zheng 2015-11-03 06:08:45 UTC
Test request:
Please
1. Apply attachment 191131 [details] and attachment 191901 [details].
2. Boot the kernel without HandleLidSwitch=ignore.
3. Perform LID open to wake the system up and see if it suspends again.
4. Upload the dmesg output of the test here.

Thanks in advance
-Lv
Comment 24 Lv Zheng 2015-11-03 06:21:08 UTC
IMO, this bug also implies another issue that:

The .notify() operation of a driver shouldn't get invoked during the period between the timing the .suspend() operation is invoked and the timing the .resume() operation is not invoked. This issue cannot fixed by this simple patch.

Thanks and best regards
-Lv
Comment 25 Lv Zheng 2015-11-03 06:50:25 UTC
Created attachment 191911 [details]
[PATCH] ACPI / button: Fix issues by using cached LID status

I just update the patch in order to be more robust to survive the order issue.
Let me post the updated test request later.
Comment 26 Lv Zheng 2015-11-03 06:50:46 UTC
Test request:
Please
1. Apply attachment 191131 [details] and attachment 191911 [details].
2. Boot the kernel without HandleLidSwitch=ignore.
3. Perform LID open to wake the system up and see if it suspends again.
4. Upload the dmesg output of the test here.

Thanks in advance
-Lv
Comment 27 Aaron Lu 2015-11-03 07:25:49 UTC
Is it possible to check if Windows behaves well regarding LID?
Comment 28 Jussi Virtanen 2015-11-04 12:37:05 UTC
Created attachment 192071 [details]
dmesg, patches 191131+191911, ecdebug, lid suspend

Test results as per comment 26

Lid closing at line 1139
Lid open at line 1367

Now the lid state from /proc/acpi/button/lid/LID0/state is not stuck at closed anymore, ie. it changes to open when resuming the laptop from suspend by opening the lid. But the laptop still begins to suspend again briefly after resuming from suspend by opening the lid.

As with earlier tests, I let the laptop go back to suspend once, then resumed with the power button, before saving dmesg output.

Comment 27:
Yes, I can do this. Will post findings later.
Comment 29 Fred 2015-11-05 18:40:36 UTC
I have the same bug on Samsung N145 plus.

Setting /sys/devices/platform/samsung/lid_handling to 1 fixes it (at least until reboot)

I might be wrong, but I think the "real" bug has already been fixed by this https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0dcaf4fbb36895175657be029ed64eda2a34707
but the dmi filtering in drivers/platform/x86/samsung-laptop.c prevent the fix to apply on our laptops
Here are the infos for my N145p :
vendor : SAMSUNG ELECTRONICS CO., LTD.
product name : N145P/N250P/260P
Comment 30 Jussi Virtanen 2015-11-07 12:51:16 UTC
About Windows 7 x86 behaviour (comment 27, comment 28):

With a clean install and no drivers, sleep is selected by default for lid close, but when trying to change the setting, sleep is not in the list of options. If I leave the setting as is, machine goes to hibernation on lid close.

With the VGA drivers installed from http://www.samsung.com/uk/support/model/NP-N210-JP02UK, sleep becomes available as an option. The machine goes to sleep correctly when the lid is closed, although there is the same 10-20 second lag as with Linux. Laptop also wakes up correctly when the lid is opened, and there is no faulty behaviour afterwards.
Comment 31 Aaron Lu 2015-11-09 02:40:19 UTC
Fred, thanks for the information.

Jussi, can you please also test by setting the lid_handling sysfs file and see if the problem is fixed?
Comment 32 Jussi Virtanen 2015-11-11 09:22:46 UTC
Created attachment 192741 [details]
dmesg, lid_handling=1, ecdebug, lid suspend

Setting /sys/devices/platform/samsung/lid_handling to 1 removes the lag, but the lid state is still stuck at closed after resuming from suspend by opening the lid, and systems begins to suspend again.

Attached is the dmesg output with lid_handling=1, ec debugging enabled, with one faulty resuspend. Lid closing at line 1066, opening at line 1309.
Comment 33 Aaron Lu 2015-11-12 02:44:26 UTC
(In reply to Jussi Virtanen from comment #32)
> Created attachment 192741 [details]
> dmesg, lid_handling=1, ecdebug, lid suspend
> 
> Setting /sys/devices/platform/samsung/lid_handling to 1 removes the lag, but
> the lid state is still stuck at closed after resuming from suspend by
> opening the lid, and systems begins to suspend again.

I suppose if you set HandleLidSwitch=ignore in logind.conf, and then close and open the LID, the LID status will also be stuck at close?

> 
> Attached is the dmesg output with lid_handling=1, ec debugging enabled, with
> one faulty resuspend. Lid closing at line 1066, opening at line 1309.

Lv,
Can you please check the dmesg? It seems that we lost an EC query to set the LID status during resume.
Comment 34 Jussi Virtanen 2015-11-18 15:54:38 UTC
(In reply to Aaron Lu from comment #33)
> I suppose if you set HandleLidSwitch=ignore in logind.conf, and then close
> and open the LID, the LID status will also be stuck at close?
With HandleLidSwitch=ignore and lid_handling=1 the lid status works as expected, ie. it's not stuck after opening the lid. This is also the case with lid_handling=0 and HandleLidSwitch=ignore.
Comment 35 Aaron Lu 2015-11-19 01:50:52 UTC
I suppose if we change the _LID method:
                    Device (LID0)
                    {
                        Name (_HID, EisaId ("PNP0C0D") /* Lid Device */)  // _HID: Hardware ID
                        Method (_LID, 0, NotSerialized)  // _LID: Lid Status
                        {
                            Return (LIDS) /* \LIDS */
                        }
                    }       

to this:
Method (_LID, 0, NotSerialized)
{
    Return (\_SB_.PCI0.LPC0.H_EC.LIDS)
}

could fix the bug?

Lv,
Is it possible to override the _LID control method at runtime? I remembered we have such a feature in Linux, but I can't find it right now...
Comment 36 Lv Zheng 2015-11-23 02:33:19 UTC
Yes, there is.
If you think this is a BIOS bug, then you can override the control method at runtime.
See Documentation/acpi/method-customizing.txt for the details.

But also be careful with it as the mechanism is not robust enough. It doesn't support "undo" right now which means it may trigger runtime race conditions.

So before accessing /sys/kernel/debug/acpi/customi_method, make sure the method you want to overriding is not used by any kernel drivers.

Thanks and best regards
-Lv
Comment 37 Aaron Lu 2015-11-23 03:00:49 UTC
Created attachment 195171 [details]
override _LID to use EC.LIDS field

Thanks Lv.

Jussi,
Please follow these steps to override the _LID method with the lids.aml file I attached here:
1 echo 1 > /sys/module/acpi/parameters/aml_debug_output
  turn on debug
2 mount debugfs by "mount -t debugfs none /sys/kernel/debug"
3 cat lids.aml > /sys/kernel/debug/acpi/custom_method

Then use the /proc/acpi interface to query LID's status, check if there is "In new _LID" message in dmesg log, as well as if the LID status is correct. If everything is right, then test if LID status is correct after resume, thanks.
Comment 38 Lv Zheng 2015-11-24 01:49:21 UTC
Hope you can also enable EC debugging for the test and upload the dmesg output.
So I can help to confirm if "LID open" event can arrive in 12 seconds after resuming.

Thanks and best regards
-Lv
Comment 39 Jussi Virtanen 2015-11-30 13:26:57 UTC
Created attachment 196021 [details]
Script used to echo /proc/acpi state to /dev/kmsg

Applying the steps at comment 37 almost fully removes the lag of lid state at /proc/acpi. Lid state works correctly if HandleLidSwitch=ignore is set. If HandleLidSwitch is commented (same as HandleLidSwitch=suspend), then lid state also works correctly, laptop suspends and resumes correctly, and the machine is no longer stuck at resume loop. Entering sleep is delayed by ~12 seconds. "In new _LID" messages appear in dmesg output.

For any Arch Linux users viewing this bug: debugfs was mounted by default. custom_method was compiled as kernel module, so it had to be enabled manually to allow writing to /sys/kernel/debug/acpi/custom_method

I used the attached script to echo /proc/acpi and approximation of how long lid has been closed to /dev/kmsg.

dmesg dumps in following posts.
Comment 40 Jussi Virtanen 2015-11-30 13:28:55 UTC
Created attachment 196031 [details]
dmesg, ecdebug, HandleLidSwitch=ignore, lids.aml

Lid closing at 1178
Opening at 1663
Comment 41 Jussi Virtanen 2015-11-30 13:30:06 UTC
Created attachment 196041 [details]
dmesg, ecdebug, HandleLidSwitch=suspend, lids.aml

Lid closing at line 1159
Opening at 1641
Comment 42 Aaron Lu 2015-12-01 05:34:56 UTC
(In reply to Jussi Virtanen from comment #39)
> Created attachment 196021 [details]
> Script used to echo /proc/acpi state to /dev/kmsg
> 
> Applying the steps at comment 37 almost fully removes the lag of lid state
> at /proc/acpi. Lid state works correctly if HandleLidSwitch=ignore is set.
> If HandleLidSwitch is commented (same as HandleLidSwitch=suspend), then lid
> state also works correctly, laptop suspends and resumes correctly, and the
> machine is no longer stuck at resume loop. Entering sleep is delayed by ~12
> seconds. "In new _LID" messages appear in dmesg output.

You will still need the lid_handling=1 quirk, that's a separate bug. Sorry if that's not clear. So can you please test again with the lid_handling=1 quirk set? Thanks.
Comment 43 Jussi Virtanen 2015-12-01 15:41:17 UTC
Created attachment 196271 [details]
dmesg, ecdebug, HandleLidSwitch=ignore, lid_handling=1, lids.aml

(In reply to Aaron Lu from comment #42)
> You will still need the lid_handling=1 quirk, that's a separate bug. Sorry
> if that's not clear. So can you please test again with the lid_handling=1
> quirk set? Thanks.

Ok. I tested with lid_handling=1 and the aml patch, and everything seems to be fine - no delay on the lid state, no delay on entering sleep, and no suspend loop.

Lid closing and opening at lines 1155-1642
Comment 44 Jussi Virtanen 2015-12-01 15:46:27 UTC
Created attachment 196281 [details]
dmesg, ecdebug, HandleLidSwitch=suspend, lid_handling=1, lids.aml

Lid closing and opening at lines 1146-1477. There are a few "Lid closed for .. seconds" after resuming because I can't interrupt the monitoring script fast enough after resume, those lines can be ignored.
Comment 45 Aaron Lu 2015-12-02 02:25:06 UTC
Jussi,

Please attach your DMI info, I'll add yours and Fred's into the DMI table so that you do not need to set lid_handling.
# dmidecode > dmi.info
Comment 46 Jussi Virtanen 2015-12-02 12:22:03 UTC
Created attachment 196321 [details]
Samsung N210P dmidecode output

(In reply to Aaron Lu from comment #45)
> Jussi,
> 
> Please attach your DMI info, I'll add yours and Fred's into the DMI table so
> that you do not need to set lid_handling.
> # dmidecode > dmi.info
Comment 47 Lv Zheng 2016-05-09 05:45:21 UTC
(In reply to Lv Zheng from comment #19)

> So I couldn't see anything abnormal here from the EC log, if you can see LID
> status stuck at "close", IMO, the bug should be in the ACPI bus core or in
> the LID driver.

I guess, the BIOS may have updated \LIDS during s3 to "open" but the Linux kernel restored it back to the value before suspending to "close". I don't know if restoring the nvram is the root cause.

I also can sense that if we stop notifying userspace after booting/resuming, and userspace stops enforcing the "initial lid status notification", we can simply ignore the initial lid status after booting/resuming without changing the nvram behavior.

So it looks to me:
This is an ACPI bus core bug or an ACPI lid driver (along with the user space) bug.
Comment 48 Zhang Rui 2016-05-11 06:41:29 UTC
(In reply to Lv Zheng from comment #47)
> (In reply to Lv Zheng from comment #19)
> 
> > So I couldn't see anything abnormal here from the EC log, if you can see
> LID
> > status stuck at "close", IMO, the bug should be in the ACPI bus core or in
> > the LID driver.
> 
> I guess, the BIOS may have updated \LIDS during s3 to "open" but the Linux
> kernel restored it back to the value before suspending to "close". I don't
> know if restoring the nvram is the root cause.
> 
to verify this, Jussi, please reboot with boot option acpi_sleep=nonvs and check the Lid status after resume.
Comment 49 Aaron Lu 2016-05-11 07:02:46 UTC
+Julijonas in case he has some ideas about this problem.
Comment 50 Zhang Rui 2016-05-15 07:41:54 UTC
Created attachment 216291 [details]
quirk patch for samsung n210 plus

Please apply this patch and the customized ASL from Aaron, to confirm this patch works for you, instead of "samsung-laptop.lid_handling=1".
Comment 51 Zhang Rui 2016-06-20 02:13:52 UTC
ping...
Comment 52 Lv Zheng 2016-06-20 02:37:54 UTC
Hi, Rui

We have asked user space to do the change.
For the time being, we implemented quirks for such bugs.

Hi, Jussie

I just forgot to give the feedback on the related bugzilla entry.
Please download the 3 patches:
https://patchwork.kernel.org/patch/9147055/
https://patchwork.kernel.org/patch/9147063/
https://patchwork.kernel.org/patch/9147067/

Apply them and boot the built kernel with following boot parameter:
lid_init_state.lid_init_state=open.

Thanks in advance.
Comment 53 Lv Zheng 2016-06-20 02:38:37 UTC
Sorry, the boot parameter should be:
button.lid_init_state=open.

Thanks
-Lv
Comment 54 Lv Zheng 2016-06-22 04:31:16 UTC
This bug should be same as Bug 89211.
If not, please re-open it.

Thanks
-Lv

*** This bug has been marked as a duplicate of bug 89211 ***