Bug 94411 - Unwanted delayed execution of _Qxx EC methods
Summary: Unwanted delayed execution of _Qxx EC methods
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-06 12:39 UTC by Gabriele Mazzotta
Modified: 2015-09-14 05:47 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.19
Tree: Mainline
Regression: No


Attachments
dmesg when function keys are pressed (9.31 KB, application/octet-stream)
2015-03-06 12:39 UTC, Gabriele Mazzotta
Details
[PATCH 1/5] ACPI / EC: Cleanup transaction state transition. (3.02 KB, patch)
2015-05-29 07:17 UTC, Lv Zheng
Details | Diff
[PATCH 2/5] ACPI / EC: Convert event handling work queue into loop style. (4.26 KB, patch)
2015-05-29 07:17 UTC, Lv Zheng
Details | Diff
[PATCH 1/5] ACPI / EC: Cleanup transaction state transition. (3.02 KB, patch)
2015-05-29 07:21 UTC, Lv Zheng
Details | Diff
[PATCH 2/5] ACPI / EC: Convert event handling work queue into loop style. (4.26 KB, patch)
2015-05-29 07:22 UTC, Lv Zheng
Details | Diff
[PATCH 3/5] ACPI / EC: Add event clearing variation support. (7.54 KB, patch)
2015-05-29 07:22 UTC, Lv Zheng
Details | Diff
[PATCH 4/5] ACPI / EC: Add experimental support to complete query after reading the query value. (3.44 KB, patch)
2015-05-29 07:23 UTC, Lv Zheng
Details | Diff
[PATCH 5/5] ACPI / EC: Reduce transaction retry times. (2.80 KB, patch)
2015-05-29 07:23 UTC, Lv Zheng
Details | Diff
dmesg - acpi.ec_event_clearing=event (139.51 KB, text/plain)
2015-05-29 21:16 UTC, Gabriele Mazzotta
Details
dmesg - acpi.ec_event_clearing=query (6.93 KB, text/plain)
2015-05-30 08:48 UTC, Gabriele Mazzotta
Details
acpidump (452.45 KB, application/octet-stream)
2015-05-30 08:48 UTC, Gabriele Mazzotta
Details
dmesg - acpi.ec_event_clearing=status (26.67 KB, text/plain)
2015-06-07 17:21 UTC, Gabriele Mazzotta
Details
dmesg - acpi.ec_event_clearing=query (22.71 KB, text/plain)
2015-06-07 17:21 UTC, Gabriele Mazzotta
Details
dmesg - acpi.ec_event_clearing=event (20.07 KB, text/plain)
2015-06-07 17:22 UTC, Gabriele Mazzotta
Details
dmesg - acpi.ec_event_clearing=method (21.41 KB, text/plain)
2015-06-07 17:22 UTC, Gabriele Mazzotta
Details
[PATCH] ACPI / OSL: Make _Lxx/_Exx/_Qxx parallelized. (1.30 KB, patch)
2015-06-09 03:01 UTC, Lv Zheng
Details | Diff
dmesg - _Qxx parallelized (21.63 KB, text/plain)
2015-06-09 10:01 UTC, Gabriele Mazzotta
Details
dmesg - _Qxx parallelized after resume (82.90 KB, text/plain)
2015-07-23 09:19 UTC, Gabriele Mazzotta
Details
[PATCH v2] ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations. (4.71 KB, patch)
2015-08-07 05:23 UTC, Lv Zheng
Details | Diff

Description Gabriele Mazzotta 2015-03-06 12:39:49 UTC
Created attachment 169541 [details]
dmesg when function keys are pressed

My laptop (XPS13 9333) has two function keys to control the screen brightness. When pressed, two _Qxx ACPI methods are called, one to increase and one to decrease the brightness. Both the methods sleep for 200ms.

This pause of 200ms is causing some problems when I keep one of the function keys pressed. In this situation, more queries than needed are scheduled for a delayed execution, making the screen brightness change even after the function key is released. If I keep the function key pressed for X seconds, it will take roughly 2X seconds for all the queries to be executed.

Here attached see some log lines that show the problem.
Comment 1 Lv Zheng 2015-03-09 02:54:10 UTC
Hi, Gabriele:

EC firmware flags SCI_EVT, and EC OS driver sends QR_EC command to obtain the the event number, and evaluate _Qxx (xx is the number of the event) to handle the event. The QR_EC command is comprised of a 0x84 (QR_EC) EC_SC register write and an EC_SC register read to obtain the event number.

The problem is we have several design concerns here:
1. The ACPI specification doesn't mention when the SCI_EVT should be cleared.
   We've observed on Samsung platforms that:
   1. SCI_EVT is cleared right after accepting the 0x84 (QR_EC) EC_SC register write and before the host driver reading the EC_SC register;
   2. It may have queued up additional events without flagging SCI_EVT, expecting EC OS driver to continously send QR_EC command until an event number of 0x00 is returned.

We have the following event handling process for a single event indication:
  Write QR_EC to EC_SC, read from EC_SC, evaluate _Qxx.
Currently, we rely on QR_EC to be queued up when SCI_EVT is flagged because:
2. The ACPI specification doesn't metion if 0x84 (QR_EC) can be written to the EC_SC register when SCI_EVT is 0.
   We've observed on Acer platforms that writing a 0x84 (QR_EC) to EC_SC register when SCI_EVT makes the platform hung.

And according to the fact 1, we might have the event handling process serialized for the next event happened after the _Qxx evaluation and continue the process until 0x00 is returned. We call it event draining support.
IMO, implementing things in this way can solve your problem.

But unfortenately, we've observed the following conflicts:

3. There is no guarantee that ACPICA will not die during _Qxx evaluation.
   So if things are serialized in this way, next QR_EC may be blocked by _Qxx evaluation. Thus current implementation doesn't serialize _Qxx evaluation in it.

4. The ACPI specification doesn't mention 0x00 should always be returned.
   We've observed on other platforms that:
   1. If we keep on sending 0x84 (QR_EC) to EC_SC, particular event number other than 0x00 is always returned and there is no _Qxx in the namespace to handle this event number.

So from the above discussion, you may see that the current implementation may have performance issues but should be immune to the possible regressions.

The improvement of draining events is on our plan, we are just waiting for the current upstreamed EC fixes to be well tested by the end users.
I'll post the patches here for you to test when they are ready.

Thanks and best regards
-Lv
Comment 2 Lv Zheng 2015-05-28 07:17:22 UTC
Hi,

Please try to:

1. apply the following patches:
   attachment 178011 [details]
   attachment 178021 [details]
   attachment 178031 [details]
2. build the kernel and try to see if things are improved.

Thanks in advance.

Best regards
-Lv
Comment 3 Lv Zheng 2015-05-28 07:18:06 UTC
I'm working on linux-pm.git/linux-next branch.

For the linux.git/master branch, I think you can use v4.1-rc5 tag and apply only the following 2 patches:
   attachment 178011 [details]
   attachment 178031 [details]
The other patch is a no-op and can be ignored.

Thanks and best regards
-Lv
Comment 4 Gabriele Mazzotta 2015-05-28 11:21:58 UTC
I tested "attachment 178011 [details]" and "attachment 178031 [details]" on top of v4.1-rc5 and the problem is still there.

Do you need some logs? Should I try with linux-pm.git/linux-next as it might have some other relevant changes I'm not aware of?

Thanks.
Comment 5 Lv Zheng 2015-05-28 11:27:50 UTC
Then we need the EC debugging log to learn.
Can you uncomment the following line in drivers/acpi/ec.c
/* #define DEBUG */
to
#define DEBUG
rebuild the kernel and retry.
And upload the log here.

Thanks
-Lv
Comment 6 Lv Zheng 2015-05-28 11:34:00 UTC
No need to use linux-pm.git.
There is no change around the timing the QR_EC is allowed.
Comment 7 Lv Zheng 2015-05-29 07:17:26 UTC
Created attachment 178231 [details]
[PATCH 1/5] ACPI / EC: Cleanup transaction state transition.
Comment 8 Lv Zheng 2015-05-29 07:17:59 UTC
Created attachment 178241 [details]
[PATCH 2/5] ACPI / EC: Convert event handling work queue into loop style.
Comment 9 Lv Zheng 2015-05-29 07:21:37 UTC
Created attachment 178251 [details]
[PATCH 1/5] ACPI / EC: Cleanup transaction state transition.
Comment 10 Lv Zheng 2015-05-29 07:22:10 UTC
Created attachment 178261 [details]
[PATCH 2/5] ACPI / EC: Convert event handling work queue into loop style.
Comment 11 Lv Zheng 2015-05-29 07:22:35 UTC
Created attachment 178271 [details]
[PATCH 3/5] ACPI / EC: Add event clearing variation support.
Comment 12 Lv Zheng 2015-05-29 07:23:04 UTC
Created attachment 178281 [details]
[PATCH 4/5] ACPI / EC: Add experimental support to complete query after reading the query value.
Comment 13 Lv Zheng 2015-05-29 07:23:28 UTC
Created attachment 178291 [details]
[PATCH 5/5] ACPI / EC: Reduce transaction retry times.
Comment 14 Lv Zheng 2015-05-29 07:28:37 UTC
Hi,

I made the QR_EC submission a bit later than the previous approach, hope this can help you.

Please:
1. Download linux-pm.git/linux-next branch as I need the ec_guard() commits in this repo.
2. apply the following patches:
   attachment 178251 [details]
   attachment 178261 [details]
   attachment 178271 [details]
   attachment 178281 [details]
   attachment 178291 [details]
3. build the kernel and try again.

You can use the following kernel boot parameter to try various tests:
"acpi.ec_event_clearing=status" this is the current behavior, before patching
"acpi.ec_event_clearing=queue" this is the previous behavior you've tested here
"acpi.ec_event_clearing=event" this is the new approach, and is the default behavior after patching

Hope you can find one of them working for you.
If not, please enable the EC driver debugging log as stated in comment 5 and use acpi.ec_event_clearing=event boot parameter to capture the debugging information for me to learn.

Thanks and best regards
-Lv
Comment 15 Gabriele Mazzotta 2015-05-29 21:16:42 UTC
Created attachment 178351 [details]
dmesg - acpi.ec_event_clearing=event

I've tested the changes and unfortunately no combination worked.
"acpi.ec_event_clearing=event" causes some issues. _Qxx methods are executed only the first time a function key is pressed and subsequent presses do nothing, at least not immediately. After some time I get the effect of the keypresses.

Here attached you can see the logs. I added "pressing brightness down twice" right before I pressed that function key. (I'm not sure that the second time I pressed it only twice, maybe more than that)

Just to the record, I used "query" instead of "queue" while testing the patches as the latter is wrong.

Thanks,
Gabriele
Comment 16 Lv Zheng 2015-05-29 23:47:41 UTC
I can see SCI_EVT cleared very early.
Could you also upload the acpi.ec_event_clearing=query log and acpidump output here.
I only need the key press logs.
Before testing please type dmesg -c.
After testing save dmesg -c output and upload here.

Thanks in advance.

Best regards
Comment 17 Lv Zheng 2015-05-29 23:59:45 UTC
It looks there's a bug in the event mode.
And query mode should be working for this firmware.
And nothing is execute in _Q81.
So I need the query mode log and acpidump to learn.
Comment 18 Gabriele Mazzotta 2015-05-30 08:48:17 UTC
Created attachment 178371 [details]
dmesg - acpi.ec_event_clearing=query

~36 seconds: single keypress
~51 seconds: single keypress
~69 seconds: longer keypress
Comment 19 Gabriele Mazzotta 2015-05-30 08:48:46 UTC
Created attachment 178381 [details]
acpidump
Comment 20 Lv Zheng 2015-06-03 01:52:27 UTC
Hi, Gabriele

For the single press, only 1 QR_EC/_Qxx per press can be seen:
[   36.814403] ACPI : EC: ===== IRQ (1) =====
[   36.814423] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   36.814427] ACPI : EC: ##### Command(QR_EC) submitted/blocked #####
[   36.814481] ACPI : EC: ##### Event started #####
[   36.814485] ACPI : EC: 2: Increase command
[   36.814488] ACPI : EC: ***** Command(QR_EC) started *****
[   36.814491] ACPI : EC: ===== TASK (1) =====
[   36.814503] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   36.814517] ACPI : EC: EC_SC(W) = 0x84
[   36.815724] ACPI : EC: ===== IRQ (1) =====
[   36.815743] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   36.815750] ACPI : EC: EC_DATA(R) = 0x81
[   36.815753] ACPI : EC: ##### Command(QR_EC) unblocked #####
[   36.815755] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[   36.815831] ACPI : EC: ***** Command(QR_EC) stopped *****
[   36.815836] ACPI : EC: 1: Decrease command
[   36.815854] ACPI : EC: ##### Query(0x81) scheduled #####
[   36.815882] ACPI : EC: ##### Event stopped #####
[   36.815908] ACPI : EC: ##### Query(0x81) started #####
[   37.019961] ACPI : EC: ##### Query(0x81) stopped #####

Unlike the previous reported issue which contains 2 QR_EC/_Qxx per press:
[   47.213439] ACPI : EC: ===== IRQ (1) =====
[   47.213455] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[   47.213458] ACPI : EC: ***** Event started *****
[   47.213509] ACPI : EC: ***** Command(QR_EC) started *****
[   47.213522] ACPI : EC: ===== TASK (1) =====
[   47.213533] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[   47.213536] ACPI : EC: EC_SC(W) = 0x84
[   47.213544] ACPI : EC: ***** Event stopped *****
[   47.213547] ACPI : EC: ***** Event started *****
[   47.218812] ACPI : EC: ===== TASK (1) =====
[   47.218829] ACPI : EC: EC_SC(R) = 0x2a SCI_EVT=1 BURST=0 CMD=1 IBF=1 OBF=0
[   47.220195] ACPI : EC: ===== IRQ (1) =====
[   47.220212] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   47.220219] ACPI : EC: EC_DATA(R) = 0x80
[   47.220222] ACPI : EC: ***** Command(QR_EC) hardware completion *****
[   47.220251] ACPI : EC: ***** Command(QR_EC) stopped *****
[   47.220260] ACPI : EC: ##### Query(0x80) scheduled #####
[   47.220273] ACPI : EC: ***** Command(QR_EC) started *****
[   47.220277] ACPI : EC: ===== TASK (1) =====
[   47.220290] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[   47.220293] ACPI : EC: ##### Query(0x80) started #####
[   47.220301] ACPI : EC: EC_SC(W) = 0x84
[   47.220308] ACPI : EC: ***** Event stopped *****
[   47.221021] ACPI : EC: ===== IRQ (0) =====
[   47.221037] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   47.221045] ACPI : EC: EC_DATA(R) = 0x00
[   47.221048] ACPI : EC: ***** Command(QR_EC) hardware completion *****
[   47.221063] ACPI : EC: ***** Command(QR_EC) stopped *****
[   47.422969] ACPI : EC: ##### Query(0x80) stopped #####

So this bug can be fixed by the new mode "acpi.ec_event_clearing=query".

For the long press, let me check it later in another reply.

Thanks and best regards
-Lv
Comment 21 Lv Zheng 2015-06-03 02:16:21 UTC
Hi, Gabriele

From the failed "acpi.ec_event_clearing=event" mode log, I can learn that SCI_EVT acts like a level triggered IRQ on your platform:
[   31.099902] ACPI : EC: ===== IRQ (1) =====
[   31.099915] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   33.149272] ACPI : EC: ===== IRQ (2) =====
[   33.149285] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   35.404185] ACPI : EC: ===== IRQ (1) =====
[   35.404198] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   37.659051] ACPI : EC: ===== IRQ (1) =====
[   37.659064] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   39.913953] ACPI : EC: ===== IRQ (1) =====
[   39.913973] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   42.168735] ACPI : EC: ===== IRQ (2) =====
[   42.168755] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   44.433657] ACPI : EC: ===== IRQ (1) =====
[   44.433675] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   46.688460] ACPI : EC: ===== IRQ (1) =====
[   46.688480] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   48.953299] ACPI : EC: ===== IRQ (1) =====
[   48.953313] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   51.208082] ACPI : EC: ===== IRQ (1) =====
[   51.208101] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   53.462800] ACPI : EC: ===== IRQ (1) =====
[   53.462812] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   55.717612] ACPI : EC: ===== IRQ (1) =====
[   55.717631] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   57.977782] ACPI : EC: ===== IRQ (1) =====
[   57.977798] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   60.232075] ACPI : EC: ===== IRQ (1) =====
[   60.232090] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   62.491695] ACPI : EC: ===== IRQ (2) =====
[   62.491716] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   64.746366] ACPI : EC: ===== IRQ (1) =====
[   64.746379] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   67.000930] ACPI : EC: ===== IRQ (3) =====
[   67.000942] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   69.255545] ACPI : EC: ===== IRQ (1) =====
[   69.255565] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   71.510128] ACPI : EC: ===== IRQ (1) =====
[   71.510141] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   73.769730] ACPI : EC: ===== IRQ (1) =====
[   73.769743] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   76.024321] ACPI : EC: ===== IRQ (1) =====
[   76.024337] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   78.281688] ACPI : EC: ===== IRQ (0) =====
[   78.281704] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   80.538478] ACPI : EC: ===== IRQ (1) =====
[   80.538491] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
This is interesting.

I guess you want to tune Linux EC driver to better fit this behavior.
Let's check the "acpi.ec_event_clearing=query mode" log.
I can see each QR_EC write can clear the indication:
1st IRQ:
[   68.745905] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
...
[   68.746027] ACPI : EC: EC_SC(W) = 0x84
...
[   68.747646] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
2nd IRQ:
[   69.015153] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
...
[   69.015232] ACPI : EC: EC_SC(W) = 0x84
...
[   69.016507] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
3rd IRQ:
[   69.075032] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
...
[   69.075111] ACPI : EC: EC_SC(W) = 0x84
...
[   69.076370] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
4th IRQ:
[   69.194623] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
...
[   69.194679] ACPI : EC: EC_SC(W) = 0x84
...
[   69.197427] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1

So IMO, this mode can fit your firmware.

But you may think this as a problem. From the 2nd IRQ, _Q81 evaluation is blocked due to unknown reasons:
[   69.016556] ACPI : EC: ##### Query(0x81) started #####
And it completed after 3rd/4th IRQs:
[   69.221093] ACPI : EC: ##### Query(0x81) stopped #####
[   69.221100] ACPI : EC: ##### Query(0x81) started #####
[   69.425237] ACPI : EC: ##### Query(0x81) stopped #####
[   69.425242] ACPI : EC: ##### Query(0x81) started #####
[   69.629397] ACPI : EC: ##### Query(0x81) stopped #####

I decompiled the _Q81 here:
                    Method (_Q81, 0, NotSerialized)  // _Qxx: EC Query
                    {
                        P8XH (Zero, 0x81)
                        If (CondRefOf (HBRT))
                        {
                            HBRT (0x04)
                        }

                        If (IGDS)
                        {
                            If (And (0x04, DSEN))
                            {
                                BRTN (0x87)
                            }
                            Else
                            {
                                Store (BRTL, Local0)
                                Store (^^^GFX0.CBLV (BRTL, And (Add (BRTL, One), 0xFE, BRTL), 
                                    If (LGreaterEqual (BRTL, 0x0A))
                                        {
                                            Subtract (BRTL, 0x0A, BRTL)
                                        }, ^^^GFX0.AINT (One, BRTL), Store (Zero, Local0)), 
                                                Store (^^^GFX0.CBLV (Local0, And (Local0, 0xFF, Local0), If (LAnd (LGreaterEqual (
                                                Local0, Zero), LLess (Local0, 0x07)))
                                            {
                                                Store (Zero, Local0)
                                            }, If (LAnd (LGreaterEqual (Local0, 0x07), LLess (
                                                    Local0, 0x0D)))
                                            {
                                                Store (One, Local0)
                                            }, If (LAnd (LGreaterEqual (Local0, 0x0D), LLess (
                                                    Local0, 0x13)))
                                            {
                                                Store (0x02, Local0)
                                            }), If (LAnd (LGreaterEqual (Local0, 0x13), LLess (
                                                    Local0, 0x19)))
                                        {
                                            Store (0x03, Local0)
                                        }))
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x19), LLess (Local0, 0x1F)))
                            {
                                Store (0x04, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x1F), LLess (Local0, 0x25)))
                            {
                                Store (0x05, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x25), LLess (Local0, 0x2B)))
                            {
                                Store (0x06, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x2B), LLess (Local0, 0x31)))
                            {
                                Store (0x07, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x31), LLess (Local0, 0x37)))
                            {
                                Store (0x08, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x37), LLess (Local0, 0x3D)))
                            {
                                Store (0x09, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x3D), LLess (Local0, 0x43)))
                            {
                                Store (0x0A, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x43), LLess (Local0, 0x49)))
                            {
                                Store (0x0B, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x49), LLess (Local0, 0x4F)))
                            {
                                Store (0x0C, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x4F), LLess (Local0, 0x55)))
                            {
                                Store (0x0D, Local0)
                            }

                            If (LAnd (LGreaterEqual (Local0, 0x55), LLess (Local0, 0x5B)))
                            {
                                Store (0x0E, Local0)
                            }

                            If (LGreaterEqual (Local0, 0x5B))
                            {
                                Store (0x0F, Local0)
                            }

                            If (LEqual (Local0, Zero)) {}
                            Else
                            {
                                Decrement (Local0)
                            }

                            If (PWRS)
                            {
                                Store (Local0, BRGA)
                            }
                            Else
                            {
                                Store (Local0, BRGD)
                            }
                        }

                        Store (0x03, ^^^^AMW0.INF0)
                        Store (Zero, ^^^^AMW0.INF1)
                        Store (0xE005, ^^^^AMW0.INF2)
                        Store (Local0, ^^^^AMW0.INF3)
                        Notify (AMW0, 0xD0)
                        Sleep (0xC8)
                    }
Unfortunately, I cannot decrypt such a big BIOS cipher text here to know why it is delayed. :-)

So let me just guess what you want:
You may want the _Qxx evaluations serialized for a long press.
TBH, according to the log, I didn't see any issues in EC driver without serializing the _Qxx evaluations. Thus I really don't know why we should serialize all _Qxx evaluations.
But if you want try, here I have improved the ec_event_clearing variation support for you and there is a new mode which can fit your requirement.

Let me post the test request in another post later.

Thanks and best regards
-Lv
Comment 22 Lv Zheng 2015-06-03 02:18:22 UTC
Hi, Gabriele

Please help to confirm the patchset behavior on your platform:

1. The patchset is based on the linux-pm.git/linux-next branch as I need some commits in that branch. So please git pull the following git:

   # git remote add linux-pm git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
   # git fetch linux-pm
   # git branch linux-pm-next linux-pm/linux-next
   # git checkout linux-pm-next

2. Apply the following patches:
   attachment 178641 [details]
   attachment 178651 [details]
   attachment 178661 [details]
   attachment 178671 [details]

3. Enable the EC debugging log:
   Uncomment the following line in drivers/acpi/ec.c
   /* #define DEBUG */
   to
   #define DEBUG

4. Rebuild the kernel

5. Try several SCI_EVT clearing approaches:
   1. Boot the kernel with acpi.ec_event_clearing=status, and do the test, save the dmesg to ec-status-dmsg.txt
   2. Boot the kernel with acpi.ec_event_clearing=query, and do the test, save the dmesg to ec-query-dmsg.txt
   3. Boot the kernel with acpi.ec_event_clearing=event, and do the test, save the dmesg to ec-event-dmsg.txt
   4. Boot the kernel with acpi.ec_event_clearing=method, and do the test, save the dmesg to ec-method-dmsg.txt

6. Upload the 4 dmesg logs here and tell us if you have met some problems with the trials.

IMO, "method" should be able to serialize _Qxx evaluations for you.

Thanks and best regards
-Lv
Comment 23 Gabriele Mazzotta 2015-06-03 18:53:30 UTC
>Unfortunately, I cannot decrypt such a big BIOS cipher text here to know why
>it is delayed. :-)

The cause of the delay should be the "Sleep (0xC8)" that is at the end of the method.
I think that the firmware does not take into account that it takes at least 200ms to completely execute the method and keeps generating events at a fast rate when the function key is pressed. As consequence, the kernel queues more method executions than one would expect.

Anyway, I'll provide the information you need ASAP.

Thanks,
Gabriele
Comment 24 Lv Zheng 2015-06-05 02:29:10 UTC
(In reply to Gabriele Mazzotta from comment #23)
> >Unfortunately, I cannot decrypt such a big BIOS cipher text here to know why
> it is delayed. :-)
> 
> The cause of the delay should be the "Sleep (0xC8)" that is at the end of
> the method.

Yes, I saw it.
What I meant is:
During the long press, 1st _Qxx can complete before 2nd SCI_EVT IRQ handling.
While 2nd _Qxx can only complete after 4th SCI_EVT IRQ handling.
I don't know why the 2nd _Qxx taks longer than the 1st.

> I think that the firmware does not take into account that it takes at least
> 200ms to completely execute the method and keeps generating events at a fast
> rate when the function key is pressed. As consequence, the kernel queues
> more method executions than one would expect.

In the "query" mode, it seems the number of the IRQs can be controlled.
But if you want to control the number of _Qxx evaluation, you may need to try another mode.

> Anyway, I'll provide the information you need ASAP.

The data is useful as:
1. I'm collecting mode-specific logs from various machines because for this issue, there is no spec explicit definition, we have to guess Windows EC driver's behavior.
2. I'm interesting in the "event"/"method" mode on your machine, if the IRQ keeps on firing at the high rate, will there be problems (for example, IRQ storming) when these 2 modes are being used.

Thanks and best regards
-Lv
Comment 25 Gabriele Mazzotta 2015-06-07 17:21:31 UTC
Created attachment 178991 [details]
dmesg - acpi.ec_event_clearing=status

Here the logs as requested in comment #22.

I haven't noticed any issue, function keys seem to behave as usual, no matter which clearing mode I choose.

In the logs I added some lines that might help:
* Single: single keypress
* Three: three subsequent keypresses
* Long: long lasting keypress
Comment 26 Gabriele Mazzotta 2015-06-07 17:21:54 UTC
Created attachment 179001 [details]
dmesg - acpi.ec_event_clearing=query
Comment 27 Gabriele Mazzotta 2015-06-07 17:22:13 UTC
Created attachment 179011 [details]
dmesg - acpi.ec_event_clearing=event
Comment 28 Gabriele Mazzotta 2015-06-07 17:22:30 UTC
Created attachment 179021 [details]
dmesg - acpi.ec_event_clearing=method
Comment 29 Lv Zheng 2015-06-08 01:34:23 UTC
(In reply to Gabriele Mazzotta from comment #25)
> Created attachment 178991 [details]
> dmesg - acpi.ec_event_clearing=status
> 
> Here the logs as requested in comment #22.
> 
> I haven't noticed any issue, function keys seem to behave as usual, no
> matter which clearing mode I choose.

Except for the "status" mode, 2 QR_EC/_Q81 processes can be seen for 1 IRQ.

> 
> In the logs I added some lines that might help:
> * Single: single keypress
> * Three: three subsequent keypresses
> * Long: long lasting keypress

This is very useful.

Marking this as resolved.
I'll close it after the patches are merged by the upstream kernel.

Thanks and best regards
-Lv
Comment 30 Gabriele Mazzotta 2015-06-08 09:18:06 UTC
The bug here reported is not solved though, so I'm not sure it should be marked as resolved.

I didn't have the time to carefully read the patches, but once I do, I'll look for a way to control the number of evaluations.

Thanks,
Gabriele
Comment 31 Lv Zheng 2015-06-09 01:17:14 UTC
Hi, Gabriele

Please take a look at the log.

A. For the "query" mode, the log entries of the single press process are:
[   38.983128] Single
[   39.290194] ACPI : EC: ===== IRQ (0) =====
[   39.290207] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   39.290210] ACPI : EC: ##### Command(QR_EC) submitted/blocked #####
[   39.290223] ACPI : EC: ##### Event started #####
[   39.290225] ACPI : EC: 2: Increase command
[   39.290227] ACPI : EC: ***** Command(QR_EC) started *****
[   39.290229] ACPI : EC: ===== TASK (0) =====
[   39.290237] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   39.290239] ACPI : EC: EC_SC(W) = 0x84

Write QR_EC.

[   39.290778] ACPI : EC: ===== IRQ (1) =====
[   39.290787] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   39.290791] ACPI : EC: EC_DATA(R) = 0x81

Read event value (0x81)

[   39.290792] ACPI : EC: ##### Command(QR_EC) unblocked #####
[   39.290793] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[   39.290800] ACPI : EC: ***** Command(QR_EC) stopped *****
[   39.290802] ACPI : EC: 1: Decrease command
[   39.290805] ACPI : EC: ##### Query(0x81) started #####
[   39.494040] ACPI : EC: ##### Query(0x81) stopped #####

Evaluate _Q81, only once.

[   39.494043] ACPI : EC: ##### Event stopped #####

Conclusion: there is only one evaluation for the single press.
Same behavior (one evaluation) can be seen for the "event"/"method" modes.

B. For the "query" mode, the log entries of the 3 presses process are:
[   54.176920] Three
[   54.503172] ACPI : EC: ===== IRQ (1) =====
[   54.503188] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   54.503190] ACPI : EC: ##### Command(QR_EC) submitted/blocked #####
[   54.503231] ACPI : EC: ##### Event started #####
[   54.503234] ACPI : EC: 2: Increase command
[   54.503236] ACPI : EC: ***** Command(QR_EC) started *****
[   54.503239] ACPI : EC: ===== TASK (1) =====
[   54.503248] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   54.503256] ACPI : EC: EC_SC(W) = 0x84

First press, write QR_EC.

[   54.503733] ACPI : EC: ===== IRQ (0) =====
[   54.503750] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1

SCI_EVT IRQ is cleared here.

[   54.503758] ACPI : EC: EC_DATA(R) = 0x81

First press, read event value (0x81).

[   54.503761] ACPI : EC: ##### Command(QR_EC) unblocked #####
[   54.503763] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[   54.503775] ACPI : EC: ***** Command(QR_EC) stopped *****
[   54.503780] ACPI : EC: 1: Decrease command
[   54.503787] ACPI : EC: ##### Query(0x81) started #####
[   54.706023] ACPI : EC: ##### Query(0x81) stopped #####

First press, evaluate _Q81.

[   54.706026] ACPI : EC: ##### Event stopped #####
[   54.879210] ACPI : EC: ===== IRQ (1) =====
[   54.879223] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   54.879225] ACPI : EC: ##### Command(QR_EC) submitted/blocked #####
[   54.879257] ACPI : EC: ##### Event started #####
[   54.879261] ACPI : EC: 2: Increase command
[   54.879264] ACPI : EC: ***** Command(QR_EC) started *****
[   54.879266] ACPI : EC: ===== TASK (1) =====
[   54.879281] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   54.879283] ACPI : EC: EC_SC(W) = 0x84

Second press, write QR_EC.

[   54.880397] ACPI : EC: ===== IRQ (1) =====
[   54.880409] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1

SCI_EVT IRQ is cleared here.

[   54.880413] ACPI : EC: EC_DATA(R) = 0x81

Second press, read event value (0x81).

[   54.880414] ACPI : EC: ##### Command(QR_EC) unblocked #####
[   54.880415] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[   54.880434] ACPI : EC: ***** Command(QR_EC) stopped *****
[   54.880435] ACPI : EC: 1: Decrease command
[   54.880439] ACPI : EC: ##### Query(0x81) started #####
[   55.082313] ACPI : EC: ##### Query(0x81) stopped #####

Second press, evaluate _Q81.

[   55.082317] ACPI : EC: ##### Event stopped #####
[   55.205066] ACPI : EC: ===== IRQ (2) =====
[   55.205081] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   55.205084] ACPI : EC: ##### Command(QR_EC) submitted/blocked #####
[   55.205177] ACPI : EC: ##### Event started #####
[   55.205179] ACPI : EC: 2: Increase command
[   55.205181] ACPI : EC: ***** Command(QR_EC) started *****
[   55.205182] ACPI : EC: ===== TASK (2) =====
[   55.205190] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   55.205191] ACPI : EC: EC_SC(W) = 0x84

Third press, write QR_EC.

[   55.205728] ACPI : EC: ===== IRQ (1) =====
[   55.205741] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   55.205745] ACPI : EC: EC_DATA(R) = 0x81

Third press, read event value (0x81).

[   55.205746] ACPI : EC: ##### Command(QR_EC) unblocked #####
[   55.205748] ACPI : EC: ##### Command(QR_EC) completed by hardware #####
[   55.205776] ACPI : EC: ***** Command(QR_EC) stopped *****
[   55.205783] ACPI : EC: 1: Decrease command
[   55.205792] ACPI : EC: ##### Query(0x81) started #####
[   55.406543] ACPI : EC: ##### Query(0x81) stopped #####

Third press, evaluate _Q81.

[   55.406550] ACPI : EC: ##### Event stopped #####

Conclusion: there are only 3 evaluations for the 3 presses.
Same behavior (one evaluation) can be seen for the "event"/"method" modes.

C. For the long press tests, we can also see first/second SCI_EVT cleared by the firmware.

According to A, B, C, why do you think the bug hasn't been solved?

D. For the long press tests, we can see the third SCI_EVT remains set until the whole process ends. What has made you think this is an EC driver's problem? What if the EC firmware just resets SCI_EVT after it is cleared due to the fast sampling? You can see even in the "method" mode, this situatiion is same and the "method" is the last position the EC driver can use to re-check the SCI_EVT. Do you think there is any chance for the host EC driver to re-check SCI_EVT later than this position? Why don't you think this is just an EC firmware's key press sampling problem?

Thanks and best regards
-Lv
Comment 32 Lv Zheng 2015-06-09 02:42:21 UTC
Hi, Gabriele

---

According to the long press log, the EC firmware acts in the way we do a
long press in the "notepad": the first several characters appear slowly and
the follow-up characters appear quickly like a "turbo press".
So this is the firmware behavior, and long with the Sleep(0xC8), _Qxx
handling mechanism, you can sense the unwanted user experience.

---

For the EC driver's event clearing timing support, I think there is nothing
can be done. The general EC driver shouldn't drop the SCI_EVT IRQs even the
IRQ is fired in the "turbo" way.

---

You may want the _Qxx evaluations to be parallelized.
1.  Before applying attachment 178651 [details], each _Qxx evaluation is done in a 
    different work item and you still blamed the experience against the old
    driver. I think this might be caused by the following line in 
    acpi_os_execute():
     queue_work_on(0, queue, &dpc->work);
    According to the comment, this line is there for the reasons, so
    deleting it may cause issues. I may have already fixed the root cause
    in this commit:
     Commit: f3e14329517ee190d34455d729430ef9d0473675
     Subject: ACPI / EC: Fix a code path that global lock is not held
    But it is still dangerous to change the this line. So let's stop 
    introducing too many regression unsafe changes in one Linux release
    cycle, given this change is just a tuning.
2. For the attachment 178651 [details], it can make "acpi.ec_event_clearing=method"
   support possible and simpler. IMO, the "event clearing timing" variation
   support can help to fix bug 97381 and bug 82611, remove
   EC_FLAGS_QUERY_HANDSHAKE quirk, learn the Windows expected/compliant
   EC firmware behavior. This should get a higher priority than the
   parallelism tuning. And we may need to stay several Linux release cycles
   to learn the Windows compliant mode.

---

For now, why don't you customize the DSDT by deleting the Sleep(0xC8) using
the methods stated in Documentation/acpi/dsdt-override.txt or
Documentation/acpi/initrd_table_override.txt and try again?

If you want to try the tuning, I can post a patch based on the old driver
for you. But IMO, we shouldn't merge such a regression unsafe commit before
fixing other issues. So even it can meet your requirement, I'll push it to
the upstream after several cycles.

---

Thanks and best regards
-Lv
Comment 33 Lv Zheng 2015-06-09 03:01:19 UTC
Created attachment 179201 [details]
[PATCH] ACPI / OSL: Make _Lxx/_Exx/_Qxx parallelized.

Try to queue _Qxx on different CPUs.
Comment 34 Lv Zheng 2015-06-09 03:03:12 UTC
Hi, Gabriele

Please help to confirm the patchset behavior on your platform:

1. The patchset is based on the linux-pm.git/linux-next branch as I need some commits in that branch. So please git pull the following git:

   # git remote add linux-pm git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
   # git fetch linux-pm
   # git branch linux-pm-next linux-pm/linux-next
   # git checkout linux-pm-next

2. Apply the following patches:
   attachment 179201 [details]

3. Enable the EC debugging log:
   Uncomment the following line in drivers/acpi/ec.c
   /* #define DEBUG */
   to
   #define DEBUG

4. Rebuild the kernel

5. Try the long press and upload the dmesg here. Also tell me the user experience.

Thanks and best regards
-Lv
Comment 35 Gabriele Mazzotta 2015-06-09 10:01:37 UTC
Created attachment 179231 [details]
dmesg - _Qxx parallelized

I tested this patch and it seems that it fixes the problem.

There's something I remembered while testing the patch: Windows can deal with the delay in the method, but I noticed that it is somewhat affected by it. If I keep the function key to lower the brightness pressed for a while and then I immediately press once the function key to increase it, the screen brightness goes immediately up, but often it will also go back one level.
With the last patch here posted, I observe the same behavior in Linux.

I'd say the delay was chosen taking into account the number number of cores.


To get the log here attached, I kept the brightness-down key pressed to bring the brightness from 100% to 0% and as soon as it reached 0% I pressed once the brightness-up key.
The brightness increased soon after I pressed the brightness-up key.


Thanks,
Gabriele
Comment 36 Lv Zheng 2015-06-09 13:56:41 UTC
OK, now I can mark this as resolved.
I need to confirm with the others to see if this commented code can be reverted to the normal behavior.

Thanks and best regards
-Lv
Comment 37 Len Brown 2015-07-22 00:16:44 UTC
LV - is there a fix for this upstream?
Comment 38 Lv Zheng 2015-07-22 01:50:13 UTC
One more thing that I'm not sure is the test result here.
IMO, Linux work queue can automatically re-order its items if one of the item is blocked. So there is some kinds of asynchrony ready for the items that bound to one CPU. Maybe it doesn't do so if the item is not blocked.

Anyway I sent the fix to the ACPI mailing list according to the test result:
https://www.spinics.net/lists/linux-acpi/msg58474.html
Neither I saw it merged nor I received comments.
Maybe the patch which makes all "Notify()" opcode parallel still looks scary for the current ACPICA synchronization implemenation.
But IMO, it's worthy of try as I couldn't find any reason from the spec preventing us from doing so and we may achieve performance cookies by doing so.

If we saw negative reports against such a change, we could offer a less aggressive approach.

Thanks and best regards
-Lv
Comment 39 Lv Zheng 2015-07-22 01:51:36 UTC
I'll try again to ping the ACPI mailing list with a less aggressive approach.

Thanks and best regards
-Lv
Comment 40 Gabriele Mazzotta 2015-07-22 10:55:51 UTC
Hi,

there's something I noticed that I forgot to report. I don't think it's entirely related to the fix, but after a suspend/resume cycle the bug reappears, maybe in a less severe form, but it's there.

Since the patch simply replaces "queue_work_on()" with "queue_work()" I guess the problem is somewhere else, but I thought to report here in case you know what the problem might be.

I can reproduce this consistently. Right after a reboot, the EC methods are executed in parallel and there's almost no delay. If I suspend and resume the laptop, it takes some time to execute all the queued methods.
Comment 41 Chen Yu 2015-07-23 02:46:50 UTC
After resuming from suspend, there's only one cpu running for a while, so during this period queue_work may act like serially, then other non-boot cpus  will be brought up later. Is there any dmesg log we can refer to? (to check if the query method is executed when there's only one cpu available) thanks
Yu
Comment 42 Gabriele Mazzotta 2015-07-23 09:19:08 UTC
Created attachment 183311 [details]
dmesg - _Qxx parallelized after resume

For the situation before the suspension, refer to attachment 179231 [details].

Here the dmesg right after resume. At ~2477 seconds I start pressing the function key.

I tried to wait for around 1 hour and nothing changed, I kept getting "IRQ (0)".
Comment 43 Lv Zheng 2015-07-24 02:36:13 UTC
The QR_EC is 0x84:
Query Embedded Controller (QR_EC) 0x84
ACPI specification, table 12-291.

I can see the work items used for QR_EC submission scheduled on different CPUs:
[ 2451.055383] ACPI : EC: ***** Command(QR_EC) started *****
[ 2451.055384] ACPI : EC: ===== TASK (1) =====
[ 2451.055392] ACPI : EC: EC_SC(R) = 0x38 SCI_EVT=1 BURST=1 CMD=1 IBF=0 OBF=0
[ 2451.055393] ACPI : EC: EC_SC(W) = 0x84
[ 2451.055531] ACPI : EC: ===== IRQ (0) =====
[ 2451.055539] ACPI : EC: EC_SC(R) = 0x39 SCI_EVT=1 BURST=1 CMD=1 IBF=0 OBF=1
[ 2451.055542] ACPI : EC: EC_DATA(R) = 0xbb
...
[ 2451.055563] ACPI : EC: ***** Command(QR_EC) started *****
[ 2451.055564] ACPI : EC: ===== TASK (1) =====
[ 2451.055572] ACPI : EC: EC_SC(R) = 0x38 SCI_EVT=1 BURST=1 CMD=1 IBF=0 OBF=0
[ 2451.055573] ACPI : EC: EC_SC(W) = 0x84
[ 2451.055713] ACPI : EC: ===== IRQ (0) =====
[ 2451.055721] ACPI : EC: EC_SC(R) = 0x19 SCI_EVT=0 BURST=1 CMD=1 IBF=0 OBF=1
[ 2451.055725] ACPI : EC: EC_DATA(R) = 0xb2
...
[ 2454.092321] ACPI : EC: ***** Command(QR_EC) started *****
[ 2454.092323] ACPI : EC: ===== TASK (0) =====
[ 2454.092331] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[ 2454.092333] ACPI : EC: EC_SC(W) = 0x84
[ 2454.092842] ACPI : EC: ===== IRQ (0) =====
[ 2454.092850] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[ 2454.092853] ACPI : EC: EC_DATA(R) = 0x85
...
[ 2454.092877] ACPI : EC: ***** Command(QR_EC) started *****
[ 2454.092879] ACPI : EC: ===== TASK (0) =====
[ 2454.092886] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[ 2454.092887] ACPI : EC: EC_SC(W) = 0x84
[ 2454.093297] ACPI : EC: ===== IRQ (0) =====
[ 2454.093306] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 2454.093309] ACPI : EC: EC_DATA(R) = 0x85
...
[ 2455.197017] ACPI : EC: ***** Command(QR_EC) started *****
[ 2455.197019] ACPI : EC: ===== TASK (0) =====
[ 2455.197027] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[ 2455.197029] ACPI : EC: EC_SC(W) = 0x84
[ 2455.197556] ACPI : EC: ===== IRQ (0) =====
[ 2455.197568] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 2455.197572] ACPI : EC: EC_DATA(R) = 0x85
...
[ 2457.198132] ACPI : EC: ***** Command(QR_EC) started *****
[ 2457.198148] ACPI : EC: ===== TASK (0) =====
[ 2457.198179] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[ 2457.198184] ACPI : EC: EC_SC(W) = 0x84
[ 2457.198539] ACPI : EC: ===== IRQ (0) =====
[ 2457.198556] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[ 2457.198564] ACPI : EC: EC_DATA(R) = 0x85

So it looks the QR_EC is still parallelized.
While we can see after resuming, all IRQs are coming from CPU0 as what you said.
If this is not a firmware mistake, it may imply a Linux APIC trouble, right?

Thanks and best regards
-Lv
Comment 44 Chen Yu 2015-07-24 04:58:43 UTC
So it appears to be all the SCI are distributed to cpu0, then 
queue_work would prefer to put work on cpu0, causing this problem.
we can check the SCI affinity by:
cat /proc/interrupts | grep acpi
cat /proc/irq/"acpi_irq"/smp_affinity
if it is not 'f', maybe it is a software bug,
if it is 'f', maybe we should raise this bug to APIC module.
what do you think?

Yu
Comment 45 Gabriele Mazzotta 2015-07-24 07:46:35 UTC
It seems that the affinity is correct:

$ cat /proc/interrupts | grep acpi
  9:        513        290         33         96   IO-APIC   9-fasteoi   acpi
$ cat /proc/irq/9/smp_affinity
f
Comment 46 Gabriele Mazzotta 2015-07-24 08:19:41 UTC
I tried something as simple as setting the affinity to "f" regardless of what "smp_affinity" reports and the problem got fixed. It's as if the affinity reported by "smp_affinity" is not the actual one.

After resume:
$ cat /proc/irq/9/smp_affinity
f   # IRQ 0 only
$ echo f > /proc/irq/9/smp_affinity
$ cat /proc/irq/9/smp_affinity
f   # IRQ 0-3
Comment 47 Chen Yu 2015-07-28 10:06:57 UTC
can you please try this command before and after resume?(without echo f to
smp_affinity) 
$ service irqbalance status
irqbalance start/running, process 1031
Comment 48 Gabriele Mazzotta 2015-07-28 10:17:50 UTC
I get the same output, except for "Active", which is greater the second time I execute the command as expected.

$ service irqbalance status

● irqbalance.service - LSB: daemon to balance interrupts for SMP systems
   Loaded: loaded (/etc/init.d/irqbalance)
   Active: active (running) since Tue 2015-07-28 12:11:10 CEST; 37s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 773 ExecStart=/etc/init.d/irqbalance start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/irqbalance.service
           └─787 /usr/sbin/irqbalance --pid=/var/run/irqbalance.pid

lug 28 12:11:10 xps13 systemd[1]: Starting LSB: daemon to balance interrupts for SMP systems...
lug 28 12:11:10 xps13 irqbalance[773]: Starting SMP IRQ Balancer: irqbalance.
lug 28 12:11:10 xps13 systemd[1]: Started LSB: daemon to balance interrupts for SMP systems.
Comment 49 Lv Zheng 2015-08-07 05:23:33 UTC
Created attachment 184501 [details]
[PATCH v2] ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations.

Hi,

Could you give this less aggressive approach a try before I send it to the upstream?

Thanks and best regards
-Lv
Comment 50 Gabriele Mazzotta 2015-08-07 10:24:45 UTC
Hi,

I tested this new patch and I might be wrong, but I think it works even even better than the previous one.

Thanks.
Comment 51 Lv Zheng 2015-08-12 03:11:53 UTC
OK, try again to upstream this version.

Thanks
-Lv
Comment 52 Lv Zheng 2015-09-14 05:47:21 UTC
The patch has been accepted by linux-pm.git and will be automatically synchronized to the mainline.
So let me close this issue.

Thanks and best regards
-Lv

Note You need to log in before you can comment on or make changes to this bug.