Created attachment 169541 [details] dmesg when function keys are pressed My laptop (XPS13 9333) has two function keys to control the screen brightness. When pressed, two _Qxx ACPI methods are called, one to increase and one to decrease the brightness. Both the methods sleep for 200ms. This pause of 200ms is causing some problems when I keep one of the function keys pressed. In this situation, more queries than needed are scheduled for a delayed execution, making the screen brightness change even after the function key is released. If I keep the function key pressed for X seconds, it will take roughly 2X seconds for all the queries to be executed. Here attached see some log lines that show the problem.
Hi, Gabriele: EC firmware flags SCI_EVT, and EC OS driver sends QR_EC command to obtain the the event number, and evaluate _Qxx (xx is the number of the event) to handle the event. The QR_EC command is comprised of a 0x84 (QR_EC) EC_SC register write and an EC_SC register read to obtain the event number. The problem is we have several design concerns here: 1. The ACPI specification doesn't mention when the SCI_EVT should be cleared. We've observed on Samsung platforms that: 1. SCI_EVT is cleared right after accepting the 0x84 (QR_EC) EC_SC register write and before the host driver reading the EC_SC register; 2. It may have queued up additional events without flagging SCI_EVT, expecting EC OS driver to continously send QR_EC command until an event number of 0x00 is returned. We have the following event handling process for a single event indication: Write QR_EC to EC_SC, read from EC_SC, evaluate _Qxx. Currently, we rely on QR_EC to be queued up when SCI_EVT is flagged because: 2. The ACPI specification doesn't metion if 0x84 (QR_EC) can be written to the EC_SC register when SCI_EVT is 0. We've observed on Acer platforms that writing a 0x84 (QR_EC) to EC_SC register when SCI_EVT makes the platform hung. And according to the fact 1, we might have the event handling process serialized for the next event happened after the _Qxx evaluation and continue the process until 0x00 is returned. We call it event draining support. IMO, implementing things in this way can solve your problem. But unfortenately, we've observed the following conflicts: 3. There is no guarantee that ACPICA will not die during _Qxx evaluation. So if things are serialized in this way, next QR_EC may be blocked by _Qxx evaluation. Thus current implementation doesn't serialize _Qxx evaluation in it. 4. The ACPI specification doesn't mention 0x00 should always be returned. We've observed on other platforms that: 1. If we keep on sending 0x84 (QR_EC) to EC_SC, particular event number other than 0x00 is always returned and there is no _Qxx in the namespace to handle this event number. So from the above discussion, you may see that the current implementation may have performance issues but should be immune to the possible regressions. The improvement of draining events is on our plan, we are just waiting for the current upstreamed EC fixes to be well tested by the end users. I'll post the patches here for you to test when they are ready. Thanks and best regards -Lv
Hi, Please try to: 1. apply the following patches: attachment 178011 [details] attachment 178021 [details] attachment 178031 [details] 2. build the kernel and try to see if things are improved. Thanks in advance. Best regards -Lv
I'm working on linux-pm.git/linux-next branch. For the linux.git/master branch, I think you can use v4.1-rc5 tag and apply only the following 2 patches: attachment 178011 [details] attachment 178031 [details] The other patch is a no-op and can be ignored. Thanks and best regards -Lv
I tested "attachment 178011 [details]" and "attachment 178031 [details]" on top of v4.1-rc5 and the problem is still there. Do you need some logs? Should I try with linux-pm.git/linux-next as it might have some other relevant changes I'm not aware of? Thanks.
Then we need the EC debugging log to learn. Can you uncomment the following line in drivers/acpi/ec.c /* #define DEBUG */ to #define DEBUG rebuild the kernel and retry. And upload the log here. Thanks -Lv
No need to use linux-pm.git. There is no change around the timing the QR_EC is allowed.
Created attachment 178231 [details] [PATCH 1/5] ACPI / EC: Cleanup transaction state transition.
Created attachment 178241 [details] [PATCH 2/5] ACPI / EC: Convert event handling work queue into loop style.
Created attachment 178251 [details] [PATCH 1/5] ACPI / EC: Cleanup transaction state transition.
Created attachment 178261 [details] [PATCH 2/5] ACPI / EC: Convert event handling work queue into loop style.
Created attachment 178271 [details] [PATCH 3/5] ACPI / EC: Add event clearing variation support.
Created attachment 178281 [details] [PATCH 4/5] ACPI / EC: Add experimental support to complete query after reading the query value.
Created attachment 178291 [details] [PATCH 5/5] ACPI / EC: Reduce transaction retry times.
Hi, I made the QR_EC submission a bit later than the previous approach, hope this can help you. Please: 1. Download linux-pm.git/linux-next branch as I need the ec_guard() commits in this repo. 2. apply the following patches: attachment 178251 [details] attachment 178261 [details] attachment 178271 [details] attachment 178281 [details] attachment 178291 [details] 3. build the kernel and try again. You can use the following kernel boot parameter to try various tests: "acpi.ec_event_clearing=status" this is the current behavior, before patching "acpi.ec_event_clearing=queue" this is the previous behavior you've tested here "acpi.ec_event_clearing=event" this is the new approach, and is the default behavior after patching Hope you can find one of them working for you. If not, please enable the EC driver debugging log as stated in comment 5 and use acpi.ec_event_clearing=event boot parameter to capture the debugging information for me to learn. Thanks and best regards -Lv
Created attachment 178351 [details] dmesg - acpi.ec_event_clearing=event I've tested the changes and unfortunately no combination worked. "acpi.ec_event_clearing=event" causes some issues. _Qxx methods are executed only the first time a function key is pressed and subsequent presses do nothing, at least not immediately. After some time I get the effect of the keypresses. Here attached you can see the logs. I added "pressing brightness down twice" right before I pressed that function key. (I'm not sure that the second time I pressed it only twice, maybe more than that) Just to the record, I used "query" instead of "queue" while testing the patches as the latter is wrong. Thanks, Gabriele
I can see SCI_EVT cleared very early. Could you also upload the acpi.ec_event_clearing=query log and acpidump output here. I only need the key press logs. Before testing please type dmesg -c. After testing save dmesg -c output and upload here. Thanks in advance. Best regards
It looks there's a bug in the event mode. And query mode should be working for this firmware. And nothing is execute in _Q81. So I need the query mode log and acpidump to learn.
Created attachment 178371 [details] dmesg - acpi.ec_event_clearing=query ~36 seconds: single keypress ~51 seconds: single keypress ~69 seconds: longer keypress
Created attachment 178381 [details] acpidump
Hi, Gabriele For the single press, only 1 QR_EC/_Qxx per press can be seen: [ 36.814403] ACPI : EC: ===== IRQ (1) ===== [ 36.814423] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 36.814427] ACPI : EC: ##### Command(QR_EC) submitted/blocked ##### [ 36.814481] ACPI : EC: ##### Event started ##### [ 36.814485] ACPI : EC: 2: Increase command [ 36.814488] ACPI : EC: ***** Command(QR_EC) started ***** [ 36.814491] ACPI : EC: ===== TASK (1) ===== [ 36.814503] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 36.814517] ACPI : EC: EC_SC(W) = 0x84 [ 36.815724] ACPI : EC: ===== IRQ (1) ===== [ 36.815743] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 36.815750] ACPI : EC: EC_DATA(R) = 0x81 [ 36.815753] ACPI : EC: ##### Command(QR_EC) unblocked ##### [ 36.815755] ACPI : EC: ##### Command(QR_EC) completed by hardware ##### [ 36.815831] ACPI : EC: ***** Command(QR_EC) stopped ***** [ 36.815836] ACPI : EC: 1: Decrease command [ 36.815854] ACPI : EC: ##### Query(0x81) scheduled ##### [ 36.815882] ACPI : EC: ##### Event stopped ##### [ 36.815908] ACPI : EC: ##### Query(0x81) started ##### [ 37.019961] ACPI : EC: ##### Query(0x81) stopped ##### Unlike the previous reported issue which contains 2 QR_EC/_Qxx per press: [ 47.213439] ACPI : EC: ===== IRQ (1) ===== [ 47.213455] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 47.213458] ACPI : EC: ***** Event started ***** [ 47.213509] ACPI : EC: ***** Command(QR_EC) started ***** [ 47.213522] ACPI : EC: ===== TASK (1) ===== [ 47.213533] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 47.213536] ACPI : EC: EC_SC(W) = 0x84 [ 47.213544] ACPI : EC: ***** Event stopped ***** [ 47.213547] ACPI : EC: ***** Event started ***** [ 47.218812] ACPI : EC: ===== TASK (1) ===== [ 47.218829] ACPI : EC: EC_SC(R) = 0x2a SCI_EVT=1 BURST=0 CMD=1 IBF=1 OBF=0 [ 47.220195] ACPI : EC: ===== IRQ (1) ===== [ 47.220212] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 47.220219] ACPI : EC: EC_DATA(R) = 0x80 [ 47.220222] ACPI : EC: ***** Command(QR_EC) hardware completion ***** [ 47.220251] ACPI : EC: ***** Command(QR_EC) stopped ***** [ 47.220260] ACPI : EC: ##### Query(0x80) scheduled ##### [ 47.220273] ACPI : EC: ***** Command(QR_EC) started ***** [ 47.220277] ACPI : EC: ===== TASK (1) ===== [ 47.220290] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0 [ 47.220293] ACPI : EC: ##### Query(0x80) started ##### [ 47.220301] ACPI : EC: EC_SC(W) = 0x84 [ 47.220308] ACPI : EC: ***** Event stopped ***** [ 47.221021] ACPI : EC: ===== IRQ (0) ===== [ 47.221037] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 47.221045] ACPI : EC: EC_DATA(R) = 0x00 [ 47.221048] ACPI : EC: ***** Command(QR_EC) hardware completion ***** [ 47.221063] ACPI : EC: ***** Command(QR_EC) stopped ***** [ 47.422969] ACPI : EC: ##### Query(0x80) stopped ##### So this bug can be fixed by the new mode "acpi.ec_event_clearing=query". For the long press, let me check it later in another reply. Thanks and best regards -Lv
Hi, Gabriele From the failed "acpi.ec_event_clearing=event" mode log, I can learn that SCI_EVT acts like a level triggered IRQ on your platform: [ 31.099902] ACPI : EC: ===== IRQ (1) ===== [ 31.099915] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 33.149272] ACPI : EC: ===== IRQ (2) ===== [ 33.149285] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 35.404185] ACPI : EC: ===== IRQ (1) ===== [ 35.404198] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 37.659051] ACPI : EC: ===== IRQ (1) ===== [ 37.659064] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 39.913953] ACPI : EC: ===== IRQ (1) ===== [ 39.913973] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 42.168735] ACPI : EC: ===== IRQ (2) ===== [ 42.168755] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 44.433657] ACPI : EC: ===== IRQ (1) ===== [ 44.433675] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 46.688460] ACPI : EC: ===== IRQ (1) ===== [ 46.688480] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 48.953299] ACPI : EC: ===== IRQ (1) ===== [ 48.953313] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 51.208082] ACPI : EC: ===== IRQ (1) ===== [ 51.208101] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 53.462800] ACPI : EC: ===== IRQ (1) ===== [ 53.462812] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 55.717612] ACPI : EC: ===== IRQ (1) ===== [ 55.717631] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 57.977782] ACPI : EC: ===== IRQ (1) ===== [ 57.977798] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 60.232075] ACPI : EC: ===== IRQ (1) ===== [ 60.232090] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 62.491695] ACPI : EC: ===== IRQ (2) ===== [ 62.491716] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 64.746366] ACPI : EC: ===== IRQ (1) ===== [ 64.746379] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 67.000930] ACPI : EC: ===== IRQ (3) ===== [ 67.000942] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 69.255545] ACPI : EC: ===== IRQ (1) ===== [ 69.255565] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 71.510128] ACPI : EC: ===== IRQ (1) ===== [ 71.510141] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 73.769730] ACPI : EC: ===== IRQ (1) ===== [ 73.769743] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 76.024321] ACPI : EC: ===== IRQ (1) ===== [ 76.024337] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 78.281688] ACPI : EC: ===== IRQ (0) ===== [ 78.281704] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 80.538478] ACPI : EC: ===== IRQ (1) ===== [ 80.538491] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 This is interesting. I guess you want to tune Linux EC driver to better fit this behavior. Let's check the "acpi.ec_event_clearing=query mode" log. I can see each QR_EC write can clear the indication: 1st IRQ: [ 68.745905] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 ... [ 68.746027] ACPI : EC: EC_SC(W) = 0x84 ... [ 68.747646] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 2nd IRQ: [ 69.015153] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 ... [ 69.015232] ACPI : EC: EC_SC(W) = 0x84 ... [ 69.016507] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 3rd IRQ: [ 69.075032] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 ... [ 69.075111] ACPI : EC: EC_SC(W) = 0x84 ... [ 69.076370] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 4th IRQ: [ 69.194623] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 ... [ 69.194679] ACPI : EC: EC_SC(W) = 0x84 ... [ 69.197427] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 So IMO, this mode can fit your firmware. But you may think this as a problem. From the 2nd IRQ, _Q81 evaluation is blocked due to unknown reasons: [ 69.016556] ACPI : EC: ##### Query(0x81) started ##### And it completed after 3rd/4th IRQs: [ 69.221093] ACPI : EC: ##### Query(0x81) stopped ##### [ 69.221100] ACPI : EC: ##### Query(0x81) started ##### [ 69.425237] ACPI : EC: ##### Query(0x81) stopped ##### [ 69.425242] ACPI : EC: ##### Query(0x81) started ##### [ 69.629397] ACPI : EC: ##### Query(0x81) stopped ##### I decompiled the _Q81 here: Method (_Q81, 0, NotSerialized) // _Qxx: EC Query { P8XH (Zero, 0x81) If (CondRefOf (HBRT)) { HBRT (0x04) } If (IGDS) { If (And (0x04, DSEN)) { BRTN (0x87) } Else { Store (BRTL, Local0) Store (^^^GFX0.CBLV (BRTL, And (Add (BRTL, One), 0xFE, BRTL), If (LGreaterEqual (BRTL, 0x0A)) { Subtract (BRTL, 0x0A, BRTL) }, ^^^GFX0.AINT (One, BRTL), Store (Zero, Local0)), Store (^^^GFX0.CBLV (Local0, And (Local0, 0xFF, Local0), If (LAnd (LGreaterEqual ( Local0, Zero), LLess (Local0, 0x07))) { Store (Zero, Local0) }, If (LAnd (LGreaterEqual (Local0, 0x07), LLess ( Local0, 0x0D))) { Store (One, Local0) }, If (LAnd (LGreaterEqual (Local0, 0x0D), LLess ( Local0, 0x13))) { Store (0x02, Local0) }), If (LAnd (LGreaterEqual (Local0, 0x13), LLess ( Local0, 0x19))) { Store (0x03, Local0) })) } If (LAnd (LGreaterEqual (Local0, 0x19), LLess (Local0, 0x1F))) { Store (0x04, Local0) } If (LAnd (LGreaterEqual (Local0, 0x1F), LLess (Local0, 0x25))) { Store (0x05, Local0) } If (LAnd (LGreaterEqual (Local0, 0x25), LLess (Local0, 0x2B))) { Store (0x06, Local0) } If (LAnd (LGreaterEqual (Local0, 0x2B), LLess (Local0, 0x31))) { Store (0x07, Local0) } If (LAnd (LGreaterEqual (Local0, 0x31), LLess (Local0, 0x37))) { Store (0x08, Local0) } If (LAnd (LGreaterEqual (Local0, 0x37), LLess (Local0, 0x3D))) { Store (0x09, Local0) } If (LAnd (LGreaterEqual (Local0, 0x3D), LLess (Local0, 0x43))) { Store (0x0A, Local0) } If (LAnd (LGreaterEqual (Local0, 0x43), LLess (Local0, 0x49))) { Store (0x0B, Local0) } If (LAnd (LGreaterEqual (Local0, 0x49), LLess (Local0, 0x4F))) { Store (0x0C, Local0) } If (LAnd (LGreaterEqual (Local0, 0x4F), LLess (Local0, 0x55))) { Store (0x0D, Local0) } If (LAnd (LGreaterEqual (Local0, 0x55), LLess (Local0, 0x5B))) { Store (0x0E, Local0) } If (LGreaterEqual (Local0, 0x5B)) { Store (0x0F, Local0) } If (LEqual (Local0, Zero)) {} Else { Decrement (Local0) } If (PWRS) { Store (Local0, BRGA) } Else { Store (Local0, BRGD) } } Store (0x03, ^^^^AMW0.INF0) Store (Zero, ^^^^AMW0.INF1) Store (0xE005, ^^^^AMW0.INF2) Store (Local0, ^^^^AMW0.INF3) Notify (AMW0, 0xD0) Sleep (0xC8) } Unfortunately, I cannot decrypt such a big BIOS cipher text here to know why it is delayed. :-) So let me just guess what you want: You may want the _Qxx evaluations serialized for a long press. TBH, according to the log, I didn't see any issues in EC driver without serializing the _Qxx evaluations. Thus I really don't know why we should serialize all _Qxx evaluations. But if you want try, here I have improved the ec_event_clearing variation support for you and there is a new mode which can fit your requirement. Let me post the test request in another post later. Thanks and best regards -Lv
Hi, Gabriele Please help to confirm the patchset behavior on your platform: 1. The patchset is based on the linux-pm.git/linux-next branch as I need some commits in that branch. So please git pull the following git: # git remote add linux-pm git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git # git fetch linux-pm # git branch linux-pm-next linux-pm/linux-next # git checkout linux-pm-next 2. Apply the following patches: attachment 178641 [details] attachment 178651 [details] attachment 178661 [details] attachment 178671 [details] 3. Enable the EC debugging log: Uncomment the following line in drivers/acpi/ec.c /* #define DEBUG */ to #define DEBUG 4. Rebuild the kernel 5. Try several SCI_EVT clearing approaches: 1. Boot the kernel with acpi.ec_event_clearing=status, and do the test, save the dmesg to ec-status-dmsg.txt 2. Boot the kernel with acpi.ec_event_clearing=query, and do the test, save the dmesg to ec-query-dmsg.txt 3. Boot the kernel with acpi.ec_event_clearing=event, and do the test, save the dmesg to ec-event-dmsg.txt 4. Boot the kernel with acpi.ec_event_clearing=method, and do the test, save the dmesg to ec-method-dmsg.txt 6. Upload the 4 dmesg logs here and tell us if you have met some problems with the trials. IMO, "method" should be able to serialize _Qxx evaluations for you. Thanks and best regards -Lv
>Unfortunately, I cannot decrypt such a big BIOS cipher text here to know why >it is delayed. :-) The cause of the delay should be the "Sleep (0xC8)" that is at the end of the method. I think that the firmware does not take into account that it takes at least 200ms to completely execute the method and keeps generating events at a fast rate when the function key is pressed. As consequence, the kernel queues more method executions than one would expect. Anyway, I'll provide the information you need ASAP. Thanks, Gabriele
(In reply to Gabriele Mazzotta from comment #23) > >Unfortunately, I cannot decrypt such a big BIOS cipher text here to know why > it is delayed. :-) > > The cause of the delay should be the "Sleep (0xC8)" that is at the end of > the method. Yes, I saw it. What I meant is: During the long press, 1st _Qxx can complete before 2nd SCI_EVT IRQ handling. While 2nd _Qxx can only complete after 4th SCI_EVT IRQ handling. I don't know why the 2nd _Qxx taks longer than the 1st. > I think that the firmware does not take into account that it takes at least > 200ms to completely execute the method and keeps generating events at a fast > rate when the function key is pressed. As consequence, the kernel queues > more method executions than one would expect. In the "query" mode, it seems the number of the IRQs can be controlled. But if you want to control the number of _Qxx evaluation, you may need to try another mode. > Anyway, I'll provide the information you need ASAP. The data is useful as: 1. I'm collecting mode-specific logs from various machines because for this issue, there is no spec explicit definition, we have to guess Windows EC driver's behavior. 2. I'm interesting in the "event"/"method" mode on your machine, if the IRQ keeps on firing at the high rate, will there be problems (for example, IRQ storming) when these 2 modes are being used. Thanks and best regards -Lv
Created attachment 178991 [details] dmesg - acpi.ec_event_clearing=status Here the logs as requested in comment #22. I haven't noticed any issue, function keys seem to behave as usual, no matter which clearing mode I choose. In the logs I added some lines that might help: * Single: single keypress * Three: three subsequent keypresses * Long: long lasting keypress
Created attachment 179001 [details] dmesg - acpi.ec_event_clearing=query
Created attachment 179011 [details] dmesg - acpi.ec_event_clearing=event
Created attachment 179021 [details] dmesg - acpi.ec_event_clearing=method
(In reply to Gabriele Mazzotta from comment #25) > Created attachment 178991 [details] > dmesg - acpi.ec_event_clearing=status > > Here the logs as requested in comment #22. > > I haven't noticed any issue, function keys seem to behave as usual, no > matter which clearing mode I choose. Except for the "status" mode, 2 QR_EC/_Q81 processes can be seen for 1 IRQ. > > In the logs I added some lines that might help: > * Single: single keypress > * Three: three subsequent keypresses > * Long: long lasting keypress This is very useful. Marking this as resolved. I'll close it after the patches are merged by the upstream kernel. Thanks and best regards -Lv
The bug here reported is not solved though, so I'm not sure it should be marked as resolved. I didn't have the time to carefully read the patches, but once I do, I'll look for a way to control the number of evaluations. Thanks, Gabriele
Hi, Gabriele Please take a look at the log. A. For the "query" mode, the log entries of the single press process are: [ 38.983128] Single [ 39.290194] ACPI : EC: ===== IRQ (0) ===== [ 39.290207] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 39.290210] ACPI : EC: ##### Command(QR_EC) submitted/blocked ##### [ 39.290223] ACPI : EC: ##### Event started ##### [ 39.290225] ACPI : EC: 2: Increase command [ 39.290227] ACPI : EC: ***** Command(QR_EC) started ***** [ 39.290229] ACPI : EC: ===== TASK (0) ===== [ 39.290237] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 39.290239] ACPI : EC: EC_SC(W) = 0x84 Write QR_EC. [ 39.290778] ACPI : EC: ===== IRQ (1) ===== [ 39.290787] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 39.290791] ACPI : EC: EC_DATA(R) = 0x81 Read event value (0x81) [ 39.290792] ACPI : EC: ##### Command(QR_EC) unblocked ##### [ 39.290793] ACPI : EC: ##### Command(QR_EC) completed by hardware ##### [ 39.290800] ACPI : EC: ***** Command(QR_EC) stopped ***** [ 39.290802] ACPI : EC: 1: Decrease command [ 39.290805] ACPI : EC: ##### Query(0x81) started ##### [ 39.494040] ACPI : EC: ##### Query(0x81) stopped ##### Evaluate _Q81, only once. [ 39.494043] ACPI : EC: ##### Event stopped ##### Conclusion: there is only one evaluation for the single press. Same behavior (one evaluation) can be seen for the "event"/"method" modes. B. For the "query" mode, the log entries of the 3 presses process are: [ 54.176920] Three [ 54.503172] ACPI : EC: ===== IRQ (1) ===== [ 54.503188] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 54.503190] ACPI : EC: ##### Command(QR_EC) submitted/blocked ##### [ 54.503231] ACPI : EC: ##### Event started ##### [ 54.503234] ACPI : EC: 2: Increase command [ 54.503236] ACPI : EC: ***** Command(QR_EC) started ***** [ 54.503239] ACPI : EC: ===== TASK (1) ===== [ 54.503248] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 54.503256] ACPI : EC: EC_SC(W) = 0x84 First press, write QR_EC. [ 54.503733] ACPI : EC: ===== IRQ (0) ===== [ 54.503750] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 SCI_EVT IRQ is cleared here. [ 54.503758] ACPI : EC: EC_DATA(R) = 0x81 First press, read event value (0x81). [ 54.503761] ACPI : EC: ##### Command(QR_EC) unblocked ##### [ 54.503763] ACPI : EC: ##### Command(QR_EC) completed by hardware ##### [ 54.503775] ACPI : EC: ***** Command(QR_EC) stopped ***** [ 54.503780] ACPI : EC: 1: Decrease command [ 54.503787] ACPI : EC: ##### Query(0x81) started ##### [ 54.706023] ACPI : EC: ##### Query(0x81) stopped ##### First press, evaluate _Q81. [ 54.706026] ACPI : EC: ##### Event stopped ##### [ 54.879210] ACPI : EC: ===== IRQ (1) ===== [ 54.879223] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 54.879225] ACPI : EC: ##### Command(QR_EC) submitted/blocked ##### [ 54.879257] ACPI : EC: ##### Event started ##### [ 54.879261] ACPI : EC: 2: Increase command [ 54.879264] ACPI : EC: ***** Command(QR_EC) started ***** [ 54.879266] ACPI : EC: ===== TASK (1) ===== [ 54.879281] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 54.879283] ACPI : EC: EC_SC(W) = 0x84 Second press, write QR_EC. [ 54.880397] ACPI : EC: ===== IRQ (1) ===== [ 54.880409] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 SCI_EVT IRQ is cleared here. [ 54.880413] ACPI : EC: EC_DATA(R) = 0x81 Second press, read event value (0x81). [ 54.880414] ACPI : EC: ##### Command(QR_EC) unblocked ##### [ 54.880415] ACPI : EC: ##### Command(QR_EC) completed by hardware ##### [ 54.880434] ACPI : EC: ***** Command(QR_EC) stopped ***** [ 54.880435] ACPI : EC: 1: Decrease command [ 54.880439] ACPI : EC: ##### Query(0x81) started ##### [ 55.082313] ACPI : EC: ##### Query(0x81) stopped ##### Second press, evaluate _Q81. [ 55.082317] ACPI : EC: ##### Event stopped ##### [ 55.205066] ACPI : EC: ===== IRQ (2) ===== [ 55.205081] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 55.205084] ACPI : EC: ##### Command(QR_EC) submitted/blocked ##### [ 55.205177] ACPI : EC: ##### Event started ##### [ 55.205179] ACPI : EC: 2: Increase command [ 55.205181] ACPI : EC: ***** Command(QR_EC) started ***** [ 55.205182] ACPI : EC: ===== TASK (2) ===== [ 55.205190] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 55.205191] ACPI : EC: EC_SC(W) = 0x84 Third press, write QR_EC. [ 55.205728] ACPI : EC: ===== IRQ (1) ===== [ 55.205741] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 55.205745] ACPI : EC: EC_DATA(R) = 0x81 Third press, read event value (0x81). [ 55.205746] ACPI : EC: ##### Command(QR_EC) unblocked ##### [ 55.205748] ACPI : EC: ##### Command(QR_EC) completed by hardware ##### [ 55.205776] ACPI : EC: ***** Command(QR_EC) stopped ***** [ 55.205783] ACPI : EC: 1: Decrease command [ 55.205792] ACPI : EC: ##### Query(0x81) started ##### [ 55.406543] ACPI : EC: ##### Query(0x81) stopped ##### Third press, evaluate _Q81. [ 55.406550] ACPI : EC: ##### Event stopped ##### Conclusion: there are only 3 evaluations for the 3 presses. Same behavior (one evaluation) can be seen for the "event"/"method" modes. C. For the long press tests, we can also see first/second SCI_EVT cleared by the firmware. According to A, B, C, why do you think the bug hasn't been solved? D. For the long press tests, we can see the third SCI_EVT remains set until the whole process ends. What has made you think this is an EC driver's problem? What if the EC firmware just resets SCI_EVT after it is cleared due to the fast sampling? You can see even in the "method" mode, this situatiion is same and the "method" is the last position the EC driver can use to re-check the SCI_EVT. Do you think there is any chance for the host EC driver to re-check SCI_EVT later than this position? Why don't you think this is just an EC firmware's key press sampling problem? Thanks and best regards -Lv
Hi, Gabriele --- According to the long press log, the EC firmware acts in the way we do a long press in the "notepad": the first several characters appear slowly and the follow-up characters appear quickly like a "turbo press". So this is the firmware behavior, and long with the Sleep(0xC8), _Qxx handling mechanism, you can sense the unwanted user experience. --- For the EC driver's event clearing timing support, I think there is nothing can be done. The general EC driver shouldn't drop the SCI_EVT IRQs even the IRQ is fired in the "turbo" way. --- You may want the _Qxx evaluations to be parallelized. 1. Before applying attachment 178651 [details], each _Qxx evaluation is done in a different work item and you still blamed the experience against the old driver. I think this might be caused by the following line in acpi_os_execute(): queue_work_on(0, queue, &dpc->work); According to the comment, this line is there for the reasons, so deleting it may cause issues. I may have already fixed the root cause in this commit: Commit: f3e14329517ee190d34455d729430ef9d0473675 Subject: ACPI / EC: Fix a code path that global lock is not held But it is still dangerous to change the this line. So let's stop introducing too many regression unsafe changes in one Linux release cycle, given this change is just a tuning. 2. For the attachment 178651 [details], it can make "acpi.ec_event_clearing=method" support possible and simpler. IMO, the "event clearing timing" variation support can help to fix bug 97381 and bug 82611, remove EC_FLAGS_QUERY_HANDSHAKE quirk, learn the Windows expected/compliant EC firmware behavior. This should get a higher priority than the parallelism tuning. And we may need to stay several Linux release cycles to learn the Windows compliant mode. --- For now, why don't you customize the DSDT by deleting the Sleep(0xC8) using the methods stated in Documentation/acpi/dsdt-override.txt or Documentation/acpi/initrd_table_override.txt and try again? If you want to try the tuning, I can post a patch based on the old driver for you. But IMO, we shouldn't merge such a regression unsafe commit before fixing other issues. So even it can meet your requirement, I'll push it to the upstream after several cycles. --- Thanks and best regards -Lv
Created attachment 179201 [details] [PATCH] ACPI / OSL: Make _Lxx/_Exx/_Qxx parallelized. Try to queue _Qxx on different CPUs.
Hi, Gabriele Please help to confirm the patchset behavior on your platform: 1. The patchset is based on the linux-pm.git/linux-next branch as I need some commits in that branch. So please git pull the following git: # git remote add linux-pm git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git # git fetch linux-pm # git branch linux-pm-next linux-pm/linux-next # git checkout linux-pm-next 2. Apply the following patches: attachment 179201 [details] 3. Enable the EC debugging log: Uncomment the following line in drivers/acpi/ec.c /* #define DEBUG */ to #define DEBUG 4. Rebuild the kernel 5. Try the long press and upload the dmesg here. Also tell me the user experience. Thanks and best regards -Lv
Created attachment 179231 [details] dmesg - _Qxx parallelized I tested this patch and it seems that it fixes the problem. There's something I remembered while testing the patch: Windows can deal with the delay in the method, but I noticed that it is somewhat affected by it. If I keep the function key to lower the brightness pressed for a while and then I immediately press once the function key to increase it, the screen brightness goes immediately up, but often it will also go back one level. With the last patch here posted, I observe the same behavior in Linux. I'd say the delay was chosen taking into account the number number of cores. To get the log here attached, I kept the brightness-down key pressed to bring the brightness from 100% to 0% and as soon as it reached 0% I pressed once the brightness-up key. The brightness increased soon after I pressed the brightness-up key. Thanks, Gabriele
OK, now I can mark this as resolved. I need to confirm with the others to see if this commented code can be reverted to the normal behavior. Thanks and best regards -Lv
LV - is there a fix for this upstream?
One more thing that I'm not sure is the test result here. IMO, Linux work queue can automatically re-order its items if one of the item is blocked. So there is some kinds of asynchrony ready for the items that bound to one CPU. Maybe it doesn't do so if the item is not blocked. Anyway I sent the fix to the ACPI mailing list according to the test result: https://www.spinics.net/lists/linux-acpi/msg58474.html Neither I saw it merged nor I received comments. Maybe the patch which makes all "Notify()" opcode parallel still looks scary for the current ACPICA synchronization implemenation. But IMO, it's worthy of try as I couldn't find any reason from the spec preventing us from doing so and we may achieve performance cookies by doing so. If we saw negative reports against such a change, we could offer a less aggressive approach. Thanks and best regards -Lv
I'll try again to ping the ACPI mailing list with a less aggressive approach. Thanks and best regards -Lv
Hi, there's something I noticed that I forgot to report. I don't think it's entirely related to the fix, but after a suspend/resume cycle the bug reappears, maybe in a less severe form, but it's there. Since the patch simply replaces "queue_work_on()" with "queue_work()" I guess the problem is somewhere else, but I thought to report here in case you know what the problem might be. I can reproduce this consistently. Right after a reboot, the EC methods are executed in parallel and there's almost no delay. If I suspend and resume the laptop, it takes some time to execute all the queued methods.
After resuming from suspend, there's only one cpu running for a while, so during this period queue_work may act like serially, then other non-boot cpus will be brought up later. Is there any dmesg log we can refer to? (to check if the query method is executed when there's only one cpu available) thanks Yu
Created attachment 183311 [details] dmesg - _Qxx parallelized after resume For the situation before the suspension, refer to attachment 179231 [details]. Here the dmesg right after resume. At ~2477 seconds I start pressing the function key. I tried to wait for around 1 hour and nothing changed, I kept getting "IRQ (0)".
The QR_EC is 0x84: Query Embedded Controller (QR_EC) 0x84 ACPI specification, table 12-291. I can see the work items used for QR_EC submission scheduled on different CPUs: [ 2451.055383] ACPI : EC: ***** Command(QR_EC) started ***** [ 2451.055384] ACPI : EC: ===== TASK (1) ===== [ 2451.055392] ACPI : EC: EC_SC(R) = 0x38 SCI_EVT=1 BURST=1 CMD=1 IBF=0 OBF=0 [ 2451.055393] ACPI : EC: EC_SC(W) = 0x84 [ 2451.055531] ACPI : EC: ===== IRQ (0) ===== [ 2451.055539] ACPI : EC: EC_SC(R) = 0x39 SCI_EVT=1 BURST=1 CMD=1 IBF=0 OBF=1 [ 2451.055542] ACPI : EC: EC_DATA(R) = 0xbb ... [ 2451.055563] ACPI : EC: ***** Command(QR_EC) started ***** [ 2451.055564] ACPI : EC: ===== TASK (1) ===== [ 2451.055572] ACPI : EC: EC_SC(R) = 0x38 SCI_EVT=1 BURST=1 CMD=1 IBF=0 OBF=0 [ 2451.055573] ACPI : EC: EC_SC(W) = 0x84 [ 2451.055713] ACPI : EC: ===== IRQ (0) ===== [ 2451.055721] ACPI : EC: EC_SC(R) = 0x19 SCI_EVT=0 BURST=1 CMD=1 IBF=0 OBF=1 [ 2451.055725] ACPI : EC: EC_DATA(R) = 0xb2 ... [ 2454.092321] ACPI : EC: ***** Command(QR_EC) started ***** [ 2454.092323] ACPI : EC: ===== TASK (0) ===== [ 2454.092331] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 2454.092333] ACPI : EC: EC_SC(W) = 0x84 [ 2454.092842] ACPI : EC: ===== IRQ (0) ===== [ 2454.092850] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1 [ 2454.092853] ACPI : EC: EC_DATA(R) = 0x85 ... [ 2454.092877] ACPI : EC: ***** Command(QR_EC) started ***** [ 2454.092879] ACPI : EC: ===== TASK (0) ===== [ 2454.092886] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 2454.092887] ACPI : EC: EC_SC(W) = 0x84 [ 2454.093297] ACPI : EC: ===== IRQ (0) ===== [ 2454.093306] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 2454.093309] ACPI : EC: EC_DATA(R) = 0x85 ... [ 2455.197017] ACPI : EC: ***** Command(QR_EC) started ***** [ 2455.197019] ACPI : EC: ===== TASK (0) ===== [ 2455.197027] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 2455.197029] ACPI : EC: EC_SC(W) = 0x84 [ 2455.197556] ACPI : EC: ===== IRQ (0) ===== [ 2455.197568] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 2455.197572] ACPI : EC: EC_DATA(R) = 0x85 ... [ 2457.198132] ACPI : EC: ***** Command(QR_EC) started ***** [ 2457.198148] ACPI : EC: ===== TASK (0) ===== [ 2457.198179] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 2457.198184] ACPI : EC: EC_SC(W) = 0x84 [ 2457.198539] ACPI : EC: ===== IRQ (0) ===== [ 2457.198556] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1 [ 2457.198564] ACPI : EC: EC_DATA(R) = 0x85 So it looks the QR_EC is still parallelized. While we can see after resuming, all IRQs are coming from CPU0 as what you said. If this is not a firmware mistake, it may imply a Linux APIC trouble, right? Thanks and best regards -Lv
So it appears to be all the SCI are distributed to cpu0, then queue_work would prefer to put work on cpu0, causing this problem. we can check the SCI affinity by: cat /proc/interrupts | grep acpi cat /proc/irq/"acpi_irq"/smp_affinity if it is not 'f', maybe it is a software bug, if it is 'f', maybe we should raise this bug to APIC module. what do you think? Yu
It seems that the affinity is correct: $ cat /proc/interrupts | grep acpi 9: 513 290 33 96 IO-APIC 9-fasteoi acpi $ cat /proc/irq/9/smp_affinity f
I tried something as simple as setting the affinity to "f" regardless of what "smp_affinity" reports and the problem got fixed. It's as if the affinity reported by "smp_affinity" is not the actual one. After resume: $ cat /proc/irq/9/smp_affinity f # IRQ 0 only $ echo f > /proc/irq/9/smp_affinity $ cat /proc/irq/9/smp_affinity f # IRQ 0-3
can you please try this command before and after resume?(without echo f to smp_affinity) $ service irqbalance status irqbalance start/running, process 1031
I get the same output, except for "Active", which is greater the second time I execute the command as expected. $ service irqbalance status ● irqbalance.service - LSB: daemon to balance interrupts for SMP systems Loaded: loaded (/etc/init.d/irqbalance) Active: active (running) since Tue 2015-07-28 12:11:10 CEST; 37s ago Docs: man:systemd-sysv-generator(8) Process: 773 ExecStart=/etc/init.d/irqbalance start (code=exited, status=0/SUCCESS) CGroup: /system.slice/irqbalance.service └─787 /usr/sbin/irqbalance --pid=/var/run/irqbalance.pid lug 28 12:11:10 xps13 systemd[1]: Starting LSB: daemon to balance interrupts for SMP systems... lug 28 12:11:10 xps13 irqbalance[773]: Starting SMP IRQ Balancer: irqbalance. lug 28 12:11:10 xps13 systemd[1]: Started LSB: daemon to balance interrupts for SMP systems.
Created attachment 184501 [details] [PATCH v2] ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations. Hi, Could you give this less aggressive approach a try before I send it to the upstream? Thanks and best regards -Lv
Hi, I tested this new patch and I might be wrong, but I think it works even even better than the previous one. Thanks.
OK, try again to upstream this version. Thanks -Lv
The patch has been accepted by linux-pm.git and will be automatically synchronized to the mainline. So let me close this issue. Thanks and best regards -Lv