Created attachment 175151 [details] dmesg Laptop: Lenovo Z50 Affected Kernels: 4.0+ Starting from kernel 4.0, pressing Fn+F11/F12 to adjust backlight has a long delay (2+ seconds). Last working kernel is 3.19.3. I have tried 4.1rc1 which is not fixed.
acpidump please: # acpidump > acpidump.txt And please use acpi_listen to see if the delay occurs in the event delivery or the adjust. Thanks.
Created attachment 175371 [details] acpidump
I press brightness down key once and I immediately get video/brightnessdown BRTDN 00000087 00000000 and brightness goes down 1 step (at the same time). I then press it three more times right after the first press and after 3 second delay I get video/brightnessdown BRTDN 00000087 00000000 video/brightnessdown BRTDN 00000087 00000000 video/brightnessdown BRTDN 00000087 00000000 one right after the other and brightness goes down 3 steps (again, brightness goes down as the event is displayed).
Created attachment 175381 [details] screen capture I pressed brightness down a total of three times at the same time at time 0:01. At 0:01 in you see the first event and brightness goes down. The next two events come in together at 0:07, 6 seconds after the first event.
Lv, Do you want to take a look at this? Thanks.
Hi, Please try the following test: 1. enable EC debug log (uncomment the following line) and recompile the kernel /* #define DEBUG */ 2. boot the kernel and upload the dmesg here 3. then do the test in comment 3 and upload the dmesg here Thanks -Lv
Hi, It could be better to use "dmesg -c" after each brightness key press. And upload several files here, each contains a "dmesg -c" output for one press. Thanks and best regards -Lv
Should I do this against the latest RC, git head, or latest stable? Thanks
Latest RC or git head are both OK I think.
Created attachment 176241 [details] dmesg output with ec debug on Test was run with git head 3e0283a53f7d2f2dae7bc4aa7f3104cb5988018f Files ----- dmesg_boot: immediately after boot dmesg_single_press: after pressing brightness down one time no delay dmesg_three_presses: after pressing brightness down three times in a row no delay first press, delay, then second and third press register together dmesg_single_press_followed_by_3_presses: after pressing brightness down once, waiting 1-2 seconds, then pressing brightness down three times no delay on first press, delay, then next three presses register together
Would you like a comparison output with kernel 3.19 where it is working?
You machine seems to need the ec_flags_query_handshake quirk. It refuses to respond 2nd QR_EC in the way reported in Bug 86211. This kind of firmware seems to be rare (we've bought many test platforms, but none of them acts in this way). So it's still unclear to me if this can be found on a real machine because this is not ACPI spec compliant according to my ACPI spec interpretion. Please: 1. Upload the dmidecode so that we can add the quirk for your platform. 2. You can set the EC_FLAGS_QUERY_HANDSHAKE to 1 and try again to see if the bug can be fixed. Thanks -Lv
Created attachment 176471 [details] dmidecode I will have to get back to you if setting EC_FLAGS_QUERY_HANDSHAKE to 1 fixes the issue. Thank you for looking into it.
I can confirm that EC_FLAGS_QUERY_HANDSHAKE=1 fixes the issue.
One possible approach is to make the whole event handling process serialized. So that all QUERY_HANDSHAKE quirk can be removed and in the meanwhile the Samsung paltform can still be supported (on which, there is a need to send query whatever the EVT is flagged). If you are willing to test, I can try this implementation and post it here. But you need to wait a while. I'll do that after finishing my current tasks. Maybe this is a good thing to do to stop the proliferation of the QUERY_HANDSHAKE quirks. Thanks -Lv
That's not a problem. I can test whenever you are ready.
Created attachment 176981 [details] dmidecode for Lenovo ideapad U430 Exact same issue on a Lenovo ideapad U430. I have attached my dmidecode output.
Lv, Is it clear why this problem starts to appear from v4.0?
(In reply to Aaron Lu from comment #18) > Lv, > > Is it clear why this problem starts to appear from v4.0? I don't know. You should ask reporters to try kernel 3.15 where the EC races are not fixed. See bugzilla 82611 for the same issue on Acer. I silently changed one behavior in the race fix series. Originally next QR_EC is queued up after reading the query value (buggy because SCI_EVT isn't checked in the polling mode and the check isn't synchronized). While I changed the behavior to queue up next QR_EC after writing QR_EC before reading the query value where the SCI_EVT is ensured to be set. This change silently fixed Samsung platforms and didn't break the test platforms in my hands, thus I was about to add event draining support for all platforms. If firmware like Lenovo/Acer can always clear SCI_EVT after preparing the returned query value and cannot respond QR_EC when SCI_EVT is not set (sounds like a dumb synchronous firmware), then this change might be a problem. But unfortunately, we have EC_FLAGS_CLEAR_ON_RESUME quirk that need to be removed which seems to be conflict with such EC_FLAGS_QUERY_HANDSHAKE platforms. I don't know how Windows can survive both platforms. So I have to guess one firmware variation is not real and stay cautious unless the de-facto standard driver behavior is clarified... Thanks and best regards -Lv
Created attachment 177901 [details] [PATCH 1] ACPI / EC: Cleanup transaction state transition. Cleanup transaction state transition.
Created attachment 177911 [details] [PATCH 2] ACPI / EC: Fix potential issue for query loss. Fix potential issue for query loss. This restores one of the old behaviors to submit QR_EC work item.
Created attachment 177921 [details] [PATCH 3] ACPI / EC: Add experimental support to complete query after reading the query value. Add experimental support to complete query after reading the query value. This restores another the old behaviors to submit QR_EC work item.
Also you can try applying attachment 177901 [details], attachment 177911 [details], attachment 177921 [details] on recent kernel instead of trying 3.15. Thanks -Lv
(In reply to Aaron Lu from comment #18) > Lv, > > Is it clear why this problem starts to appear from v4.0? tigrangab: Can you also try a git bisect to find the cause? Thanks -Lv
Yea no problem, I can do it over the weekend.
Created attachment 178011 [details] [PATCH 1/3] ACPI / EC: Cleanup transaction state transition. Oops. There is 1 line wrong in the patch. Sorry that I haven't confirmed it before posting.
Created attachment 178021 [details] [PATCH 2/3] ACPI / EC: Fix potential issue for query loss. Rebase of patch 2.
Created attachment 178031 [details] [PATCH 3/3] ACPI / EC: Add experimental support to complete query after reading the query value. Rebase of patch 3.
Sorry for the mistake. Please try to: 1. apply the following patches: attachment 178011 [details] attachment 178021 [details] attachment 178031 [details] 2. build the kernel and try to see if things are improved. Also we still need the bisection to root cause the issue. Thanks in advance. Best regards -Lv
I'm working on linux-pm.git/linux-next branch. For the linux.git/master branch, I think you can use v4.1-rc4 tag and apply only the following 2 patches: attachment 178011 [details] attachment 178031 [details] The other patch is a no-op and can be ignored. Thanks and best regards -Lv
The two attachments fix the issue (applied against master). I am starting the bisection now.
Thanks for the confirmation. Looking forward to the bisection result. Your result means the query timing doesn't break your platform. But according to bug 94411, I still need to tune it a bit later…… I'll try another fix there and let you also test it to see if it will break your platform. Thanks
The issue started with https://github.com/torvalds/linux/commit/74443bb
This commit makes SCI_EVT checking a part of the main state machine - advance_transaction(). The commit was a good thing to make development proceeded. All QR_EC submissions can then be handled using the 2 new APIs. But making SCI_EVT check earlier for all cases may break the platforms requiring EC_FLAGS_QUERY_HANDSHAKE. At that time, I was thinking there is only 1 such kind of platform. I was trying to implement the adaptive way to handle SCI_EVT clearing timing. And making the timing suitable for this bug the default one. Let me prepare the fix for the upstream. And please help to do a test again for the new approaches, and give me more feedback for the different clearning timing approaches. BTW, could you leave your full name here so that I can add it to the SOB block as Reported-and-tested-by. Thanks and best regards -Lv
I'll test when you are ready. My name is Tigran Gabrielyan.
Created attachment 178641 [details] [PATCH 1] ACPI / EC: Cleanup transaction state transition.
Created attachment 178651 [details] [PATCH 2] ACPI / EC: Cleanup _Qxx evaluation work item.
Created attachment 178661 [details] [PATCH 3] ACPI / EC: Convert event handling work queue into loop style.
Created attachment 178671 [details] [PATCH 4] ACPI / EC: Add event clearing variation support.
Hi, Tigran Please help to confirm the EC firmware behavior of your platform: 1. The patchset is based on the linux-pm.git/linux-next branch as I need some commits in that branch. So please git pull the following git: # git remote add linux-pm git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git # git fetch linux-pm # git branch linux-pm-next linux-pm/linux-next # git checkout linux-pm-next 2. Apply the following patches: attachment 178641 [details] attachment 178651 [details] attachment 178661 [details] attachment 178671 [details] 3. Enable the EC debugging log: Uncomment the following line in drivers/acpi/ec.c /* #define DEBUG */ to #define DEBUG 4. Rebuild the kernel 5. Try several SCI_EVT clearing approaches: 1. Boot the kernel with acpi.ec_event_clearing=status, and do the test, save the dmesg to ec-status-dmsg.txt 2. Boot the kernel with acpi.ec_event_clearing=query, and do the test, save the dmesg to ec-query-dmsg.txt 3. Boot the kernel with acpi.ec_event_clearing=event, and do the test, save the dmesg to ec-event-dmsg.txt 4. Boot the kernel with acpi.ec_event_clearing=method, and do the test, save the dmesg to ec-method-dmsg.txt 6. Upload the 4 dmesg logs here and tell us if you have met some problems with the trials. IMO, "status" should still be broken for your platform, and all the other 3 modes should be working for your platform. Thanks and best regards -Lv
Created attachment 178791 [details] ec_event_clearing=status
Created attachment 178801 [details] ec_event_clearing=query
Created attachment 178811 [details] ec_event_clearing=event
Created attachment 178821 [details] ec_event_clearing=method
All four methods work as expected.
Good to know. Let me mark this as resolved. I'll close it after upstreaming. Thanks for the report. Best regards -Lv
Patchset has been merged by Linux upstreamed: commit f8b8eb71533338654f39211a87efeca35055566b commit 9d8993be2d9149bc8b3132dad030ff5960f5abcc commit 1d68d2612c2e7309166fa43d8e27eb163435527f commit 3cb02aeb28dd3f1b8a132fa3ecf6db17afd518d6 commit 66db383439b51b1aa920f3579da644fb5fdb2b7c Closing the bug... Thanks and best regards -Lv