Bug 97381

Summary: Delay in backlight adjusting
Product: ACPI Reporter: Tigran Gabrielyan (tigrangab)
Component: ECAssignee: Lv Zheng (lv.zheng)
Status: CLOSED CODE_FIX    
Severity: normal CC: aaron.lu, frederic, lv.zheng
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.0+ Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
acpidump
screen capture
dmesg output with ec debug on
dmidecode
dmidecode for Lenovo ideapad U430
[PATCH 1] ACPI / EC: Cleanup transaction state transition.
[PATCH 2] ACPI / EC: Fix potential issue for query loss.
[PATCH 3] ACPI / EC: Add experimental support to complete query after reading the query value.
[PATCH 1/3] ACPI / EC: Cleanup transaction state transition.
[PATCH 2/3] ACPI / EC: Fix potential issue for query loss.
[PATCH 3/3] ACPI / EC: Add experimental support to complete query after reading the query value.
[PATCH 1] ACPI / EC: Cleanup transaction state transition.
[PATCH 2] ACPI / EC: Cleanup _Qxx evaluation work item.
[PATCH 3] ACPI / EC: Convert event handling work queue into loop style.
[PATCH 4] ACPI / EC: Add event clearing variation support.
ec_event_clearing=status
ec_event_clearing=query
ec_event_clearing=event
ec_event_clearing=method

Description Tigran Gabrielyan 2015-04-27 18:10:03 UTC
Created attachment 175151 [details]
dmesg

Laptop: Lenovo Z50
Affected Kernels: 4.0+

Starting from kernel 4.0, pressing Fn+F11/F12 to adjust backlight has a long delay (2+ seconds).

Last working kernel is 3.19.3. I have tried 4.1rc1 which is not fixed.
Comment 1 Aaron Lu 2015-04-30 02:11:22 UTC
acpidump please:
# acpidump > acpidump.txt

And please use acpi_listen to see if the delay occurs in the event delivery or the adjust.
Thanks.
Comment 2 Tigran Gabrielyan 2015-04-30 15:56:24 UTC
Created attachment 175371 [details]
acpidump
Comment 3 Tigran Gabrielyan 2015-04-30 16:01:55 UTC
I press brightness down key once and I immediately get

video/brightnessdown BRTDN 00000087 00000000 and brightness goes down 1 step (at the same time).

I then press it three more times right after the first press and after 3 second delay I get

video/brightnessdown BRTDN 00000087 00000000
video/brightnessdown BRTDN 00000087 00000000
video/brightnessdown BRTDN 00000087 00000000

one right after the other and brightness goes down 3 steps (again, brightness goes down as the event is displayed).
Comment 4 Tigran Gabrielyan 2015-04-30 16:24:24 UTC
Created attachment 175381 [details]
screen capture

I pressed brightness down a total of three times at the same time at time 0:01.

At 0:01 in you see the first event and brightness goes down.

The next two events come in together at 0:07, 6 seconds after the first event.
Comment 5 Aaron Lu 2015-05-06 08:45:00 UTC
Lv,

Do you want to take a look at this? Thanks.
Comment 6 Lv Zheng 2015-05-07 01:57:51 UTC
Hi,

Please try the following test:

1. enable EC debug log (uncomment the following line) and recompile the kernel
/* #define DEBUG */

2. boot the kernel and upload the dmesg here

3. then do the test in comment 3 and upload the dmesg here

Thanks
-Lv
Comment 7 Lv Zheng 2015-05-07 05:02:34 UTC
Hi,

It could be better to use "dmesg -c" after each brightness key press.
And upload several files here, each contains a "dmesg -c" output for one press.

Thanks and best regards
-Lv
Comment 8 Tigran Gabrielyan 2015-05-07 16:21:23 UTC
Should I do this against the latest RC, git head, or latest stable?

Thanks
Comment 9 Aaron Lu 2015-05-08 01:39:47 UTC
Latest RC or git head are both OK I think.
Comment 10 Tigran Gabrielyan 2015-05-09 02:32:05 UTC
Created attachment 176241 [details]
dmesg output with ec debug on

Test was run with git head 3e0283a53f7d2f2dae7bc4aa7f3104cb5988018f

Files
-----
dmesg_boot: immediately after boot

dmesg_single_press: after pressing brightness down one time
no delay

dmesg_three_presses: after pressing brightness down three times in a row
no delay first press, delay, then second and third press register together

dmesg_single_press_followed_by_3_presses: after pressing brightness down once, waiting 1-2 seconds, then pressing brightness down three times
no delay on first press, delay, then next three presses register together
Comment 11 Tigran Gabrielyan 2015-05-09 02:33:07 UTC
Would you like a comparison output with kernel 3.19 where it is working?
Comment 12 Lv Zheng 2015-05-11 01:01:23 UTC
You machine seems to need the ec_flags_query_handshake quirk.
It refuses to respond 2nd QR_EC in the way reported in Bug 86211.
This kind of firmware seems to be rare (we've bought many test platforms, but none of them acts in this way). So it's still unclear to me if this can be found on a real machine because this is not ACPI spec compliant according to my ACPI spec interpretion.

Please:
1. Upload the dmidecode so that we can add the quirk for your platform.
2. You can set the EC_FLAGS_QUERY_HANDSHAKE to 1 and try again to see if the bug can be fixed.

Thanks
-Lv
Comment 13 Tigran Gabrielyan 2015-05-11 22:12:15 UTC
Created attachment 176471 [details]
dmidecode

I will have to get back to you if setting EC_FLAGS_QUERY_HANDSHAKE to 1 fixes the issue.

Thank you for looking into it.
Comment 14 Tigran Gabrielyan 2015-05-12 00:04:41 UTC
I can confirm that EC_FLAGS_QUERY_HANDSHAKE=1 fixes the issue.
Comment 15 Lv Zheng 2015-05-12 02:07:30 UTC
One possible approach is to make the whole event handling process serialized.

So that all QUERY_HANDSHAKE quirk can be removed and in the meanwhile the Samsung paltform can still be supported (on which, there is a need to send query whatever the EVT is flagged).

If you are willing to test, I can try this implementation and post it here.
But you need to wait a while. I'll do that after finishing my current tasks.

Maybe this is a good thing to do to stop the proliferation of the QUERY_HANDSHAKE quirks.

Thanks
-Lv
Comment 16 Tigran Gabrielyan 2015-05-12 02:31:12 UTC
That's not a problem. I can test whenever you are ready.
Comment 17 frederic 2015-05-15 10:10:59 UTC
Created attachment 176981 [details]
dmidecode for Lenovo ideapad U430

Exact same issue on a Lenovo ideapad U430. I have attached my dmidecode output.
Comment 18 Aaron Lu 2015-05-18 02:41:27 UTC
Lv,

Is it clear why this problem starts to appear from v4.0?
Comment 19 Lv Zheng 2015-05-26 08:23:28 UTC
(In reply to Aaron Lu from comment #18)
> Lv,
> 
> Is it clear why this problem starts to appear from v4.0?

I don't know.
You should ask reporters to try kernel 3.15 where the EC races are not fixed.
See bugzilla 82611 for the same issue on Acer.

I silently changed one behavior in the race fix series.

Originally next QR_EC is queued up after reading the query value (buggy because SCI_EVT isn't checked in the polling mode and the check isn't synchronized).
While I changed the behavior to queue up next QR_EC after writing QR_EC before reading the query value where the SCI_EVT is ensured to be set.

This change silently fixed Samsung platforms and didn't break the test platforms in my hands, thus I was about to add event draining support for all platforms.

If firmware like Lenovo/Acer can always clear SCI_EVT after preparing the returned query value and cannot respond QR_EC when SCI_EVT is not set (sounds like a dumb synchronous firmware), then this change might be a problem.

But unfortunately, we have EC_FLAGS_CLEAR_ON_RESUME quirk that need to be removed which seems to be conflict with such EC_FLAGS_QUERY_HANDSHAKE platforms.

I don't know how Windows can survive both platforms.

So I have to guess one firmware variation is not real and stay cautious unless the de-facto standard driver behavior is clarified...

Thanks and best regards
-Lv
Comment 20 Lv Zheng 2015-05-26 08:32:48 UTC
Created attachment 177901 [details]
[PATCH 1] ACPI / EC: Cleanup transaction state transition.

Cleanup transaction state transition.
Comment 21 Lv Zheng 2015-05-26 08:34:16 UTC
Created attachment 177911 [details]
[PATCH 2] ACPI / EC: Fix potential issue for query loss.

Fix potential issue for query loss.

This restores one of the old behaviors to submit QR_EC work item.
Comment 22 Lv Zheng 2015-05-26 08:35:07 UTC
Created attachment 177921 [details]
[PATCH 3] ACPI / EC: Add experimental support to complete query after reading the query value.

Add experimental support to complete query after reading the query value.

This restores another the old behaviors to submit QR_EC work item.
Comment 23 Lv Zheng 2015-05-26 08:36:18 UTC
Also you can try applying attachment 177901 [details], attachment 177911 [details], attachment 177921 [details] on recent kernel instead of trying 3.15.

Thanks
-Lv
Comment 24 Lv Zheng 2015-05-27 01:44:31 UTC
(In reply to Aaron Lu from comment #18)
> Lv,
> 
> Is it clear why this problem starts to appear from v4.0?

tigrangab:

Can you also try a git bisect to find the cause?

Thanks
-Lv
Comment 25 Tigran Gabrielyan 2015-05-27 01:54:51 UTC
Yea no problem, I can do it over the weekend.
Comment 26 Lv Zheng 2015-05-27 06:11:46 UTC
Created attachment 178011 [details]
[PATCH 1/3] ACPI / EC: Cleanup transaction state transition.

Oops.
There is 1 line wrong in the patch.
Sorry that I haven't confirmed it before posting.
Comment 27 Lv Zheng 2015-05-27 06:12:21 UTC
Created attachment 178021 [details]
[PATCH 2/3] ACPI / EC: Fix potential issue for query loss.

Rebase of patch 2.
Comment 28 Lv Zheng 2015-05-27 06:13:09 UTC
Created attachment 178031 [details]
[PATCH 3/3] ACPI / EC: Add experimental support to complete query after reading the query value.

Rebase of patch 3.
Comment 29 Lv Zheng 2015-05-27 06:15:06 UTC
Sorry for the mistake.

Please try to:

1. apply the following patches:
   attachment 178011 [details]
   attachment 178021 [details]
   attachment 178031 [details]
2. build the kernel and try to see if things are improved.

Also we still need the bisection to root cause the issue.

Thanks in advance.

Best regards
-Lv
Comment 30 Lv Zheng 2015-05-27 06:51:26 UTC
I'm working on linux-pm.git/linux-next branch.

For the linux.git/master branch, I think you can use v4.1-rc4 tag and apply only the following 2 patches:
   attachment 178011 [details]
   attachment 178031 [details]
The other patch is a no-op and can be ignored.

Thanks and best regards
-Lv
Comment 31 Tigran Gabrielyan 2015-05-28 20:27:34 UTC
The two attachments fix the issue (applied against master).
I am starting the bisection now.
Comment 32 Lv Zheng 2015-05-28 20:45:08 UTC
Thanks for the confirmation.
Looking forward to the bisection result.
Your result means the query timing doesn't break your platform.
But according to bug 94411, I still need to tune it a bit later……
I'll try another fix there and let you also test it to see if it will break your platform.

Thanks
Comment 33 Tigran Gabrielyan 2015-05-30 05:52:53 UTC
The issue started with https://github.com/torvalds/linux/commit/74443bb
Comment 34 Lv Zheng 2015-06-01 03:19:52 UTC
This commit makes SCI_EVT checking a part of the main state machine - advance_transaction().
The commit was a good thing to make development proceeded.
All QR_EC submissions can then be handled using the 2 new APIs.
But making SCI_EVT check earlier for all cases may break the platforms requiring EC_FLAGS_QUERY_HANDSHAKE.
At that time, I was thinking there is only 1 such kind of platform.

I was trying to implement the adaptive way to handle SCI_EVT clearing timing.
And making the timing suitable for this bug the default one.

Let me prepare the fix for the upstream.
And please help to do a test again for the new approaches, and give me more feedback for the different clearning timing approaches.

BTW, could you leave your full name here so that I can add it to the SOB block as Reported-and-tested-by.

Thanks and best regards
-Lv
Comment 35 Tigran Gabrielyan 2015-06-01 18:08:56 UTC
I'll test when you are ready.

My name is Tigran Gabrielyan.
Comment 36 Lv Zheng 2015-06-03 01:29:29 UTC
Created attachment 178641 [details]
[PATCH 1] ACPI / EC: Cleanup transaction state transition.
Comment 37 Lv Zheng 2015-06-03 01:29:56 UTC
Created attachment 178651 [details]
[PATCH 2] ACPI / EC: Cleanup _Qxx evaluation work item.
Comment 38 Lv Zheng 2015-06-03 01:30:29 UTC
Created attachment 178661 [details]
[PATCH 3] ACPI / EC: Convert event handling work queue into loop style.
Comment 39 Lv Zheng 2015-06-03 01:31:02 UTC
Created attachment 178671 [details]
[PATCH 4] ACPI / EC: Add event clearing variation support.
Comment 40 Lv Zheng 2015-06-03 01:42:27 UTC
Hi, Tigran

Please help to confirm the EC firmware behavior of your platform:

1. The patchset is based on the linux-pm.git/linux-next branch as I need some commits in that branch. So please git pull the following git:

   # git remote add linux-pm git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
   # git fetch linux-pm
   # git branch linux-pm-next linux-pm/linux-next
   # git checkout linux-pm-next

2. Apply the following patches:
   attachment 178641 [details]
   attachment 178651 [details]
   attachment 178661 [details]
   attachment 178671 [details]

3. Enable the EC debugging log:
   Uncomment the following line in drivers/acpi/ec.c
   /* #define DEBUG */
   to
   #define DEBUG

4. Rebuild the kernel

5. Try several SCI_EVT clearing approaches:
   1. Boot the kernel with acpi.ec_event_clearing=status, and do the test, save the dmesg to ec-status-dmsg.txt
   2. Boot the kernel with acpi.ec_event_clearing=query, and do the test, save the dmesg to ec-query-dmsg.txt
   3. Boot the kernel with acpi.ec_event_clearing=event, and do the test, save the dmesg to ec-event-dmsg.txt
   4. Boot the kernel with acpi.ec_event_clearing=method, and do the test, save the dmesg to ec-method-dmsg.txt

6. Upload the 4 dmesg logs here and tell us if you have met some problems with the trials.

IMO, "status" should still be broken for your platform, and all the other 3 modes should be working for your platform.

Thanks and best regards
-Lv
Comment 41 Tigran Gabrielyan 2015-06-03 22:50:10 UTC
Created attachment 178791 [details]
ec_event_clearing=status
Comment 42 Tigran Gabrielyan 2015-06-03 22:51:02 UTC
Created attachment 178801 [details]
ec_event_clearing=query
Comment 43 Tigran Gabrielyan 2015-06-03 22:51:37 UTC
Created attachment 178811 [details]
ec_event_clearing=event
Comment 44 Tigran Gabrielyan 2015-06-03 22:51:55 UTC
Created attachment 178821 [details]
ec_event_clearing=method
Comment 45 Tigran Gabrielyan 2015-06-03 22:53:50 UTC
All four methods work as expected.
Comment 46 Lv Zheng 2015-06-04 01:45:34 UTC
Good to know.
Let me mark this as resolved.
I'll close it after upstreaming.
Thanks for the report.

Best regards
-Lv
Comment 47 Lv Zheng 2015-07-07 02:55:56 UTC
Patchset has been merged by Linux upstreamed:
commit f8b8eb71533338654f39211a87efeca35055566b
commit 9d8993be2d9149bc8b3132dad030ff5960f5abcc
commit 1d68d2612c2e7309166fa43d8e27eb163435527f
commit 3cb02aeb28dd3f1b8a132fa3ecf6db17afd518d6
commit 66db383439b51b1aa920f3579da644fb5fdb2b7c

Closing the bug...

Thanks and best regards
-Lv