Bug 11674

Summary: 2.6.26-rc6 regression - special keys unusable after resume - T20, T21
Product: ACPI Reporter: Karol Lewandowski (lmctlx)
Component: ECAssignee: Alexey Starikovskiy (astarikovskiy)
Status: REJECTED UNREPRODUCIBLE    
Severity: normal CC: acpi-bugzilla, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-rc9 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 10492    
Attachments: Broken (upstream) kernel 2.6.27-rc7 with DEBUG defined in drivers/acpi/ec.c
Broken (upstream) kernel 2.6.27-rc7 with DEBUG defined in drivers/acpi/ec.c and 1b7fc5aae8867046f8d3d45808309d5b7f2e036a reverted
GOOD kernel 2.6.27-rc7 with DEBUG defined in drivers/acpi/ec.c and 1b7fc5aae8867046f8d3d45808309d5b7f2e036a reverted
clear QUERY_PENDING flag on resume
Broken kernel 2.6.27-rc7 with CLEAR_PENDING patch
add ec->flags printout
dmesg from broken kernel 2.6.27-rc7 with CLEAR_PENDING and ec->status printout
dmesg after resume and sysrq commands per comment #12

Description Karol Lewandowski 2008-09-30 07:00:50 UTC
Latest working kernel version:
c21d1e7f53ffd9c0f162c42e7fde07d1c45fa127
i.e. one commit before:

Earliest failing kernel version:
1b7fc5aae8867046f8d3d45808309d5b7f2e036a ACPI: EC: Use msleep instead of udelay 
while waiting for event.

Distribution:
Debian GNU/Linux 4.0r4

Hardware Environment:
IBM Thinkpad T21

Problem Description:
Commit 1b7fc5aae8867046f8d3d45808309d5b7f2e036a causes my Thinkpad T21
to no longer generate any hotkey events (e.g. suspend fn-f4, switch-output
fn-f7, etc.) after resume.

It's regression from 2.6.25.  2.6.26 and 2.6.27-rc7 are broken,
reverting that commit on 2.6.27-rc7 makes keys work again.


Steps to reproduce:
1. boot [hotkeys work here]
2. suspend to ram and resume
3. use special keys fn-fX (X: 3,4,7,12) [hotkeys doesn't work anymore]

Thanks.
Comment 1 Karol Lewandowski 2008-09-30 07:05:34 UTC
Created attachment 18110 [details]
Broken (upstream) kernel 2.6.27-rc7 with DEBUG defined in drivers/acpi/ec.c
Comment 2 Karol Lewandowski 2008-09-30 07:06:20 UTC
Created attachment 18111 [details]
Broken (upstream) kernel 2.6.27-rc7 with DEBUG defined in drivers/acpi/ec.c and 1b7fc5aae8867046f8d3d45808309d5b7f2e036a reverted
Comment 3 Karol Lewandowski 2008-09-30 07:08:10 UTC
Created attachment 18112 [details]
GOOD kernel 2.6.27-rc7 with DEBUG defined in drivers/acpi/ec.c and 1b7fc5aae8867046f8d3d45808309d5b7f2e036a reverted

This is dmesg from working kernel.
Comment 4 Alexey Starikovskiy 2008-09-30 08:26:17 UTC
Created attachment 18113 [details]
clear QUERY_PENDING flag on resume

Please check if this patch helps.
Comment 5 Karol Lewandowski 2008-09-30 12:08:13 UTC
That patch did't help.
Comment 6 Alexey Starikovskiy 2008-09-30 12:17:09 UTC
Could you please attach dmesg from this run?
Comment 7 Karol Lewandowski 2008-09-30 12:32:37 UTC
Created attachment 18121 [details]
Broken kernel 2.6.27-rc7 with CLEAR_PENDING patch
Comment 8 Alexey Starikovskiy 2008-09-30 12:48:56 UTC
Created attachment 18122 [details]
add ec->flags printout

please send dmesg from run with this patch.
Comment 9 Karol Lewandowski 2008-09-30 13:09:02 UTC
Created attachment 18123 [details]
dmesg from broken kernel 2.6.27-rc7 with CLEAR_PENDING and ec->status printout
Comment 10 Karol Lewandowski 2008-09-30 13:11:51 UTC
Last attachment description was wrong, that's ec->flags printout of course.
Sorry for confusion.
Comment 11 Alexey Starikovskiy 2008-09-30 13:40:08 UTC
Ok, so it is the QUERY_PENDING bit being stuck... 0x2 in flags.
Comment 12 Alexey Starikovskiy 2008-10-01 05:23:06 UTC
Could you please check the state of kacpid thread?
You should probably have /proc/sysrq-trigger file,
try to "echo 8 > /proc/sysrq-trigger", then "echo d", "echo w" and "echo t"
they should drop some debug info into dmesg...
Comment 13 Karol Lewandowski 2008-10-02 13:55:49 UTC
Before suspend:
kacpid        S cf89e300     0    65      2
       cf89e300 00000046 cf838494 cf89e300 cf89e558 c03cd160 cf89e760 cf89e300 
       0001018d cf80b280 c0124e6d cf80b288 00000000 c0124eec 00000000 cf89e300 
       c0126e90 cf80b288 cf80b288 cf80b280 00000000 c0126d40 c0126d0a 00000000 
Call Trace:
 [<c0124e6d>] worker_thread+0x0/0xb5
 [<c0124eec>] worker_thread+0x7f/0xb5
 [<c0126e90>] autoremove_wake_function+0x0/0x2d
 [<c0126d40>] kthread+0x36/0x5a
 [<c0126d0a>] kthread+0x0/0x5a
 [<c010409b>] kernel_thread_helper+0x7/0x10
 =======================

After resume:
kacpid        S 003cd2e0     0    65      2
       cf89e300 00000046 cf89455b 003cd2e0 cf89e558 30f26be8 0000001e ceea20a8 
       c01dccf2 cf80b280 c0124e6d cf80b288 00000000 c0124eec 00000000 cf89e300 
       c0126e90 cf80b288 cf80b288 cf80b280 00000000 c0126d40 c0126d0a 00000000 
Call Trace:
 [<c01dccf2>] acpi_os_execute_deferred+0x0/0x25
 [<c0124e6d>] worker_thread+0x0/0xb5
 [<c0124eec>] worker_thread+0x7f/0xb5
 [<c0126e90>] autoremove_wake_function+0x0/0x2d
 [<c0126d40>] kthread+0x36/0x5a
 [<c0126d0a>] kthread+0x0/0x5a
 [<c010409b>] kernel_thread_helper+0x7/0x10
 =======================
.

Thanks.
Comment 14 Karol Lewandowski 2008-10-02 13:57:02 UTC
Ah, there isn't any info from sysrq-d nor -w, I'm possibly missing
some required CONFIG_* options.
Comment 15 Alexey Starikovskiy 2008-10-02 14:01:10 UTC
this might be ok.
could you please check if changing msleep(1) to, say, msleep(5) changes anything?
Comment 16 Karol Lewandowski 2008-10-02 14:56:45 UTC
No, msleep(5) doesn't work.  But udelay(x) does work, where x=500-5000,
x=0 works too.

I'm wondering if that might be some jiffy(clocksource)-related problem.
My T21 has 800MHz but I see sth like:

Detected 335.365 MHz processor.
...
Calibrating delay loop (skipped), value calculated using timer frequency.. 670.73 BogoMIPS (lpj=3353650)

Maybe cpufreq is to blame?
I've tried forcing different clocksources but that didn't help.

Thanks.
Comment 17 Len Brown 2008-10-02 23:56:49 UTC
I have reproduced this regression on a T20 using 2.6.27.rc8.

ie. killing acpid and cat /proc/acpi/event shows
for each Fn key:

Fn F3:
ibm/hotkey HKEY 00000080 00001003

Fn-F4:
ibm/hotkey HKEY 00000080 00001004

Fn-F7:
switches the LCD/VGA

Fn F12:
ibm/hotkey HKEY 00000080 0000100c

Fn End: dims display

Fn Home:
ibm/hotkey HKEY 00000080 00001010

Fn Pgup:
turns light on/off

But after suspend, Fn keys do nothing,
with the exception of Fn-End and Fn_Home
which continue to control brightness
(w/o anything in /proc/acpi/event)
and FnPgUP which continues to control the light
(also w/o anything in /proc/acpi/event)

If 1b7fc5aae8867046f8d3d45808309d5b7f2e036a (mdelay->msleep)
is reverted, then the keys all work properly before and after suspend.
Comment 18 Len Brown 2008-10-03 00:03:48 UTC
after resume, power button still works,
but LID button has stopped working along with the Fn keys.
Comment 19 Len Brown 2008-10-03 01:15:14 UTC
Created attachment 18143 [details]
dmesg after resume and sysrq commands per comment #12
Comment 20 Thomas Gleixner 2008-10-04 02:57:39 UTC
Hmm, I looked at the dmesg and both kacpid and kacpid_notify are sleeping and wait for work.

I have no real idea yet, how to debug this, but the patch which replaced udelay with msleep changes one detail: udelay does not schedule (it might be preempted), but msleep definitely schedules the task away. So the question whether scheduling the task away is causing the problem.

What happens if you put a yield() there instead of the msleep() ?
Comment 21 Karol Lewandowski 2008-10-04 10:25:23 UTC
Keys work after resume with yield().
(Keys work even if there is no msleep/udelay/yield/etc.)

BTW According to http://www.thinkwiki.org/wiki/Category:T20
there were no 550MHz T20s -- Len's CPU clock wasn't properly
calculated (just like mine).  IMVHO clocksource might be culprit
as udelay* doesn't use jiffies and schedule_timeout* does[x].

[x] According to my _very_ limited knowledge^grepping.
Comment 22 Karol Lewandowski 2008-10-04 10:32:29 UTC
Ah, miscalculated jiffies should break keys before suspend to ram...
Never mind.
Comment 23 Karol Lewandowski 2008-10-28 18:08:45 UTC
I can't reproduce this problem with 2.6.28-rc2, all special keys seem to work after resume.
Comment 24 Zhang Rui 2008-12-02 21:58:58 UTC
close this bug as it's not reproducible.
Karol, please re-open it if the problem still exists in the latest upstream kernel.
Comment 25 Karol Lewandowski 2009-09-07 14:11:12 UTC
This bug is present with latest kernel version 2.6.31-rc9.  2.6.30 is broken too.
I'll try to further track this down.

Thanks.
Comment 26 Karol Lewandowski 2009-09-07 14:26:15 UTC
Linux 2.6.29(.6) isn't broken.  I thus assume it's regression from 2.6.29.

(I would happily reopen this bug, but it seems I've no permission to do so.)
Comment 27 Karol Lewandowski 2009-09-07 20:45:42 UTC
Commit 34ff4dbccccce54c83b1234d39b7ad9e548a75dd is to blame, reverting it makes bug go away.