Bug 9147 - atkbd.c: Unknown key released when 'battery' and/or 'thermal' modules are loaded
atkbd.c: Unknown key released when 'battery' and/or 'thermal' modules are loaded
Status: CLOSED UNREPRODUCIBLE
Product: ACPI
Classification: Unclassified
Component: EC
All Linux
: P1 blocking
Assigned To: Zhang Rui
:
Depends on: 8740 9448
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-12 04:22 UTC by Daniele C.
Modified: 2014-08-04 07:15 UTC (History)
20 users (show)

See Also:
Kernel Version: 3.4.0-rc
Tree: Mainline
Regression: No


Attachments
acpidump and interrupts before & after the problem (54.38 KB, application/x-gzip)
2007-11-23 05:15 UTC, Daniele C.
Details
acpidump and interrupts after the problem verification (30.54 KB, application/x-gzip)
2007-11-23 05:31 UTC, Daniele C.
Details
Disable global lock at read field (830 bytes, patch)
2007-11-23 10:12 UTC, Alexey Starikovskiy
Details | Diff
.config files for 2.6.22 and 2.6.24 (patched), dmesg of crash (32.13 KB, application/x-gzip)
2007-11-24 11:43 UTC, Daniele C.
Details
dmesg MSI l745 2.6.24-rc4 (73.87 KB, text/x-log)
2007-12-04 16:20 UTC, Sebastien Caille
Details
first lines of 'dmidecode' output, unique ids and serial numbers removed (1.21 KB, text/plain)
2008-08-28 06:29 UTC, Daniele C.
Details
Patch 1/4 : Don't issue the burst disable command if EC exits the burst mode (2.39 KB, patch)
2008-09-08 19:27 UTC, ykzhao
Details | Diff
Patch 2/4: Clear the query_pending bit only after processing EC notification event (2.21 KB, patch)
2008-09-08 19:29 UTC, ykzhao
Details | Diff
Patch 3/4: Simplify EC working flowchart and always enable EC GPE (6.84 KB, patch)
2008-09-08 19:30 UTC, ykzhao
Details | Diff
patch 4/4: Add some udelay in EC GPE handler to avoid EC GPE interrupt storm (4.92 KB, patch)
2008-09-08 19:32 UTC, ykzhao
Details | Diff
My dmidecode output (9.84 KB, text/plain)
2008-12-29 13:51 UTC, Daniele C.
Details
DSDT AML for Maxdata 7000X, fixed errors and compiled with iasl 20061109 (16.32 KB, application/octet-stream)
2009-01-18 02:11 UTC, Daniele C.
Details
Tentative workaround patch (1.07 KB, patch)
2009-01-30 18:59 UTC, Grégory SCHMITT
Details | Diff
full list of patches applied by Fedora11 to the 2.6.29 kernel (5.57 KB, text/plain)
2009-07-02 17:01 UTC, Daniele C.
Details
test script to trigger kernel bug 9147 (113 bytes, text/plain)
2009-07-02 19:41 UTC, Daniele C.
Details
bug 9147 test results on kernels 2.6.29 (w/o fedora11 patches), 2.6.29-5 (w/o fedora11 patches), 2.6.30 (435 bytes, text/plain)
2009-07-03 13:07 UTC, Daniele C.
Details
.config with minimalistic features for my system (62.12 KB, text/plain)
2009-07-03 13:10 UTC, Daniele C.
Details
9147test.sh (590 bytes, text/plain)
2009-12-24 15:37 UTC, Daniele C.
Details
dmesg of last 4 runs of 9147test.sh with a 2.6.33-rc1 kernel + accel_query_propogation.patch (23.63 KB, text/plain)
2009-12-24 16:35 UTC, Daniele C.
Details
test script for kernel bug 9147 (648 bytes, text/plain)
2010-01-13 03:58 UTC, Daniele C.
Details

Description Daniele C. 2007-10-12 04:22:46 UTC
+++ This bug was initially created as a clone of Bug #4046 +++

Distribution: Gentoo 2.6.22-r8
Hardware Environment: Intel Centrino 1600GhZ with PS/2 keyboard and mouse
Software Environment: Problem verified under Xorg, not sure about console
Problem Description: There are few symptomps that I managed to distinguish:
 * after pressing some key I am getting following error message:

atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

   This is easily reproducible by just holding a key for several seconds. It
seems to be related only to some keys - I managed to produce it only with arrow
keys, insert, home, delete, end and pgup/pgdn
 * one of the shift/alt/ctrl keys suddenly gets stuck, not necessairly after
pressing it. It seems to happen after the above symptom. After pressing and
releasing all mentioned keys the stucked key releases.
 * some key presses and releases get lost - if release is lost I have to press
and release the key again to stop it from repeatng - this problem is not related
to any particular keys and happens randomly - no easy method of reproducing

*Important clue*: on my case the problem is coupled with http://bugzilla.kernel.org/show_bug.cgi?id=8740

I do workaround this bug using i8042.nopnp=1
I have verified that i8042.noacpi=1, pnpacpi=off or acpi=off do work it around too, although being more invasive.

It /seems/ that i8042.nopnp=1 fixes the atkbd.c messages but the input is affected by a little latency, while with the other parameters (not sure about pnpacpi=off) it seemed better. It is a very little latency, I would not consider it scientifically.

I have tested with vanilla-sources kernel 2.6.23-r9 and the bug is still there, so I am currently using 2.6.22-r8 with params "i8042.nomux=1 i8042.nopnp=1"

I am available to attach any requested log file.
Comment 1 Daniele C. 2007-10-12 04:33:30 UTC
I know got the usual messages in dmesg:
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

I assume that i8042.nopnp=1 makes the problem less frequent; the workaround is to use i8042.noacpi=1
Comment 2 Daniele C. 2007-10-12 04:40:53 UTC
http://bugzilla.kernel.org/attachment.cgi?id=13128

My /proc/bus/input/devices
Comment 3 Daniele C. 2007-10-12 04:46:04 UTC
The same bug on Gentoo http://bugs.gentoo.org/show_bug.cgi?id=194781 has been moved upstream (here)
Comment 4 Dmitry Torokhov 2007-10-12 06:39:37 UTC
OK, reading through your gentto bug it seems that with ACPI engaged keyboard controller firmware is not getting enough resources and starts dropping bytes coming from keyboard/mouse. Let's see what ACPI guys say...
Comment 5 Daniele C. 2007-10-12 10:39:12 UTC
Yes I confirm that the only 2 totally verified workarounds to this bug are using one of the following kernel parameters:

i8042.noacpi=1

OR

acpi=off

I am of course using i8042.noacpi=1 as it is less invasive.
Comment 6 Daniele C. 2007-10-12 10:41:27 UTC
Possibly the same bug:
http://dev.laptop.org/ticket/2401
Comment 7 Dmitry Torokhov 2007-10-12 10:48:44 UTC
*terribly congused* There is no i8042.noacpi parameter in the stock kernel.
Comment 8 Daniele C. 2007-10-12 11:36:14 UTC
What a shame - many apologies. I meant to say pnpacpi=off instead of i8042.noacpi=1

I am currently using 'i8042.nomux=1 pnpacpi=off'
Comment 9 Daniele C. 2007-10-13 00:06:16 UTC
Today I experienced stuck keys (tab, arrow keys) and the usual messages:
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

I am switching to 'i8042.nomux=1 acpi=off'
Comment 10 Daniele C. 2007-10-15 09:14:36 UTC
I confirm that with 'i8042.nomux=1 acpi=off' both the mouse glitches (see related bug) and the keyboard stuck keys (this bug) are worked around.

I don't yet know what types of debug lines I do have to enable in order to narrow down this issue to the relevant source lines.
Comment 11 Daniele C. 2007-11-02 02:29:28 UTC
I have tried "nomsi" instead of "acpi=off" but the wasted IRQs still happen.

It is clearly an ACPI issue
Comment 12 Erik Boritsch 2007-11-17 11:58:01 UTC
Try booting with acpi and unload the following modules: ac, battery, thermal. Did the issue dissappear?
Comment 13 Daniele C. 2007-11-18 01:44:11 UTC
I have recompiled the kernel with those modules as separate and they are not loaded automatically after boot.

Without those modules loaded, the keyboard is working perfectly. Do you want me to test for each module separately?

Note: I am still using 'i8042.nomux=1' to prevent the mouse glitch, but that's fairly unrelated
Comment 14 Daniele C. 2007-11-18 01:59:12 UTC
Right after doing 'modprobe thermal' the problem happened; thermal is indeed a module which can trigger the bug.

I have unloaded it and loaded 'battery' now, and so far no issues; I have not tested 'ac'.
Comment 15 Michal Nowak 2007-11-18 05:28:35 UTC
I am having same issues as Daniele on similar notebook (Prestigio Nobile 156) for a two years and always thought that it's HW fault...

On my system (Gentoo; 2.6.22-gentoo-r9 (aka 2.6.22.9)) is error message in dmesg caused by modules: battery and thermal; not by ac.

If there'll be patches I'd be glad to test them.
Comment 16 Daniele C. 2007-11-18 21:58:20 UTC
I can confirm that this bug is really old, at least dated 2005 (see my previous findings).

If we come up with a patch or definitively recognize the faulty code, it might be useful to reply into this LKML thread which I opened some time ago:

http://lkml.org/lkml/2007/9/30/152

There is also a reply by P.Machek containing useful suggestions for narrowing down the issue.

This bug tracker item contains of course the most updated informations about the issue.

I am also available to test patches/testcases and produce logs; I am currently testing the 'battery' module and it is not triggering the issue here.

@Michal: did you enable some specific debug messages in order to see those error messages?

Thanks all
Comment 17 Michal Nowak 2007-11-18 23:40:45 UTC
@Daniele C.: No, I am having all debug output being disabled.

I have disabled thermal and battery (ac is running) and the problem's not around anymore, but I am still observing it.
Comment 18 Daniele C. 2007-11-19 00:29:19 UTC
@Michal Nowak: can you submit a sample of the errors you get from the thermal/battery modules? It's not really important, I am just curious and would like to correlate them with my 'atkbd.c' messages.

I am currently auto-loading ac and battery modules at boot (not thermal) and I am not experiencing the issue anymore.

Note: I don't know if it does matter, but I am running acpid at boot.

Thanks
Comment 19 Michal Nowak 2007-11-19 00:40:28 UTC
Daniele C.:

ad 3) I am running acpid for 2 years on boot and still having this keyboard faults, so I guess it does not matter whether is acpid running or not.

ad 2) autoloading ac on boot, battery and thermal off -> no faults

ad 1) not on that machine right now, will post it later, but they are completely same as yours.
Comment 20 Michal Nowak 2007-11-19 11:22:42 UTC
Daniele C.:

modprobe thermal

atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).


modprobe -r thermal; modprobe battery

atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).

it's same.

Now having i8042.nomux=1 to avoid problems of lost touchpad (synaptics) sync.
Comment 21 Daniele C. 2007-11-19 13:20:42 UTC
Thanks Michal for having done this test; I have recompiled the kernel without the hangcheck timer and now I have difficulties at getting the atkbd.c messages - although I am not saying that the problem disappeared - it's just harder to spot.

It also happened to me other times during the various settings tweaking which I did in the past; so the bug is still present and it is indeed in one of those modules, thermal and battery being the most probable.
Comment 22 Daniele C. 2007-11-20 22:44:55 UTC
Ok, here are some updates:

- without the hangcheck timer I no more get the atkbd.c messages in dmesg, but the  problem is equally happening
- I am auto-loading the 'battery' module and it is not causing the issue; the 'battery' module is not guilty for me
- when I auto-loaded also the 'ac' module I got the issue as usual
- when I manually load 'thermal' I instantly verify the problem as the enter key used in the 'modprobe thermal' shell command is repeated indefinitively

Conclusions:

'ac' and 'thermal' do cause the bug, 'battery' does not.

I will change this statement as soon as my findings will proof that also 'battery' is causing the problem. For example I am going to test during a battery-powered session to see if the keyboard does generate hung keys.

Thanks all
Comment 23 Daniele C. 2007-11-20 22:45:49 UTC
I forgot: when I load 'thermal' I get these system messages:

ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3])
ACPI: Processor [CPU0] (supports 8 throttling states)
ACPI: Thermal Zone [THRS] (41 C)
ACPI: Thermal Zone [THRC] (37 C)
Marking TSC unstable due to: possible TSC halt in C2.
Time: acpi_pm clocksource has been installed.
Comment 24 Michal Nowak 2007-11-21 03:14:44 UTC
Thanks for ongoing research, Daniele.

(In reply to comment #22)
> - I am auto-loading the 'battery' module and it is not causing the issue; the
> 'battery' module is not guilty for me
Still for me... I am using only arrow keys for testing.

> - when I auto-loaded also the 'ac' module I got the issue as usual
battery causes err msg with or without ac module 

> - when I manually load 'thermal' I instantly verify the problem as the enter
> key used in the 'modprobe thermal' shell command is repeated indefinitively
I happen to me to. I wrote 'modprobe thermal' and then hit Return and I got stacked key, and terminal was "scrolling" down.

> I will change this statement as soon as my findings will proof that also
> 'battery' is causing the problem. For example I am going to test during a
> battery-powered session to see if the keyboard does generate hung keys.
In last years this key locking happen usually on high load - high temperature in room and in system itself. 
Comment 25 Michal Nowak 2007-11-21 06:14:23 UTC
Weird... today I got two stacked keys, but no message in dmesg output... (having only battery loaded).
Comment 26 Daniele C. 2007-11-21 10:12:17 UTC
Exact! I also think that when the battery module is loaded no messages are logged, although the problem may (seldom) happen.

With the thermal module it happens 100%

I am going to post on LKML and tell this.

I am also using only battery now; do you have the hangcheck timer enabled?
Comment 27 Michal Nowak 2007-11-21 12:44:34 UTC
(In reply to comment #26)

> I am going to post on LKML and tell this.
Great.

> 
> I am also using only battery now; do you have the hangcheck timer enabled?
> 

No, not at all:

assam linux # grep -i hangcheck .config
# CONFIG_HANGCHECK_TIMER is not set
Comment 28 Daniele C. 2007-11-21 15:26:09 UTC
I can confirm that the 'battery' module causes hung keys, even if no dmesg messages are generated for them.

So each of the 'ac', 'battery' and 'thermal' modules can cause this issue. When all of them are unloaded, the problem is not verified.

But without the 'battery' module for example you cannot know how much power you have left...
Comment 29 Michal Nowak 2007-11-21 16:28:15 UTC
(In reply to comment #28)
> But without the 'battery' module for example you cannot know how much power you
> have left...
> 

Of course they are all useful. You can always load it on demand manually or in script every 60 sec and then unload and hope, you will not get any locking.

Can you please give me the link to the LKML post you were talking about?
Comment 30 Daniele C. 2007-11-21 23:31:20 UTC
Yes, here it is:

http://lkml.org/lkml/2007/11/22/2

It is also a valid summary to the current bug situation.

I hope I have been clear and I hope that it will cause some more interest in bug addressing/discussion/confirmation.
Comment 31 Daniele C. 2007-11-22 01:07:28 UTC
Today I got system messages with only the 'battery' module loaded.

atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

I am going to test 2.6.24-rc3 on the next reboot
Comment 32 Alexey Starikovskiy 2007-11-23 03:54:20 UTC
Please attach acpidump output, list of /proc/interrupts over several second period or, preferably, over the skipped/stacked key, dmesg from recent kernel.
Comment 33 Daniele C. 2007-11-23 05:14:30 UTC
I verified the problem after loading 'battery', 'ac' and 'thermal' altogether, however I did not get any 'atkbd.c' message in dmesg. I am rarely getting those messages since when I disabled the hangcheck timer.

The *.before logs are generated before loading the modules, while the *.after logs are generated after having loaded the modules.

I guess I should better generate them.

I am going to provide the same set of files with kernel 2.6.24-rc3

Thanks
Comment 34 Daniele C. 2007-11-23 05:15:06 UTC
Created attachment 13710 [details]
acpidump and interrupts before & after the problem
Comment 35 Daniele C. 2007-11-23 05:31:07 UTC
Created attachment 13712 [details]
acpidump and interrupts after the problem verification
Comment 36 Daniele C. 2007-11-23 05:34:07 UTC
I have modified the affected versions considering also the original bug 4046
Comment 37 Alexey Starikovskiy 2007-11-23 06:55:40 UTC
Actually, EC _driver_ is not involved in battery/ac/thermal activities, as all communication is done through MNVS system memory region. Thus we need to check if it's possible to not disable interrupts (if they are disabled) over access to system memory.
Comment 38 Daniele C. 2007-11-23 07:52:24 UTC
@Alexey: I have a barely sufficient knowledge about EC driver, MNVS and linux IRQs.

Do you think that acpi=noirq (or any kernel parameter of the same area) would help at narrowing down the problem?

Any comment about bug 8740 which also comes with this bug?
Comment 39 Alexey Starikovskiy 2007-11-23 08:35:51 UTC
there are several ways ACPI driver (ac/battery/thermal...) could contact real hardware, all of them are guarded by operation regions. ACPI by itself (interpreter) supports system memory, system i/o and pci config. EC driver creates one more type of region -- ec region. In most cases drivers above will go through op. region defined by EC, just because battery, charger control and thermal sensors are connected to EC. This is why so many people asked you to disable EC driver in order to localize problem. Your case is different, as I said before -- drivers get information through op.region in system memory. It will probably be mapped to some slow hardware, not generic RAM -- may be same EC, so access to it takes some time. So far so good. Problem could arrive, if access to this memory region is guarded with spinlock inside ACPI, thus interrupts are disabled for the whole duration of the access (up to several hundred of milliseconds).

I hope this amount of theory is enough for a moment, let's switch to practice:
there is no kernel option which could help you; but I will to try to come up with a patch to remove the disabling of interrupts during such accesses, if it indeed happens, you just have to wait...

Comment 40 Alexey Starikovskiy 2007-11-23 10:12:37 UTC
Created attachment 13720 [details]
Disable global lock at read field

It seems that the only lock, which is held across field access is ACPI global lock. Use of it is prescribed in your DSDT.
This patch disables it's use in field read, please check if it changes situation.
Comment 41 Len Brown 2007-11-23 21:05:17 UTC
thanks for the info in comment #34 and comment #35
interrupts.before shows 21 acpi
interrupts.after shows 8780 acpi

What happened between these two snapshots?
And what does "grep HZ .config" show? (will tell us how far apart snapshots are)

Can you reproduce the issue this way?

cat /proc/interrupts; modprobe thermal; cat /proc/interrupts

Also,
the acpidump output is interesting.  Your "after" snapshot
on line 1100 showed that the Global Lock was held.

$ diff acpidump.before acpidump.after
1100c1100
<   0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
---
>   0010: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

dis-assembled via iasl -d, the FACS looks like this:

[000h 000  4]                    Signature : "FACS"
[004h 004  4]                       Length : 00000040
[008h 008  4]           Hardware Signature : 0000120F
[00Ch 012  4]   Firmware Waking Vector(32) : 00000000
[010h 016  4]                  Global Lock : 00000002
[014h 020  4]        Flags (decoded below) : 00000000
                    S4BIOS Support Present : 0
[018h 024  8]   Firmware Waking Vector(64) : 0000000000000000
[020h 032  1]                      Version : 00

This is not really a good sign -- as the global lock
should be held for durations so infrequent and so quickly
that it would be very unlikely that you'd catch it with acpidump.
Comment 42 Daniele C. 2007-11-24 06:22:06 UTC
Between the two snapshots I got a hung key, for example PagUp or PagDown, and the release key event was lost, as seen in dmesg.
At least 2-3 seconds elapsed, as I had to press some keys, check dmesg|tail and then do dmesg>dmesg.after (you can check the file timestamps).

$ grep HZ .config
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300

When using 2.6.22 I had 250 Hz instead, and the problem (iirc) happened more frequently.

'cat /proc/interrupts; sudo modprobe thermal; cat /proc/interrupts' did not work. I will try again, with some luck I will succeed.

I also think that the global lock should be really infinitesimal, but since my keyboard interrupts can get in the middle - it must be somewhat longer, I guess it can sometimes reach a maximum value in the order of 0.5s/0.8s

I am going to test the patch in comment #40 ASAP

Thanks
Comment 43 Daniele C. 2007-11-24 11:41:13 UTC
I applied the patch in comment #40 on kernel 2.6.24-rc3 typing 'patch -p1 < acpiham.patch'

Then I rebuilt the kernel (I have attached my .config files).

I can type much faster with this patch! The bug seems no more happening! However, the video card didn't switch resolution. There is no off/on switch, only vertical colored lines when X is started.

I have attached the relative dmesg taken from messages
Comment 44 Daniele C. 2007-11-24 11:43:32 UTC
Created attachment 13737 [details]
.config files for 2.6.22 and 2.6.24 (patched), dmesg of crash
Comment 45 Erik Boritsch 2007-11-26 14:09:54 UTC
Tried the patch, while the patch itself worked, thermal module has ceased his function along with ac and battery. It act therefore same way as if I were removing the corresponding modules. I also get following errors - on startup:
ACPI: unknown link to device
also dmesg is full of:
ACPI Error (utglobal-0126): Unknown exception code: 0xE71F5C00 [20070126]
ACPI Exception (dswexec-0462): UNKNOWN_STATUS_CODE, While resolving operands for [OpcodeName unavailable] [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.EC__.COMD] (Node c18b1a2c)ACPI Error (utglobal-0126): Unknown exception code: 0xE71F5C00 [20070126]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.BAT0._BST] (Node c18b9e28)ACPI Error (utglobal-0126): Unknown exception code: 0xE71F5C00 [20070126]
Comment 46 Daniele C. 2007-11-27 03:31:38 UTC
I recompiled the patched 2.6.24-rc3 with ACPI support (modular). A certain module crashed, then udev tried repeatedly to load some other module, finally the kernel crashed while loading ALSA modules.

I guess that it's not an option to disable ACPI IRQ locking. A dumb question: is it possible to disable ACPI_EC? There is a config option for it but it does not seem to be configurable
Comment 47 Erik Boritsch 2007-11-30 15:30:24 UTC
one suggestion is to boot with ec_intr=0

Another workaround proposal is as follows:
xset r rate 1000 50

Also it is not clear yet, whether this bug is actually a regression - I have heard reports of this bug happening after a kernel upgrade. 2.6.17 is a good one to start with - i'll look at several older kernels tomorrow.

I will test all the solutions properly as soon as I can - I need a solution because the affected laptop tends to overheat so it's not an option to start without acpi - I want to lower CPU frequency before the lappy shuts down.

As a separate note I would suggest to raise the priority of this bug - way too many people around the net are experiencing this issue and it is hardly a workaround to turn off acpi which is very important on notebooks.
Comment 48 Daniele C. 2007-11-30 15:48:39 UTC
I can confirm that ec_intr=0 perfectly works around the bug and is (as far as I know the best workaround to use my hardware, since I currently have ac, battery and thermal and the system is behaving like when I used to boot with acpi=off. No keyboard glitch whatsoever.

I don't know if my system will boot with 2.6.17 - nor if I will damage my filesystem with it. I will try to get a runnning vanilla 2.6.17 and see if the bug still happens - even if I fear that I have too many apps/modules that will not start with it.

I have raised the priority because I also think that there are really a lot of people affected by this issue; however I don't know if I have authorship to raise priorities of bugs, please adjust the setting if it was not wanted.
Comment 49 Daniele C. 2007-11-30 15:58:33 UTC
Sadly, I talked too earlier. When using ec_intr=0 the problem is happening equally.

atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

I am now proceeding to compile 2.6.17.14 to see if it is affected.
Comment 50 Daniele C. 2007-11-30 16:42:16 UTC
Kernel 2.6.17.14, same parameters as kernel 2.6.22 (no ec_intr=0), is affected too.

In bug 4046 the original poster talks about possibly unaffected versions. Seems like he is saying that 2.6.8 (or an older one) was not affected.

He later says that 2.6.11-rc4 was no more affected. I am going to get that version and verify.
Comment 51 Daniele C. 2007-11-30 16:44:45 UTC
For completeness of comment #50:

I executed with kernel 2.6.17.14:

sudo modprobe ac
sudo modprobe battery
sudo modprobe thermal

When pressing enter after 'thermal', the keyboard enter key hanged.
I have not provided dmesg (available on request) because no atkbd.c message was generated in this case.
Comment 52 Daniele C. 2007-11-30 16:48:52 UTC
I am not going to test 2.6.11-rc4 because it is not on portage, unless somebody pushes me to do it.

I wonder how much can be understood from comparing the (presumed) unaffected codebase and current codebase.

I am now using
xset r rate 1000 50

My raw sensation is that this does not really work around the bug, just reduces the probability of seeing it in action.
Furthermore, the keyboard becomes less responsive.

Thanks
Comment 53 Erik Boritsch 2007-12-01 16:44:03 UTC
I can confirm this bug on 2.6.16.19 kernel too. Older versions are not in portage anymore so i'm not gonna try them. This bug seems to be much older than I thought. 

I have also disabled ACPI_EC (by changing its default value in drivers/acpi/Kconfig and then launching make xconfig - save) - without any effect. The bug is still there with "CONFIG_ACPI_EC is not set".
Any other ideas? I'm running out of tricks here :-)
Comment 54 Sebastien Caille 2007-12-04 16:18:51 UTC
Hello,

I have a MSI l745 laptop that shows a 100% reproducible behaviour (phantom key + stuck keys), with a kernel 2.6.22 provided by Ubuntu and a kernel 2.6.24-rc4 (+ a hack in the hwsleep.c:acpi_enter_sleep_state - probably not relevant since it happens with the ubuntu's kernel)


With the AC power adapter plugged, I see in dmesg:
[   54.492465] power_supply ADP1: uevent
[   54.492470] power_supply ADP1: POWER_SUPPLY_NAME=ADP1
[   54.492477] power_supply ADP1: Static prop TYPE=Mains
[   54.492480] power_supply ADP1: 1 dynamic props
[   54.492484] power_supply ADP1: prop ONLINE=1
[   54.507097] atkbd.c: Unknown key pressed (translated set 2, code 0xf1 on isa0060/serio0).
[   54.507105] atkbd.c: Use 'setkeycodes e071 <keycode>' to make it known.
[   54.508824] atkbd.c: Unknown key released (translated set 2, code 0xac on isa0060/serio0).
[   54.508830] atkbd.c: Use 'setkeycodes e02c <keycode>' to make it known.
[   54.509297] atkbd.c: Unknown key pressed (translated set 2, code 0x71 on isa0060/serio0).
[   54.509302] atkbd.c: Use 'setkeycodes 71 <keycode>' to make it known.
[   56.512797] power_supply ADP1: uevent
[   56.512804] power_supply ADP1: POWER_SUPPLY_NAME=ADP1
[   56.512810] power_supply ADP1: Static prop TYPE=Mains
[   56.512813] power_supply ADP1: 1 dynamic props
[   56.512817] power_supply ADP1: prop ONLINE=1
[   56.526942] atkbd.c: Unknown key pressed (translated set 2, code 0xf1 on isa0060/serio0).
[   56.526950] atkbd.c: Use 'setkeycodes e071 <keycode>' to make it known.
[   56.528129] atkbd.c: Unknown key released (translated set 2, code 0xf1 on isa0060/serio0).
[   56.528136] atkbd.c: Use 'setkeycodes e071 <keycode>' to make it known.

Without AC, I see 
[   60.773379] power_supply ADP1: uevent
[   60.773387] power_supply ADP1: POWER_SUPPLY_NAME=ADP1
[   60.773393] power_supply ADP1: Static prop TYPE=Mains
[   60.773397] power_supply ADP1: 1 dynamic props
[   60.773401] power_supply ADP1: prop ONLINE=0
[   60.789708] atkbd.c: Unknown key pressed (translated set 2, code 0xf2 on isa0060/serio0).
[   60.789716] atkbd.c: Use 'setkeycodes e072 <keycode>' to make it known.
[   60.818902] atkbd.c: Unknown key released (translated set 2, code 0xf2 on isa0060/serio0).
[   60.818908] atkbd.c: Use 'setkeycodes e072 <keycode>' to make it known.

See how the keycode has changed with the online status. If I put back the AC the keycode turns back to e071 again.

Booting with acpi=off or acpi=ht solves the issue.

The workarounds i8042.nomux=1, i8042.nopnp=1, ec_intr=0, ec_intr=2, pnpacpi=off, removing ac/thermal/battery modules do not work in my case.
I also tried acpi=noirq, pci=routeirq without improvement.

I had a look in the FACS, the lock is always 0 (as are all the other fields, at the exception of the version (01))

Also I can make the laptop go in sleep mode ("mem"), but it does not correctly resume (the disk spins up, but the keyboard - i.e. the num lock - does not work)
However it resumes fine if I make a "setkeycodes e071 255" before triggering the sleep mode.

I will attach my dmesg... (I will try to do more tomorrow. It's a bit late here...)

Hope it helps.
Regards
Comment 55 Sebastien Caille 2007-12-04 16:20:41 UTC
Created attachment 13851 [details]
dmesg MSI l745 2.6.24-rc4
Comment 56 Sebastien Caille 2007-12-05 15:16:34 UTC
Hello again,

I tried to dump /proc/interrupts every 5 seconds (cat /proc/interrupts; sleep 5; ...)
With acpi=on, I see that the interrupt count for IRQ1 (keyboard) is incrementing even if I don't use the keyboard.

With acpi=ht the interrupt count was not incrementing.

Something is definitely triggering the irq1 when acpi=on...



Comment 57 Daniele C. 2007-12-05 15:44:15 UTC
Shooting in the dark: maybe i8042 multiplexer is triggering a wrong IRQ? But i8042.nomux=1...

I don't really know if IRQ1 is multiplexed, to tell the truth.
Comment 58 Michal Nowak 2008-01-11 12:06:11 UTC
I know it is kinda weird but it's not happening anymore. No msg in dmesg, no "stucked" keys. Daniele, others, are you still facing this bug?

Linux assam 2.6.23-gentoo-r3 #1 PREEMPT Wed Dec 5 09:06:24 CET 2007 i686 Intel(R) Pentium(R) M processor 1.50GHz GenuineIntel GNU/Linux

It's 2.6.23.8 (probably) from Gentoo.

Do you remember, that this bug emerges mostly when in X? (see below)

I have no clue what "fixed" it, but I remember that some X.Org stuff was in Gentoo stabilized some time ago (so it's now on x86, which I am running on). But I more guess that it may cover the bug from emerging not fixing it, coz I believe it's in-kernel issue.
Comment 59 Michal Nowak 2008-01-11 12:16:29 UTC
This is more than weird... Now I got it inside VirtualBox while testing new KDE-4 liveCD. 

kernel: 2.22.13

msg was (transcript): 

atkbd.c: Spurious NAK on isa0060/serio0. Some hardware might be trying access hardware directly.
[3x repeated]
Comment 60 Michal Nowak 2008-02-03 12:52:42 UTC
Anyone's having some info on this? I am not facing it on my Gentoo box for maybe a month.
Comment 61 Daniele C. 2008-02-03 13:19:51 UTC
I also think that some X.org change might be "masking" this bug, since it is of course a kernel bug.

I have now linux-2.6.23-gentoo-r6 and will test with ac,battery,processor modules loaded altogether; I will report my findings later
Comment 62 Daniele C. 2008-02-03 13:24:45 UTC
Still same problem, but dmesg doesn't contain anything. The errors are actually masked - if any.

I can always reproduce the bug this way:
modprobe ac
modprobe battery
modprobe processor
scite /usr/src/linux/Documentation/kernel-parameters.txt

Then I scroll up & down, with arrow keys and PgUp/PgDn when finally one key gets stuck and the text keeps scrolling even after release.

It's easy to reproduce it this way. You also get the same problem when scrolling with Firefox or when typing also (I Shift+Deleted a *LOT* of good emails because of this bug)
Comment 63 Michal Nowak 2008-02-10 02:23:44 UTC
OK :(. You are right. It's not producing msg in dmesg but the stucking is still there.
Comment 64 Sebastien Caille 2008-02-11 09:55:47 UTC
Hello, 
Here are more details about what I see on my laptop (Comment #54) by using i8042.debug=1 and keeping a key pressed.

The laptop uses translated mode/scanset 2 (I havn't been able to put the i8042 in direct mode or put the keyboard in scanset 1 or 3).

It looks like the scancodes of the "real" key events manage to place themselves on the input port of the i8042 in the middle of the sequence triggered by the acpi "power_supply" check.

Some examples...
1. After a "power_supply" check:
    i8042 probably reads "0xe0 0x71 0xe0 0xf0 0x71"
    i8042 outputs "0xe0 0x71 0xe0 0xf1"
   => No real problem here

2. Sometimes a real key event (i.e. scancode 0x20) is read between the 0xe0 and 0x71
    i8042 probably reads "0xe0 0x20 0x71 0xe0 0xf0 0x71"
    i8042 outputs  "0xe0 0x20 0x71 0xe0 0xf1"
   => 0x20 is turned into "0xe0 0x20" and we have lost a key press
   If we loose a 0xa0 instead (key release of 0x20) then there is a "stuck key"

3. Sometimes the release bit is on the wrong scancode 
   i8042 probably reads "0xe0 0x71 0xe0 0xf0 0x20 0x71"
   i8042 outputs  "0xe0 0x71 0xe0 0xa0 0x71"
  => 0x20 is read just after the 0xf0 of the "0xe0 0xf0 0x71", and the release bit is "moved" from the second 0x71 to the 0x20 => the key press is turned into a key release...

I don't know if the 0xf0 of a "0xf0 0x20" can be read by the i8042 just before the 0x71 of the "0xe0 0x71" (i.e. "0xe0 0xf0 0x71 0x20" => i8042 outputs a 0x20 instead of 0xa0 => stuck key).

That's weird...

Note: I also see some scancodes when the brightness of the screen is changed by the system...
Comment 65 Daniele C. 2008-02-11 10:40:19 UTC
Hi Sebastien,

thank you for these valuable informations; can we deduce that the i8042 code needs to be rewritten?

If multiplexling is not working we should at first provide a workaround (disabling multiplexing?) and then see what is being done wrong by the i8042 code

Just a thought...
Comment 66 Daniele C. 2008-02-11 10:44:37 UTC
I am always using i8042.nomux=1

Did anybody check the differences when specifying or not i8042.nomux=1?
Comment 67 Sebastien Caille 2008-02-12 04:32:24 UTC
Unfortunately the "power_status" sequence comes from the KBD port (irq 1, AUX flag not set). So i8042.nomux=1 won't have any effect. I also tried i8042.noaux=1, without any success...
Comment 68 Michal Nowak 2008-02-16 19:17:51 UTC
FYI: Just filled https://bugzilla.redhat.com/show_bug.cgi?id=433164
Comment 69 Daniele C. 2008-02-19 09:00:01 UTC
Can somebody please test if i8042.noacpi has any positive effect?

Thanks
Comment 70 Erik Boritsch 2008-03-21 12:49:28 UTC
The bug ist still there with 2.6.25-r6. Is "i8042.noacpi" a correct kernel parameter? 2.6.25-r6 ignores it because it is unknown "i8042.noacpi=1" is unknown either.
Comment 71 Daniele C. 2008-03-22 06:52:35 UTC
@Erik Boritsch: I am pretty sure it no more exists neither in nearly previous kernel versions

I am also noticing that with kernel 2.6.24 the notebook does not shutdown when issuing shutdown from XFCE4 menu if the battery is plugged in (although I haven't loaded any module). Don't know if relevant but maybe somebody else had noticed it
Comment 72 Erik Boritsch 2008-03-26 09:22:05 UTC
Please add 2.6.24.(1-3) and 2.6.25-rc(1-6) to affected kernels.
Comment 73 Jon Zdricolei 2008-03-26 17:11:05 UTC
I've had same issues on a desktop AMD64 PC using 2.6.18 kernel that came with Debian4 r3 Etch AMD64.
This issue existed as far back as 6 months ago - I think it was kernel 2.6.15 then
Comment 74 Zhang Rui 2008-07-17 18:53:33 UTC
what's the status of this bug?
Can anyone verify that if "i8042.nopnp=1" workaround the problem, as described in comment #1?
Comment 75 Daniele C. 2008-07-18 00:16:53 UTC
@Zhang Rui: there's some confusion here, I will make some tests again and report my findings
Comment 76 Daniele C. 2008-07-18 00:28:21 UTC
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

The problem happened in the usual way, scrolling down some text and pressing Up/Down, PagUp/PagDown. It's not hard to trigger it, and anyway happens during normal computer usage.

This is the summary of the bug situation, from my point of view:

1) I have always been using and I am still using 'i8042.nomux=1'
2) when the battery,ac and thermal modules were built into kernel, 'acpi=off' turned off ACPI and fixed the dmesg error messages, but that was not a viable solution so
3) I compiled them as modules and could blacklist them so that they are not loaded automatically, causing the same problem (keypress glitches) but more often without the corresponding dmesg message. The problem *IS* still happening anyway after loading them manually (see first 2 lines)

I am a developer, I have developed several C libraries and can help in kernel patches testing, if necessary. I have barely understood that this problem is due to wrong ACPI tables but I really don't have a strategy to narrow the problem down nor fix it.

Please tell me if I can do something more
Comment 77 Daniele C. 2008-07-18 06:03:59 UTC
About 'i8042.nopnp=1': it is absolutely ineffective.

This is my /proc/cmdline:

root=/dev/hda5   console=tty1 vga=791   video=vesafb:vram=2,xres=1024,yres:768,bpp:8,hsync1:30,vsync1:50,hsync2:55,vsync2:85,accel,mtrr   i8042.nomux=1 i8042.nopnp=1

I have loaded manually 'battery', 'ac' and 'thermal' and I got a lot of the usual errors:
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key pressed (translated set 2, code 0xb1 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e031 <keycode>' to make it known.
Comment 78 ykzhao 2008-07-20 20:55:11 UTC
Hi, Daniele
    Thanks for your test.
    From the comment #77 it seems that you can't do workaround this bug by adding the boot option of "i8042.nopnp=1". This is inconsist with what you said in comment #1.
    At the same time it seems that the keyboard interrupt(IRQ 1) is also triggered while EC triggers the ACPI interrupt(IRQ 9). In such case the unknown keyboard scan code is gotten. So OS will complain the following warning message:   
    >atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
   >atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
   > atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
   > atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
   >atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
   >atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
   
    Maybe this is related with the keyboard configuration in BIOS option. Will you please confirm whether the keyboard mode can be changed in BIOS option?
    Will you please do the following command and get the serio type?
   >cat /sys/devices/platform/i8042/serio0/id/type 
    
   Thanks.
Comment 79 Daniele C. 2008-07-21 12:01:09 UTC
Please see comment #10, which clarifies what said in the comments #1-#9. I know there is a real mess in these comments posted by me, but if you read them incrementally (considering the latest as the truemost) you can get the correct information from them, e.g. that turning ACPI off fixes the problem (spurious key codes do not happen).

I will check if keyboard mode can be changed in BIOS.

serio type (cat /sys/devices/platform/i8042/serio0/id/type) is:
06

Thank you
Comment 80 Daniele C. 2008-07-22 07:36:27 UTC
Keyboard mode cannot be changed in BIOS (PhoenixBIOS). This notebook is a Fujitsu-Siemens V2000 Pro exact copy, which is sold as 'Maxdata Pro 7000DX' (internal product name says 7000X instead), I hope this information can be of some use.

I noticed these two lines in dmesg:

PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report

So I added 'pci=routeirq', without any success. After loading the ac,battery and thermal modules I got these:

atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
Comment 81 Daniele C. 2008-07-22 07:54:33 UTC
There's to say that, as far as I know, this notebook has a real PS/2 keyboard and PS/2 touchpad which are linked to the i8042.

Up to now the most interesting comment to fix this bug is comment #64.
Comment 82 Sebastien Caille 2008-07-22 13:02:53 UTC
Hello, 

I saw something interesting a few days ago:
I updated the bios of my msi l745 laptop to the latest version. 
After reboot, the "unknown key released" messages were not triggered anymore.
However unplugging the AC adapter made them come back. 

Also one more thing: I tried to dump the content of the EC_SC (Embedded Controller Status) register some times ago to understand the "EC GPE Storm" issue, and I saw that the flag SMI_EVT was set from time to time.
Moreover I read in a document from Phoenix that the keyboard controller may be involved to manage SMIs in Legacy more.

So... one more wild guess: the EC is (wrongly) triggering SMIs to notify the state of the AC power and some SMI related data are made available on the keyboard controller ports, confusing the kernel.

Unfortunately I have not been able to find the model/datasheets of my EC.
lm_sensors says
  Trying family `National Semiconductor'...                   Yes
  Found unknown chip with ID 0xa300
But I'm not even sure that this information is reliable...
(and at the moment I haven't found any datasheet that is matching the content of the registers specified in the DSDT).

I also tried to dump the registers of the EC using the acer_ec.pl tool and force the state of some registers, but without any success.
Comment 83 Daniele C. 2008-08-03 06:50:30 UTC
@Sebastien: thanks for these useful informations, I hope we can get to a solution for this nasty bug.

Here are some downstream / duplicates of this bug:

http://bugs.launchpad.net/ubuntu/+bug/124406
http://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/194214
http://bugs.launchpad.net/ubuntu/+bug/39315
http://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/65249

A possible fix to atkbd.c lays in here:
http://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/39315

I hope there's something useful scattered there around.
Comment 84 Daniele C. 2008-08-04 04:07:44 UTC
If I run:

rmmod ac thermal; modprobe ac; modprobe thermal

I almost instantly get keyboard stuck keys. Sometimes the ENTER key used to run the command is even duplicated! Like if I had pressed enter twice.

Other times if I run that line and press something that key starts instantly to repeat indefinitely (until another key is pressed).

I still wonder why Windows never had this issue...maybe we (the Linux kernel) are using some "grey zone" ACPI functions which are not behaving as expected (standards?) and making the i8042 become crazy?
Comment 85 Daniele C. 2008-08-04 05:52:12 UTC
Another guy having the same problem:

http://www.mail-archive.com/linux-input@vger.kernel.org/msg00014.html
Comment 86 Daniele C. 2008-08-13 21:54:04 UTC
Can we say that the problem happens only with Fn+? keys?
Comment 87 Daniele C. 2008-08-14 06:22:39 UTC
It happens with any type of key, not only the special Fn keys.

The patch in https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/39315/comments/2 does not seem effective to me, I won't try it.

I have decompiled the DSDT, seems the standard one for the "Intel Montara" processors (Centrino).

Can somebody (more experienced than me) please decompile it and see if Linux gets assigned a special table instead of the one passed to other OSes?

I am asking because at a certain point the AML contains a possible OS-specific block:

        Device (PCI0)
        {
            Method (_INI, 0, NotSerialized)
            {
                If (CondRefOf (_OSI, Local0))
                {
                    Store (0x07D1, OSYS)
                }
                Else
                {
                    If (LEqual (SizeOf (_OS), 0x14))
                    {
                        Store (0x07D0, OSYS)
                    }
                    Else
                    {
                        If (LEqual (SizeOf (_OS), 0x27))
                        {
                            Store (0x07CF, OSYS)
                        }
                        Else
                        {
                            Store (0x07CE, OSYS)
                        }
                    }
                }
...
}
Comment 88 Daniele C. 2008-08-14 06:31:14 UTC
On 2008-08-12 I upgraded to gentoo-sources-2.6.25-r7

The bug seems fixed.

I will post another comment if it is not.
Comment 89 Daniele C. 2008-08-14 06:59:14 UTC
It seems fixed with gentoo-sources-2.6.25-r7 and gentoo-sources-2.6.25-r6.

I haven't tested with others.

The only recent change to my parameters was adding 'clocksource=acpi_pm', but I would not say that it worked around the bug since in my /var/log/messages there is an atkbd.c message also when booting with clockource=acpi_pm...

I will post again if the bug reappears...weird, I would like to narrow down the bug to its cause now that it has disappeared

If you need information from my configuration, please ask
Comment 90 Daniele C. 2008-08-14 07:07:50 UTC
From my emerge.log (grep gentoo-sources):

2008-07-19 12:56:49 -   >>> emerge (7 of 10) sys-kernel/gentoo-sources-2.6.25-r6 to /
2008-07-19 12:56:49 -   === (7 of 10) Cleaning (sys-kernel/gentoo-sources-2.6.25-r6::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r6.ebuild)
2008-07-19 12:56:49 -   === (7 of 10) Compiling/Merging (sys-kernel/gentoo-sources-2.6.25-r6::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r6.ebuild)
2008-07-19 12:59:04 -   >>> AUTOCLEAN: sys-kernel/gentoo-sources
2008-07-19 12:59:04 -   === (7 of 10) Post-Build Cleaning (sys-kernel/gentoo-sources-2.6.25-r6::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r6.ebuild)
2008-07-19 12:59:04 -   ::: completed emerge (7 of 10) sys-kernel/gentoo-sources-2.6.25-r6 to /
2008-07-22 12:32:19 -   *** emerge  unmerge =sys-kernel/gentoo-sources-2.6.24-r8
2008-07-22 12:32:30 -  === Unmerging... (sys-kernel/gentoo-sources-2.6.24-r8)
2008-07-22 12:33:06 -   >>> unmerge success: sys-kernel/gentoo-sources-2.6.24-r8
2008-07-22 20:41:27 -   >>> emerge (9 of 12) sys-kernel/gentoo-sources-2.6.25-r7 to /
2008-07-22 20:41:27 -   === (9 of 12) Cleaning (sys-kernel/gentoo-sources-2.6.25-r7::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r7.ebuild)
2008-07-22 20:41:28 -   === (9 of 12) Compiling/Merging (sys-kernel/gentoo-sources-2.6.25-r7::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r7.ebuild)
2008-07-22 20:42:58 -   >>> AUTOCLEAN: sys-kernel/gentoo-sources
2008-07-22 20:42:59 -   === (9 of 12) Post-Build Cleaning (sys-kernel/gentoo-sources-2.6.25-r7::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r7.ebuild)
2008-07-22 20:42:59 -   ::: completed emerge (9 of 12) sys-kernel/gentoo-sources-2.6.25-r7 to /
2008-08-12 15:08:29 -  === Unmerging... (sys-kernel/gentoo-sources-2.6.25-r6)
2008-08-12 15:08:37 -   >>> unmerge success: sys-kernel/gentoo-sources-2.6.25-r6

Does not really seem related to gentoo-sources package (?!), as I experienced the bug (as per the comments here on this tracker) on July 22 (before the gentoo-sources update) and on August 4. I have been using 2.6.25-r6 up to this morning (switched to r7 and AOK).

From my /var/log/messages the 'clocksource=acpi_pm' was introduced on Jul 22 13:19:00
Comment 91 Daniele C. 2008-08-17 11:02:00 UTC
No hope. The bug is not fixed.

I can witness that it hardly happens if the battery applet is not present.

So, the new way to trigger this bug is:

1) load the 'battery' module (or 'ac')
2) activate an applet or application which polls the battery status (like the XFCE4 battery applet)

If no application polls the battery status, the bug does not happen (I assume that the same is true for the 'ac' and 'thermal' modules).
Comment 92 ykzhao 2008-08-27 20:33:39 UTC
Hi, Daniele
    Will you please confirm whether the system is affected by the following message? 
    >atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
    >atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

    From the acpidump info it seems that there exist the following definition in many ACPI control method.
    >Method (_Q31, 0, NotSerialized)
                    {
     >                   Store (0x32, SMIF)
     >                  Store (Zero, TRP0)
     >                  Sleep (0x64)
     >                  Notify (\_SB.AC, 0x80)
    Maybe the SMI is triggered when OS checks the status of AC/Battery/Thermal. In such case maybe the incorrect keyboard scancode is reported.
    
   Will you please check whether the issue is fixed by bios upgrading?
   Thanks.
      
Comment 93 Daniele C. 2008-08-28 06:28:51 UTC
Hi ykzhao,

yes I confirm that I get:
----
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
----
When loading one (or more than one) of the ac,battery,thermal modules and using a program reading data through those modules (like the battery status applet).

I have not yet found a BIOS upgrade for this notebook. I own a Maxdata 7000DX Pro, which seems to me a Fujitsu Amilo (see http://gentoo-wiki.com/Maxdata_Pro_7000DX). Maxdata website says nothing about this line of products, and they do not mention BIOS upgrade downloads. I have found on the FujitsuSiemens website a BIOS upgrade file called 'FSC_BIOSFlashISOCDImageAMILOM7400_R01S0Z_1001939.ISO' for the Fujitsu Amilo M7400, but I will not apply it unless I am sure that it is the correct BIOS upgrade. Can you please help me?

I have attached my dmidecode informations. From these informations I see that I have a Phoenix BIOS, version R01-M0Vf (released 10/23/03). Where can I get the correct BIOS upgrade (starting from this BIOS version and for my mainboard)?

Thanks
Comment 94 Daniele C. 2008-08-28 06:29:27 UTC
Created attachment 17506 [details]
first lines of 'dmidecode' output, unique ids and serial numbers removed
Comment 95 ykzhao 2008-08-28 18:56:59 UTC
Hi, Daniele
    Thanks for the confirm. It seems that it is a very old BIOS.(10/23/03). I am not sure whether ACPI is supported very well on this laptop.
    At the same time I see the following info from http://gentoo-wiki.com/Maxdata_Pro_7000DX
     > ACPI issues There are 2/3 documented (here) issues regarding mouse and keyboard of this notebook (and most probably of many others) happening with 2.6.x kernels with ACPI active
     > AT2 PS2 Keyboard   There is another issue regarding ACPI and always the i8042 chipset: some keys may get stuck and the release event may not be caught. This problem it's due only to another ACPI vs i8042 conflict (like the above one regarding the PS2 mouse). 
    
    Maybe this issue is related with the hardware/BIOS. But as this is a very old machine model, maybe there is no available BIOS to be upgraded. 
    Poor hardware, poor people.
    
    Thanks.
Comment 96 Erik Boritsch 2008-08-29 07:46:25 UTC
Hmm, I have the very same problem on acer TravelMate 243LC and I doubt that BIOS is the same on those two models. I'll look further into it, yet I don't think it is one specific BIOS issue for it happens on different models from different manufacturers. 
Comment 97 Daniele C. 2008-08-31 00:18:24 UTC
ACPI is of course supported, as it works through Windows XP.

The page you are referring to, http://gentoo-wiki.com/Maxdata_Pro_7000DX, was authored by me - so it does not contain more informations than this bug tracker.

I do not agree with you, "Poor hardware, poor people", this was a high profile notebook, bought (new) at about 1800 € in 2004, not a cheap notebook. As Erik said, this might not be a BIOS problem, and let's not forget that keyboard works fine on Windows.

And, anyway, I expect hardware to work as it should, on Windows and on Linux.

Seems like you are not going to work on this bug.
Comment 98 ykzhao 2008-09-08 19:27:39 UTC
Created attachment 17689 [details]
Patch 1/4 : Don't issue the burst disable command if EC exits the burst mode
Comment 99 ykzhao 2008-09-08 19:29:12 UTC
Created attachment 17690 [details]
Patch 2/4: Clear the query_pending bit only after processing EC notification event
Comment 100 ykzhao 2008-09-08 19:30:37 UTC
Created attachment 17691 [details]
Patch 3/4: Simplify EC working flowchart and always enable EC GPE
Comment 101 ykzhao 2008-09-08 19:32:43 UTC
Created attachment 17692 [details]
patch 4/4: Add some udelay in EC GPE handler to avoid EC GPE interrupt storm
Comment 102 ykzhao 2008-09-08 19:38:27 UTC
From the acpidump info it seems that there exist the following definition
in many ACPI control method.
    >Method (_Q31, 0, NotSerialized)
                    {
     >                   Store (0x32, SMIF)
     >                  Store (Zero, TRP0)
     >                  Sleep (0x64)
     >                  Notify (\_SB.AC, 0x80)
    Maybe the SMI is triggered when OS checks the status of AC/Battery/Thermal.
In such case maybe the incorrect keyboard scancode is reported.

   Will you please try the attached four patches on the latest kernel(2.6.27-rc5) and see whether the issue still exists?
   Thanks.
Comment 103 Daniele C. 2008-09-09 03:43:37 UTC
@ykzhao: I will test within next few hours

Thanks
Comment 104 Daniele C. 2008-09-09 04:56:35 UTC
What kernel shall I use specifically? I have tried to apply the first 2 patches to the latest git (retrieved with 'git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6') but some hunks do fail, seems to be some similar code already in place there.
Comment 105 Daniele C. 2008-09-09 05:56:39 UTC
I tested the vanilla unpatched 2.6.27-r5 (from git), still affected
Comment 106 Michal Nowak 2008-09-09 06:27:39 UTC
I guess you are expected to *apply* those patches to 2.6.27-rc5.
Comment 107 Daniele C. 2008-09-09 07:23:54 UTC
@mnowak: read comment #104
Comment 108 Michal Nowak 2008-09-09 07:57:50 UTC
> I have tried to apply the first 2 patches
> to the latest git

"latest git" != 2.6.27-rc5

But I obviously did not tried it, anyway.

> I tested the vanilla unpatched 2.6.27-r5 (from git)

Because not patched, right?

Or, what I am missing here...?
Comment 109 Daniele C. 2008-09-09 08:47:36 UTC
...and anyway it fails

ec_gpe_storm.patch and ec_work_mode.patch are missing 1 hunk each (respectively 3rd and 2nd)

I won't compile the patched kernel without all hunks being patched OK!
Comment 110 Shaohua 2008-09-23 22:44:49 UTC
yakui, please tell which kernel verion your patch applied
Comment 111 ykzhao 2008-09-24 00:35:37 UTC
Hi, Daniele
    You can try all the patches on the latest kernel(for example: 2.6.27-rc6).
    Thanks.
Comment 112 Daniele C. 2008-09-24 09:27:43 UTC
just downloaded 2.6.27-rc6 from:
http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.27-rc6.tar.bz2

Will try ASAP
Comment 113 Daniele C. 2008-09-24 10:06:27 UTC
DOES NOT WORK.

Maybe I should apply patches in a different order?

I won't go on unless all hunks are successful

Here is my shell output:
---------------------------------
legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ patch -p1 < ../ykzhao-patches/ec_asus.patch 
patching file drivers/acpi/ec.c
Hunk #1 succeeded at 135 with fuzz 2 (offset 25 lines).
Hunk #2 FAILED at 957.
1 out of 2 hunks FAILED -- saving rejects to file drivers/acpi/ec.c.rej
legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ patch -p1 < ../ykzhao-patches/ec_clear_query.patch 
patching file drivers/acpi/ec.c
Hunk #1 succeeded at 278 (offset 21 lines).
Hunk #2 succeeded at 512 (offset 8 lines).
Hunk #3 succeeded at 528 (offset 8 lines).
legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ patch -p1 < ../ykzhao-patches/ec_gpe_storm.patch 
patching file drivers/acpi/ec.c
Hunk #3 FAILED at 517.
Hunk #4 succeeded at 786 (offset 49 lines).
1 out of 4 hunks FAILED -- saving rejects to file drivers/acpi/ec.c.rej
legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ patch -p1 < ../ykzhao-patches/ec_work_mode.patch 
patching file drivers/acpi/ec.c
Hunk #1 succeeded at 200 (offset 33 lines).
Hunk #2 succeeded at 228 (offset 33 lines).
Hunk #3 succeeded at 251 (offset 33 lines).
Hunk #4 succeeded at 260 (offset 33 lines).
Hunk #5 succeeded at 278 (offset 33 lines).
Hunk #6 succeeded at 532 (offset 20 lines).
Hunk #7 succeeded at 740 (offset 22 lines).
Hunk #8 succeeded at 787 (offset 22 lines).
Hunk #9 succeeded at 880 (offset 22 lines).
Hunk #10 succeeded at 975 (offset 22 lines).
Hunk #11 succeeded at 983 (offset 22 lines).
legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ 
Comment 114 Daniele C. 2008-09-25 11:34:31 UTC
The only patch which needs corrections seem to be ec_asus.patch, I will skip it as I do not own an Asus

Let's see what happens..
Comment 115 Daniele C. 2008-09-25 12:34:38 UTC
I am running the 2.6.27-rc6 kernel with all patches except the failing one, ec_asus.patch.

I have found on dmesg the well known lines:
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

But I have to say that I am not able to trigger the stuck keys anymore...but I guess it's just a matter of time. I really hope that the atkbd.c messages are a sort of 1-time glitch and not a stuck key that I did not recognize...
Comment 116 Daniele C. 2008-09-25 12:43:42 UTC
I have found other 4 lines in dmesg:
----
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
----

No stuck key experienced yet
Comment 117 Daniele C. 2008-09-25 12:48:04 UTC
Ok, got a stuck key while browsing on menuconfig...

@ykzhao: patches 2+3+4 do not fix the bug
Comment 118 ykzhao 2008-09-25 18:34:11 UTC
Hi, Daniele
   thanks for the test. 
   Sorry that I paste the incorrect patch in comment #98. 
   Thanks for the confirmation that the issue can't be resolved by the attached patch.
   
Comment 119 Daniele C. 2008-10-17 00:38:27 UTC
The i8042.nomux=1 is no more necessary with kernel >= 2.6.25

I have tried the latest linux-2.6.27-rc9 with the patch in http://bugzilla.kernel.org/show_bug.cgi?id=11549 (http://bugzilla.kernel.org/attachment.cgi?id=18047), it does not fix the bug. Just shooting in the dark, really.
Comment 120 Daniele C. 2008-10-23 11:07:13 UTC
If some developer could take benefit from it, I can offer my hardware via SSH for some hours (with a booted LiveCD)
Comment 121 Daniele C. 2008-11-23 04:55:05 UTC
Bug does not happen with Ubuntu Intrepid Ibex kernel 2.6.27-7

I don't know where's the magic, but I am booting with no special kernel parameter, it simply works (TM)
Comment 122 Erik Boritsch 2008-11-23 07:15:54 UTC
What about gentoo's and vanilla kernels? I don't have the hardware right now to test...
Comment 123 Daniele C. 2008-11-23 12:25:38 UTC
@erik: my last test was that of comment 119 (2.6.27-rc9), and was not successful. I am no more involved in Gentoo but I can provide the hardware via SSH as said in comment 120, if necessary
Comment 124 Zhang Rui 2008-11-23 18:47:58 UTC
can anyone please try a vanilla kernel and see if the problem still exists?
Comment 125 Eugen 2008-11-27 14:23:57 UTC
i have today tried the current sidux kernel 2.6.27-7.slh.1-sidux-686 (vanilla kernel + sidux patches) on an fujitsu siemens amilo pro v2000 notebook...

unfortunatelly, this bug is not fixed with the kernel (at least for me)

(In reply to comment #124)
> can anyone please try a vanilla kernel and see if the problem still exists?
> 

Comment 126 Daniele C. 2008-11-27 14:30:15 UTC
@Eugen: my notebook, Maxdata 7000X, is a clone of FS Amilo Pro v2000, so I assume our hardware base is almost the same.

Can you please try an Ubuntu Intrepid Ibex LiveDVD?
I don't know what's in the recipe, but it's working fine here...and I ac,battery,thermal modules are always loaded
Comment 127 Daniele C. 2008-12-07 11:47:26 UTC
Correction: with Ubuntu Intrepid Ibex I could trigger the nasty bug only once when I was using battery (no AC plugged) and the wireless network adapter. I think the two are somewhat tied, i.e. using the network adapter increases the chances of triggering the bug.

So bug is not fixed neither in Ubuntu Intrepid Ibex, it's still something at the kernel level
Comment 128 Daniele C. 2008-12-28 02:13:44 UTC
A friend suggested me that perhaps we could use approach used in bug 12021, and an opportunely modified patch similar to http://bugzilla.kernel.org/attachment.cgi?id=19301&action=view

Can some expert tell if this would classify as "bad hack" or "fix"? It would not be a "bad hack" if it is "normal" for our bugged keyboard to loose release events under heavy i8042 traffic load
Comment 129 Daniele C. 2008-12-28 04:55:05 UTC
Friend in comment 128 is Grégory Schmitt
Comment 130 Zhang Rui 2008-12-28 17:30:17 UTC
cc dmitry and the other input experts. :)
Comment 131 Daniele C. 2008-12-29 05:23:35 UTC
Regarding priority: this bug is pissing off a lot of people which approaches to Linux...I am asking (myself) if this bug would have been fixed in a few weeks if the i8042 was used in server equipment instead of laptops...
Comment 132 Daniele C. 2008-12-29 13:51:29 UTC
Created attachment 19530 [details]
My dmidecode output
Comment 133 Daniele C. 2008-12-29 14:01:16 UTC
Seems that my DSDT has some bad errors. I am not able to fix them...

---
Intel ACPI Component Architecture
ASL Optimizing Compiler version 20061109 [May 16 2007]
Copyright (C) 2000 - 2006 Intel Corporation
Supports ACPI Specification Revision 3.0a

dsdt.dsl  2561:                     Field (ERAM, AnyAcc, Lock, Preserve)
Error    4074 -                               ^ Host Operation Region requires ByteAcc access

dsdt.dsl  2660:                             Store (Arg2, DAT3)
Error    4005 -    Method argument is not initialized ^  (Arg2)

dsdt.dsl  2660:                             Store (Arg2, DAT3)
Remark   5065 -   Not a parameter, used as local only ^  (Arg2)

ASL Input:  dsdt.dsl - 4738 lines, 174788 bytes, 1974 keywords
Compilation complete. 2 Errors, 0 Warnings, 1 Remarks, 456 Optimizations
Comment 134 Daniele C. 2008-12-29 14:59:22 UTC
The first error can be easily fixed by using ByteAcc; the second error is something weird.

The Amilo M7400, almost a clone by hardware specs, had the same error on its DSDT. I don't know if the DSDT compilation error with iasl are related to our i8042 problem, but there is an interesting reading here:
http://www.mavetju.org/mail/view_message.php?list=freebsd-acpi&id=2286041
Comment 135 Daniele C. 2009-01-18 02:11:03 UTC
Created attachment 19874 [details]
DSDT AML for Maxdata 7000X, fixed errors and compiled with iasl 20061109

Bug happens equally with the fixed DSDT, I will now try a patch by G.Schmitt on 2.6.28 kernel
Comment 136 Jon Zdricolei 2009-01-18 16:08:58 UTC
My scenario -comment #73- was related to a KVM USB switch. Whenever I bypass the switch the issues disappears. The ACPI workaround didn't work for me.
Comment 137 David Oftedal 2009-01-19 19:05:59 UTC
The bug is in fact from as far back as 2003:

http://lkml.org/lkml/2003/9/15/210

The user begins by complaining about the familiar errors:

Sep 14 20:42:27 cpp kernel: atkbd.c: Unknown key (set 2, scancode 0xb6, on isa0060/serio0) pressed.
Sep 14 20:42:27 cpp kernel: i8042 history: 19 a2 99 0f 8f 0f 8f 1c 9c 04 84 36 09 b6 89 b6 
Sep 14 22:13:00 cpp kernel: atkbd.c: Unknown key (set 2, scancode 0xa5, on isa0060/serio0) pressed.
Sep 14 22:13:00 cpp kernel: i8042 history: a7 20 9e 21 9f 24 25 26 27 a0 a4 a5 a6 a7 a1 a5 
Sep 14 22:13:00 cpp kernel: atkbd.c: Unknown key (set 2, scancode 0xa6, on isa0060/serio0) pressed.
Sep 14 22:13:00 cpp kernel: i8042 history: 20 9e 21 9f 24 25 26 27 a0 a4 a5 a6 a7 a1 a5 a6 
Sep 14 22:13:00 cpp kernel: atkbd.c: Unknown key (set 2, scancode 0xa7, on isa0060/serio0) pressed.
Sep 14 22:13:00 cpp kernel: i8042 history: 9e 21 9f 24 25 26 27 a0 a4 a5 a6 a7 a1 a5 a6 a7 

An Andries Brouwer then replies:

	
Enter your search termsSubmit search formWeblkml.org
Date	Mon, 15 Sep 2003 23:28:00 +0200
From	Andries Brouwer <>
Subject	Re: 2.6.0-test1, -test4 control key "stuck"

On Mon, Sep 15, 2003 at 08:55:46PM +0000, xsdg wrote:
> What would happen if the kernel received two keypress events, and then one key-
> release event for a single key?  I'd imagine that it'd disregard the duplicate
> keypress

The answers differ for 2.4 and 2.6. For 2.4 each keypress is a keypress,
and key releases are rather unimportant as long as the key is not a
modifier key. For 2.6 we have synthetic repeat, so a second keypress from
the keyboard is ignored, the key repeats with kernel-defined frequency,
and the repeat is ended by the key release.

> any idea what might cause the key sticking problem?

If a key release is not seen, 2.4 doesnt mind, but 2.6 keeps repeating.

> Also, I'm not sure how the final issue I described

Do not recall all items of all letters I answer - sorry.

Andries


In other words, the bug may be caused by a combination of faulty hardware and some very naive code in the keyboard driver that was added in kernel 2.6.
Comment 138 Daniele C. 2009-01-23 07:01:10 UTC
(In reply to comment #136)
> My scenario -comment #73- was related to a KVM USB switch. Whenever I bypass
> the switch the issues disappears. The ACPI workaround didn't work for me.
> 

I would say that yours is not bug 9147
Comment 139 Daniele C. 2009-01-24 08:34:01 UTC
(in reply to comment 137)
thank you very much David for showing us the "root" of the bug, I will CC this comment to Andries Brouwer just in case he can confirm.
Comment 140 Daniele C. 2009-01-24 09:30:37 UTC
(In reply to comment #119)
> The i8042.nomux=1 is no more necessary with kernel >= 2.6.25
> 
> I have tried the latest linux-2.6.27-rc9 with the patch in
> http://bugzilla.kernel.org/show_bug.cgi?id=11549
> (http://bugzilla.kernel.org/attachment.cgi?id=18047), it does not fix the bug.
> Just shooting in the dark, really.
> 

Bug 8740 is not yet fixed, so i8042.nomux=1 is still necessary.

[ 3203.402077] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1
[ 3203.403431] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1
[ 3203.405245] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1
[ 3203.406608] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1
[ 3203.407971] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1
[ 3203.407979] psmouse.c: issuing reconnect request
[ 4399.418433] atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
[ 4399.418464] atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.
Comment 141 Daniele C. 2009-01-24 12:54:49 UTC
(read on https://bugs.launchpad.net/linux/+bug/119194/comments/34)

perhaps acpi_osi=Linux can work around the problem (like acpi=off but without the negative effects)

Comment 142 Daniele C. 2009-01-29 23:39:58 UTC
Please see bug 1203.

On this hardware I also have clocksource problems. I am currently booting with 'clocksource=acpi_pm i8042.nomux=1' to have an usable clock.

Maybe we have the same issue of bug 1203? See also comment 39
Comment 143 Grégory SCHMITT 2009-01-30 18:59:39 UTC
Created attachment 20049 [details]
Tentative workaround patch

To be applied against 2.6.28 branch, but is quite simple, so that a simple copy & paste will be enough. This patch tries to force the release to keycode 0xe0. According to Daniele, this patch greatly improves the situation, but does not solve it completely.

The patch has been inspired by patch provided for another notebook (see bug http://bugzilla.kernel.org/show_bug.cgi?id=12021, patch http://bugzilla.kernel.org/attachment.cgi?id=19301)

According to me, this is nothing but a workaround which will not cure the evil at its root, but it may help until the real bug is tracked down.

Could anyone apply this patch, and comment on it ? Thanks.
Comment 144 Daniele C. 2009-01-30 21:40:12 UTC
I can witness that with patch in attachment 20049 [details] the problem happened only once in a week, while without any patch I can experience it 10/20 times per day.

I am currently booting without this patch and with kernel command line 'vga=791 lapic hpet=force clocksource=hpet i8042.nomux=1' to test if the problem happens equally with HPET timer enabled
Comment 145 Eugen 2009-01-31 14:52:36 UTC
@Daniele,
sorry that i have not responded a long time... unfortunately, i really lack free time :(

) 2.6.28 sidux kernels do not fix the issue

) "acpi_osi=Linux" ( http://bugzilla.kernel.org/show_bug.cgi?id=9147#c141 ) grub cheat code does not fix the issue here too
Comment 146 Daniele C. 2009-02-08 09:09:03 UTC
@Eugen: ok, really thanks for having tested this out.

I can also confirm that 'hpet=force clocksource=hpet' does not workaround the bug. It is not relevant.

The only known partial workaround is G.Schmitt's patch in comment 143
Comment 147 David Oftedal 2009-02-16 10:19:58 UTC
Unfortunately, on this machine acpi=off doesn't work either, as that stops the WLAN card from working... Is there any other option that will stop the bug from occurring, or at least keep it from occurring every day?
Comment 148 Daniele C. 2009-02-16 14:29:11 UTC
@David: the best workaround is NOT to turn off acpi but to unload ac,battery and thermal modules
Comment 149 David Oftedal 2009-02-17 12:56:04 UTC
Ah, thanks. It seems to be thermal that's the main culprit for me, as the bug starts appearing when the computer heats up and the fan starts spinning faster.

I noticed something interesting today, though: Since (if I remember correctly) killing X resets the keyboard somehow, I tried switching to the console using Ctrl-Alt-F1 and Ctrl-Alt-F2 etc., but Ctrl-Alt-F1 seems to be where X has placed itself (usually Ctrl-Alt-F7 I believe?), because it just brought me back to X. Ctrl-Alt-F2 brought up a launcher.

Be that as it may, one of those combinations, and presumably the first one, released the control key again. If hopping out of X and back in again or some such thing will consistently clear up the bug, then that will really go a long way towards making the system usable.
Comment 150 David Oftedal 2009-02-21 18:58:20 UTC
I can confirm what I said earlier. While the X server is apparently still on console 7, pressing Ctrl-Alt-F1 when the bug appears refreshes the screen somehow and clears the bug up. Quite possibly, pressing Ctrl-Alt-<F-Anything> will do the same thing.
Comment 151 Daniele C. 2009-02-23 02:56:14 UTC
@David: as you said, that operation resets the X keyboard driver. But the problem is still at the hardware/kernel level since it is not normal to have to reset the driver when the problem occurs.

We could configure a key which resets Xorg when the keyboard goes crazy...but a keyboard should not get crazy in first place
Comment 152 Daniele C. 2009-02-24 05:12:13 UTC
I can witness (by empirical experience) that with ACPI modules loaded (ac + battery + thermal) and battery in slot, the bug appears often.

@ykzhao: can you please make the point? do you have any strategy to fix this issue?

Thanks
Comment 153 David Oftedal 2009-05-07 01:30:59 UTC
I'd like to report a potentially positive finding. After upgrading to Ubuntu version 9.04 - the Jaunty Jackalope, which I did in April when it was released, I haven't experienced this bug.

The kernel version is "2.6.27-11-generic". I can't remember what it used to be before.

Is anyone else here using Jaunty Jackalope with this or a newer kernel, and able to confirm that the bug isn't manifesting itself on their system anymore?
Comment 154 Daniele C. 2009-05-07 11:20:35 UTC
I am using Debian Squeeze with kernel 2.6.28-1-686 and bug is still happening.

I have often experienced fake negatives, with some kernel you have to try hard before triggering the bug. On my laptop battery and wireless usage seem to make it happen sooner
Comment 155 David Oftedal 2009-06-04 10:29:15 UTC
That's very interesting! Your kernel version is certainly newer than mine, and yet the bug, and the messages about unknown keys in dmesg, are both gone here, despite the system being used in the same way as before. At least we can be fairly sure that the messages and the bug are associated, if anyone doubted it before.
Comment 156 Michal Nowak 2009-06-04 11:06:40 UTC
Since I moved from Gentoo 2 yrs ago to Fedora I've experienced the problem only twice in times of Fedora 8. Now with Fedora 10, I've never seen them: is it possible for you Daniele to try recent Fedora on the system and see what happens?
Comment 157 Daniele C. 2009-07-02 15:35:30 UTC
@Michal: good news! I have been using the F10 live cd and everything works fine! That kernel has the ac,battery,thermal modules builtin.

I would say that Fedora10/Fedora11 is not affected, maybe thanks to some magic patch. How can we isolate them? I would focus on Fedora11 patches, can somebody help in this search?

By the way, I am using Arch Linux 2.6.30 now. Next thing I will do is to try to boot using the Fedora kernel and my Arch Linux system
Comment 158 Daniele C. 2009-07-02 17:00:12 UTC
I have downloaded http://mirror.cc.vt.edu/pub/fedora/linux/updates/11/SRPMS/kernel-2.6.29.5-191.fc11.src.rpm which contains the 2.6.29 kernel and all the patches. First I will try this 2.6.29 kernel with my current .config, then if bug is still present (as it should be) I will start testing patches starting from the most relevant.
Comment 159 Daniele C. 2009-07-02 17:01:12 UTC
Created attachment 22178 [details]
full list of patches applied by Fedora11 to the 2.6.29 kernel
Comment 160 Daniele C. 2009-07-02 19:41:36 UTC
Created attachment 22179 [details]
test script to trigger kernel bug 9147

These are the steps in order to trigger the 9147 bug.

1) open a terminal
2) run ./9147test.sh
3) quickly press up arrow and then enter to run again the script
4) repeat (3) till the enter key gets stuck

If the enter key never gets stuck (it usually does in less than 10 executions), then system is not affected by the bug
Comment 161 Daniele C. 2009-07-03 13:07:05 UTC
Created attachment 22191 [details]
bug 9147 test results on kernels 2.6.29 (w/o fedora11 patches), 2.6.29-5 (w/o fedora11 patches), 2.6.30

From my tests (see attachment) it can clearly be deduced that the bug is not triggered when using built-in ACPI modules; I have not yet triggered it with builtin ac,battery,thermal,container,processor built-in modules and I will keep using this kernel. I will add a comment if I trigger the bug.

I invite other testers to use a kernel with such built-in ACPI modules to see if the bug appears; I assume it is very hard to trigger when modules are built-in, when instead can be easily triggered with my test script when modules are separated from kernel.

Can somebody please explain why there is such difference? It has apparently become more deep from previous kernels, since bug was easily triggered also with built-in ACPI modules with previous kernels.

So right now the best workaround is to compile the above mentioned modules as builtin.

To make the point: Fedora10/Fedora11 do not have any patch which addresses bug 9147 as side effect, it's just that they compile the guilty modules as built-in making the bug gone or very hard to trigger.

Also, I haven't yet been able to enable (via kernel .config) the correct options to show again the dmesg messages when a key gets stuck.
Comment 162 Daniele C. 2009-07-03 13:10:27 UTC
Created attachment 22192 [details]
.config with minimalistic features for my system
Comment 163 Daniele C. 2009-07-16 10:35:24 UTC
After about 2 weeks I triggered (once) the stuck-keys bug even with the built-in ACPI modules.

In order to trigger the 9142 kernel bug also with recent kernels which have built-in ACPI functionalities it is necessary to run a program which constantly monitors the thermal sensors (for example a tray icon plugin).

So bug is not fixed, it's just that it's easier to detect by using ACPI modules and by overloading the i8042

I also think that the bug should be UNASSIGNED if nobody is really working towards a solution.
Comment 164 Michal Nowak 2009-09-11 16:38:00 UTC
Just to let others know that I just came across this bug again with 2.6.31-0.204.rc9.fc12.i686.

atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

It's not happening very often but still here. Since this laptop has ~5 years, I guess this bug will be in kernel even after the HW itself will be long gone :).
Comment 165 Daniele C. 2009-09-11 16:55:23 UTC
@Michal: I think that when Windows XP will be declared unsupported many people will try installing linux on hardware from this generation and, guess what, they'll say that with linux their laptop keyboard doesn't even work...
Comment 166 Michal Nowak 2009-09-12 09:27:51 UTC
Likely some of them will, but the problem is that the bug is not *that* sucking to interest enough of kernel upstream. That's how it goes..
Comment 167 Daniele C. 2009-09-12 10:51:58 UTC
Yes but some weeks ago Alexey Starikovskiy got this bug assigned, is something being studied for the solution?

@Alexey Starikovskiy: any news?
Comment 168 Daniele C. 2009-12-08 12:19:48 UTC
Interesting related pages:

http://lkml.indiana.edu/hypermail/linux/kernel/0602.2/1795.html
https://bugzilla.redhat.com/show_bug.cgi?id=181457

My /sys/bus/serio/devices/serio0/softrepeat is 0
Comment 169 Michal Nowak 2009-12-21 11:22:10 UTC
Daniele, thanks for pointing that out. Recently on Fedora 12 similar problem emerged w/ same symptoms (some key is "locked", Tab usually) but w/o the message in dmesg.

Linux dhcp-lab-216.englab.brq.redhat.com 2.6.31.6-166.fc12.i686 #1 SMP Wed Dec 9 11:14:59 EST 2009 i686 i686 i386 GNU/Linux

I just added "nosoftrepeat", will see if that helps for me.
Comment 170 Alexey Starikovskiy 2009-12-24 11:41:22 UTC
Could someone please report status with latest 2.6.32 vanilla kernel?
Comment 171 Daniele C. 2009-12-24 15:34:19 UTC
@alexey: I have just tried with my 2.6.33-rc1 (from wireless-testing) and result is the same. key stuck after a couple of tests through 9147test.sh script (attached), see comment 160
Comment 172 Daniele C. 2009-12-24 15:37:38 UTC
Created attachment 24292 [details]
9147test.sh
Comment 173 Daniele C. 2009-12-24 15:45:02 UTC
Adding 'nosoftrepeat' does nothing on my hardware (bug is still there), and anyway my softrepeat is already 0 so you shouldn't try with 'nosoftrepeat' if /sys/bus/serio/devices/serio0/softrepeat is already 0
Comment 174 Alexey Starikovskiy 2009-12-24 15:52:51 UTC
Could you please check if last patch from this bug reports helps:
http://bugzilla.kernel.org/show_bug.cgi?id=14858
Comment 175 Daniele C. 2009-12-24 16:14:58 UTC
ok, I am gonna check the patch in a while

do you know how to detect BIOS and EC versions? My notebook is basically a Fujitsu Siemens V2000 with different branding (Maxdata)
Comment 176 Alexey Starikovskiy 2009-12-24 16:22:00 UTC
dmidecode may have this information
Comment 177 Daniele C. 2009-12-24 16:35:01 UTC
@alexey: I have applied accel_query_propogation.patch and after running 9147test.sh 4 times I get the enter key stuck bug. I have attached the dmesg of the last 4 9147test.sh executions.

My sensation is that the patch has not changed bug behaviour, but I might not be sensible to changes in the order of milliseconds...

Please tell me if I can test other patches and/or get other debug information
Comment 178 Daniele C. 2009-12-24 16:35:45 UTC
Created attachment 24298 [details]
dmesg of last 4 runs of 9147test.sh with a 2.6.33-rc1 kernel + accel_query_propogation.patch
Comment 179 Daniele C. 2009-12-24 16:40:50 UTC
Loading/unloading the battery module checks the battery status, but in the attached dmesg it's not clear if battery is there or not.
Please ignore such messages since battery has been present (fully charged) during the whole test (it increases bug triggering), but I think that it may be broken (sometimes it does not report its capacity). Bug also happened before this battery glitch (due to shock damage to the battery which probably injured its capacity sensor) so it's not relevant.
Comment 180 Daniele C. 2010-01-08 02:43:04 UTC
'processor', 'ac', 'container' modules are not responsible for the bug triggering. I can trigger the bug by using 'battery' or 'thermal' modules. (Thanks to james_mcl for pointing out this)
Comment 181 Eugen 2010-01-11 22:28:57 UTC
Daniele, reporting back after a longer time... was pretty much busy.

we have been suffering because of this bug for a long time, but we completely reinstalled sidux one month ago, using the 2009-03 release. the bug is gone. more precisely: the bug did not appear since the installation... 

currently, a newer sidux release (2009-04) is available. worth a try?

best regards
Eugen
Comment 182 Michal Nowak 2010-01-12 09:49:29 UTC
The bug is likely still present in latest sidux. When I moved from home-brewed Gentoo to Fedora 8 it "disappeared" too, but as of now it's sometimes back but hopefully reduced to livable minimum.
Comment 183 Eugen 2010-01-13 00:24:50 UTC
Michal, i do agree here...

there is an interesting info/link in a similar bug report here: http://bugzilla.kernel.org/show_bug.cgi?id=9448#c37
Comment 184 Daniele C. 2010-01-13 03:58:36 UTC
Created attachment 24530 [details]
test script for kernel bug 9147
Comment 185 Daniele C. 2010-01-13 04:04:33 UTC
@mnowak: that happens because battery and thermal modules are compiled in kernel and not stand alone (please confirm)

@eugen: interesting...I have asked them to test the script
Comment 186 Michal Nowak 2010-01-13 10:31:14 UTC
(In reply to comment #185)
> @mnowak: that happens because battery and thermal modules are compiled in
> kernel and not stand alone (please confirm)

Yes. That's what I thing. With Gentoo I used to have my own .config, where those modules were stand-alone. Now in Fedora they are compiled-in and the problems is much less frequent (since Fedora 8) - twice a day? -, I can see the problem from time to time on my Fedora 12 (still compiled-in) but without

  atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
  atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

messages in dmesg.
Comment 187 james_mcl 2010-01-13 19:06:22 UTC
Eugen's interesting link is

http://ajaxxx.livejournal.com/62378.html

It does indeed look very relevant, which is why I'm posting the URL here even though there's a link to the 9448 comment with it in.
Comment 188 Daniele C. 2010-01-14 03:34:00 UTC
@mnowak: yes, when compiled in kernel that's what has happened so far. The fact that it seldom happens when not compiled as module is already a clue towards solution, in my opinion.

regardin comment 187: I am not fully getting the point, so we have a tight race condition caused by hardware as first source?
Comment 189 Michal Nowak 2010-01-14 10:13:21 UTC
(In reply to comment #188)
> regardin comment 187: I am not fully getting the point, so we have a tight race
> condition caused by hardware as first source?

My, poor, understanding is that the X race has nothing to do with our HW problem. I am thinking of having now two "stucked-keys-problems" at one time (with the HW one reduced largely). But I could be wrong.
Comment 190 Daniele C. 2010-01-14 16:13:28 UTC
@mnowak: that was also my first understanding when reading the SIGIO article.

The only possible scenario otherwise could be that there *is* a SIGIO glitch in current Xorg/kernel, and that boxes with i8042 controllers (or similar) have 2 or more peripherals which end up locking the same single hardware (perhaps the i8042 controller itself, which someway "emulate" locking of 2 or more peripherals in an inconsistent way), thus leading to locking glitches. But this would be very strange since no other hardware combinations are showing the "bug" and since it is happening from so much time (kernel 2.6.9 being the first verified).

Also there has been different research on this bug and it seemed like having identified the source elsewhere.
Comment 191 Michal Nowak 2010-02-01 08:45:58 UTC
FYI:

Yesterday I suffered by

atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known.

while in vim - I got "never ending" series of 'X\n'. It's quite a bunch of months I've seen this issue. The box is full updated F-12 with following kernel:

Linux <HOST.NAME> 2.6.31.12-174.2.3.fc12.i686
#1 SMP Mon Jan 18 20:22:46 UTC 2010 i686 i686 i386 GNU/Linux
Comment 192 Zhang Rui 2010-10-22 03:04:23 UTC
does the problem still exist in the latest upstream kernel? say 2.6.35 or 2.6.36-rc
Comment 193 Michal Nowak 2010-10-22 07:41:07 UTC
It certainly fails with recent minor releases of 2.6.34. Will test 2.6.35 when Fedora 14 is out.
Comment 194 Daniele C. 2010-10-22 19:02:47 UTC
I am using 2.6.35-rc4 and it seems like fixed, I will report back if it's not the case
Comment 195 Michal Nowak 2010-10-31 19:33:29 UTC
2.6.35.6-48.fc14.i686 just failed for me with:

[63135.617931] atkbd serio0: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
[63135.617945] atkbd serio0: Use 'setkeycodes e060 <keycode>' to make it known.
Comment 196 Zhang Rui 2011-03-21 07:30:58 UTC
does the problem still exist in the latest upstream kernel?
Comment 197 Michal Nowak 2011-03-21 11:24:52 UTC
I can see stuck keys from time to time (let's say once per month), but not sure it's "atkbd ..." problem. Will have a look.
Comment 198 Michal Nowak 2011-04-05 07:11:46 UTC
I tested 2.6.38 and no hung but I may need months to reproduce it...
Comment 199 Henrik P. 2011-04-18 20:34:43 UTC
This is on a Thinkpad R40, with distribution kernel

Linux debian 2.6.38-2-686 #1 SMP Tue Mar 29 17:27:45 UTC 2011 i686 GNU/Linux

I installed stress (copyright file says its from http://weather.ou.edu/~apw/projects/stress/) and slightly adapted its manpage example: 

debian:~$ stress --verbose --cpu 8 --io 4 --vm 2 --vm-bytes 128M --vm-hang 5 --timeout 90s

After switching the terminal window and typing

debian:~$ echo thequck bown fox jumps over th lazy dog ... te qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

I get the expected

[210307.542572] atkbd serio0: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0).
[210307.542579] atkbd serio0: Use 'setkeycodes e060 <keycode>' to make it known.

in dmesg output. Immediately after stress exits the stuck keys effect still happens frequently, but after a while it is back to about once in 5 minutes.
Comment 200 Zhang Rui 2012-01-18 01:36:32 UTC
It's great that kernel bugzilla is back.
I do not have any idea about this bug.
Anyway, can you please verify if the problem still exists in the latest upstream kernel?
Comment 201 papukaija 2012-05-04 10:55:13 UTC
Yes, still an issue with kernel 3.4.0-030400rc5-generic-pae #201205011817 SMP Tue May 1 22:31:34 UTC 2012 i686 athlon i386 GNU/Linux:

[ 3403.034926] atkbd serio0: Unknown key pressed (translated set 2, code 0x0 on isa0060/serio0).
[ 3403.034937] atkbd serio0: Use 'setkeycodes 00 <keycode>' to make it known.
[ 3403.035444] atkbd_interrupt: 36 callbacks suppressed

My laptop is a Fujitsu Siemens Amilo A1645.
Comment 202 Daniele C. 2013-08-19 14:45:52 UTC
Sorry but I am going to soon dispose the hardware in question
Comment 203 xerofoify 2014-06-22 02:55:32 UTC
Hey there ,
If you can test in on your computer and see if it's fixed in 2014 kernel
releases , that would be great.
Comment 204 Zhang Rui 2014-08-04 07:15:39 UTC
Bug closed as the hardware is not available any more.
Please feel free to re-open it if anyone can reproduce the problem in the latest upstream kernel.

Note You need to log in before you can comment on or make changes to this bug.