Bug 9147
Description
Daniele C.
2007-10-12 04:22:46 UTC
I know got the usual messages in dmesg: atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. I assume that i8042.nopnp=1 makes the problem less frequent; the workaround is to use i8042.noacpi=1 http://bugzilla.kernel.org/attachment.cgi?id=13128 My /proc/bus/input/devices The same bug on Gentoo http://bugs.gentoo.org/show_bug.cgi?id=194781 has been moved upstream (here) OK, reading through your gentto bug it seems that with ACPI engaged keyboard controller firmware is not getting enough resources and starts dropping bytes coming from keyboard/mouse. Let's see what ACPI guys say... Yes I confirm that the only 2 totally verified workarounds to this bug are using one of the following kernel parameters: i8042.noacpi=1 OR acpi=off I am of course using i8042.noacpi=1 as it is less invasive. Possibly the same bug: http://dev.laptop.org/ticket/2401 *terribly congused* There is no i8042.noacpi parameter in the stock kernel. What a shame - many apologies. I meant to say pnpacpi=off instead of i8042.noacpi=1 I am currently using 'i8042.nomux=1 pnpacpi=off' Today I experienced stuck keys (tab, arrow keys) and the usual messages: atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. I am switching to 'i8042.nomux=1 acpi=off' I confirm that with 'i8042.nomux=1 acpi=off' both the mouse glitches (see related bug) and the keyboard stuck keys (this bug) are worked around. I don't yet know what types of debug lines I do have to enable in order to narrow down this issue to the relevant source lines. I have tried "nomsi" instead of "acpi=off" but the wasted IRQs still happen. It is clearly an ACPI issue Try booting with acpi and unload the following modules: ac, battery, thermal. Did the issue dissappear? I have recompiled the kernel with those modules as separate and they are not loaded automatically after boot. Without those modules loaded, the keyboard is working perfectly. Do you want me to test for each module separately? Note: I am still using 'i8042.nomux=1' to prevent the mouse glitch, but that's fairly unrelated Right after doing 'modprobe thermal' the problem happened; thermal is indeed a module which can trigger the bug. I have unloaded it and loaded 'battery' now, and so far no issues; I have not tested 'ac'. I am having same issues as Daniele on similar notebook (Prestigio Nobile 156) for a two years and always thought that it's HW fault... On my system (Gentoo; 2.6.22-gentoo-r9 (aka 2.6.22.9)) is error message in dmesg caused by modules: battery and thermal; not by ac. If there'll be patches I'd be glad to test them. I can confirm that this bug is really old, at least dated 2005 (see my previous findings). If we come up with a patch or definitively recognize the faulty code, it might be useful to reply into this LKML thread which I opened some time ago: http://lkml.org/lkml/2007/9/30/152 There is also a reply by P.Machek containing useful suggestions for narrowing down the issue. This bug tracker item contains of course the most updated informations about the issue. I am also available to test patches/testcases and produce logs; I am currently testing the 'battery' module and it is not triggering the issue here. @Michal: did you enable some specific debug messages in order to see those error messages? Thanks all @Daniele C.: No, I am having all debug output being disabled. I have disabled thermal and battery (ac is running) and the problem's not around anymore, but I am still observing it. @Michal Nowak: can you submit a sample of the errors you get from the thermal/battery modules? It's not really important, I am just curious and would like to correlate them with my 'atkbd.c' messages. I am currently auto-loading ac and battery modules at boot (not thermal) and I am not experiencing the issue anymore. Note: I don't know if it does matter, but I am running acpid at boot. Thanks Daniele C.: ad 3) I am running acpid for 2 years on boot and still having this keyboard faults, so I guess it does not matter whether is acpid running or not. ad 2) autoloading ac on boot, battery and thermal off -> no faults ad 1) not on that machine right now, will post it later, but they are completely same as yours. Daniele C.: modprobe thermal atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). modprobe -r thermal; modprobe battery atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). it's same. Now having i8042.nomux=1 to avoid problems of lost touchpad (synaptics) sync. Thanks Michal for having done this test; I have recompiled the kernel without the hangcheck timer and now I have difficulties at getting the atkbd.c messages - although I am not saying that the problem disappeared - it's just harder to spot. It also happened to me other times during the various settings tweaking which I did in the past; so the bug is still present and it is indeed in one of those modules, thermal and battery being the most probable. Ok, here are some updates: - without the hangcheck timer I no more get the atkbd.c messages in dmesg, but the problem is equally happening - I am auto-loading the 'battery' module and it is not causing the issue; the 'battery' module is not guilty for me - when I auto-loaded also the 'ac' module I got the issue as usual - when I manually load 'thermal' I instantly verify the problem as the enter key used in the 'modprobe thermal' shell command is repeated indefinitively Conclusions: 'ac' and 'thermal' do cause the bug, 'battery' does not. I will change this statement as soon as my findings will proof that also 'battery' is causing the problem. For example I am going to test during a battery-powered session to see if the keyboard does generate hung keys. Thanks all I forgot: when I load 'thermal' I get these system messages: ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3]) ACPI: Processor [CPU0] (supports 8 throttling states) ACPI: Thermal Zone [THRS] (41 C) ACPI: Thermal Zone [THRC] (37 C) Marking TSC unstable due to: possible TSC halt in C2. Time: acpi_pm clocksource has been installed. Thanks for ongoing research, Daniele. (In reply to comment #22) > - I am auto-loading the 'battery' module and it is not causing the issue; the > 'battery' module is not guilty for me Still for me... I am using only arrow keys for testing. > - when I auto-loaded also the 'ac' module I got the issue as usual battery causes err msg with or without ac module > - when I manually load 'thermal' I instantly verify the problem as the enter > key used in the 'modprobe thermal' shell command is repeated indefinitively I happen to me to. I wrote 'modprobe thermal' and then hit Return and I got stacked key, and terminal was "scrolling" down. > I will change this statement as soon as my findings will proof that also > 'battery' is causing the problem. For example I am going to test during a > battery-powered session to see if the keyboard does generate hung keys. In last years this key locking happen usually on high load - high temperature in room and in system itself. Weird... today I got two stacked keys, but no message in dmesg output... (having only battery loaded). Exact! I also think that when the battery module is loaded no messages are logged, although the problem may (seldom) happen. With the thermal module it happens 100% I am going to post on LKML and tell this. I am also using only battery now; do you have the hangcheck timer enabled? (In reply to comment #26) > I am going to post on LKML and tell this. Great. > > I am also using only battery now; do you have the hangcheck timer enabled? > No, not at all: assam linux # grep -i hangcheck .config # CONFIG_HANGCHECK_TIMER is not set I can confirm that the 'battery' module causes hung keys, even if no dmesg messages are generated for them. So each of the 'ac', 'battery' and 'thermal' modules can cause this issue. When all of them are unloaded, the problem is not verified. But without the 'battery' module for example you cannot know how much power you have left... (In reply to comment #28) > But without the 'battery' module for example you cannot know how much power > you > have left... > Of course they are all useful. You can always load it on demand manually or in script every 60 sec and then unload and hope, you will not get any locking. Can you please give me the link to the LKML post you were talking about? Yes, here it is: http://lkml.org/lkml/2007/11/22/2 It is also a valid summary to the current bug situation. I hope I have been clear and I hope that it will cause some more interest in bug addressing/discussion/confirmation. Today I got system messages with only the 'battery' module loaded. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. I am going to test 2.6.24-rc3 on the next reboot Please attach acpidump output, list of /proc/interrupts over several second period or, preferably, over the skipped/stacked key, dmesg from recent kernel. I verified the problem after loading 'battery', 'ac' and 'thermal' altogether, however I did not get any 'atkbd.c' message in dmesg. I am rarely getting those messages since when I disabled the hangcheck timer. The *.before logs are generated before loading the modules, while the *.after logs are generated after having loaded the modules. I guess I should better generate them. I am going to provide the same set of files with kernel 2.6.24-rc3 Thanks Created attachment 13710 [details]
acpidump and interrupts before & after the problem
Created attachment 13712 [details]
acpidump and interrupts after the problem verification
I have modified the affected versions considering also the original bug 4046 Actually, EC _driver_ is not involved in battery/ac/thermal activities, as all communication is done through MNVS system memory region. Thus we need to check if it's possible to not disable interrupts (if they are disabled) over access to system memory. @Alexey: I have a barely sufficient knowledge about EC driver, MNVS and linux IRQs. Do you think that acpi=noirq (or any kernel parameter of the same area) would help at narrowing down the problem? Any comment about bug 8740 which also comes with this bug? there are several ways ACPI driver (ac/battery/thermal...) could contact real hardware, all of them are guarded by operation regions. ACPI by itself (interpreter) supports system memory, system i/o and pci config. EC driver creates one more type of region -- ec region. In most cases drivers above will go through op. region defined by EC, just because battery, charger control and thermal sensors are connected to EC. This is why so many people asked you to disable EC driver in order to localize problem. Your case is different, as I said before -- drivers get information through op.region in system memory. It will probably be mapped to some slow hardware, not generic RAM -- may be same EC, so access to it takes some time. So far so good. Problem could arrive, if access to this memory region is guarded with spinlock inside ACPI, thus interrupts are disabled for the whole duration of the access (up to several hundred of milliseconds). I hope this amount of theory is enough for a moment, let's switch to practice: there is no kernel option which could help you; but I will to try to come up with a patch to remove the disabling of interrupts during such accesses, if it indeed happens, you just have to wait... Created attachment 13720 [details]
Disable global lock at read field
It seems that the only lock, which is held across field access is ACPI global lock. Use of it is prescribed in your DSDT.
This patch disables it's use in field read, please check if it changes situation.
thanks for the info in comment #34 and comment #35 interrupts.before shows 21 acpi interrupts.after shows 8780 acpi What happened between these two snapshots? And what does "grep HZ .config" show? (will tell us how far apart snapshots are) Can you reproduce the issue this way? cat /proc/interrupts; modprobe thermal; cat /proc/interrupts Also, the acpidump output is interesting. Your "after" snapshot on line 1100 showed that the Global Lock was held. $ diff acpidump.before acpidump.after 1100c1100 < 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ --- > 0010: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ dis-assembled via iasl -d, the FACS looks like this: [000h 000 4] Signature : "FACS" [004h 004 4] Length : 00000040 [008h 008 4] Hardware Signature : 0000120F [00Ch 012 4] Firmware Waking Vector(32) : 00000000 [010h 016 4] Global Lock : 00000002 [014h 020 4] Flags (decoded below) : 00000000 S4BIOS Support Present : 0 [018h 024 8] Firmware Waking Vector(64) : 0000000000000000 [020h 032 1] Version : 00 This is not really a good sign -- as the global lock should be held for durations so infrequent and so quickly that it would be very unlikely that you'd catch it with acpidump. Between the two snapshots I got a hung key, for example PagUp or PagDown, and the release key event was lost, as seen in dmesg. At least 2-3 seconds elapsed, as I had to press some keys, check dmesg|tail and then do dmesg>dmesg.after (you can check the file timestamps). $ grep HZ .config CONFIG_NO_HZ=y # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set CONFIG_HZ_300=y # CONFIG_HZ_1000 is not set CONFIG_HZ=300 When using 2.6.22 I had 250 Hz instead, and the problem (iirc) happened more frequently. 'cat /proc/interrupts; sudo modprobe thermal; cat /proc/interrupts' did not work. I will try again, with some luck I will succeed. I also think that the global lock should be really infinitesimal, but since my keyboard interrupts can get in the middle - it must be somewhat longer, I guess it can sometimes reach a maximum value in the order of 0.5s/0.8s I am going to test the patch in comment #40 ASAP Thanks I applied the patch in comment #40 on kernel 2.6.24-rc3 typing 'patch -p1 < acpiham.patch' Then I rebuilt the kernel (I have attached my .config files). I can type much faster with this patch! The bug seems no more happening! However, the video card didn't switch resolution. There is no off/on switch, only vertical colored lines when X is started. I have attached the relative dmesg taken from messages Created attachment 13737 [details]
.config files for 2.6.22 and 2.6.24 (patched), dmesg of crash
Tried the patch, while the patch itself worked, thermal module has ceased his function along with ac and battery. It act therefore same way as if I were removing the corresponding modules. I also get following errors - on startup: ACPI: unknown link to device also dmesg is full of: ACPI Error (utglobal-0126): Unknown exception code: 0xE71F5C00 [20070126] ACPI Exception (dswexec-0462): UNKNOWN_STATUS_CODE, While resolving operands for [OpcodeName unavailable] [20070126] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.EC__.COMD] (Node c18b1a2c)ACPI Error (utglobal-0126): Unknown exception code: 0xE71F5C00 [20070126] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.BAT0._BST] (Node c18b9e28)ACPI Error (utglobal-0126): Unknown exception code: 0xE71F5C00 [20070126] I recompiled the patched 2.6.24-rc3 with ACPI support (modular). A certain module crashed, then udev tried repeatedly to load some other module, finally the kernel crashed while loading ALSA modules. I guess that it's not an option to disable ACPI IRQ locking. A dumb question: is it possible to disable ACPI_EC? There is a config option for it but it does not seem to be configurable one suggestion is to boot with ec_intr=0 Another workaround proposal is as follows: xset r rate 1000 50 Also it is not clear yet, whether this bug is actually a regression - I have heard reports of this bug happening after a kernel upgrade. 2.6.17 is a good one to start with - i'll look at several older kernels tomorrow. I will test all the solutions properly as soon as I can - I need a solution because the affected laptop tends to overheat so it's not an option to start without acpi - I want to lower CPU frequency before the lappy shuts down. As a separate note I would suggest to raise the priority of this bug - way too many people around the net are experiencing this issue and it is hardly a workaround to turn off acpi which is very important on notebooks. I can confirm that ec_intr=0 perfectly works around the bug and is (as far as I know the best workaround to use my hardware, since I currently have ac, battery and thermal and the system is behaving like when I used to boot with acpi=off. No keyboard glitch whatsoever. I don't know if my system will boot with 2.6.17 - nor if I will damage my filesystem with it. I will try to get a runnning vanilla 2.6.17 and see if the bug still happens - even if I fear that I have too many apps/modules that will not start with it. I have raised the priority because I also think that there are really a lot of people affected by this issue; however I don't know if I have authorship to raise priorities of bugs, please adjust the setting if it was not wanted. Sadly, I talked too earlier. When using ec_intr=0 the problem is happening equally. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. I am now proceeding to compile 2.6.17.14 to see if it is affected. Kernel 2.6.17.14, same parameters as kernel 2.6.22 (no ec_intr=0), is affected too. In bug 4046 the original poster talks about possibly unaffected versions. Seems like he is saying that 2.6.8 (or an older one) was not affected. He later says that 2.6.11-rc4 was no more affected. I am going to get that version and verify. For completeness of comment #50: I executed with kernel 2.6.17.14: sudo modprobe ac sudo modprobe battery sudo modprobe thermal When pressing enter after 'thermal', the keyboard enter key hanged. I have not provided dmesg (available on request) because no atkbd.c message was generated in this case. I am not going to test 2.6.11-rc4 because it is not on portage, unless somebody pushes me to do it. I wonder how much can be understood from comparing the (presumed) unaffected codebase and current codebase. I am now using xset r rate 1000 50 My raw sensation is that this does not really work around the bug, just reduces the probability of seeing it in action. Furthermore, the keyboard becomes less responsive. Thanks I can confirm this bug on 2.6.16.19 kernel too. Older versions are not in portage anymore so i'm not gonna try them. This bug seems to be much older than I thought. I have also disabled ACPI_EC (by changing its default value in drivers/acpi/Kconfig and then launching make xconfig - save) - without any effect. The bug is still there with "CONFIG_ACPI_EC is not set". Any other ideas? I'm running out of tricks here :-) Hello, I have a MSI l745 laptop that shows a 100% reproducible behaviour (phantom key + stuck keys), with a kernel 2.6.22 provided by Ubuntu and a kernel 2.6.24-rc4 (+ a hack in the hwsleep.c:acpi_enter_sleep_state - probably not relevant since it happens with the ubuntu's kernel) With the AC power adapter plugged, I see in dmesg: [ 54.492465] power_supply ADP1: uevent [ 54.492470] power_supply ADP1: POWER_SUPPLY_NAME=ADP1 [ 54.492477] power_supply ADP1: Static prop TYPE=Mains [ 54.492480] power_supply ADP1: 1 dynamic props [ 54.492484] power_supply ADP1: prop ONLINE=1 [ 54.507097] atkbd.c: Unknown key pressed (translated set 2, code 0xf1 on isa0060/serio0). [ 54.507105] atkbd.c: Use 'setkeycodes e071 <keycode>' to make it known. [ 54.508824] atkbd.c: Unknown key released (translated set 2, code 0xac on isa0060/serio0). [ 54.508830] atkbd.c: Use 'setkeycodes e02c <keycode>' to make it known. [ 54.509297] atkbd.c: Unknown key pressed (translated set 2, code 0x71 on isa0060/serio0). [ 54.509302] atkbd.c: Use 'setkeycodes 71 <keycode>' to make it known. [ 56.512797] power_supply ADP1: uevent [ 56.512804] power_supply ADP1: POWER_SUPPLY_NAME=ADP1 [ 56.512810] power_supply ADP1: Static prop TYPE=Mains [ 56.512813] power_supply ADP1: 1 dynamic props [ 56.512817] power_supply ADP1: prop ONLINE=1 [ 56.526942] atkbd.c: Unknown key pressed (translated set 2, code 0xf1 on isa0060/serio0). [ 56.526950] atkbd.c: Use 'setkeycodes e071 <keycode>' to make it known. [ 56.528129] atkbd.c: Unknown key released (translated set 2, code 0xf1 on isa0060/serio0). [ 56.528136] atkbd.c: Use 'setkeycodes e071 <keycode>' to make it known. Without AC, I see [ 60.773379] power_supply ADP1: uevent [ 60.773387] power_supply ADP1: POWER_SUPPLY_NAME=ADP1 [ 60.773393] power_supply ADP1: Static prop TYPE=Mains [ 60.773397] power_supply ADP1: 1 dynamic props [ 60.773401] power_supply ADP1: prop ONLINE=0 [ 60.789708] atkbd.c: Unknown key pressed (translated set 2, code 0xf2 on isa0060/serio0). [ 60.789716] atkbd.c: Use 'setkeycodes e072 <keycode>' to make it known. [ 60.818902] atkbd.c: Unknown key released (translated set 2, code 0xf2 on isa0060/serio0). [ 60.818908] atkbd.c: Use 'setkeycodes e072 <keycode>' to make it known. See how the keycode has changed with the online status. If I put back the AC the keycode turns back to e071 again. Booting with acpi=off or acpi=ht solves the issue. The workarounds i8042.nomux=1, i8042.nopnp=1, ec_intr=0, ec_intr=2, pnpacpi=off, removing ac/thermal/battery modules do not work in my case. I also tried acpi=noirq, pci=routeirq without improvement. I had a look in the FACS, the lock is always 0 (as are all the other fields, at the exception of the version (01)) Also I can make the laptop go in sleep mode ("mem"), but it does not correctly resume (the disk spins up, but the keyboard - i.e. the num lock - does not work) However it resumes fine if I make a "setkeycodes e071 255" before triggering the sleep mode. I will attach my dmesg... (I will try to do more tomorrow. It's a bit late here...) Hope it helps. Regards Created attachment 13851 [details]
dmesg MSI l745 2.6.24-rc4
Hello again, I tried to dump /proc/interrupts every 5 seconds (cat /proc/interrupts; sleep 5; ...) With acpi=on, I see that the interrupt count for IRQ1 (keyboard) is incrementing even if I don't use the keyboard. With acpi=ht the interrupt count was not incrementing. Something is definitely triggering the irq1 when acpi=on... Shooting in the dark: maybe i8042 multiplexer is triggering a wrong IRQ? But i8042.nomux=1... I don't really know if IRQ1 is multiplexed, to tell the truth. I know it is kinda weird but it's not happening anymore. No msg in dmesg, no "stucked" keys. Daniele, others, are you still facing this bug? Linux assam 2.6.23-gentoo-r3 #1 PREEMPT Wed Dec 5 09:06:24 CET 2007 i686 Intel(R) Pentium(R) M processor 1.50GHz GenuineIntel GNU/Linux It's 2.6.23.8 (probably) from Gentoo. Do you remember, that this bug emerges mostly when in X? (see below) I have no clue what "fixed" it, but I remember that some X.Org stuff was in Gentoo stabilized some time ago (so it's now on x86, which I am running on). But I more guess that it may cover the bug from emerging not fixing it, coz I believe it's in-kernel issue. This is more than weird... Now I got it inside VirtualBox while testing new KDE-4 liveCD. kernel: 2.22.13 msg was (transcript): atkbd.c: Spurious NAK on isa0060/serio0. Some hardware might be trying access hardware directly. [3x repeated] Anyone's having some info on this? I am not facing it on my Gentoo box for maybe a month. I also think that some X.org change might be "masking" this bug, since it is of course a kernel bug. I have now linux-2.6.23-gentoo-r6 and will test with ac,battery,processor modules loaded altogether; I will report my findings later Still same problem, but dmesg doesn't contain anything. The errors are actually masked - if any. I can always reproduce the bug this way: modprobe ac modprobe battery modprobe processor scite /usr/src/linux/Documentation/kernel-parameters.txt Then I scroll up & down, with arrow keys and PgUp/PgDn when finally one key gets stuck and the text keeps scrolling even after release. It's easy to reproduce it this way. You also get the same problem when scrolling with Firefox or when typing also (I Shift+Deleted a *LOT* of good emails because of this bug) OK :(. You are right. It's not producing msg in dmesg but the stucking is still there. Hello, Here are more details about what I see on my laptop (Comment #54) by using i8042.debug=1 and keeping a key pressed. The laptop uses translated mode/scanset 2 (I havn't been able to put the i8042 in direct mode or put the keyboard in scanset 1 or 3). It looks like the scancodes of the "real" key events manage to place themselves on the input port of the i8042 in the middle of the sequence triggered by the acpi "power_supply" check. Some examples... 1. After a "power_supply" check: i8042 probably reads "0xe0 0x71 0xe0 0xf0 0x71" i8042 outputs "0xe0 0x71 0xe0 0xf1" => No real problem here 2. Sometimes a real key event (i.e. scancode 0x20) is read between the 0xe0 and 0x71 i8042 probably reads "0xe0 0x20 0x71 0xe0 0xf0 0x71" i8042 outputs "0xe0 0x20 0x71 0xe0 0xf1" => 0x20 is turned into "0xe0 0x20" and we have lost a key press If we loose a 0xa0 instead (key release of 0x20) then there is a "stuck key" 3. Sometimes the release bit is on the wrong scancode i8042 probably reads "0xe0 0x71 0xe0 0xf0 0x20 0x71" i8042 outputs "0xe0 0x71 0xe0 0xa0 0x71" => 0x20 is read just after the 0xf0 of the "0xe0 0xf0 0x71", and the release bit is "moved" from the second 0x71 to the 0x20 => the key press is turned into a key release... I don't know if the 0xf0 of a "0xf0 0x20" can be read by the i8042 just before the 0x71 of the "0xe0 0x71" (i.e. "0xe0 0xf0 0x71 0x20" => i8042 outputs a 0x20 instead of 0xa0 => stuck key). That's weird... Note: I also see some scancodes when the brightness of the screen is changed by the system... Hi Sebastien, thank you for these valuable informations; can we deduce that the i8042 code needs to be rewritten? If multiplexling is not working we should at first provide a workaround (disabling multiplexing?) and then see what is being done wrong by the i8042 code Just a thought... I am always using i8042.nomux=1 Did anybody check the differences when specifying or not i8042.nomux=1? Unfortunately the "power_status" sequence comes from the KBD port (irq 1, AUX flag not set). So i8042.nomux=1 won't have any effect. I also tried i8042.noaux=1, without any success... FYI: Just filled https://bugzilla.redhat.com/show_bug.cgi?id=433164 Can somebody please test if i8042.noacpi has any positive effect? Thanks The bug ist still there with 2.6.25-r6. Is "i8042.noacpi" a correct kernel parameter? 2.6.25-r6 ignores it because it is unknown "i8042.noacpi=1" is unknown either. @Erik Boritsch: I am pretty sure it no more exists neither in nearly previous kernel versions I am also noticing that with kernel 2.6.24 the notebook does not shutdown when issuing shutdown from XFCE4 menu if the battery is plugged in (although I haven't loaded any module). Don't know if relevant but maybe somebody else had noticed it Please add 2.6.24.(1-3) and 2.6.25-rc(1-6) to affected kernels. I've had same issues on a desktop AMD64 PC using 2.6.18 kernel that came with Debian4 r3 Etch AMD64. This issue existed as far back as 6 months ago - I think it was kernel 2.6.15 then what's the status of this bug? Can anyone verify that if "i8042.nopnp=1" workaround the problem, as described in comment #1? @Zhang Rui: there's some confusion here, I will make some tests again and report my findings atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. The problem happened in the usual way, scrolling down some text and pressing Up/Down, PagUp/PagDown. It's not hard to trigger it, and anyway happens during normal computer usage. This is the summary of the bug situation, from my point of view: 1) I have always been using and I am still using 'i8042.nomux=1' 2) when the battery,ac and thermal modules were built into kernel, 'acpi=off' turned off ACPI and fixed the dmesg error messages, but that was not a viable solution so 3) I compiled them as modules and could blacklist them so that they are not loaded automatically, causing the same problem (keypress glitches) but more often without the corresponding dmesg message. The problem *IS* still happening anyway after loading them manually (see first 2 lines) I am a developer, I have developed several C libraries and can help in kernel patches testing, if necessary. I have barely understood that this problem is due to wrong ACPI tables but I really don't have a strategy to narrow the problem down nor fix it. Please tell me if I can do something more About 'i8042.nopnp=1': it is absolutely ineffective. This is my /proc/cmdline: root=/dev/hda5 console=tty1 vga=791 video=vesafb:vram=2,xres=1024,yres:768,bpp:8,hsync1:30,vsync1:50,hsync2:55,vsync2:85,accel,mtrr i8042.nomux=1 i8042.nopnp=1 I have loaded manually 'battery', 'ac' and 'thermal' and I got a lot of the usual errors: atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key pressed (translated set 2, code 0xb1 on isa0060/serio0). atkbd.c: Use 'setkeycodes e031 <keycode>' to make it known. Hi, Daniele Thanks for your test. From the comment #77 it seems that you can't do workaround this bug by adding the boot option of "i8042.nopnp=1". This is inconsist with what you said in comment #1. At the same time it seems that the keyboard interrupt(IRQ 1) is also triggered while EC triggers the ACPI interrupt(IRQ 9). In such case the unknown keyboard scan code is gotten. So OS will complain the following warning message: >atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). >atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. > atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). > atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. >atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). >atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. Maybe this is related with the keyboard configuration in BIOS option. Will you please confirm whether the keyboard mode can be changed in BIOS option? Will you please do the following command and get the serio type? >cat /sys/devices/platform/i8042/serio0/id/type Thanks. Please see comment #10, which clarifies what said in the comments #1-#9. I know there is a real mess in these comments posted by me, but if you read them incrementally (considering the latest as the truemost) you can get the correct information from them, e.g. that turning ACPI off fixes the problem (spurious key codes do not happen). I will check if keyboard mode can be changed in BIOS. serio type (cat /sys/devices/platform/i8042/serio0/id/type) is: 06 Thank you Keyboard mode cannot be changed in BIOS (PhoenixBIOS). This notebook is a Fujitsu-Siemens V2000 Pro exact copy, which is sold as 'Maxdata Pro 7000DX' (internal product name says 7000X instead), I hope this information can be of some use. I noticed these two lines in dmesg: PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report So I added 'pci=routeirq', without any success. After loading the ac,battery and thermal modules I got these: atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. There's to say that, as far as I know, this notebook has a real PS/2 keyboard and PS/2 touchpad which are linked to the i8042. Up to now the most interesting comment to fix this bug is comment #64. Hello, I saw something interesting a few days ago: I updated the bios of my msi l745 laptop to the latest version. After reboot, the "unknown key released" messages were not triggered anymore. However unplugging the AC adapter made them come back. Also one more thing: I tried to dump the content of the EC_SC (Embedded Controller Status) register some times ago to understand the "EC GPE Storm" issue, and I saw that the flag SMI_EVT was set from time to time. Moreover I read in a document from Phoenix that the keyboard controller may be involved to manage SMIs in Legacy more. So... one more wild guess: the EC is (wrongly) triggering SMIs to notify the state of the AC power and some SMI related data are made available on the keyboard controller ports, confusing the kernel. Unfortunately I have not been able to find the model/datasheets of my EC. lm_sensors says Trying family `National Semiconductor'... Yes Found unknown chip with ID 0xa300 But I'm not even sure that this information is reliable... (and at the moment I haven't found any datasheet that is matching the content of the registers specified in the DSDT). I also tried to dump the registers of the EC using the acer_ec.pl tool and force the state of some registers, but without any success. @Sebastien: thanks for these useful informations, I hope we can get to a solution for this nasty bug. Here are some downstream / duplicates of this bug: http://bugs.launchpad.net/ubuntu/+bug/124406 http://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/194214 http://bugs.launchpad.net/ubuntu/+bug/39315 http://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/65249 A possible fix to atkbd.c lays in here: http://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/39315 I hope there's something useful scattered there around. If I run: rmmod ac thermal; modprobe ac; modprobe thermal I almost instantly get keyboard stuck keys. Sometimes the ENTER key used to run the command is even duplicated! Like if I had pressed enter twice. Other times if I run that line and press something that key starts instantly to repeat indefinitely (until another key is pressed). I still wonder why Windows never had this issue...maybe we (the Linux kernel) are using some "grey zone" ACPI functions which are not behaving as expected (standards?) and making the i8042 become crazy? Another guy having the same problem: http://www.mail-archive.com/linux-input@vger.kernel.org/msg00014.html Can we say that the problem happens only with Fn+? keys? It happens with any type of key, not only the special Fn keys. The patch in https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/39315/comments/2 does not seem effective to me, I won't try it. I have decompiled the DSDT, seems the standard one for the "Intel Montara" processors (Centrino). Can somebody (more experienced than me) please decompile it and see if Linux gets assigned a special table instead of the one passed to other OSes? I am asking because at a certain point the AML contains a possible OS-specific block: Device (PCI0) { Method (_INI, 0, NotSerialized) { If (CondRefOf (_OSI, Local0)) { Store (0x07D1, OSYS) } Else { If (LEqual (SizeOf (_OS), 0x14)) { Store (0x07D0, OSYS) } Else { If (LEqual (SizeOf (_OS), 0x27)) { Store (0x07CF, OSYS) } Else { Store (0x07CE, OSYS) } } } ... } On 2008-08-12 I upgraded to gentoo-sources-2.6.25-r7 The bug seems fixed. I will post another comment if it is not. It seems fixed with gentoo-sources-2.6.25-r7 and gentoo-sources-2.6.25-r6. I haven't tested with others. The only recent change to my parameters was adding 'clocksource=acpi_pm', but I would not say that it worked around the bug since in my /var/log/messages there is an atkbd.c message also when booting with clockource=acpi_pm... I will post again if the bug reappears...weird, I would like to narrow down the bug to its cause now that it has disappeared If you need information from my configuration, please ask From my emerge.log (grep gentoo-sources): 2008-07-19 12:56:49 - >>> emerge (7 of 10) sys-kernel/gentoo-sources-2.6.25-r6 to / 2008-07-19 12:56:49 - === (7 of 10) Cleaning (sys-kernel/gentoo-sources-2.6.25-r6::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r6.ebuild) 2008-07-19 12:56:49 - === (7 of 10) Compiling/Merging (sys-kernel/gentoo-sources-2.6.25-r6::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r6.ebuild) 2008-07-19 12:59:04 - >>> AUTOCLEAN: sys-kernel/gentoo-sources 2008-07-19 12:59:04 - === (7 of 10) Post-Build Cleaning (sys-kernel/gentoo-sources-2.6.25-r6::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r6.ebuild) 2008-07-19 12:59:04 - ::: completed emerge (7 of 10) sys-kernel/gentoo-sources-2.6.25-r6 to / 2008-07-22 12:32:19 - *** emerge unmerge =sys-kernel/gentoo-sources-2.6.24-r8 2008-07-22 12:32:30 - === Unmerging... (sys-kernel/gentoo-sources-2.6.24-r8) 2008-07-22 12:33:06 - >>> unmerge success: sys-kernel/gentoo-sources-2.6.24-r8 2008-07-22 20:41:27 - >>> emerge (9 of 12) sys-kernel/gentoo-sources-2.6.25-r7 to / 2008-07-22 20:41:27 - === (9 of 12) Cleaning (sys-kernel/gentoo-sources-2.6.25-r7::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r7.ebuild) 2008-07-22 20:41:28 - === (9 of 12) Compiling/Merging (sys-kernel/gentoo-sources-2.6.25-r7::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r7.ebuild) 2008-07-22 20:42:58 - >>> AUTOCLEAN: sys-kernel/gentoo-sources 2008-07-22 20:42:59 - === (9 of 12) Post-Build Cleaning (sys-kernel/gentoo-sources-2.6.25-r7::/usr/portage/sys-kernel/gentoo-sources/gentoo-sources-2.6.25-r7.ebuild) 2008-07-22 20:42:59 - ::: completed emerge (9 of 12) sys-kernel/gentoo-sources-2.6.25-r7 to / 2008-08-12 15:08:29 - === Unmerging... (sys-kernel/gentoo-sources-2.6.25-r6) 2008-08-12 15:08:37 - >>> unmerge success: sys-kernel/gentoo-sources-2.6.25-r6 Does not really seem related to gentoo-sources package (?!), as I experienced the bug (as per the comments here on this tracker) on July 22 (before the gentoo-sources update) and on August 4. I have been using 2.6.25-r6 up to this morning (switched to r7 and AOK). From my /var/log/messages the 'clocksource=acpi_pm' was introduced on Jul 22 13:19:00 No hope. The bug is not fixed. I can witness that it hardly happens if the battery applet is not present. So, the new way to trigger this bug is: 1) load the 'battery' module (or 'ac') 2) activate an applet or application which polls the battery status (like the XFCE4 battery applet) If no application polls the battery status, the bug does not happen (I assume that the same is true for the 'ac' and 'thermal' modules). Hi, Daniele Will you please confirm whether the system is affected by the following message? >atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). >atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. From the acpidump info it seems that there exist the following definition in many ACPI control method. >Method (_Q31, 0, NotSerialized) { > Store (0x32, SMIF) > Store (Zero, TRP0) > Sleep (0x64) > Notify (\_SB.AC, 0x80) Maybe the SMI is triggered when OS checks the status of AC/Battery/Thermal. In such case maybe the incorrect keyboard scancode is reported. Will you please check whether the issue is fixed by bios upgrading? Thanks. Hi ykzhao, yes I confirm that I get: ---- atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. ---- When loading one (or more than one) of the ac,battery,thermal modules and using a program reading data through those modules (like the battery status applet). I have not yet found a BIOS upgrade for this notebook. I own a Maxdata 7000DX Pro, which seems to me a Fujitsu Amilo (see http://gentoo-wiki.com/Maxdata_Pro_7000DX). Maxdata website says nothing about this line of products, and they do not mention BIOS upgrade downloads. I have found on the FujitsuSiemens website a BIOS upgrade file called 'FSC_BIOSFlashISOCDImageAMILOM7400_R01S0Z_1001939.ISO' for the Fujitsu Amilo M7400, but I will not apply it unless I am sure that it is the correct BIOS upgrade. Can you please help me? I have attached my dmidecode informations. From these informations I see that I have a Phoenix BIOS, version R01-M0Vf (released 10/23/03). Where can I get the correct BIOS upgrade (starting from this BIOS version and for my mainboard)? Thanks Created attachment 17506 [details]
first lines of 'dmidecode' output, unique ids and serial numbers removed
Hi, Daniele Thanks for the confirm. It seems that it is a very old BIOS.(10/23/03). I am not sure whether ACPI is supported very well on this laptop. At the same time I see the following info from http://gentoo-wiki.com/Maxdata_Pro_7000DX > ACPI issues There are 2/3 documented (here) issues regarding mouse and keyboard of this notebook (and most probably of many others) happening with 2.6.x kernels with ACPI active > AT2 PS2 Keyboard There is another issue regarding ACPI and always the i8042 chipset: some keys may get stuck and the release event may not be caught. This problem it's due only to another ACPI vs i8042 conflict (like the above one regarding the PS2 mouse). Maybe this issue is related with the hardware/BIOS. But as this is a very old machine model, maybe there is no available BIOS to be upgraded. Poor hardware, poor people. Thanks. Hmm, I have the very same problem on acer TravelMate 243LC and I doubt that BIOS is the same on those two models. I'll look further into it, yet I don't think it is one specific BIOS issue for it happens on different models from different manufacturers. ACPI is of course supported, as it works through Windows XP. The page you are referring to, http://gentoo-wiki.com/Maxdata_Pro_7000DX, was authored by me - so it does not contain more informations than this bug tracker. I do not agree with you, "Poor hardware, poor people", this was a high profile notebook, bought (new) at about 1800 € in 2004, not a cheap notebook. As Erik said, this might not be a BIOS problem, and let's not forget that keyboard works fine on Windows. And, anyway, I expect hardware to work as it should, on Windows and on Linux. Seems like you are not going to work on this bug. Created attachment 17689 [details]
Patch 1/4 : Don't issue the burst disable command if EC exits the burst mode
Created attachment 17690 [details]
Patch 2/4: Clear the query_pending bit only after processing EC notification event
Created attachment 17691 [details]
Patch 3/4: Simplify EC working flowchart and always enable EC GPE
Created attachment 17692 [details]
patch 4/4: Add some udelay in EC GPE handler to avoid EC GPE interrupt storm
From the acpidump info it seems that there exist the following definition in many ACPI control method. >Method (_Q31, 0, NotSerialized) { > Store (0x32, SMIF) > Store (Zero, TRP0) > Sleep (0x64) > Notify (\_SB.AC, 0x80) Maybe the SMI is triggered when OS checks the status of AC/Battery/Thermal. In such case maybe the incorrect keyboard scancode is reported. Will you please try the attached four patches on the latest kernel(2.6.27-rc5) and see whether the issue still exists? Thanks. @ykzhao: I will test within next few hours Thanks What kernel shall I use specifically? I have tried to apply the first 2 patches to the latest git (retrieved with 'git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6') but some hunks do fail, seems to be some similar code already in place there. I tested the vanilla unpatched 2.6.27-r5 (from git), still affected I guess you are expected to *apply* those patches to 2.6.27-rc5. @mnowak: read comment #104 > I have tried to apply the first 2 patches > to the latest git "latest git" != 2.6.27-rc5 But I obviously did not tried it, anyway. > I tested the vanilla unpatched 2.6.27-r5 (from git) Because not patched, right? Or, what I am missing here...? ...and anyway it fails ec_gpe_storm.patch and ec_work_mode.patch are missing 1 hunk each (respectively 3rd and 2nd) I won't compile the patched kernel without all hunks being patched OK! yakui, please tell which kernel verion your patch applied Hi, Daniele You can try all the patches on the latest kernel(for example: 2.6.27-rc6). Thanks. just downloaded 2.6.27-rc6 from: http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.27-rc6.tar.bz2 Will try ASAP DOES NOT WORK. Maybe I should apply patches in a different order? I won't go on unless all hunks are successful Here is my shell output: --------------------------------- legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ patch -p1 < ../ykzhao-patches/ec_asus.patch patching file drivers/acpi/ec.c Hunk #1 succeeded at 135 with fuzz 2 (offset 25 lines). Hunk #2 FAILED at 957. 1 out of 2 hunks FAILED -- saving rejects to file drivers/acpi/ec.c.rej legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ patch -p1 < ../ykzhao-patches/ec_clear_query.patch patching file drivers/acpi/ec.c Hunk #1 succeeded at 278 (offset 21 lines). Hunk #2 succeeded at 512 (offset 8 lines). Hunk #3 succeeded at 528 (offset 8 lines). legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ patch -p1 < ../ykzhao-patches/ec_gpe_storm.patch patching file drivers/acpi/ec.c Hunk #3 FAILED at 517. Hunk #4 succeeded at 786 (offset 49 lines). 1 out of 4 hunks FAILED -- saving rejects to file drivers/acpi/ec.c.rej legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ patch -p1 < ../ykzhao-patches/ec_work_mode.patch patching file drivers/acpi/ec.c Hunk #1 succeeded at 200 (offset 33 lines). Hunk #2 succeeded at 228 (offset 33 lines). Hunk #3 succeeded at 251 (offset 33 lines). Hunk #4 succeeded at 260 (offset 33 lines). Hunk #5 succeeded at 278 (offset 33 lines). Hunk #6 succeeded at 532 (offset 20 lines). Hunk #7 succeeded at 740 (offset 22 lines). Hunk #8 succeeded at 787 (offset 22 lines). Hunk #9 succeeded at 880 (offset 22 lines). Hunk #10 succeeded at 975 (offset 22 lines). Hunk #11 succeeded at 983 (offset 22 lines). legolas558@localhost ~/3rd_pty-sources/linux-2.6.27-rc6 $ The only patch which needs corrections seem to be ec_asus.patch, I will skip it as I do not own an Asus Let's see what happens.. I am running the 2.6.27-rc6 kernel with all patches except the failing one, ec_asus.patch. I have found on dmesg the well known lines: atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. But I have to say that I am not able to trigger the stuck keys anymore...but I guess it's just a matter of time. I really hope that the atkbd.c messages are a sort of 1-time glitch and not a stuck key that I did not recognize... I have found other 4 lines in dmesg: ---- atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. ---- No stuck key experienced yet Ok, got a stuck key while browsing on menuconfig... @ykzhao: patches 2+3+4 do not fix the bug Hi, Daniele thanks for the test. Sorry that I paste the incorrect patch in comment #98. Thanks for the confirmation that the issue can't be resolved by the attached patch. The i8042.nomux=1 is no more necessary with kernel >= 2.6.25 I have tried the latest linux-2.6.27-rc9 with the patch in http://bugzilla.kernel.org/show_bug.cgi?id=11549 (http://bugzilla.kernel.org/attachment.cgi?id=18047), it does not fix the bug. Just shooting in the dark, really. If some developer could take benefit from it, I can offer my hardware via SSH for some hours (with a booted LiveCD) Bug does not happen with Ubuntu Intrepid Ibex kernel 2.6.27-7 I don't know where's the magic, but I am booting with no special kernel parameter, it simply works (TM) What about gentoo's and vanilla kernels? I don't have the hardware right now to test... @erik: my last test was that of comment 119 (2.6.27-rc9), and was not successful. I am no more involved in Gentoo but I can provide the hardware via SSH as said in comment 120, if necessary can anyone please try a vanilla kernel and see if the problem still exists? i have today tried the current sidux kernel 2.6.27-7.slh.1-sidux-686 (vanilla kernel + sidux patches) on an fujitsu siemens amilo pro v2000 notebook... unfortunatelly, this bug is not fixed with the kernel (at least for me) (In reply to comment #124) > can anyone please try a vanilla kernel and see if the problem still exists? > @Eugen: my notebook, Maxdata 7000X, is a clone of FS Amilo Pro v2000, so I assume our hardware base is almost the same. Can you please try an Ubuntu Intrepid Ibex LiveDVD? I don't know what's in the recipe, but it's working fine here...and I ac,battery,thermal modules are always loaded Correction: with Ubuntu Intrepid Ibex I could trigger the nasty bug only once when I was using battery (no AC plugged) and the wireless network adapter. I think the two are somewhat tied, i.e. using the network adapter increases the chances of triggering the bug. So bug is not fixed neither in Ubuntu Intrepid Ibex, it's still something at the kernel level A friend suggested me that perhaps we could use approach used in bug 12021, and an opportunely modified patch similar to http://bugzilla.kernel.org/attachment.cgi?id=19301&action=view Can some expert tell if this would classify as "bad hack" or "fix"? It would not be a "bad hack" if it is "normal" for our bugged keyboard to loose release events under heavy i8042 traffic load Friend in comment 128 is Grégory Schmitt cc dmitry and the other input experts. :) Regarding priority: this bug is pissing off a lot of people which approaches to Linux...I am asking (myself) if this bug would have been fixed in a few weeks if the i8042 was used in server equipment instead of laptops... Created attachment 19530 [details]
My dmidecode output
Seems that my DSDT has some bad errors. I am not able to fix them... --- Intel ACPI Component Architecture ASL Optimizing Compiler version 20061109 [May 16 2007] Copyright (C) 2000 - 2006 Intel Corporation Supports ACPI Specification Revision 3.0a dsdt.dsl 2561: Field (ERAM, AnyAcc, Lock, Preserve) Error 4074 - ^ Host Operation Region requires ByteAcc access dsdt.dsl 2660: Store (Arg2, DAT3) Error 4005 - Method argument is not initialized ^ (Arg2) dsdt.dsl 2660: Store (Arg2, DAT3) Remark 5065 - Not a parameter, used as local only ^ (Arg2) ASL Input: dsdt.dsl - 4738 lines, 174788 bytes, 1974 keywords Compilation complete. 2 Errors, 0 Warnings, 1 Remarks, 456 Optimizations The first error can be easily fixed by using ByteAcc; the second error is something weird. The Amilo M7400, almost a clone by hardware specs, had the same error on its DSDT. I don't know if the DSDT compilation error with iasl are related to our i8042 problem, but there is an interesting reading here: http://www.mavetju.org/mail/view_message.php?list=freebsd-acpi&id=2286041 Created attachment 19874 [details]
DSDT AML for Maxdata 7000X, fixed errors and compiled with iasl 20061109
Bug happens equally with the fixed DSDT, I will now try a patch by G.Schmitt on 2.6.28 kernel
My scenario -comment #73- was related to a KVM USB switch. Whenever I bypass the switch the issues disappears. The ACPI workaround didn't work for me. The bug is in fact from as far back as 2003: http://lkml.org/lkml/2003/9/15/210 The user begins by complaining about the familiar errors: Sep 14 20:42:27 cpp kernel: atkbd.c: Unknown key (set 2, scancode 0xb6, on isa0060/serio0) pressed. Sep 14 20:42:27 cpp kernel: i8042 history: 19 a2 99 0f 8f 0f 8f 1c 9c 04 84 36 09 b6 89 b6 Sep 14 22:13:00 cpp kernel: atkbd.c: Unknown key (set 2, scancode 0xa5, on isa0060/serio0) pressed. Sep 14 22:13:00 cpp kernel: i8042 history: a7 20 9e 21 9f 24 25 26 27 a0 a4 a5 a6 a7 a1 a5 Sep 14 22:13:00 cpp kernel: atkbd.c: Unknown key (set 2, scancode 0xa6, on isa0060/serio0) pressed. Sep 14 22:13:00 cpp kernel: i8042 history: 20 9e 21 9f 24 25 26 27 a0 a4 a5 a6 a7 a1 a5 a6 Sep 14 22:13:00 cpp kernel: atkbd.c: Unknown key (set 2, scancode 0xa7, on isa0060/serio0) pressed. Sep 14 22:13:00 cpp kernel: i8042 history: 9e 21 9f 24 25 26 27 a0 a4 a5 a6 a7 a1 a5 a6 a7 An Andries Brouwer then replies: Enter your search termsSubmit search formWeblkml.org Date Mon, 15 Sep 2003 23:28:00 +0200 From Andries Brouwer <> Subject Re: 2.6.0-test1, -test4 control key "stuck" On Mon, Sep 15, 2003 at 08:55:46PM +0000, xsdg wrote: > What would happen if the kernel received two keypress events, and then one > key- > release event for a single key? I'd imagine that it'd disregard the > duplicate > keypress The answers differ for 2.4 and 2.6. For 2.4 each keypress is a keypress, and key releases are rather unimportant as long as the key is not a modifier key. For 2.6 we have synthetic repeat, so a second keypress from the keyboard is ignored, the key repeats with kernel-defined frequency, and the repeat is ended by the key release. > any idea what might cause the key sticking problem? If a key release is not seen, 2.4 doesnt mind, but 2.6 keeps repeating. > Also, I'm not sure how the final issue I described Do not recall all items of all letters I answer - sorry. Andries In other words, the bug may be caused by a combination of faulty hardware and some very naive code in the keyboard driver that was added in kernel 2.6. (In reply to comment #136) > My scenario -comment #73- was related to a KVM USB switch. Whenever I bypass > the switch the issues disappears. The ACPI workaround didn't work for me. > I would say that yours is not bug 9147 (in reply to comment 137) thank you very much David for showing us the "root" of the bug, I will CC this comment to Andries Brouwer just in case he can confirm. (In reply to comment #119) > The i8042.nomux=1 is no more necessary with kernel >= 2.6.25 > > I have tried the latest linux-2.6.27-rc9 with the patch in > http://bugzilla.kernel.org/show_bug.cgi?id=11549 > (http://bugzilla.kernel.org/attachment.cgi?id=18047), it does not fix the > bug. > Just shooting in the dark, really. > Bug 8740 is not yet fixed, so i8042.nomux=1 is still necessary. [ 3203.402077] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1 [ 3203.403431] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1 [ 3203.405245] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1 [ 3203.406608] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1 [ 3203.407971] psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1 [ 3203.407979] psmouse.c: issuing reconnect request [ 4399.418433] atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). [ 4399.418464] atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. (read on https://bugs.launchpad.net/linux/+bug/119194/comments/34) perhaps acpi_osi=Linux can work around the problem (like acpi=off but without the negative effects) Please see bug 1203. On this hardware I also have clocksource problems. I am currently booting with 'clocksource=acpi_pm i8042.nomux=1' to have an usable clock. Maybe we have the same issue of bug 1203? See also comment 39 Created attachment 20049 [details] Tentative workaround patch To be applied against 2.6.28 branch, but is quite simple, so that a simple copy & paste will be enough. This patch tries to force the release to keycode 0xe0. According to Daniele, this patch greatly improves the situation, but does not solve it completely. The patch has been inspired by patch provided for another notebook (see bug http://bugzilla.kernel.org/show_bug.cgi?id=12021, patch http://bugzilla.kernel.org/attachment.cgi?id=19301) According to me, this is nothing but a workaround which will not cure the evil at its root, but it may help until the real bug is tracked down. Could anyone apply this patch, and comment on it ? Thanks. I can witness that with patch in attachment 20049 [details] the problem happened only once in a week, while without any patch I can experience it 10/20 times per day.
I am currently booting without this patch and with kernel command line 'vga=791 lapic hpet=force clocksource=hpet i8042.nomux=1' to test if the problem happens equally with HPET timer enabled
@Daniele, sorry that i have not responded a long time... unfortunately, i really lack free time :( ) 2.6.28 sidux kernels do not fix the issue ) "acpi_osi=Linux" ( http://bugzilla.kernel.org/show_bug.cgi?id=9147#c141 ) grub cheat code does not fix the issue here too @Eugen: ok, really thanks for having tested this out. I can also confirm that 'hpet=force clocksource=hpet' does not workaround the bug. It is not relevant. The only known partial workaround is G.Schmitt's patch in comment 143 Unfortunately, on this machine acpi=off doesn't work either, as that stops the WLAN card from working... Is there any other option that will stop the bug from occurring, or at least keep it from occurring every day? @David: the best workaround is NOT to turn off acpi but to unload ac,battery and thermal modules Ah, thanks. It seems to be thermal that's the main culprit for me, as the bug starts appearing when the computer heats up and the fan starts spinning faster. I noticed something interesting today, though: Since (if I remember correctly) killing X resets the keyboard somehow, I tried switching to the console using Ctrl-Alt-F1 and Ctrl-Alt-F2 etc., but Ctrl-Alt-F1 seems to be where X has placed itself (usually Ctrl-Alt-F7 I believe?), because it just brought me back to X. Ctrl-Alt-F2 brought up a launcher. Be that as it may, one of those combinations, and presumably the first one, released the control key again. If hopping out of X and back in again or some such thing will consistently clear up the bug, then that will really go a long way towards making the system usable. I can confirm what I said earlier. While the X server is apparently still on console 7, pressing Ctrl-Alt-F1 when the bug appears refreshes the screen somehow and clears the bug up. Quite possibly, pressing Ctrl-Alt-<F-Anything> will do the same thing. @David: as you said, that operation resets the X keyboard driver. But the problem is still at the hardware/kernel level since it is not normal to have to reset the driver when the problem occurs. We could configure a key which resets Xorg when the keyboard goes crazy...but a keyboard should not get crazy in first place I can witness (by empirical experience) that with ACPI modules loaded (ac + battery + thermal) and battery in slot, the bug appears often. @ykzhao: can you please make the point? do you have any strategy to fix this issue? Thanks I'd like to report a potentially positive finding. After upgrading to Ubuntu version 9.04 - the Jaunty Jackalope, which I did in April when it was released, I haven't experienced this bug. The kernel version is "2.6.27-11-generic". I can't remember what it used to be before. Is anyone else here using Jaunty Jackalope with this or a newer kernel, and able to confirm that the bug isn't manifesting itself on their system anymore? I am using Debian Squeeze with kernel 2.6.28-1-686 and bug is still happening. I have often experienced fake negatives, with some kernel you have to try hard before triggering the bug. On my laptop battery and wireless usage seem to make it happen sooner That's very interesting! Your kernel version is certainly newer than mine, and yet the bug, and the messages about unknown keys in dmesg, are both gone here, despite the system being used in the same way as before. At least we can be fairly sure that the messages and the bug are associated, if anyone doubted it before. Since I moved from Gentoo 2 yrs ago to Fedora I've experienced the problem only twice in times of Fedora 8. Now with Fedora 10, I've never seen them: is it possible for you Daniele to try recent Fedora on the system and see what happens? @Michal: good news! I have been using the F10 live cd and everything works fine! That kernel has the ac,battery,thermal modules builtin. I would say that Fedora10/Fedora11 is not affected, maybe thanks to some magic patch. How can we isolate them? I would focus on Fedora11 patches, can somebody help in this search? By the way, I am using Arch Linux 2.6.30 now. Next thing I will do is to try to boot using the Fedora kernel and my Arch Linux system I have downloaded http://mirror.cc.vt.edu/pub/fedora/linux/updates/11/SRPMS/kernel-2.6.29.5-191.fc11.src.rpm which contains the 2.6.29 kernel and all the patches. First I will try this 2.6.29 kernel with my current .config, then if bug is still present (as it should be) I will start testing patches starting from the most relevant. Created attachment 22178 [details]
full list of patches applied by Fedora11 to the 2.6.29 kernel
Created attachment 22179 [details] test script to trigger kernel bug 9147 These are the steps in order to trigger the 9147 bug. 1) open a terminal 2) run ./9147test.sh 3) quickly press up arrow and then enter to run again the script 4) repeat (3) till the enter key gets stuck If the enter key never gets stuck (it usually does in less than 10 executions), then system is not affected by the bug Created attachment 22191 [details] bug 9147 test results on kernels 2.6.29 (w/o fedora11 patches), 2.6.29-5 (w/o fedora11 patches), 2.6.30 From my tests (see attachment) it can clearly be deduced that the bug is not triggered when using built-in ACPI modules; I have not yet triggered it with builtin ac,battery,thermal,container,processor built-in modules and I will keep using this kernel. I will add a comment if I trigger the bug. I invite other testers to use a kernel with such built-in ACPI modules to see if the bug appears; I assume it is very hard to trigger when modules are built-in, when instead can be easily triggered with my test script when modules are separated from kernel. Can somebody please explain why there is such difference? It has apparently become more deep from previous kernels, since bug was easily triggered also with built-in ACPI modules with previous kernels. So right now the best workaround is to compile the above mentioned modules as builtin. To make the point: Fedora10/Fedora11 do not have any patch which addresses bug 9147 as side effect, it's just that they compile the guilty modules as built-in making the bug gone or very hard to trigger. Also, I haven't yet been able to enable (via kernel .config) the correct options to show again the dmesg messages when a key gets stuck. Created attachment 22192 [details]
.config with minimalistic features for my system
After about 2 weeks I triggered (once) the stuck-keys bug even with the built-in ACPI modules. In order to trigger the 9142 kernel bug also with recent kernels which have built-in ACPI functionalities it is necessary to run a program which constantly monitors the thermal sensors (for example a tray icon plugin). So bug is not fixed, it's just that it's easier to detect by using ACPI modules and by overloading the i8042 I also think that the bug should be UNASSIGNED if nobody is really working towards a solution. Just to let others know that I just came across this bug again with 2.6.31-0.204.rc9.fc12.i686. atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. It's not happening very often but still here. Since this laptop has ~5 years, I guess this bug will be in kernel even after the HW itself will be long gone :). @Michal: I think that when Windows XP will be declared unsupported many people will try installing linux on hardware from this generation and, guess what, they'll say that with linux their laptop keyboard doesn't even work... Likely some of them will, but the problem is that the bug is not *that* sucking to interest enough of kernel upstream. That's how it goes.. Yes but some weeks ago Alexey Starikovskiy got this bug assigned, is something being studied for the solution? @Alexey Starikovskiy: any news? Interesting related pages: http://lkml.indiana.edu/hypermail/linux/kernel/0602.2/1795.html https://bugzilla.redhat.com/show_bug.cgi?id=181457 My /sys/bus/serio/devices/serio0/softrepeat is 0 Daniele, thanks for pointing that out. Recently on Fedora 12 similar problem emerged w/ same symptoms (some key is "locked", Tab usually) but w/o the message in dmesg. Linux dhcp-lab-216.englab.brq.redhat.com 2.6.31.6-166.fc12.i686 #1 SMP Wed Dec 9 11:14:59 EST 2009 i686 i686 i386 GNU/Linux I just added "nosoftrepeat", will see if that helps for me. Could someone please report status with latest 2.6.32 vanilla kernel? @alexey: I have just tried with my 2.6.33-rc1 (from wireless-testing) and result is the same. key stuck after a couple of tests through 9147test.sh script (attached), see comment 160 Created attachment 24292 [details]
9147test.sh
Adding 'nosoftrepeat' does nothing on my hardware (bug is still there), and anyway my softrepeat is already 0 so you shouldn't try with 'nosoftrepeat' if /sys/bus/serio/devices/serio0/softrepeat is already 0 Could you please check if last patch from this bug reports helps: http://bugzilla.kernel.org/show_bug.cgi?id=14858 ok, I am gonna check the patch in a while do you know how to detect BIOS and EC versions? My notebook is basically a Fujitsu Siemens V2000 with different branding (Maxdata) dmidecode may have this information @alexey: I have applied accel_query_propogation.patch and after running 9147test.sh 4 times I get the enter key stuck bug. I have attached the dmesg of the last 4 9147test.sh executions. My sensation is that the patch has not changed bug behaviour, but I might not be sensible to changes in the order of milliseconds... Please tell me if I can test other patches and/or get other debug information Created attachment 24298 [details]
dmesg of last 4 runs of 9147test.sh with a 2.6.33-rc1 kernel + accel_query_propogation.patch
Loading/unloading the battery module checks the battery status, but in the attached dmesg it's not clear if battery is there or not. Please ignore such messages since battery has been present (fully charged) during the whole test (it increases bug triggering), but I think that it may be broken (sometimes it does not report its capacity). Bug also happened before this battery glitch (due to shock damage to the battery which probably injured its capacity sensor) so it's not relevant. 'processor', 'ac', 'container' modules are not responsible for the bug triggering. I can trigger the bug by using 'battery' or 'thermal' modules. (Thanks to james_mcl for pointing out this) Daniele, reporting back after a longer time... was pretty much busy. we have been suffering because of this bug for a long time, but we completely reinstalled sidux one month ago, using the 2009-03 release. the bug is gone. more precisely: the bug did not appear since the installation... currently, a newer sidux release (2009-04) is available. worth a try? best regards Eugen The bug is likely still present in latest sidux. When I moved from home-brewed Gentoo to Fedora 8 it "disappeared" too, but as of now it's sometimes back but hopefully reduced to livable minimum. Michal, i do agree here... there is an interesting info/link in a similar bug report here: http://bugzilla.kernel.org/show_bug.cgi?id=9448#c37 Created attachment 24530 [details] test script for kernel bug 9147 @mnowak: that happens because battery and thermal modules are compiled in kernel and not stand alone (please confirm) @eugen: interesting...I have asked them to test the script (In reply to comment #185) > @mnowak: that happens because battery and thermal modules are compiled in > kernel and not stand alone (please confirm) Yes. That's what I thing. With Gentoo I used to have my own .config, where those modules were stand-alone. Now in Fedora they are compiled-in and the problems is much less frequent (since Fedora 8) - twice a day? -, I can see the problem from time to time on my Fedora 12 (still compiled-in) but without atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. messages in dmesg. Eugen's interesting link is http://ajaxxx.livejournal.com/62378.html It does indeed look very relevant, which is why I'm posting the URL here even though there's a link to the 9448 comment with it in. @mnowak: yes, when compiled in kernel that's what has happened so far. The fact that it seldom happens when not compiled as module is already a clue towards solution, in my opinion. regardin comment 187: I am not fully getting the point, so we have a tight race condition caused by hardware as first source? (In reply to comment #188) > regardin comment 187: I am not fully getting the point, so we have a tight > race > condition caused by hardware as first source? My, poor, understanding is that the X race has nothing to do with our HW problem. I am thinking of having now two "stucked-keys-problems" at one time (with the HW one reduced largely). But I could be wrong. @mnowak: that was also my first understanding when reading the SIGIO article. The only possible scenario otherwise could be that there *is* a SIGIO glitch in current Xorg/kernel, and that boxes with i8042 controllers (or similar) have 2 or more peripherals which end up locking the same single hardware (perhaps the i8042 controller itself, which someway "emulate" locking of 2 or more peripherals in an inconsistent way), thus leading to locking glitches. But this would be very strange since no other hardware combinations are showing the "bug" and since it is happening from so much time (kernel 2.6.9 being the first verified). Also there has been different research on this bug and it seemed like having identified the source elsewhere. FYI: Yesterday I suffered by atkbd.c: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). atkbd.c: Use 'setkeycodes e060 <keycode>' to make it known. while in vim - I got "never ending" series of 'X\n'. It's quite a bunch of months I've seen this issue. The box is full updated F-12 with following kernel: Linux <HOST.NAME> 2.6.31.12-174.2.3.fc12.i686 #1 SMP Mon Jan 18 20:22:46 UTC 2010 i686 i686 i386 GNU/Linux does the problem still exist in the latest upstream kernel? say 2.6.35 or 2.6.36-rc It certainly fails with recent minor releases of 2.6.34. Will test 2.6.35 when Fedora 14 is out. I am using 2.6.35-rc4 and it seems like fixed, I will report back if it's not the case 2.6.35.6-48.fc14.i686 just failed for me with: [63135.617931] atkbd serio0: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). [63135.617945] atkbd serio0: Use 'setkeycodes e060 <keycode>' to make it known. does the problem still exist in the latest upstream kernel? I can see stuck keys from time to time (let's say once per month), but not sure it's "atkbd ..." problem. Will have a look. I tested 2.6.38 and no hung but I may need months to reproduce it... This is on a Thinkpad R40, with distribution kernel Linux debian 2.6.38-2-686 #1 SMP Tue Mar 29 17:27:45 UTC 2011 i686 GNU/Linux I installed stress (copyright file says its from http://weather.ou.edu/~apw/projects/stress/) and slightly adapted its manpage example: debian:~$ stress --verbose --cpu 8 --io 4 --vm 2 --vm-bytes 128M --vm-hang 5 --timeout 90s After switching the terminal window and typing debian:~$ echo thequck bown fox jumps over th lazy dog ... te qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq I get the expected [210307.542572] atkbd serio0: Unknown key released (translated set 2, code 0xe0 on isa0060/serio0). [210307.542579] atkbd serio0: Use 'setkeycodes e060 <keycode>' to make it known. in dmesg output. Immediately after stress exits the stuck keys effect still happens frequently, but after a while it is back to about once in 5 minutes. It's great that kernel bugzilla is back. I do not have any idea about this bug. Anyway, can you please verify if the problem still exists in the latest upstream kernel? Yes, still an issue with kernel 3.4.0-030400rc5-generic-pae #201205011817 SMP Tue May 1 22:31:34 UTC 2012 i686 athlon i386 GNU/Linux: [ 3403.034926] atkbd serio0: Unknown key pressed (translated set 2, code 0x0 on isa0060/serio0). [ 3403.034937] atkbd serio0: Use 'setkeycodes 00 <keycode>' to make it known. [ 3403.035444] atkbd_interrupt: 36 callbacks suppressed My laptop is a Fujitsu Siemens Amilo A1645. Sorry but I am going to soon dispose the hardware in question Hey there , If you can test in on your computer and see if it's fixed in 2014 kernel releases , that would be great. Bug closed as the hardware is not available any more. Please feel free to re-open it if anyone can reproduce the problem in the latest upstream kernel. |