Most recent kernel where this bug did not occur: 2.6.23.1 Distribution: Gentoo Hardware Environment: x86 (Intel Centrino 1.5 Ghz) Software Environment: Problem Description: Steps to reproduce: Boot the computer and try to type anything Almost everything is explained in the summary. I can't use the keyboard with 2.6.24 release candidates (the problem also occured with 2.6.24-rc1). I use exactly the same configuration used under 2.6.23.1 for input devices but it doesn't work. Does anybody has any clue to solve this annoying problem ?
Created attachment 13523 [details] Kernel configuration file
Is the keyboard detected? Could you please boot with :i8042.debug" kernel parameter and post your full dmesg? Thanks!
It seems the keyboard is detected, but the keyboard doesn't work.
Created attachment 13545 [details] Output of dmesg
This is a partial dmesg... Could you try booting with :i8042.debug log_buf_len=262144"? Anyway, I see there is some data coming from the keyboard port. Could you also compile and load evbug module? Thanks!
Created attachment 13547 [details] Output of dmesg with evbug So, I have booted with the additional parameters and evbug loaded.If I type anything, nothing new appears in the kernel logs. So, the keyboard is detected, but if I type, nothing happens.
since the bug seems to trigger rather deterministically (you have no keyboard at all), it might make sense to try a bisection run to figure out what broke it between v2.6.23 and v2.6.24-rc2. a quick git introduction can be found at: http://www.kernel.org/doc/local/git-quick.html a QuickStart: git-clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6.git cd linux-2.6.git # check that this kernel still has a broken keyboard! git-bisect start git-bisect bad # mark current kernel bad git-bisect good v2.6.23 # mark .23 good and from this point on, build and boot every kernel that git-bisect presents you, and mark them either as "git-bisect good" or "git-bisect bad". After around 10-15 build+reboots (not all will be full kernel rebuilds - especially at the end it will speed up) git-bisect should return a single "guilty" commit.
I have moved and unfortunately, my broadband internet connection is not yet activated. So, it's difficult to get clone of the current git kernel. However, I have noticed something strange.I use gentoo and if I type "I" after the promt "press I to enter interactive boot mode", it has an effect and the keyboard works. As soon as I get the first question asking to boot services, the keyboard doesn't work. So, there happens something in between which breaks the keyboard input.
Hm, doesnt udevd start up in that timeframe?
Udevd indeed starts in that timeframe. I have tried git-bisect and unfortunately, it doesn't seem to be conclusive. There was 37 revisions yet to test but from that point, it's impossible to compile the kernel. The commit at that point was "[a2883dfa2e4a94b24109b2bfe735561e50cc44b4] Pull thermal into release branch". I get plenty of errors during "make prepare" (see the last attachment)
Created attachment 13846 [details] compile errors
> Udevd indeed starts in that timeframe. I have tried git-bisect and > unfortunately, it doesn't seem to be conclusive. There was 37 > revisions yet to test but from that point, it's impossible to compile > the kernel. The commit at that point was > "[a2883dfa2e4a94b24109b2bfe735561e50cc44b4] Pull thermal into release > branch". > I get plenty of errors during "make prepare" (see the last > attachment) does it build if you fudge the bisection point by say 3 commits via: git-reset --hard HEAD~3 and try to build from there? git-bisect can work with arbitrary bisection points, so if one point does not build, try to find a nearby, sane-looking commit via "git-bisect visualize" and jump there via git-reset.
I finally manage to find the problematic commit with git-bisect and the result is quite surprising. The bad commit is the following: db1c291af7ad748777371f25b9ff92e3e5aba38e is first bad commit commit db1c291af7ad748777371f25b9ff92e3e5aba38e Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Wed Sep 26 19:43:41 2007 +0400 ACPI: SBS: Make SBS reads table-driven. Re-factor SBS functions to use tables and cycles for repeated operations. Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> So, the problem isn't directly to input drivers. The problem of the keyboard seems rather to be a symptom of a problem caused by the ACPI SBS driver. Furthermore, the computer is also very slow when this problem occurs. Also, it's impossible to shut down the computer by pressing the power button more than 20 seconds with this problematic kernel. What I find very strange is that we end up with a commit made before 2.6.23.However, with the stable 2.6.23.9 kernel, the problem doesn't occurs.
Created attachment 13850 [details] Log of git-bisect
Could you please attach an acpidump output?
Created attachment 13854 [details] Output of acpidump This dump is made with kernel 2.6.22.14 which is the last kernel which really works well for me.
Created attachment 13865 [details] disable callback in sbs.c Please check if keyboard works with this patch. Also, could you please check if kacpi* threads consume the CPU time?
That works much better with the patch you suggested. The keyboard works and the kacpi_notify and kacpid processes consume each one 0% of CPU time. Is there any drawback with this patch or do you intend to submit it for the 2.6.24 kernel ? Thanks for your help.
This patch is debug only -- battery and charger states will not be updated. I'll try to come up with proper patch soon, as direction seem to be found.
Created attachment 13872 [details] Ignore alarms from unknown devices Please check if this patch works. It should also allow you to notice AC adapter plug-in/out.
Created attachment 13889 [details] Ignore alarms from unknown devices #2 This patch work better...
Created attachment 13892 [details] Reset alarm bit Please try if this one line patch helps. Unknown devices might be irrelevant...
Created attachment 13893 [details] Ignore alarms from unknown devices #3 This one should be applied on top of 'Reset alarm bit' patch
So, I have applied the two last patches you proposed. There remains two problems. I get plenty of errors like this in dmesg: drivers/input/serio/i8042.c: 1c <- i8042 (interrupt, 0, 1) [28196] There is also a problem with the computation of the remaining capacity of the battery. The KDE applet indicates a remaining time of 14 hours ! If it could be true; that would be nice. But I don't thinks it's the case. I have also noticed that the present rate indicated by cat /proc/acpi/battery/BAT0/state increase from around 370 to 1400 mW when I unplug the AC adapter. With the kernel 2.6.22, this rate stays equal to 9979 mW.
Created attachment 13894 [details] errors with i8042 (in dmesg) This is the result of dmesg | grep i8042
did you try to remove 'i8042.debug'? please post contents of /proc/acpi/battery/*/* and /sys/class/power_supply/*/*
Indeed, I had forgotten that I had put i8042.debug as a boot parameter. As you asked, here is the content of /sys/class/power_supply/: BAT0 -> ../../devices/LNXSYSTM:00/device:00/PNP0A03:00/device:0b/PNP0C09:00/ACPI0001:00/ACPI0002:00/power_supply/BAT0/ BAT1 -> ../../devices/LNXSYSTM:00/device:00/PNP0A03:00/device:0b/PNP0C09:00/ACPI0001:00/ACPI0002:00/power_supply/BAT1/ sbs-charger -> ../../devices/LNXSYSTM:00/device:00/PNP0A03:00/device:0b/PNP0C09:00/ACPI0001:00/ACPI0002:00/power_supply/sbs-charger The content of /proc/acpi is put in attachment.
Created attachment 13895 [details] Content of /proc/acpi
I meant the contents of files
Sorry for the misunderstanding. So you can find the contents of all these files in attachment.
Created attachment 13906 [details] Contents of /proc/acpi and /sys/class/power_supply
Francois, there is no error. Before 24-rc, SBS gave you discharge in Watts (electric power), but now it gives you discharge in Ampers (electric current), to convert it back to Watts you will need to multiply it by current voltage (also known). 1.695A * 15.896V = 26.943W -- about right. I'm moving bug into resolved, so patches could be pushed into kernel. Thanks for report and testing.
So, if I have well understood, the problem is caused by the computation made by Klaptopdaemon. It probably needs to be updated to take into account the change of units of measurement.
> Francois, there is no error. Before 24-rc, SBS gave you discharge in > Watts (electric power), but now it gives you discharge in Ampers > (electric current), to convert it back to Watts you will need to > multiply it by current voltage (also known). 1.695A * 15.896V = > 26.943W -- about right. ugh. Isnt that a kernel ABI change?
:) No. ABI stands for Application *Binary* Interface. Output from 'state' is ascii text, not a binary, so ABI is not changed by this. present rate has (and always had) a units appended to the number (1695 *mA*), so if some userland program ignored it, this is a problem of the program.
> :) No. > ABI stands for Application *Binary* Interface. Output from 'state' is > ascii text, not a binary, so ABI is not changed by this. present rate > has (and always had) a units appended to the number (1695 *mA*), so if > some userland program ignored it, this is a problem of the program. this is one of the most stupid arguments i've ever seen. If there's ever a gratituous ABI change this is one. Of course text contents of /proc and /sys are just as part of the ABI as anything else. There are exceptions, but this is not one of them. user-space started reasonably assuming that the units is watts, because you nowhere said that it's going to be changed to amperes. So now there's user-space code that relies on it being watts and you MUST NOT BREAK THAT. that change needs to be reverted. The proper solution is to add _another_ entry that gives amperes and then slowly and gently phase out the old units. (Of course that's more work, but that's how compatibility and responsible engineering looks like.) Ingo
So you are saying every user has to implement a parser and convertor for all possible veriations of metric and imperial UOM?
Created attachment 13907 [details] return rate in same units as capacity Please check if this patch makes klaptopd happy...
Ingo, /proc/acpi/battery/state was able to return rate in mA or mW from the beginning. this is not changed. This is why there is a units added to each string, and this units was there since the very beginning as well. So this is not stupid to assume that userspace program should look into this field. Dmitry, this is the design of /proc/acpi. I'm not changing a single bit of it. In new /sysfs you will not have to parse anything -- file 'current' is measured in mA always, and 'energy' is always measured in mWh.
The last patch makes indeed klaptopdaemon happy.
Len, last 3 patches need to be pushed up.
> Ingo, > /proc/acpi/battery/state was able to return rate in mA or mW from the > beginning. this is not changed. This is why there is a units added to > each string, and this units was there since the very beginning as > well. So this is not stupid to assume that userspace program should > look into this field. well, it's still an ABI change, because - absent very clear specifications otherwise, if an app becomes reliant on some userspace-visible detail of the kernel, the rule is for the kernel to keep that aspect, not for the app to change. i got upset about the ABI comment - "text" is a subset of binary data and is thus part of the kernel ABI, so it has just as strict compatibility requirements. We can sometimes get away with changing the ABI - when we are lucky enough that apps have not become reliant on a particular detail yet (for example the description fields on /proc/interrupts), but quite often this is not the case. I'd suggest you notify the klaptopdaemon developers about the unintended dependency they have built into their app, and once it's fixed it might be possible to phase that out.
IMHO, you stretch ABI term too far. If data needs text parser to be understood, it is not binary and not part of binary. klaptopd made wrong assumptions about what fields it could "ignore" during parsing, so only "bug-compatibility" was broken.
> IMHO, you stretch ABI term too far. If data needs text parser to be > understood, it is not binary and not part of binary. klaptopd made > wrong assumptions about what fields it could "ignore" during parsing, > so only "bug-compatibility" was broken. Well, i just gave you the fact that "text" data is part of the ABI just as much as "non text" data - basically every file in /proc/* is an example for that. (And i guess lkml would be a more appropriate forum for compatibility policy questions - please bring it up there.)
<wakes up> Please restore the contents of /proc files to exactly what they were previously. That is all. Yes, you could argue that userspace which parsed eariler versions of the /proc files weren't doing it correctly. Well tough luck, that's our fault for designing poor interfaces. We should remain compatuible with that userspace rather than blaming it for being confused. It's very easy: don't break existing userspace. > Len, last 3 patches need to be pushed up. I doubt if Len even read this. Can you please send them out via email and cc me? That's assuming they restore these files to their previous contents. If they don't, please fix that.
bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9362 > > > > > > ------- Comment #45 from akpm@osdl.org 2007-12-07 14:46 ------- > <wakes up> > > Please restore the contents of /proc files to exactly what they were > previously. That is all. Done > Yes, you could argue that userspace which parsed eariler versions of the > /proc files weren't doing it correctly. Well tough luck, that's our fault > for designing poor interfaces. We should remain compatuible with that > userspace rather than blaming it for being confused. > > It's very easy: don't break existing userspace. > >> Len, last 3 patches need to be pushed up. > > I doubt if Len even read this. Can you please send them out via email and cc > me? > That's assuming they restore these files to their previous contents. If they > don't, please fix that. Done. > >
Thanks for the quick resolution, guys. The 3 patches (taken from the list) are now in the acpi release tree for 2.6.24-rc5. Andrew's doubt was correct, however -- I didn't notice this report until now, for I was not on the CC. In general, adding acpi-bugzilla@lists.sourceforge.net, as I'm doing now, to bugs that may involve ACPI is good practice. I do periodically sift through bugzilla to look for trouble in the ACPI category, but periodically != constantly. François, To answer your question about the regression being checked into the tree 3 months ago during 2.6.23... If you run gitk on that commit, you'll notice that it figures out where in the release cycle that commit lands. If it said "Preceeds 2.6.23", then it means it shipped in 2.6.23 - no matter what tag it follows. But this one "Preceeds 2.6.24-rc1", which means it first shipped upstream in 2.6.24-rc1, even though it was checked into a tree well before that. Indeed, this patch series was checked into 2.6.23 much earlier than the final commit, and it was "re-based" late in 2.6.23 in preparation for 2.6.24... But more importantly, you noticed this first at 2.6.24-rc2. That means that nobody with a system like yours (Smart battery and Klaptopdaemon bug) ran the -mm tree for the period this was in 2.6.23-rc*-mm*, nor ran the upstream kernel tree until 2.6.24-rc2. While it is true that systems with sbs is quite rare (for many years, Linux did not support them at all) this testing gap is not comforting.
Fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a21e4fe587ebb793bf3a1c02755f8a845170328 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c2d00f2d1bf8dd721f5557b0df23729addc1898d http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=09f1fb41ad45bc18abe07c62f7b56560571584d1 can be closed.