Bug 9362

Summary: The keyboard doesn't work, the computer is very slow
Product: ACPI Reporter: François Valenduc (francoisvalenduc)
Component: Power-BatteryAssignee: Alexey Starikovskiy (astarikovskiy)
Status: CLOSED CODE_FIX    
Severity: high CC: acpi-bugzilla, akpm, bunk, dmitry.torokhov, francoisvalenduc, greg, mingo, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243    
Attachments: Kernel configuration file
Output of dmesg
Output of dmesg with evbug
compile errors
Log of git-bisect
Output of acpidump
disable callback in sbs.c
Ignore alarms from unknown devices
Ignore alarms from unknown devices #2
Reset alarm bit
Ignore alarms from unknown devices #3
errors with i8042 (in dmesg)
Content of /proc/acpi
Contents of /proc/acpi and /sys/class/power_supply
return rate in same units as capacity

Description François Valenduc 2007-11-13 00:52:46 UTC
Most recent kernel where this bug did not occur: 2.6.23.1
Distribution: Gentoo
Hardware Environment: x86 (Intel Centrino 1.5 Ghz)
Software Environment:
Problem Description:

Steps to reproduce: Boot the computer and try to type anything

Almost everything is explained in the summary. I can't use the keyboard with 2.6.24 release candidates (the problem also occured with 2.6.24-rc1). I use exactly the same configuration used under 2.6.23.1 for input devices but it doesn't work.

Does anybody has any clue to solve this annoying problem ?
Comment 1 François Valenduc 2007-11-13 00:53:24 UTC
Created attachment 13523 [details]
Kernel configuration file
Comment 2 Dmitry Torokhov 2007-11-14 08:05:20 UTC
Is the keyboard detected? Could you please boot with :i8042.debug" kernel parameter and post your full dmesg? Thanks!
Comment 3 François Valenduc 2007-11-14 09:04:02 UTC
It seems the keyboard is detected, but the keyboard doesn't work.
Comment 4 François Valenduc 2007-11-14 09:04:35 UTC
Created attachment 13545 [details]
Output of dmesg
Comment 5 Dmitry Torokhov 2007-11-14 12:17:59 UTC
This is a partial dmesg... Could you try booting with :i8042.debug log_buf_len=262144"? Anyway, I see there is some data coming from the keyboard port. Could you also compile and load evbug module? Thanks!
Comment 6 François Valenduc 2007-11-14 12:46:11 UTC
Created attachment 13547 [details]
Output of dmesg with evbug

So, I have booted with the additional parameters and evbug loaded.If I type anything, nothing new appears in the kernel logs. So, the keyboard is detected, but if I type, nothing happens.
Comment 7 Ingo Molnar 2007-11-30 07:13:41 UTC
since the bug seems to trigger rather deterministically (you have no keyboard at all), it might make sense to try a bisection run to figure out what broke it between v2.6.23 and v2.6.24-rc2.

a quick git introduction can be found at:

  http://www.kernel.org/doc/local/git-quick.html

a QuickStart:

 git-clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6.git
 cd linux-2.6.git

 # check that this kernel still has a broken keyboard!

 git-bisect start
 git-bisect bad           # mark current kernel bad
 git-bisect good v2.6.23  # mark .23 good

and from this point on, build and boot every kernel that git-bisect presents you, and mark them either as "git-bisect good" or "git-bisect bad". After around 10-15 build+reboots (not all will be full kernel rebuilds - especially at the end it will speed up) git-bisect should return a single "guilty" commit.
Comment 8 François Valenduc 2007-12-02 03:02:52 UTC
I have moved and unfortunately, my broadband internet connection is not yet activated. So, it's difficult to get clone of the current git kernel. However, I have noticed something strange.I use gentoo and if I type "I" after the promt "press I to enter interactive boot mode", it has an effect and the keyboard works. As soon as I get the first question asking to boot services, the keyboard doesn't work. So, there happens something in between which breaks the keyboard input.
Comment 9 Ingo Molnar 2007-12-04 03:28:44 UTC
Hm, doesnt udevd start up in that timeframe?
Comment 10 François Valenduc 2007-12-04 05:57:05 UTC
Udevd indeed starts in that timeframe. I have tried git-bisect and unfortunately, it doesn't seem to be conclusive. There was 37 revisions yet to test but from that point, it's impossible to compile the kernel. The commit at that point was "[a2883dfa2e4a94b24109b2bfe735561e50cc44b4] Pull thermal into release branch".
 I get plenty of errors during "make prepare" (see the last attachment)
Comment 11 François Valenduc 2007-12-04 05:57:38 UTC
Created attachment 13846 [details]
compile errors
Comment 12 Ingo Molnar 2007-12-04 06:13:37 UTC
> Udevd indeed starts in that timeframe. I have tried git-bisect and 
> unfortunately, it doesn't seem to be conclusive. There was 37 
> revisions yet to test but from that point, it's impossible to compile 
> the kernel. The commit at that point was 
> "[a2883dfa2e4a94b24109b2bfe735561e50cc44b4] Pull thermal into release 
> branch".
>  I get plenty of errors during "make prepare" (see the last 
>  attachment)

does it build if you fudge the bisection point by say 3 commits via:

   git-reset --hard HEAD~3

and try to build from there? git-bisect can work with arbitrary 
bisection points, so if one point does not build, try to find a nearby, 
sane-looking commit via "git-bisect visualize" and jump there via 
git-reset.
Comment 13 François Valenduc 2007-12-04 14:41:40 UTC
I finally manage to find the problematic commit with git-bisect and the result is quite surprising. The bad commit is the following:

db1c291af7ad748777371f25b9ff92e3e5aba38e is first bad commit
commit db1c291af7ad748777371f25b9ff92e3e5aba38e
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Wed Sep 26 19:43:41 2007 +0400

    ACPI: SBS: Make SBS reads table-driven.

    Re-factor SBS functions to use tables and cycles for repeated operations.
    Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
    Signed-off-by: Len Brown <len.brown@intel.com>

So, the problem isn't directly to input drivers. The problem of the keyboard seems rather to be a symptom of a problem caused by the ACPI SBS driver. Furthermore, the computer is also very slow when this problem occurs. Also, it's impossible to shut down the computer by pressing the power button more than 20 seconds with this problematic kernel.

What I find very strange is that we end up with a commit made before 2.6.23.However, with the stable 2.6.23.9 kernel, the problem doesn't occurs.
Comment 14 François Valenduc 2007-12-04 14:43:21 UTC
Created attachment 13850 [details]
Log of git-bisect
Comment 15 Alexey Starikovskiy 2007-12-04 15:06:44 UTC
Could you please attach an acpidump output? 
Comment 16 François Valenduc 2007-12-04 21:20:31 UTC
Created attachment 13854 [details]
Output of acpidump

This dump is made with kernel 2.6.22.14 which is the last kernel which really works well for me.
Comment 17 Alexey Starikovskiy 2007-12-05 07:48:17 UTC
Created attachment 13865 [details]
disable callback in sbs.c

Please check if keyboard works with this patch.
Also, could you please check if kacpi* threads consume the CPU time?
Comment 18 François Valenduc 2007-12-05 11:51:49 UTC
That works much better with the patch you suggested. The keyboard works and the kacpi_notify and kacpid processes consume each one 0% of CPU time.
Is there any drawback with this patch or do you intend to submit it for the 2.6.24 kernel ?

Thanks for your help.
Comment 19 Alexey Starikovskiy 2007-12-05 12:25:34 UTC
This patch is debug only -- battery and charger states will not be updated.
I'll try to come up with proper patch soon, as direction seem to be found.
Comment 20 Alexey Starikovskiy 2007-12-05 13:09:03 UTC
Created attachment 13872 [details]
Ignore alarms from unknown devices

Please check if this patch works. It should also allow you to notice AC adapter plug-in/out.
Comment 21 Alexey Starikovskiy 2007-12-06 07:05:56 UTC
Created attachment 13889 [details]
Ignore alarms from unknown devices #2

This patch work better...
Comment 22 Alexey Starikovskiy 2007-12-06 07:42:24 UTC
Created attachment 13892 [details]
Reset alarm bit 

Please try if this one line patch helps. Unknown devices might be irrelevant...
Comment 23 Alexey Starikovskiy 2007-12-06 07:43:42 UTC
Created attachment 13893 [details]
Ignore alarms from unknown devices #3

This one should be applied on top of 'Reset alarm bit' patch
Comment 24 François Valenduc 2007-12-06 10:08:22 UTC
So, I have applied the two last patches you proposed. There remains two problems. I get plenty of errors like this in dmesg:
drivers/input/serio/i8042.c: 1c <- i8042 (interrupt, 0, 1) [28196]

There is also a problem with the computation of the remaining capacity of the battery. The KDE applet indicates a remaining time of 14 hours ! If it could be true; that would be nice. But I don't thinks it's the case. I have also noticed that the present rate indicated by cat /proc/acpi/battery/BAT0/state increase from around 370 to 1400 mW when I unplug the AC adapter. With the kernel 2.6.22, this rate stays equal to 9979 mW.
Comment 25 François Valenduc 2007-12-06 10:09:36 UTC
Created attachment 13894 [details]
errors with i8042 (in dmesg)

This is the result of dmesg | grep i8042
Comment 26 Alexey Starikovskiy 2007-12-06 10:38:33 UTC
did you try to remove 'i8042.debug'?
please post contents of /proc/acpi/battery/*/* and /sys/class/power_supply/*/*
Comment 27 François Valenduc 2007-12-06 13:41:32 UTC
Indeed, I had forgotten that I had put i8042.debug as a boot parameter.

As you asked, here is the content of /sys/class/power_supply/:

BAT0 -> ../../devices/LNXSYSTM:00/device:00/PNP0A03:00/device:0b/PNP0C09:00/ACPI0001:00/ACPI0002:00/power_supply/BAT0/
BAT1 -> ../../devices/LNXSYSTM:00/device:00/PNP0A03:00/device:0b/PNP0C09:00/ACPI0001:00/ACPI0002:00/power_supply/BAT1/
sbs-charger -> ../../devices/LNXSYSTM:00/device:00/PNP0A03:00/device:0b/PNP0C09:00/ACPI0001:00/ACPI0002:00/power_supply/sbs-charger

The content of /proc/acpi is put in attachment.
Comment 28 François Valenduc 2007-12-06 13:42:01 UTC
Created attachment 13895 [details]
Content of /proc/acpi
Comment 29 Alexey Starikovskiy 2007-12-06 13:50:45 UTC
I meant the contents of files
Comment 30 François Valenduc 2007-12-07 10:29:24 UTC
Sorry for the misunderstanding. So you can find the contents of all these files in attachment.
Comment 31 François Valenduc 2007-12-07 10:30:15 UTC
Created attachment 13906 [details]
Contents of /proc/acpi and /sys/class/power_supply
Comment 32 Alexey Starikovskiy 2007-12-07 10:55:56 UTC
Francois, there is no error. Before 24-rc, SBS gave you discharge in Watts (electric power), but now it gives you discharge in Ampers (electric current), to convert it back to Watts you will need to multiply it by current voltage (also known). 1.695A * 15.896V = 26.943W -- about right.

I'm moving bug into resolved, so patches could be pushed into kernel.
Thanks for report and testing.
Comment 33 François Valenduc 2007-12-07 11:00:51 UTC
So, if I have well understood, the problem is caused by the computation made by Klaptopdaemon. It probably needs to be updated to take into account the change of units of measurement.
Comment 34 Ingo Molnar 2007-12-07 11:03:33 UTC
> Francois, there is no error. Before 24-rc, SBS gave you discharge in 
> Watts (electric power), but now it gives you discharge in Ampers 
> (electric current), to convert it back to Watts you will need to 
> multiply it by current voltage (also known). 1.695A * 15.896V = 
> 26.943W -- about right.

ugh. Isnt that a kernel ABI change?
Comment 35 Alexey Starikovskiy 2007-12-07 11:14:38 UTC
:) No.
ABI stands for Application *Binary* Interface. Output from 'state' is ascii text, not a binary, so ABI is not changed by this.
present rate has (and always had) a units appended to the number (1695 *mA*), so if some userland program ignored it, this is a problem of the program.
Comment 36 Ingo Molnar 2007-12-07 11:22:45 UTC
> :) No.
> ABI stands for Application *Binary* Interface. Output from 'state' is 
> ascii text, not a binary, so ABI is not changed by this. present rate 
> has (and always had) a units appended to the number (1695 *mA*), so if 
> some userland program ignored it, this is a problem of the program.

this is one of the most stupid arguments i've ever seen. If there's ever 
a gratituous ABI change this is one. Of course text contents of /proc 
and /sys are just as part of the ABI as anything else. There are 
exceptions, but this is not one of them.

user-space started reasonably assuming that the units is watts, because 
you nowhere said that it's going to be changed to amperes. So now 
there's user-space code that relies on it being watts and you MUST NOT 
BREAK THAT.

that change needs to be reverted. The proper solution is to add 
_another_ entry that gives amperes and then slowly and gently phase out 
the old units. (Of course that's more work, but that's how compatibility 
and responsible engineering looks like.)

	Ingo
Comment 37 Dmitry Torokhov 2007-12-07 11:25:49 UTC
So you are saying every user has to implement a parser and convertor for all possible veriations of metric and imperial UOM?
Comment 38 Alexey Starikovskiy 2007-12-07 11:34:07 UTC
Created attachment 13907 [details]
return rate in same units as capacity

Please check if this patch makes klaptopd happy...
Comment 39 Alexey Starikovskiy 2007-12-07 11:41:32 UTC
Ingo,
/proc/acpi/battery/state was able to return rate in mA or mW from the beginning. this is not changed. This is why there is a units added to each string, and this units was there since the very beginning as well. So this is not stupid to assume that userspace program should look into this field.
Dmitry,
this is the design of /proc/acpi. I'm not changing a single bit of it. In new /sysfs you will not have to parse anything -- file 'current' is measured in mA always, and 'energy' is always measured in mWh. 
Comment 40 François Valenduc 2007-12-07 12:24:17 UTC
The last patch makes indeed klaptopdaemon happy.
Comment 41 Alexey Starikovskiy 2007-12-07 12:38:05 UTC
Len, last 3 patches need to be pushed up.
Comment 42 Ingo Molnar 2007-12-07 12:39:48 UTC
> Ingo,
> /proc/acpi/battery/state was able to return rate in mA or mW from the 
> beginning. this is not changed. This is why there is a units added to 
> each string, and this units was there since the very beginning as 
> well. So this is not stupid to assume that userspace program should 
> look into this field.

well, it's still an ABI change, because - absent very clear 
specifications otherwise, if an app becomes reliant on some 
userspace-visible detail of the kernel, the rule is for the kernel to 
keep that aspect, not for the app to change.

i got upset about the ABI comment - "text" is a subset of binary data 
and is thus part of the kernel ABI, so it has just as strict 
compatibility requirements.

We can sometimes get away with changing the ABI - when we are lucky 
enough that apps have not become reliant on a particular detail yet (for 
example the description fields on /proc/interrupts), but quite often 
this is not the case. I'd suggest you notify the klaptopdaemon 
developers about the unintended dependency they have built into their 
app, and once it's fixed it might be possible to phase that out.
Comment 43 Alexey Starikovskiy 2007-12-07 12:55:29 UTC
IMHO, you stretch ABI term too far. If data needs text parser to be understood, it  is not binary and not part of binary. klaptopd made wrong assumptions about what fields it could "ignore" during parsing, so only "bug-compatibility" was broken. 
Comment 44 Ingo Molnar 2007-12-07 14:08:47 UTC
> IMHO, you stretch ABI term too far. If data needs text parser to be 
> understood, it is not binary and not part of binary. klaptopd made 
> wrong assumptions about what fields it could "ignore" during parsing, 
> so only "bug-compatibility" was broken.

Well, i just gave you the fact that "text" data is part of the ABI just 
as much as "non text" data - basically every file in /proc/* is an 
example for that. (And i guess lkml would be a more appropriate forum 
for compatibility policy questions - please bring it up there.)
Comment 45 Andrew Morton 2007-12-07 14:46:55 UTC
<wakes up>

Please restore the contents of /proc files to exactly what they were
previously.  That is all.

Yes, you could argue that userspace which parsed eariler versions of the
/proc files weren't doing it correctly.  Well tough luck, that's our fault
for designing poor interfaces.  We should remain compatuible with that
userspace rather than blaming it for being confused.

It's very easy: don't break existing userspace.

> Len, last 3 patches need to be pushed up.

I doubt if Len even read this.  Can you please send them out via email and cc me?
That's assuming they restore these files to their previous contents.  If they
don't, please fix that.
Comment 46 Alexey Starikovskiy 2007-12-08 02:04:35 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9362
> 
> 
> 
> 
> 
> ------- Comment #45 from akpm@osdl.org  2007-12-07 14:46 -------
> <wakes up>
> 
> Please restore the contents of /proc files to exactly what they were
> previously.  That is all.
Done
> Yes, you could argue that userspace which parsed eariler versions of the
> /proc files weren't doing it correctly.  Well tough luck, that's our fault
> for designing poor interfaces.  We should remain compatuible with that
> userspace rather than blaming it for being confused.
> 
> It's very easy: don't break existing userspace.
> 
>> Len, last 3 patches need to be pushed up.
> 
> I doubt if Len even read this.  Can you please send them out via email and cc
> me?
> That's assuming they restore these files to their previous contents.  If they
> don't, please fix that.
Done.
> 
> 
Comment 47 Len Brown 2007-12-14 13:16:41 UTC
Thanks for the quick resolution, guys.

The 3 patches (taken from the list)
are now in the acpi release tree for 2.6.24-rc5.

Andrew's doubt was correct, however -- I didn't notice this report until now,
for I was not on the CC.

In general, adding acpi-bugzilla@lists.sourceforge.net, as I'm doing now,
to bugs that may involve ACPI is good practice.  I do periodically sift
through bugzilla to look for trouble in the ACPI category, but
periodically != constantly.

François,
To answer your question about the regression being
checked into the tree 3 months ago during 2.6.23...

If you run gitk on that commit, you'll notice that it
figures out where in the release cycle that commit lands.
If it said "Preceeds 2.6.23", then it means it shipped in
2.6.23 - no matter what tag it follows.
But this one "Preceeds 2.6.24-rc1", which means it
first shipped upstream in 2.6.24-rc1, even though
it was checked into a tree well before that.
Indeed, this patch series was checked into 2.6.23
much earlier than the final commit, and it was "re-based"
late in 2.6.23 in preparation for 2.6.24...

But more importantly, you noticed this first at 2.6.24-rc2.
That means that nobody with a system like yours
(Smart battery and Klaptopdaemon bug) ran the -mm tree
for the period this was in 2.6.23-rc*-mm*, nor ran the upstream
kernel tree until 2.6.24-rc2.

While it is true that systems with sbs is quite rare
(for many years, Linux did not support them at all)
this testing gap is not comforting.