Bug 9998 - SCI interrupt storm - lost keystrokes - Acer Travelmate 4002
Summary: SCI interrupt storm - lost keystrokes - Acer Travelmate 4002
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Alexey Starikovskiy
URL:
Keywords:
: 10483 (view as bug list)
Depends on:
Blocks: 10279
  Show dependency tree
 
Reported: 2008-02-16 02:41 UTC by Black
Modified: 2009-04-07 02:55 UTC (History)
8 users (show)

See Also:
Kernel Version: 2.6.24.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
output of acpidump (109.35 KB, text/plain)
2008-02-18 00:54 UTC, Black
Details
output of dmesg (15.09 KB, text/plain)
2008-02-18 00:54 UTC, Black
Details
test the debug patch (687 bytes, patch)
2008-02-19 23:09 UTC, ykzhao
Details | Diff
output of dmesg with DEBUG (14.94 KB, text/plain)
2008-02-21 09:24 UTC, Black
Details
/proc/interrupts (733 bytes, text/plain)
2008-02-21 09:48 UTC, Black
Details
output of dmesg on 2.6.23.16 (15.28 KB, text/plain)
2008-02-21 11:00 UTC, Black
Details
output of dmesg on 2.6.24.2 with acpi_sci=edge (14.98 KB, text/plain)
2008-02-21 11:01 UTC, Black
Details
/proc/interrupts on 2.6.24.2 with acpi_sci=edge (727 bytes, text/plain)
2008-02-21 11:01 UTC, Black
Details
patch: disable ec gpe in gpe handler (781 bytes, patch)
2008-02-28 19:36 UTC, Zhang Rui
Details | Diff
Detect IRQ storm (1.21 KB, patch)
2008-02-29 09:32 UTC, Alexey Starikovskiy
Details | Diff
ready patch (1.83 KB, patch)
2008-02-29 16:01 UTC, Alexey Starikovskiy
Details | Diff
Call _PSW methods then available (4.61 KB, patch)
2008-03-18 05:53 UTC, Alexey Starikovskiy
Details | Diff
Revert disable storm patch (1.63 KB, patch)
2008-03-18 06:14 UTC, Alexey Starikovskiy
Details | Diff
Change msleep(5) to original udelay(100) (983 bytes, patch)
2008-03-18 18:10 UTC, Alexey Starikovskiy
Details | Diff
Add poll timer (4.13 KB, patch)
2008-03-18 18:12 UTC, Alexey Starikovskiy
Details | Diff
dmesg of vanilla linux-git with 2c81ce4c9c37b910210f2640c28e98a0c398dc26 reverted (115.48 KB, application/octet-stream)
2008-03-19 15:46 UTC, Guillaume Chazarain
Details
dmesg of linux-git with the 3 patches (115.47 KB, text/plain)
2008-03-19 15:49 UTC, Guillaume Chazarain
Details
same patch, attempt #3 (6.83 KB, patch)
2008-07-17 12:22 UTC, Alexey Starikovskiy
Details | Diff
kernel panic with pqtch 16868 (106.26 KB, image/jpeg)
2008-07-19 05:14 UTC, Guillaume Chazarain
Details
try the debug patch in which EC GPE works in level mode (609 bytes, patch)
2008-09-02 01:28 UTC, ykzhao
Details | Diff
try the debug patch in which the query_pending bit is cleared after processing EC notification event (2.17 KB, patch)
2008-09-02 22:50 UTC, ykzhao
Details | Diff
try the debug patch in which the query_pending bit is clear after processing EC notification event (2.72 KB, patch)
2008-09-02 23:55 UTC, ykzhao
Details | Diff
Patch 1/2 : try the debug patch in which the query_pending bit is cleared after processing EC notification event (2.17 KB, patch)
2008-09-03 23:24 UTC, ykzhao
Details | Diff
Patch 2/2: try the debug patch in which EC work mode is simplified (5.43 KB, patch)
2008-09-03 23:35 UTC, ykzhao
Details | Diff
Add some delay in EC GPE interrupt handler on some bios. (Patch 3) (3.82 KB, patch)
2008-09-04 03:06 UTC, ykzhao
Details | Diff
cleaner storm detection, lower lattencies (13.29 KB, patch)
2008-09-04 05:29 UTC, Alexey Starikovskiy
Details | Diff
updated fast transaction patch (15.02 KB, patch)
2008-09-08 22:14 UTC, Alexey Starikovskiy
Details | Diff
separate MSI delays (1.54 KB, patch)
2009-03-27 14:15 UTC, Alexey Starikovskiy
Details | Diff

Description Black 2008-02-16 02:41:02 UTC
Latest working kernel version: 2.6.23 (with genpatches 6 and tuxonice)
Earliest failing kernel version: 2.6.24.2
Distribution: Gentoo
Hardware Environment: Acer Travelmate 4002 (with Smart Battery)
Problem Description:

When I first booted 2.6.24 (config based on the working one from 2.6.23) it paniced. I figured it was a problem with the Smart Battery-Code, so I compiled it as a module (CONFIG_ACPI_SBS=m), hoping to be able to get the trace when I load the module.
Indeed the kernel booted without problem this time. The module loaded just fine as well. But I lose keypresses if (and only if) the module is loaded.

I remember this behaviour (lost keypresses, not sure about the panic) from earlier kernel-versions (forgot which one(s)), when Smart Battery-support was just introduced.
Maybe I should note, that the DSDT is buggy.

Here's what I wrote down on a paper when the kernel paniced on boot:
BUG: unable to handle paging request at virtual address ...
printing eip: ...
Oops: 0002 [#1] PREEMPT
Modules linked in:

Pid: 1, comm: swapper Not tainted (2.6.24.2)
...
Call trace:
mutex_lock
acpi_smbus_transaction
ida_get_new_above
sysfs_ilookup_test
acpi_smbus_read
acpi_ac_get_present
acpi_sbs_add
...
EIP: __mutex_lock_slowpath
end trace
note: swapper[1] exited with preempt_count 1
Kernel panic - not syncing: Attempted to kill init!
Switched to NOHz mode on CPU #0

I hope this very incomplete output is a little useful.

Steps to reproduce:
1. compile and install kernel with CONFIG_ACPI_SBS=y
2. reboot
3. see the panic
4. compile and install kernel with CONFIG_ACPI_SBS=m
5. reboot
6. modprobe sbs
7. type something, some letters/keys are missing
Comment 1 ykzhao 2008-02-17 19:13:27 UTC
Will you please attach the output of dmesg and acpidump?
Thanks.
Comment 2 Black 2008-02-18 00:54:04 UTC
Created attachment 14880 [details]
output of acpidump
Comment 3 Black 2008-02-18 00:54:46 UTC
Created attachment 14881 [details]
output of dmesg
Comment 4 Black 2008-02-18 00:57:04 UTC
Forgot to mention:
Keys are only lost, when the battery is polled (i.e. klaptop running, some WindowMaker applet or just "cat /proc/acpi/battery/BAT0/state" in a loop)
Comment 5 ykzhao 2008-02-18 23:05:49 UTC
HI, Black
   It seems that the bug has the same error message with the bug 9910.
   Will you please try the patch in the comment #14 of bug 9110 and confirm whether the bug is fixed?
   Thanks.
Comment 6 Black 2008-02-19 00:28:35 UTC
Thank you for your help so far.
The patch does fix the panic, but not the problem with lost keystrokes (that I can now experience with compiled-in SBS support).
I'll be happy to provide more information/do some testing if it helps. Just tell me what I can do.
Comment 7 ykzhao 2008-02-19 23:09:17 UTC
Created attachment 14909 [details]
test the debug patch

Will you please try the debug patch and see whether the problem still exists?
Comment 8 ykzhao 2008-02-19 23:13:43 UTC
Hi, Black
    Thanks for the confirm that the patch in bug 9110 can fix the panic problem.
    Will you please try the debug patch in comment #7 and see whether problem still exists? 
    It will be great if you can confirm whether the problem exists on the 2.6.23 kernel. (Of course the patch in bug9110 is required).
    Thanks.
Comment 9 Black 2008-02-20 06:44:34 UTC
(In reply to comment #8)
>     Will you please try the debug patch in comment #7 and see whether problem
> still exists?
Well, the patch does not apply. I tried to do it manually, but the lines to be commented out are not present. Nowhere in the whole kernel, as it seems...
(I took a quick glance at 2.6.23, same result)

>     It will be great if you can confirm whether the problem exists on the
> 2.6.23 kernel. (Of course the patch in bug9110 is required).
As mentioned in the initial report 2.6.23 (with Gentoo's patchset) works fine. Just to make sure I downloaded and compiled vanilla 2.6.23.16 (without any patches at all and CONFIG_ACPI_SBS=y). Works perfectly.
Comment 10 ykzhao 2008-02-20 21:46:09 UTC
Hi, Black
   Please try the debug patch on the 2.6.24.2 kernel and see whether the keystrobe problem still exists. ( Of course the patch in bug9110 is needed).
   Thanks. 
Comment 11 Alexey Starikovskiy 2008-02-21 03:25:30 UTC
Yakui,
Debug patch commenting out burst functions which were inserted in 25-rc1, so .24 is broken by some other means. Don't insist.

Black, could you please uncomment DEBUG at drivers/acpi/ec.c:30 and compile the kernel with time in printk. Please provide a dmesg.
Comment 12 Black 2008-02-21 09:24:16 UTC
Created attachment 14935 [details]
output of dmesg with DEBUG
Comment 13 Alexey Starikovskiy 2008-02-21 09:37:29 UTC
> Created an attachment (id=14935)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=14935&action=view)
> output of dmesg with DEBUG
Thanks, it looks like an interrupt storm to me.  
Comment 14 Alexey Starikovskiy 2008-02-21 09:38:52 UTC
please attach /proc/interrupts 
Comment 15 Black 2008-02-21 09:48:32 UTC
Created attachment 14937 [details]
/proc/interrupts

When running 2.6.23 the value for acpi is much lower/not increasing so fast.
Comment 16 Alexey Starikovskiy 2008-02-21 10:02:33 UTC
Interrupt storm it is :)
It might help if you attach dmesg from 2.6.23.
It might also help if you try different acpi_sci= kernel options e.g. level or edge.
Comment 17 Black 2008-02-21 10:59:17 UTC
I added '#define DEBUG' to ec.c of 2.6.23.16 (don't know if this has any effect) and compiled it with CONFIG_PRINTK_TIME=y. I'll attach the resulting output of dmesg.
Next I booted 2.6.24.2 with acpi_sci=level. The result didn't seem much different from what I posted earlier, so I won't attach it.
But when I tried acpi_sci=edge. I have no idea what it does, but it seems to work. I didn't notice any lost keystrokes. I'll attach dmesg and
/proc/interrupts.
Comment 18 Black 2008-02-21 11:00:15 UTC
Created attachment 14938 [details]
output of dmesg on 2.6.23.16
Comment 19 Black 2008-02-21 11:01:07 UTC
Created attachment 14939 [details]
output of dmesg on 2.6.24.2 with acpi_sci=edge
Comment 20 Black 2008-02-21 11:01:56 UTC
Created attachment 14940 [details]
/proc/interrupts on 2.6.24.2 with acpi_sci=edge
Comment 21 Alexey Starikovskiy 2008-02-21 12:05:12 UTC
do you have correct battery readings with acpi_sci=edge?
Comment 22 Black 2008-02-21 12:53:39 UTC
(In reply to comment #21)
> do you have correct battery readings with acpi_sci=edge?

Yes, the values seem to be correct.
Comment 23 Len Brown 2008-02-22 11:01:03 UTC
> ACPI: setting ELCR to 0200 (from 0440)

Both the working 2.6.23 and the failing 2.6.24 are setting IRQ9
to level triggered.

When you boot 2.6.24 with acpi_sci=edge, it is likely that
you are disabling all ACPI interrupts (except the first one).

On a working system, you should be able to kill acpid
(simply to disable the action on the interrupt)
and press the power or lid buttons and see /proc/interrupts increment.

I expect with acpi_sci=edge, "grep acpi /proc/interrupts" will stay at 1.

If you boot 2.6.24 with CONFIG_ACPI_SBS=n, does "grep acpi /proc/interrupts"
still show the acpi interrupt increasing at a high rate?

Please boot the latest 2.6.25, 
# cd /sys/firmware/acpi/interrupts
# grep . *
# grep acpi /proc/interrupts
and paste the output here
Comment 24 Black 2008-02-26 01:59:29 UTC
(In reply to comment #23)
> On a working system, you should be able to kill acpid
> (simply to disable the action on the interrupt)
> and press the power or lid buttons and see /proc/interrupts increment.
Yup.
 
> I expect with acpi_sci=edge, "grep acpi /proc/interrupts" will stay at 1.
Yup.

> If you boot 2.6.24 with CONFIG_ACPI_SBS=n, does "grep acpi /proc/interrupts"
> still show the acpi interrupt increasing at a high rate?
No, it stay's quite low (<10.000 after a few minutes).
However the storm seems to be caused by accessing the battery, which is not possible without the sbs module. What I mean is: Possibly there is something wrong outside the sbs-code, but it doesn't get triggered. Anyway, you're the expert here :)
 
> # cd /sys/firmware/acpi/interrupts
> # grep . *
error:0
ff_gbl_lock:0
ff_pmtimer:0
ff_pwr_btn:0
ff_rt_clk:1
ff_slp_btn:0
gpe00:0
gpe01:0
gpe02:0
gpe03:0
gpe04:0
gpe05:0
gpe06:0
gpe07:0
gpe08:0
gpe09:0
gpe0A:0
gpe0B:0
gpe0C:0
gpe0D:0
gpe0E:0
gpe0F:0
gpe10:0
gpe11:0
gpe12:0
gpe13:0
gpe14:0
gpe15:0
gpe16:0
gpe17:0
gpe18:0
gpe19:0
gpe1A:0
gpe1B:0
gpe1C:0
gpe1D:443152
gpe1E:0
gpe1F:0
gpe_all:443152
sci:443153

> # grep acpi /proc/interrupts
  9:     559889    XT-PIC-XT        acpi

Typing was quite difficult, so several seconds passed between the two commands.
Comment 25 Zhang Rui 2008-02-28 19:36:28 UTC
Created attachment 15077 [details]
patch: disable ec gpe in gpe handler

does this patch help?
Comment 26 Alexey Starikovskiy 2008-02-29 00:27:21 UTC
Rui, if you completely disable GPE, it certainly helps.
Just checked on my Acer, with you patch storm happens only then EC does something.
Comment 27 Black 2008-02-29 03:43:07 UTC
For me the patch in comment #25 does not seem to help.
Comment 28 Alexey Starikovskiy 2008-02-29 09:32:17 UTC
Created attachment 15095 [details]
Detect IRQ storm

Bad news, this is not a regression.
I've checked back to 2.6.22, and interrupts were going much faster than needed too.
Problem seems to be that hw fails to clear GPE after we service it and write 1 into corresponding bit. Thus, as soon as we get interrupts enabled again, we receive a new one. Google gives too many results for "acer interrupt storm" for this being one-broken-machine case.

Please check if this patch helps. It works for me on TM2300.
Comment 29 Black 2008-02-29 11:35:47 UTC
(In reply to comment #28)
> Please check if this patch helps. It works for me on TM2300.
I'm afraid, it doesn't. Still lost keystrokes...
Comment 30 Alexey Starikovskiy 2008-02-29 13:38:57 UTC
Do you have acpi interrupts count increasing?
Please try if insertion of 'msleep(10)' in ec.c:227 (while loop) helps.
Comment 31 Black 2008-02-29 15:11:48 UTC
(In reply to comment #30)
> Do you have acpi interrupts count increasing?
Not at all. After five minutes of uptime and some aggressive polling of the battery it stayed fixed at 7 (increased to 8 when I pressed the power-button).

> Please try if insertion of 'msleep(10)' in ec.c:227 (while loop) helps.
Yes, it helps :)
No lost keys at all, interrupt count is 7, battery readings seem correct.
All looks fine to me.
On a sidenote: The command went to line 231, which I guessed to be the correct line (right after the 'while' line).
Comment 32 Alexey Starikovskiy 2008-02-29 15:27:35 UTC
great news. I'll update patch and get it upstream. Thanks for help.
Comment 33 Alexey Starikovskiy 2008-02-29 16:01:46 UTC
Created attachment 15102 [details]
ready patch

Updated patch, ready for submission
Comment 34 Len Brown 2008-03-11 14:04:15 UTC
patch in comment #33 applied to acpi tree
Comment 35 Len Brown 2008-03-14 19:02:36 UTC
2c81ce4c9c37b910210f2640c28e98a0c398dc26
ACPI: EC: Handle IRQ storm on Acer laptops
shipped in 2.6.25-rc5-git4

closed.
Comment 36 Alexey Starikovskiy 2008-03-18 05:51:02 UTC
Cause of the storm is found, so storm workaround might not be needed.
Comment 37 Alexey Starikovskiy 2008-03-18 05:53:21 UTC
Created attachment 15327 [details]
Call _PSW methods then available

Please check if this patch makes interrupt storm go away. It certainly helps on my Acer.
Comment 38 Alexey Starikovskiy 2008-03-18 06:14:54 UTC
Created attachment 15328 [details]
Revert disable storm patch

Storm detection patch is not needed any longer.
Comment 39 Black 2008-03-18 07:10:22 UTC
(In reply to comment #37)
> Please check if this patch makes interrupt storm go away. It certainly helps
> on my Acer.

On mine, it doesn't. Still losing keystrokes (might be less than without the patch).
After 5 min of uptime /proc/interrrupts showed around 750k ACPI-interrupts.
Comment 40 Alexey Starikovskiy 2008-03-18 07:22:21 UTC
does it change if you unload sbs*?
Comment 41 Black 2008-03-18 10:29:09 UTC
When I boot the machine, typing is fine and the ACPI-interrupts increase at a rate <100/s.
After modprobe sbs, nothing changes.
But when I start to poll the battery (cat /proc/acpi/battery/BAT0/state; sleep 1 in a loop, in this case) the interrrupts increase at ~4000/s and keystrokes are lost.
When I stop polling, the rate goes back to normal.
Comment 42 Alexey Starikovskiy 2008-03-18 18:10:31 UTC
Created attachment 15336 [details]
Change msleep(5) to original udelay(100)

msleep causes too long delays at suspend, change it to original udelay(100).
Comment 43 Alexey Starikovskiy 2008-03-18 18:12:10 UTC
Created attachment 15337 [details]
Add poll timer

Adding poll timer helps to get events from EC if GPE is disabled.
Please check if it works.
Comment 44 Len Brown 2008-03-18 19:18:00 UTC
*** Bug 10279 has been marked as a duplicate of this bug. ***
Comment 45 Black 2008-03-19 06:04:33 UTC
Now I'm a little confused.
I previously (i.e. comment #39, comment #41) reverted acpi_ec_detect_irq_storm.patch, but since restore_poll_udelay.patch doesn't make any sense without it, I applied it again. add_poll_timer.patch also doesn't fully apply without it.
Beside the three mentioned patches, I applied call_psw_on_capable_devices.patch.
I'm still using 2.6.24.2 as a base.
Please tell me if I did as intended, or what I should have done.

With all these patches, everything works fine, the number of ACPI-interrupts stays constant (~800) even when polling the battery. But I suppose, this is due to acpi_ec_detect_irq_storm.patch.
Comment 46 Alexey Starikovskiy 2008-03-19 06:49:15 UTC
Right. Sorry for confusion.
acpi_ec_detect_irq_storm caused several side effects, so it requires 2 more patches with it, one of them big (add_poll_timer). It is too late to add such a big patch at rc6, so we need to to revert acpi_ec_detect_irq_storm now, and send all 3 patches up for 2.6.26-rc1. 
Comment 47 Guillaume Chazarain 2008-03-19 15:46:33 UTC
Created attachment 15349 [details]
dmesg of vanilla linux-git with 2c81ce4c9c37b910210f2640c28e98a0c398dc26 reverted
Comment 48 Guillaume Chazarain 2008-03-19 15:49:51 UTC
Created attachment 15350 [details]
dmesg of linux-git with the 3 patches

My dmesg buffer is too small for #define DEBUG in acpi/ec.c, hopefully the part I have is enough. With the 3 patches applied, acpi keys work perfectly. Good job :-)
Comment 49 Alexey Starikovskiy 2008-03-20 00:50:48 UTC
Guillaume,
Thanks for the log, it shows interrupts coming at much higher rate than needed, so 
you too suffer from "interrupt storm" this patch is trying to fix (apparently to some lesser degree). 
Comment 50 Len Brown 2008-04-30 21:02:53 UTC
fa95ba04e6ba11d71e1b87becd054b38faf546c8
ACPI: EC: Detect irq storm

shipped in linux-2.6.25-git16
please re-open if this release still has a problem.
Comment 51 Black 2008-05-01 11:12:39 UTC
I tried 2.6.25-git17. The number of ACPI-interrupts remained at 7 the whole time. However, I'm still losing keystrokes. But I think it were fewer than with 2.6.24.
Comment 52 Alexey Starikovskiy 2008-05-01 11:31:56 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9998
> ------- Comment #51 from brot.kann.man.nicht.einfrieren@gmail.com  2008-05-01
> 11:12 -------
> I tried 2.6.25-git17. The number of ACPI-interrupts remained at 7 the whole
> time. However, I'm still losing keystrokes. But I think it were fewer than
> with
> 2.6.24.
Please try to change udelay(...) to msleep(1) in drivers/acpi/ec.c:197.
Comment 53 Black 2008-05-01 12:14:11 UTC
(In reply to comment #52)
> Please try to change udelay(...) to msleep(1) in drivers/acpi/ec.c:197.

That helps. Typing is fine again. :)
Comment 54 Guillaume Chazarain 2008-05-02 12:55:13 UTC
The first time it completely broke ACPI keys for me, this time they still work but are very laggy (1 second or so).

I made some tests with the first incantation here: http://lkml.org/lkml/2008/3/19/605

Reverting fa95ba04e6ba11d71e1b87becd054b38faf546c8 fixes the problem, but replacing udelay(...) to msleep(1) changes nothing.

EC related messages in dmesg:

ACPI: EC: Look up EC in DSDT
ACPI: EC: non-query interrupt received, switching to interrupt mode
ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62
ACPI: EC: driver started in interrupt mode
ACPI: EC: GPE storm detected, disabling EC GPE
Comment 55 Guillaume Chazarain 2008-05-12 11:55:29 UTC
Any news/patch to test on this one?
This is clearly a regression from 2.6.25, i.e. ACPI keys lagging.
Comment 56 Alexey Starikovskiy 2008-05-12 12:00:40 UTC
please try to change 5 to say 20 for irq_count compare at drivers/acpi/ec.c:495.
Comment 57 Guillaume Chazarain 2008-05-12 15:28:13 UTC
Changed to 20, works perfectly now. Thanks a lot.
Comment 58 Guillaume Chazarain 2008-05-24 15:03:36 UTC
Will this change be pushed to mainline?
Comment 59 Len Brown 2008-06-13 19:32:54 UTC
please try 2.6.26-rc6, which contains this patch:

commit 1b7fc5aae8867046f8d3d45808309d5b7f2e036a
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Fri Jun 6 11:49:33 2008 -0400

    ACPI: EC: Use msleep instead of udelay while waiting for event.

    http://bugzilla.kernel.org/show_bug.cgi?id=10724
Comment 60 Guillaume Chazarain 2008-06-16 15:44:43 UTC
Just tried 2.6.26-rc6-00149-gc8988f9, same as before: ACPI keys work but are something like 1 second behind.

OTOH, changing 5 to 20 for irq_count compare at drivers/acpi/ec.c:495 works perfectly for me.
Comment 61 Black 2008-06-17 07:52:03 UTC
I tried 2.6.26-rc6 as well. For me it seems to work perfectly. No keystrokes were lost while reading the battery.
The interrupt count was 6 all the time.
Comment 62 Alexey Starikovskiy 2008-07-17 12:22:36 UTC
Created attachment 16868 [details]
same patch, attempt #3
Comment 63 Guillaume Chazarain 2008-07-19 05:14:25 UTC
Created attachment 16887 [details]
kernel panic with pqtch 16868

5b664cb235e97afbf34db9c4d77f08ebd725335e + patch 16868 (disable_gpe_during_transaction.patch) kernel panics at bootup
Comment 64 Alan Jenkins 2008-07-20 07:56:31 UTC
(In reply to comment #61)
> I tried 2.6.26-rc6 as well. For me it seems to work perfectly. No keystrokes
> were lost while reading the battery.
> The interrupt count was 6 all the time.
> 

Black, are you able to test a new patch?  It applies on top of 2.6.26 (or 2.6.26-rc6 if that's easier).  It's attached to comment #9 on my bug <http://bugzilla.kernel.org/show_bug.cgi?id=11089>.

The patch removes the current workaround ("ACPI: EC: Detect irq storm") and replaces it with something simpler.  The current workaround causes a regression where ACPI hotkeys are laggy or stop working altogether.  I believe this new fix is safe and it works for me.  I just need to check that it doesn't break your computer :-).
Comment 65 Black 2008-07-20 10:39:41 UTC
(In reply to comment #64)
> Black, are you able to test a new patch?  It applies on top of 2.6.26 (or
> 2.6.26-rc6 if that's easier).  It's attached to comment #9 on my bug
> <http://bugzilla.kernel.org/show_bug.cgi?id=11089>.
> 
> The patch removes the current workaround ("ACPI: EC: Detect irq storm") and
> replaces it with something simpler.  The current workaround causes a
> regression
> where ACPI hotkeys are laggy or stop working altogether.  I believe this new
> fix is safe and it works for me.  I just need to check that it doesn't break
> your computer :-).
> 
Well, it does.
Unfortunately, your patch makes things much worse than what I started out with.
Not only keys are lost (more often that not, I think), sometime they also 'get stuck', i.e. get repeated about 10-20 times (I did not count).

The interrupt count was increasing quite fast. It had reached more than 4,000,000 when I finally managed to type the whole command.

The original problem might have gone unnoticed due to bad typing. This patch makes my computer unusable.
Comment 66 ykzhao 2008-09-02 01:28:27 UTC
Created attachment 17571 [details]
try the debug patch in which EC GPE works in level mode

Can someone try the attached debug patch on the upstream kernel(2.6.27-rc4/5) and see whether the EC GPE interrupt storm still exists?
Thanks.
Comment 67 Alexey Starikovskiy 2008-09-02 01:48:14 UTC
Tried that several months ago. Didn't work at all.

Regards,
Alex.
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9998
> ------- Comment #66 from yakui.zhao@intel.com  2008-09-02 01:28 -------
> Created an attachment (id=17571)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=17571&action=view)
> try the debug patch in which EC GPE works in level mode
> 
> Can someone try the attached debug patch on the upstream kernel(2.6.27-rc4/5)
> and see whether the EC GPE interrupt storm still exists?
> Thanks.
> 
> 
Comment 68 ykzhao 2008-09-02 02:38:09 UTC
Hi, Alexey
    Do you mean the boot option of "acpi_sci=edge"? 
    If yes, it is different with what I said in the debug patch of comment #66. 
    If no, please tell me which bug already tries the similar patch.(EC GPE is in level mode).

   Thanks.
Comment 69 Alexey Starikovskiy 2008-09-02 02:43:34 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9998
> 
> 
> 
> 
> 
> ------- Comment #68 from yakui.zhao@intel.com  2008-09-02 02:38 -------
> Hi, Alexey
>     Do you mean the boot option of "acpi_sci=edge"? 
>     If yes, it is different with what I said in the debug patch of comment
>     #66. 
>     If no, please tell me which bug already tries the similar patch.(EC GPE
>     is
> in level mode).
> 
>    Thanks.
> 
> 
It's this same 9998 bug. I never put up the patch just because it does not work on Acer here.
Comment 70 ykzhao 2008-09-02 22:50:06 UTC
Created attachment 17586 [details]
try the debug patch in which the query_pending bit is cleared after processing EC notification event

Does someone have an opportunity to try the debug patch and see whether the number of ACPI interrupt is increased as fast as before?

thanks.
Comment 71 ykzhao 2008-09-02 23:27:36 UTC
It is better that the debug patch in comment #71 is tried on the kernel of 2.6.25-rc4.
Of course it is also OK that the following commit is reverted on the upstream kernel when the debug patch in comment #71 is tried.
    >commit fa95ba04e6ba11d71e1b87becd054b38faf546c8
    >Author: Alexey Starikovskiy <astarikovskiy@suse.de>
    > Date:   Fri Mar 21 19:36:02 2008 +0300

       >ACPI: EC: Detect irq storm

Thanks.
Comment 72 ykzhao 2008-09-02 23:55:41 UTC
Created attachment 17588 [details]
try the debug patch in which the query_pending bit is clear after processing EC notification event

Maybe it is not easy to revert the commit fa95ba04e6ba11d71e1b87becd054b38faf546c8.
Will you please try the updated debug patch on the latest kernel and see whether the number of ACPI interrupt is increased as fast as before?( cd
/sys/firmware/acpi/interrupts/ ; grep . *)
In this debug patch the query_pending bit is cleared after EC notification
event is processed. At the same time the source about detecting EC storm is
disabled.

Thanks.
Comment 73 ykzhao 2008-09-03 23:24:12 UTC
Created attachment 17610 [details]
Patch 1/2 : try the debug patch in which the query_pending bit is cleared after processing EC notification event
Comment 74 ykzhao 2008-09-03 23:35:36 UTC
Created attachment 17612 [details]
Patch 2/2: try the debug patch in which EC work mode is simplified

Hi, Black
    Do you have opportunity to try the two debug patches on the latest kernel(2.6.27-rc4/5) and see whether the system can work well?
    In the patch 1: The query_pending bit will be cleared only after processing EC notification event.
    In the patch 2: The EC will work in polling mode when EC internal register is accessed. The EC gpe handler is only to process the EC notification event.

    Thanks.
Comment 75 ykzhao 2008-09-04 03:06:39 UTC
Created attachment 17616 [details]
Add some delay in EC GPE interrupt handler on some bios. (Patch 3)

Hi, Black
   Maybe the above two patches still can't resovle the issue related with EC GPE interrupt storm.
   Will you please try this debug patch and see whether the system can work well?
   Of course the above two debug patches are still required.

   thanks.
Comment 76 Alexey Starikovskiy 2008-09-04 05:29:45 UTC
Created attachment 17620 [details]
cleaner storm detection, lower lattencies

Here is rewrite of EC transaction mechanism, should allow
faster transaction and lower latencies for queries. Also GPE is not disabled
completely, so no queries are lost (even fast ones).
Number of interrupts remains low (< 1000 after one hour of uptime)
Comment 77 Black 2008-09-04 07:19:11 UTC
I applied the patches from comment #73, #74 and #75 to 2.6.27-rc5.
The system works well, i.e. typing is fine.

The number of interrupts however increases very fast, about 13000 or 14000 per second.
dmesg spits out
"EC GPE interrupt storm. And it is hardware issue"
about 1800 to 2000 times per second.

I will test the patch from comment #76 later. Should it be applied on top of the other three patches or on vanilla 2.6.27-rc5?
Comment 78 ykzhao 2008-09-04 19:40:57 UTC
Hi, Black 
   Thanks for the test and so quick response. It seems that the number of ACPI interrupt is still increased very fast after the patches are applied. IMO this is a hardware issue. Maybe too many EC GPE interrupts are triggered although only one EC interrupt pulse is generated. Of course it is very lucky that the keyboad typing is OK after the workaround patch is applied.
   
   Please apply the patch in comment #76 on vanilla kernel of 2.6.27-rc5 and see whether the system can work well and the interrupt number is still increased very fast?
   Thanks.
Comment 79 Alexey Starikovskiy 2008-09-05 05:12:51 UTC
vanilla please.
Comment 80 Black 2008-09-05 09:32:38 UTC
With the patch from comment #76 typing works as desired.
The number of interrupts increases by about 100 - 200 per second.
Comment 81 Alexey Starikovskiy 2008-09-08 22:14:38 UTC
Created attachment 17695 [details]
updated fast transaction patch
Comment 82 ykzhao 2008-09-08 22:46:55 UTC
Hi, Alexey
    From the attached patch it seems that spin lock is used in the EC driver. Is it reasonable?
Comment 83 Alexey Starikovskiy 2008-09-08 22:52:07 UTC
No, I added it just to have every cool thing in one place.
please read the code and stop asking stupid questions.
More so in bug reports.
Comment 84 Alexey Starikovskiy 2008-09-15 11:42:28 UTC
*** Bug 10483 has been marked as a duplicate of this bug. ***
Comment 85 Len Brown 2008-09-25 11:36:13 UTC
patch in comment #81 applied to acpi-test.
test it now, or forever hold your peace.

commit bfd14ac4bc840261f09d9dfd3b497c46131b1918
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Thu Sep 25 21:00:31 2008 +0400

    ACPI: EC: do transaction from interrupt context
    
    It is easier and faster to do transaction directly from interrupt context
    rather than waking control thread.
    Also, cleaner GPE storm avoidance is implemented.
    References:         http://bugzilla.kernel.org/show_bug.cgi?id=9998
                http://bugzilla.kernel.org/show_bug.cgi?id=10724
                http://bugzilla.kernel.org/show_bug.cgi?id=10919
                http://bugzilla.kernel.org/show_bug.cgi?id=11309
                http://bugzilla.kernel.org/show_bug.cgi?id=11549
    Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
    Signed-off-by: Len Brown <len.brown@intel.com>
Comment 86 Black 2008-09-29 09:17:49 UTC
(In reply to comment #85)
> patch in comment #81 applied to acpi-test.
> test it now, or forever hold your peace.

Tested it. Works fine. Thank you very much :)
Comment 87 Len Brown 2008-10-24 23:21:29 UTC
shipped in linux-2.6.28-rc1
closed

commit 7c6db4e050601f359081fde418ca6dc4fc2d0011
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Thu Sep 25 21:00:31 2008 +0400

    ACPI: EC: do transaction from interrupt context
Comment 88 Black 2009-01-08 04:30:25 UTC
I'm sorry to reopen this bug. But I just came to test 2.6.28. Unfortunately the patch shipped in it differs slightly but crucially from the one in comment #81. On line #73 it says
#define ACPI_EC_STORM_THRESHOLD 8
while the patch in this bug makes it 20.

When set to 8 typing is erroneous and ACPI interrupts increase by about 2000 per second.
Not knowing what exactly I'm doing I changed the threshold to 20. Now typing works as expected and interrupts increase by ~80 every two seconds.
Comment 89 Black 2009-01-08 04:44:12 UTC
Sorry, I booted the wrong kernel. Forget all I said about the threshold. Setting it to 20 makes no noticable difference.
Comment 90 Zhang Rui 2009-01-08 18:02:03 UTC
So I think we can close this bug again, right? :)
Comment 91 Zhang Rui 2009-01-08 18:03:10 UTC
oops, sorry, threshold 8 and 20 both failed.
re-open the bug
Comment 92 Black 2009-03-26 16:52:50 UTC
Any updates on this?
2.6.29 is out but the problem still exists. What is the significant difference between the patch from comment #81 and the final commit?
Is there any more information I can provide you with?
Comment 93 Alexey Starikovskiy 2009-03-26 17:51:56 UTC
there was a change from msleep(1) to udelay(...) in drivers/acpi/ec.c among other things. If you could check that changing it back helps you, it will be great.
Comment 94 Black 2009-03-27 10:10:55 UTC
(In reply to comment #93)
> there was a change from msleep(1) to udelay(...) in drivers/acpi/ec.c among
> other things. If you could check that changing it back helps you, it will be
> great.

Yes, it helps. Typing is fine, interrupts increase slowly.
I found and changed udelay in 3 lines (2 in ec_poll, 1 in acpi_ec_transaction_unlocked).
Comment 95 Alexey Starikovskiy 2009-03-27 14:15:50 UTC
Created attachment 20705 [details]
separate MSI delays

Please check if this patch works for you.
Comment 96 Black 2009-03-27 15:05:09 UTC
(In reply to comment #95)
> Created an attachment (id=20705) [details]
> separate MSI delays
> 
> Please check if this patch works for you.

Yes, everything seems fine again. Thank you very much. :)
Comment 97 Alexey Starikovskiy 2009-03-27 15:13:54 UTC
Thanks for report and testing!
Comment 98 Len Brown 2009-04-01 04:25:19 UTC
MSI patch in comment #95 applied to acpi-test tree
Comment 99 Len Brown 2009-04-07 02:55:21 UTC
shipped in 2.6.30 merge window (Linux-2.6.29-git14)

closed.

Note You need to log in before you can comment on or make changes to this bug.