Bug 9998
Description
Black
2008-02-16 02:41:02 UTC
Will you please attach the output of dmesg and acpidump? Thanks. Created attachment 14880 [details]
output of acpidump
Created attachment 14881 [details]
output of dmesg
Forgot to mention: Keys are only lost, when the battery is polled (i.e. klaptop running, some WindowMaker applet or just "cat /proc/acpi/battery/BAT0/state" in a loop) HI, Black It seems that the bug has the same error message with the bug 9910. Will you please try the patch in the comment #14 of bug 9110 and confirm whether the bug is fixed? Thanks. Thank you for your help so far. The patch does fix the panic, but not the problem with lost keystrokes (that I can now experience with compiled-in SBS support). I'll be happy to provide more information/do some testing if it helps. Just tell me what I can do. Created attachment 14909 [details]
test the debug patch
Will you please try the debug patch and see whether the problem still exists?
Hi, Black Thanks for the confirm that the patch in bug 9110 can fix the panic problem. Will you please try the debug patch in comment #7 and see whether problem still exists? It will be great if you can confirm whether the problem exists on the 2.6.23 kernel. (Of course the patch in bug9110 is required). Thanks. (In reply to comment #8) > Will you please try the debug patch in comment #7 and see whether problem > still exists? Well, the patch does not apply. I tried to do it manually, but the lines to be commented out are not present. Nowhere in the whole kernel, as it seems... (I took a quick glance at 2.6.23, same result) > It will be great if you can confirm whether the problem exists on the > 2.6.23 kernel. (Of course the patch in bug9110 is required). As mentioned in the initial report 2.6.23 (with Gentoo's patchset) works fine. Just to make sure I downloaded and compiled vanilla 2.6.23.16 (without any patches at all and CONFIG_ACPI_SBS=y). Works perfectly. Hi, Black Please try the debug patch on the 2.6.24.2 kernel and see whether the keystrobe problem still exists. ( Of course the patch in bug9110 is needed). Thanks. Yakui, Debug patch commenting out burst functions which were inserted in 25-rc1, so .24 is broken by some other means. Don't insist. Black, could you please uncomment DEBUG at drivers/acpi/ec.c:30 and compile the kernel with time in printk. Please provide a dmesg. Created attachment 14935 [details]
output of dmesg with DEBUG
> Created an attachment (id=14935)
> --> (http://bugzilla.kernel.org/attachment.cgi?id=14935&action=view)
> output of dmesg with DEBUG
Thanks, it looks like an interrupt storm to me.
please attach /proc/interrupts Created attachment 14937 [details]
/proc/interrupts
When running 2.6.23 the value for acpi is much lower/not increasing so fast.
Interrupt storm it is :) It might help if you attach dmesg from 2.6.23. It might also help if you try different acpi_sci= kernel options e.g. level or edge. I added '#define DEBUG' to ec.c of 2.6.23.16 (don't know if this has any effect) and compiled it with CONFIG_PRINTK_TIME=y. I'll attach the resulting output of dmesg. Next I booted 2.6.24.2 with acpi_sci=level. The result didn't seem much different from what I posted earlier, so I won't attach it. But when I tried acpi_sci=edge. I have no idea what it does, but it seems to work. I didn't notice any lost keystrokes. I'll attach dmesg and /proc/interrupts. Created attachment 14938 [details]
output of dmesg on 2.6.23.16
Created attachment 14939 [details]
output of dmesg on 2.6.24.2 with acpi_sci=edge
Created attachment 14940 [details]
/proc/interrupts on 2.6.24.2 with acpi_sci=edge
do you have correct battery readings with acpi_sci=edge? (In reply to comment #21) > do you have correct battery readings with acpi_sci=edge? Yes, the values seem to be correct. > ACPI: setting ELCR to 0200 (from 0440)
Both the working 2.6.23 and the failing 2.6.24 are setting IRQ9
to level triggered.
When you boot 2.6.24 with acpi_sci=edge, it is likely that
you are disabling all ACPI interrupts (except the first one).
On a working system, you should be able to kill acpid
(simply to disable the action on the interrupt)
and press the power or lid buttons and see /proc/interrupts increment.
I expect with acpi_sci=edge, "grep acpi /proc/interrupts" will stay at 1.
If you boot 2.6.24 with CONFIG_ACPI_SBS=n, does "grep acpi /proc/interrupts"
still show the acpi interrupt increasing at a high rate?
Please boot the latest 2.6.25,
# cd /sys/firmware/acpi/interrupts
# grep . *
# grep acpi /proc/interrupts
and paste the output here
(In reply to comment #23) > On a working system, you should be able to kill acpid > (simply to disable the action on the interrupt) > and press the power or lid buttons and see /proc/interrupts increment. Yup. > I expect with acpi_sci=edge, "grep acpi /proc/interrupts" will stay at 1. Yup. > If you boot 2.6.24 with CONFIG_ACPI_SBS=n, does "grep acpi /proc/interrupts" > still show the acpi interrupt increasing at a high rate? No, it stay's quite low (<10.000 after a few minutes). However the storm seems to be caused by accessing the battery, which is not possible without the sbs module. What I mean is: Possibly there is something wrong outside the sbs-code, but it doesn't get triggered. Anyway, you're the expert here :) > # cd /sys/firmware/acpi/interrupts > # grep . * error:0 ff_gbl_lock:0 ff_pmtimer:0 ff_pwr_btn:0 ff_rt_clk:1 ff_slp_btn:0 gpe00:0 gpe01:0 gpe02:0 gpe03:0 gpe04:0 gpe05:0 gpe06:0 gpe07:0 gpe08:0 gpe09:0 gpe0A:0 gpe0B:0 gpe0C:0 gpe0D:0 gpe0E:0 gpe0F:0 gpe10:0 gpe11:0 gpe12:0 gpe13:0 gpe14:0 gpe15:0 gpe16:0 gpe17:0 gpe18:0 gpe19:0 gpe1A:0 gpe1B:0 gpe1C:0 gpe1D:443152 gpe1E:0 gpe1F:0 gpe_all:443152 sci:443153 > # grep acpi /proc/interrupts 9: 559889 XT-PIC-XT acpi Typing was quite difficult, so several seconds passed between the two commands. Created attachment 15077 [details]
patch: disable ec gpe in gpe handler
does this patch help?
Rui, if you completely disable GPE, it certainly helps. Just checked on my Acer, with you patch storm happens only then EC does something. For me the patch in comment #25 does not seem to help. Created attachment 15095 [details]
Detect IRQ storm
Bad news, this is not a regression.
I've checked back to 2.6.22, and interrupts were going much faster than needed too.
Problem seems to be that hw fails to clear GPE after we service it and write 1 into corresponding bit. Thus, as soon as we get interrupts enabled again, we receive a new one. Google gives too many results for "acer interrupt storm" for this being one-broken-machine case.
Please check if this patch helps. It works for me on TM2300.
(In reply to comment #28) > Please check if this patch helps. It works for me on TM2300. I'm afraid, it doesn't. Still lost keystrokes... Do you have acpi interrupts count increasing? Please try if insertion of 'msleep(10)' in ec.c:227 (while loop) helps. (In reply to comment #30) > Do you have acpi interrupts count increasing? Not at all. After five minutes of uptime and some aggressive polling of the battery it stayed fixed at 7 (increased to 8 when I pressed the power-button). > Please try if insertion of 'msleep(10)' in ec.c:227 (while loop) helps. Yes, it helps :) No lost keys at all, interrupt count is 7, battery readings seem correct. All looks fine to me. On a sidenote: The command went to line 231, which I guessed to be the correct line (right after the 'while' line). great news. I'll update patch and get it upstream. Thanks for help. Created attachment 15102 [details]
ready patch
Updated patch, ready for submission
patch in comment #33 applied to acpi tree 2c81ce4c9c37b910210f2640c28e98a0c398dc26 ACPI: EC: Handle IRQ storm on Acer laptops shipped in 2.6.25-rc5-git4 closed. Cause of the storm is found, so storm workaround might not be needed. Created attachment 15327 [details]
Call _PSW methods then available
Please check if this patch makes interrupt storm go away. It certainly helps on my Acer.
Created attachment 15328 [details]
Revert disable storm patch
Storm detection patch is not needed any longer.
(In reply to comment #37) > Please check if this patch makes interrupt storm go away. It certainly helps > on my Acer. On mine, it doesn't. Still losing keystrokes (might be less than without the patch). After 5 min of uptime /proc/interrrupts showed around 750k ACPI-interrupts. does it change if you unload sbs*? When I boot the machine, typing is fine and the ACPI-interrupts increase at a rate <100/s. After modprobe sbs, nothing changes. But when I start to poll the battery (cat /proc/acpi/battery/BAT0/state; sleep 1 in a loop, in this case) the interrrupts increase at ~4000/s and keystrokes are lost. When I stop polling, the rate goes back to normal. Created attachment 15336 [details]
Change msleep(5) to original udelay(100)
msleep causes too long delays at suspend, change it to original udelay(100).
Created attachment 15337 [details]
Add poll timer
Adding poll timer helps to get events from EC if GPE is disabled.
Please check if it works.
*** Bug 10279 has been marked as a duplicate of this bug. *** Now I'm a little confused. I previously (i.e. comment #39, comment #41) reverted acpi_ec_detect_irq_storm.patch, but since restore_poll_udelay.patch doesn't make any sense without it, I applied it again. add_poll_timer.patch also doesn't fully apply without it. Beside the three mentioned patches, I applied call_psw_on_capable_devices.patch. I'm still using 2.6.24.2 as a base. Please tell me if I did as intended, or what I should have done. With all these patches, everything works fine, the number of ACPI-interrupts stays constant (~800) even when polling the battery. But I suppose, this is due to acpi_ec_detect_irq_storm.patch. Right. Sorry for confusion. acpi_ec_detect_irq_storm caused several side effects, so it requires 2 more patches with it, one of them big (add_poll_timer). It is too late to add such a big patch at rc6, so we need to to revert acpi_ec_detect_irq_storm now, and send all 3 patches up for 2.6.26-rc1. Created attachment 15349 [details]
dmesg of vanilla linux-git with 2c81ce4c9c37b910210f2640c28e98a0c398dc26 reverted
Created attachment 15350 [details]
dmesg of linux-git with the 3 patches
My dmesg buffer is too small for #define DEBUG in acpi/ec.c, hopefully the part I have is enough. With the 3 patches applied, acpi keys work perfectly. Good job :-)
Guillaume, Thanks for the log, it shows interrupts coming at much higher rate than needed, so you too suffer from "interrupt storm" this patch is trying to fix (apparently to some lesser degree). fa95ba04e6ba11d71e1b87becd054b38faf546c8 ACPI: EC: Detect irq storm shipped in linux-2.6.25-git16 please re-open if this release still has a problem. I tried 2.6.25-git17. The number of ACPI-interrupts remained at 7 the whole time. However, I'm still losing keystrokes. But I think it were fewer than with 2.6.24. bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9998 > ------- Comment #51 from brot.kann.man.nicht.einfrieren@gmail.com 2008-05-01 > 11:12 ------- > I tried 2.6.25-git17. The number of ACPI-interrupts remained at 7 the whole > time. However, I'm still losing keystrokes. But I think it were fewer than > with > 2.6.24. Please try to change udelay(...) to msleep(1) in drivers/acpi/ec.c:197. (In reply to comment #52) > Please try to change udelay(...) to msleep(1) in drivers/acpi/ec.c:197. That helps. Typing is fine again. :) The first time it completely broke ACPI keys for me, this time they still work but are very laggy (1 second or so). I made some tests with the first incantation here: http://lkml.org/lkml/2008/3/19/605 Reverting fa95ba04e6ba11d71e1b87becd054b38faf546c8 fixes the problem, but replacing udelay(...) to msleep(1) changes nothing. EC related messages in dmesg: ACPI: EC: Look up EC in DSDT ACPI: EC: non-query interrupt received, switching to interrupt mode ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62 ACPI: EC: driver started in interrupt mode ACPI: EC: GPE storm detected, disabling EC GPE Any news/patch to test on this one? This is clearly a regression from 2.6.25, i.e. ACPI keys lagging. please try to change 5 to say 20 for irq_count compare at drivers/acpi/ec.c:495. Changed to 20, works perfectly now. Thanks a lot. Will this change be pushed to mainline? please try 2.6.26-rc6, which contains this patch: commit 1b7fc5aae8867046f8d3d45808309d5b7f2e036a Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Fri Jun 6 11:49:33 2008 -0400 ACPI: EC: Use msleep instead of udelay while waiting for event. http://bugzilla.kernel.org/show_bug.cgi?id=10724 Just tried 2.6.26-rc6-00149-gc8988f9, same as before: ACPI keys work but are something like 1 second behind. OTOH, changing 5 to 20 for irq_count compare at drivers/acpi/ec.c:495 works perfectly for me. I tried 2.6.26-rc6 as well. For me it seems to work perfectly. No keystrokes were lost while reading the battery. The interrupt count was 6 all the time. Created attachment 16868 [details]
same patch, attempt #3
Created attachment 16887 [details]
kernel panic with pqtch 16868
5b664cb235e97afbf34db9c4d77f08ebd725335e + patch 16868 (disable_gpe_during_transaction.patch) kernel panics at bootup
(In reply to comment #61) > I tried 2.6.26-rc6 as well. For me it seems to work perfectly. No keystrokes > were lost while reading the battery. > The interrupt count was 6 all the time. > Black, are you able to test a new patch? It applies on top of 2.6.26 (or 2.6.26-rc6 if that's easier). It's attached to comment #9 on my bug <http://bugzilla.kernel.org/show_bug.cgi?id=11089>. The patch removes the current workaround ("ACPI: EC: Detect irq storm") and replaces it with something simpler. The current workaround causes a regression where ACPI hotkeys are laggy or stop working altogether. I believe this new fix is safe and it works for me. I just need to check that it doesn't break your computer :-). (In reply to comment #64) > Black, are you able to test a new patch? It applies on top of 2.6.26 (or > 2.6.26-rc6 if that's easier). It's attached to comment #9 on my bug > <http://bugzilla.kernel.org/show_bug.cgi?id=11089>. > > The patch removes the current workaround ("ACPI: EC: Detect irq storm") and > replaces it with something simpler. The current workaround causes a > regression > where ACPI hotkeys are laggy or stop working altogether. I believe this new > fix is safe and it works for me. I just need to check that it doesn't break > your computer :-). > Well, it does. Unfortunately, your patch makes things much worse than what I started out with. Not only keys are lost (more often that not, I think), sometime they also 'get stuck', i.e. get repeated about 10-20 times (I did not count). The interrupt count was increasing quite fast. It had reached more than 4,000,000 when I finally managed to type the whole command. The original problem might have gone unnoticed due to bad typing. This patch makes my computer unusable. Created attachment 17571 [details]
try the debug patch in which EC GPE works in level mode
Can someone try the attached debug patch on the upstream kernel(2.6.27-rc4/5) and see whether the EC GPE interrupt storm still exists?
Thanks.
Tried that several months ago. Didn't work at all. Regards, Alex. bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9998 > ------- Comment #66 from yakui.zhao@intel.com 2008-09-02 01:28 ------- > Created an attachment (id=17571) > --> (http://bugzilla.kernel.org/attachment.cgi?id=17571&action=view) > try the debug patch in which EC GPE works in level mode > > Can someone try the attached debug patch on the upstream kernel(2.6.27-rc4/5) > and see whether the EC GPE interrupt storm still exists? > Thanks. > > Hi, Alexey Do you mean the boot option of "acpi_sci=edge"? If yes, it is different with what I said in the debug patch of comment #66. If no, please tell me which bug already tries the similar patch.(EC GPE is in level mode). Thanks. bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=9998 > > > > > > ------- Comment #68 from yakui.zhao@intel.com 2008-09-02 02:38 ------- > Hi, Alexey > Do you mean the boot option of "acpi_sci=edge"? > If yes, it is different with what I said in the debug patch of comment > #66. > If no, please tell me which bug already tries the similar patch.(EC GPE > is > in level mode). > > Thanks. > > It's this same 9998 bug. I never put up the patch just because it does not work on Acer here. Created attachment 17586 [details]
try the debug patch in which the query_pending bit is cleared after processing EC notification event
Does someone have an opportunity to try the debug patch and see whether the number of ACPI interrupt is increased as fast as before?
thanks.
It is better that the debug patch in comment #71 is tried on the kernel of 2.6.25-rc4. Of course it is also OK that the following commit is reverted on the upstream kernel when the debug patch in comment #71 is tried. >commit fa95ba04e6ba11d71e1b87becd054b38faf546c8 >Author: Alexey Starikovskiy <astarikovskiy@suse.de> > Date: Fri Mar 21 19:36:02 2008 +0300 >ACPI: EC: Detect irq storm Thanks. Created attachment 17588 [details]
try the debug patch in which the query_pending bit is clear after processing EC notification event
Maybe it is not easy to revert the commit fa95ba04e6ba11d71e1b87becd054b38faf546c8.
Will you please try the updated debug patch on the latest kernel and see whether the number of ACPI interrupt is increased as fast as before?( cd
/sys/firmware/acpi/interrupts/ ; grep . *)
In this debug patch the query_pending bit is cleared after EC notification
event is processed. At the same time the source about detecting EC storm is
disabled.
Thanks.
Created attachment 17610 [details]
Patch 1/2 : try the debug patch in which the query_pending bit is cleared after processing EC notification event
Created attachment 17612 [details]
Patch 2/2: try the debug patch in which EC work mode is simplified
Hi, Black
Do you have opportunity to try the two debug patches on the latest kernel(2.6.27-rc4/5) and see whether the system can work well?
In the patch 1: The query_pending bit will be cleared only after processing EC notification event.
In the patch 2: The EC will work in polling mode when EC internal register is accessed. The EC gpe handler is only to process the EC notification event.
Thanks.
Created attachment 17616 [details]
Add some delay in EC GPE interrupt handler on some bios. (Patch 3)
Hi, Black
Maybe the above two patches still can't resovle the issue related with EC GPE interrupt storm.
Will you please try this debug patch and see whether the system can work well?
Of course the above two debug patches are still required.
thanks.
Created attachment 17620 [details]
cleaner storm detection, lower lattencies
Here is rewrite of EC transaction mechanism, should allow
faster transaction and lower latencies for queries. Also GPE is not disabled
completely, so no queries are lost (even fast ones).
Number of interrupts remains low (< 1000 after one hour of uptime)
I applied the patches from comment #73, #74 and #75 to 2.6.27-rc5. The system works well, i.e. typing is fine. The number of interrupts however increases very fast, about 13000 or 14000 per second. dmesg spits out "EC GPE interrupt storm. And it is hardware issue" about 1800 to 2000 times per second. I will test the patch from comment #76 later. Should it be applied on top of the other three patches or on vanilla 2.6.27-rc5? Hi, Black Thanks for the test and so quick response. It seems that the number of ACPI interrupt is still increased very fast after the patches are applied. IMO this is a hardware issue. Maybe too many EC GPE interrupts are triggered although only one EC interrupt pulse is generated. Of course it is very lucky that the keyboad typing is OK after the workaround patch is applied. Please apply the patch in comment #76 on vanilla kernel of 2.6.27-rc5 and see whether the system can work well and the interrupt number is still increased very fast? Thanks. vanilla please. With the patch from comment #76 typing works as desired. The number of interrupts increases by about 100 - 200 per second. Created attachment 17695 [details]
updated fast transaction patch
Hi, Alexey From the attached patch it seems that spin lock is used in the EC driver. Is it reasonable? No, I added it just to have every cool thing in one place. please read the code and stop asking stupid questions. More so in bug reports. *** Bug 10483 has been marked as a duplicate of this bug. *** patch in comment #81 applied to acpi-test. test it now, or forever hold your peace. commit bfd14ac4bc840261f09d9dfd3b497c46131b1918 Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Thu Sep 25 21:00:31 2008 +0400 ACPI: EC: do transaction from interrupt context It is easier and faster to do transaction directly from interrupt context rather than waking control thread. Also, cleaner GPE storm avoidance is implemented. References: http://bugzilla.kernel.org/show_bug.cgi?id=9998 http://bugzilla.kernel.org/show_bug.cgi?id=10724 http://bugzilla.kernel.org/show_bug.cgi?id=10919 http://bugzilla.kernel.org/show_bug.cgi?id=11309 http://bugzilla.kernel.org/show_bug.cgi?id=11549 Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> (In reply to comment #85) > patch in comment #81 applied to acpi-test. > test it now, or forever hold your peace. Tested it. Works fine. Thank you very much :) shipped in linux-2.6.28-rc1 closed commit 7c6db4e050601f359081fde418ca6dc4fc2d0011 Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Thu Sep 25 21:00:31 2008 +0400 ACPI: EC: do transaction from interrupt context I'm sorry to reopen this bug. But I just came to test 2.6.28. Unfortunately the patch shipped in it differs slightly but crucially from the one in comment #81. On line #73 it says #define ACPI_EC_STORM_THRESHOLD 8 while the patch in this bug makes it 20. When set to 8 typing is erroneous and ACPI interrupts increase by about 2000 per second. Not knowing what exactly I'm doing I changed the threshold to 20. Now typing works as expected and interrupts increase by ~80 every two seconds. Sorry, I booted the wrong kernel. Forget all I said about the threshold. Setting it to 20 makes no noticable difference. So I think we can close this bug again, right? :) oops, sorry, threshold 8 and 20 both failed. re-open the bug Any updates on this? 2.6.29 is out but the problem still exists. What is the significant difference between the patch from comment #81 and the final commit? Is there any more information I can provide you with? there was a change from msleep(1) to udelay(...) in drivers/acpi/ec.c among other things. If you could check that changing it back helps you, it will be great. (In reply to comment #93) > there was a change from msleep(1) to udelay(...) in drivers/acpi/ec.c among > other things. If you could check that changing it back helps you, it will be > great. Yes, it helps. Typing is fine, interrupts increase slowly. I found and changed udelay in 3 lines (2 in ec_poll, 1 in acpi_ec_transaction_unlocked). Created attachment 20705 [details]
separate MSI delays
Please check if this patch works for you.
(In reply to comment #95) > Created an attachment (id=20705) [details] > separate MSI delays > > Please check if this patch works for you. Yes, everything seems fine again. Thank you very much. :) Thanks for report and testing! MSI patch in comment #95 applied to acpi-test tree shipped in 2.6.30 merge window (Linux-2.6.29-git14) closed. |