Bug 12949
Created attachment 20695 [details]
dmesg with full debugging
Thanks for proper dmesg. Could you please try to disable calls to acpi_ec_burst_enable() in the same ec.c and see if it helps? Created attachment 20700 [details]
dmesg after disabling acpi_ec_burst_enable
I changed acpi_ec_burst_enable to printk and inconditionally return 0, but this didn't solve the issue (see below my changes). See attached dmesg.
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 2fe1506..8e66e88 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -27,7 +27,7 @@
*/
/* Uncomment next line to get verbose printout */
-/* #define DEBUG */
+#define DEBUG
#include <linux/kernel.h>
#include <linux/module.h>
@@ -350,6 +350,8 @@ unlock:
*/
static int acpi_ec_burst_enable(struct acpi_ec *ec)
{
+ printk(KERN_INFO "acpi_ec_burst_enable ignored");
+ return 0;
u8 d;
struct transaction t = {.command = ACPI_EC_BURST_ENABLE,
.wdata = NULL, .rdata = &d,
How is you config differ from default? Interrupt is there, but somehow it appears before we write command to EC in printk There is a synchronization spin lock in this place, but it seems to be ignored. Try to insert some msleep(1) at the beginning of acpi_ec_transaction_unlocked()? [ 0.370660] ACPI: EC: transaction end >>>[ 0.370660] ACPI: EC: ~~~> interrupt [ 0.370660] ACPI: EC: ---> status = 0x00 >>>[ 0.370660] ACPI: EC: ~~~> interrupt [ 0.370660] ACPI: EC: ---> status = 0x00 [ 0.370660] ACPI: EC: ---> status = 0x00 [ 0.370660] acpi_ec_burst_enable ignored<7>ACPI: EC: ---> status = 0x00 [ 0.370660] ACPI: EC: transaction start [ 0.370660] ACPI: EC: <--- command = 0x81 [ 0.370660] ACPI: EC: ~~~> interrupt [ 0.370660] ACPI: EC: ---> status = 0x08 [ 0.370660] ACPI: EC: <--- data = 0x60 [ 0.870015] ACPI: EC: missing confirmations, switch off interrupt mode. Created attachment 20706 [details]
dmesg after disabling acpi_ec_burst_enable, with some sleep, + printing flags
sleeps were added as follows:
@@ -255,7 +256,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
{
unsigned long tmp;
int ret = 0;
- pr_debug(PREFIX "transaction start\n");
+ pr_my_debug(PREFIX "transaction start\n");
+ msleep(1);
/* disable GPE during transaction if storm is detected */
if (test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) {
clear_bit(EC_FLAGS_GPE_MODE, &ec->flags);
@@ -263,6 +265,7 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
}
if (EC_FLAGS_MSI)
udelay(ACPI_EC_DELAY);
+ msleep(1);
/* start transaction */
spin_lock_irqsave(&ec->curr_lock, tmp);
/* following two actions should be kept atomic */
@@ -271,15 +274,21 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
if (ec->curr->command == ACPI_EC_COMMAND_QUERY)
clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
spin_unlock_irqrestore(&ec->curr_lock, tmp);
+
+ msleep(1);
/* if we selected poll mode or failed in GPE-mode do a poll loop */
if (force_poll ||
!test_bit(EC_FLAGS_GPE_MODE, &ec->flags) ||
acpi_ec_wait(ec))
ret = ec_poll(ec);
- pr_debug(PREFIX "transaction end\n");
+ pr_my_debug(PREFIX "transaction end\n");
+
+ msleep(1);
spin_lock_irqsave(&ec->curr_lock, tmp);
ec->curr = NULL;
spin_unlock_irqrestore(&ec->curr_lock, tmp);
+
+ msleep(1);
if (test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) {
/* check if we received SCI during transaction */
ec_check_sci(ec, acpi_ec_read_status(ec));
Created attachment 20707 [details]
change mode after transaction
Please check if this patch helps.
Created attachment 20715 [details]
dmesg after patch applied
Still same problem at startup, but after a while, it settles on interrupts enabled, and seems working. Only command 0x81 causes problems.
Note: this is taken when on ac, without battery. I have an other dmesg output when charging battery, that may be helpful.
Created attachment 20716 [details]
dmesg with patch applied, while charging battery
All previous dmesg outputs were taken when on AC without battery.
This one is dwhile charging battery
Created attachment 20725 [details]
patch against v2.6.29 that solves the problem
I've elaborated a way to overcome the issue. The attached patch unifies poll + interrupt handling, so that controllers that mostly work do not incur the full penalty of polling mode.
Current patch reduces the interrupt handler to just wake up the main thread, that sequences all the controller operations, thus it slightly increase the latency for perfectly working controllers, but reduces it for partially working ones like mine.
If this is a problem, you can probably try an hybrid approach, in which the interrupt handler calls advance_transaction as well (I've not tried it, since it appears I only have non-conforming controllers).
I've tested the code also on a controller that generate GPE storms, but not tried to optimize this case (it could probably be done easily, though).
If the slightly increased latency is acceptable, some more cleanup can be done (advance_transaction shouldn't need the locking stuff if called from main thread).
Created attachment 20726 [details]
dmesg after my proposed patch
Created attachment 20736 [details]
merge irq and poll modes
Nice idea,
but implementation is not quite right. Please check if my version works too.
Sure, I'll test it. This implements what I was suggesting, but unable to test. Few points: * Isn't 1 ms too much as timeout? For Burst mode, I see from APCI 3.0 specs that you have to respect the following time constraints: * First Access – 400 microseconds * Subsequent Accesses – 50 microseconds each * Total Burst Time – 1 millisecond So I think it is likely that the controller should do the transition in few tens of microseconds. Can't we use a smaller timeout, and maybe backoff if we find that the timeout expired but we cannot advance? * The part of the code starting at line 240 (/* missing GPEs, switch back to poll mode */) still bugs me. I think that if this happens, then we have a severely broken hardware, and the answer is not to just switch off interrupt mode, but probably to report to the system that we don't have a (working) EC at all (is this possible after initialization? I'm new to kernel hacking). 1. Burst mode will be deleted "real soon now". It was added as a workaround for MSI notebooks but cause more troubles. 2. missing GPE part should go way of dodo too, just because there is no separate poll mode any longer. Your patch works, but still needs some cleanup: ** same unification can be done when waiting for ibf to become 0 ** the initial state should have interrupts enabled, because otherwise it discards the first interrupt and needs to wait for timeout (I have HZ=100, so this is noticeable from my logs). I look forward to see this patch in mainline! Created attachment 20805 [details]
merge irq and poll modes #2
Here is the cleaned up version.
mark as RESOLVED since there is a patch proposed and available for test. Alexey, the patch works, but it logs too many "EC: non-query interrupt received, switching to interrupt mode" stuff, even for non-storming controllers. I think the log can be removed, since now we don't have a distinct interrupt mode any more, and nothing is switched. I've an idea about how to handle also storming controllers, without disabling interrupts for them during transaction setup. Basically, if the controller is known to be storming, as soon as the transaction is done (i.e. when going to wake up the invoking thread) we disable GPE. Is it safe to call acpi_disable_gpe inside the gpe handling? Created attachment 20821 [details]
merge irq and poll modes #3
Right, this printk is just leftover... Thanks!
patch in comment #18 applies to 2.6.29, but not later. i took a minute to merge it, but must have screwed it up because I got: drivers/acpi/ec.c:214: warning: ‘ec_delay’ defined but not used Alexey, Please refresh against 2.6.30-rc3, or even better, the acpi-test tree. Created attachment 21105 [details]
merge irq and poll modes #4
This one is clear against 2.6.29-rc3, don't have acpi-test handy.
Isn't that obvious? EC can send interrupts when it wants to, and there is an interrupt handlers inside EC code that listen for that. Currently any interrupt that has come from ec, and wasn't due to reading/writing of data is flagged as a false interrupt. This patch fixed that issue for me: diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 04e9044..6b82596 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -190,7 +190,7 @@ static void gpe_transaction(struct acpi_ec *ec, u8 status) } else goto err; } else if (ec->curr->wlen == ec->curr->wi && - (status & ACPI_EC_FLAG_IBF) == 0) + (status & (ACPI_EC_FLAG_IBF|ACPI_EC_FLAG_SCI)) == 0) ec->curr->done = true; goto unlock; err: Actually the above patch seems to be wrong, I shouldn't ack data transfer when SCI EC interrupt arrives, I will post correct one soon Yep, sorry for the type, but following patch works as well: diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 04e9044..fb07fd4 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -195,7 +195,7 @@ static void gpe_transaction(struct acpi_ec *ec, u8 status) goto unlock; err: /* false interrupt, state didn't change */ - if (in_interrupt()) + if (in_interrupt() && !(status & ACPI_EC_FLAG_SCI)) ++ec->curr->irq_count; unlock: spin_unlock_irqrestore(&ec->curr_lock, flags); -- 1.6.0.4 I mean sorry for the typo :-) In fact after a hour, it detected a storm anyway, still better that instantly. My notebook's EC does seem to send false interrupts. Still I think that SCI interrupts shouldn't be marked as false Maxim, ACPI_EC_FLAG_SCI is active from the moment EC decided to raise it till the moment driver issues query command, thus, it will be active across many invocations of interrupt handler, even more at irq storm. Only one of these invocations is right place to not count bogus interrups, all others are still bogus if they don't carry completion information. So, what your patch does is disabling storm detection for some time (while SCI bit is not acknowledged by driver). This is not a solution, and changing it to handle only first SCI bit invocation (GPE_PENDING flag is clear) will not help, while increase code size, but you may try it too. Thanks, didn't know that. Anyway my EC sends plenty (~6 irqs/s) of false interrupts. Yet, I would like to use irq mode, because it works. I just raise the storm threshold for that to 20 Thus I would like to see an option to disable storm detection. But anyway, polling or interrupt mode doesn't seem to affect my system in any way Is anything preventing this to be considered resolved? I tested it thoroughly on my hardware, and it solves the problem. Created attachment 22220 [details]
Fix GPE storm on my machine
Hi Alexey,
I found that on my machine, all the spurious interrupts are coming from the burst transaction in acpi_ec_space_handler .
The attached patch, that applies on top of your latest, skips counting spurious interrupts if they are coming from a burst transaction.
I also added a printk to see if on other machines, there are other recognizable patterns for GPE storms. Maxim, if you can give it a try, please report the dmesg output.
Created attachment 22251 [details]
Unification of GPE storm handling with normal (merged irq&poll) path.
Unfortunately, the previous patch was just hiding the problem, as uncovered disabling burst mode (I still got storm interrupts, so they were not caused by it).
Here is a new attempt, that further unifies handling of broken EC with
normal ones.
In case a spurious interrupt is received, GPEs are disabled only for
the remaining of current transaction (to avoid storm), and re-enabled at the
end.
This allows us to reduce the transaction latency on partially broken ECs, which send correctly the first interrupt (the one acknowledging the command).
See next attachment for dmesg after patch.
Created attachment 22252 [details]
dmesg after 22251 patch applied
Corrado, there is a patch to limit burst mode to MSI notebooks only, because only they benefit from it. I'm closing this bug as resolved. Thanks. Did the patch in comment #20 also get merged? That fixed the original problem, that is not related to burst. It is passed to Len Brown for inclusion. Hopefully, it will be merged to 2.6.32. Alexey, the patch in comment #20 applies cleanly on top of 2.6.29, but was dropped b/c of conflicts with later versions of ec.c. Please refresh it so that it applies cleanly on top of 2.6.31-rc8 re-opening to reflect that the candidate fix is now stale. Created attachment 22904 [details]
2.6.31 update
updated patch for 2.6.31
Created attachment 22908 [details]
patch as applied to acpi-test tree
commit 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Sun Aug 30 03:06:14 2009 +0400 ACPI: EC: Merge IRQ and POLL modes shipped in linux-2.6.31-git14 closed |
My netbook (acer aspire one 110) consistently shows during boot the message ACPI: EC: missing confirmations, switch off interrupt mode > [ 0.351161] calling acpi_init+0x0/0x236 @ 1 > [ 0.361052] ACPI: EC: Look up EC in DSDT > [ 0.367754] ACPI: BIOS _OSI(Linux) query ignored > [ 0.368984] ACPI: EC: non-query interrupt received, switching to > interrupt mode > [ 0.860013] ACPI: EC: missing confirmations, switch off interrupt mode. > [ 0.866062] ACPI: Interpreter enabled I can't say if this is a regression, since this is a new hardware, and I didn't install any previous kernel. I attached full dmesg obtained after #defining DEBUG in ec.c, and booting kernel with initcall_debug.