Bug 12949

Summary: "ACPI: EC: missing confirmations, switch off interrupt mode" on startup on 2.6.29 kernel
Product: ACPI Reporter: Corrado Zoccolo (czoccolo)
Component: ECAssignee: Alexey Starikovskiy (astarikovskiy)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, astarikovskiy, lenb, maximlevitsky, vjensen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with full debugging
dmesg after disabling acpi_ec_burst_enable
dmesg after disabling acpi_ec_burst_enable, with some sleep, + printing flags
change mode after transaction
dmesg after patch applied
dmesg with patch applied, while charging battery
patch against v2.6.29 that solves the problem
dmesg after my proposed patch
merge irq and poll modes
merge irq and poll modes #2
merge irq and poll modes #3
merge irq and poll modes #4
Fix GPE storm on my machine
Unification of GPE storm handling with normal (merged irq&poll) path.
dmesg after 22251 patch applied
2.6.31 update
patch as applied to acpi-test tree

Description Corrado Zoccolo 2009-03-26 21:46:00 UTC
My netbook (acer aspire one 110) consistently shows during boot the message 
ACPI: EC: missing confirmations, switch off interrupt mode

> [    0.351161] calling  acpi_init+0x0/0x236 @ 1
> [    0.361052] ACPI: EC: Look up EC in DSDT
> [    0.367754] ACPI: BIOS _OSI(Linux) query ignored
> [    0.368984] ACPI: EC: non-query interrupt received, switching to
> interrupt mode
> [    0.860013] ACPI: EC: missing confirmations, switch off interrupt mode.
> [    0.866062] ACPI: Interpreter enabled

I can't say if this is a regression, since this is a new hardware, and I didn't install any previous kernel.

I attached full dmesg obtained after #defining DEBUG in ec.c, and booting kernel with initcall_debug.
Comment 1 Corrado Zoccolo 2009-03-26 21:47:08 UTC
Created attachment 20695 [details]
dmesg with full debugging
Comment 2 Alexey Starikovskiy 2009-03-26 22:31:29 UTC
Thanks for proper dmesg.
Could you please try to disable calls to acpi_ec_burst_enable() in the same ec.c and see if it helps?
Comment 3 Corrado Zoccolo 2009-03-27 07:24:36 UTC
Created attachment 20700 [details]
dmesg after disabling acpi_ec_burst_enable

I changed acpi_ec_burst_enable to printk and inconditionally return 0, but this didn't solve the issue (see below my changes). See attached dmesg.

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 2fe1506..8e66e88 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -27,7 +27,7 @@
  */

 /* Uncomment next line to get verbose printout */
-/* #define DEBUG */
+#define DEBUG

 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -350,6 +350,8 @@ unlock:
  */
 static int acpi_ec_burst_enable(struct acpi_ec *ec)
 {
+  printk(KERN_INFO "acpi_ec_burst_enable ignored");
+  return 0;
        u8 d;
        struct transaction t = {.command = ACPI_EC_BURST_ENABLE,
                                .wdata = NULL, .rdata = &d,
Comment 4 Alexey Starikovskiy 2009-03-27 11:01:38 UTC
How is you config differ from default?
Interrupt is there, but somehow it appears before we write command to EC in printk
There is a synchronization spin lock in this place, but it seems to be ignored.
Try to insert some msleep(1) at the beginning of acpi_ec_transaction_unlocked()?
[    0.370660] ACPI: EC: transaction end
>>>[    0.370660] ACPI: EC: ~~~> interrupt
[    0.370660] ACPI: EC: ---> status = 0x00
>>>[    0.370660] ACPI: EC: ~~~> interrupt
[    0.370660] ACPI: EC: ---> status = 0x00
[    0.370660] ACPI: EC: ---> status = 0x00
[    0.370660] acpi_ec_burst_enable ignored<7>ACPI: EC: ---> status = 0x00
[    0.370660] ACPI: EC: transaction start
[    0.370660] ACPI: EC: <--- command = 0x81
[    0.370660] ACPI: EC: ~~~> interrupt
[    0.370660] ACPI: EC: ---> status = 0x08
[    0.370660] ACPI: EC: <--- data = 0x60
[    0.870015] ACPI: EC: missing confirmations, switch off interrupt mode.
Comment 5 Corrado Zoccolo 2009-03-27 17:02:42 UTC
Created attachment 20706 [details]
dmesg after disabling acpi_ec_burst_enable, with some sleep, + printing flags

sleeps were added as follows:
@@ -255,7 +256,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
 {
 	unsigned long tmp;
 	int ret = 0;
-	pr_debug(PREFIX "transaction start\n");
+	pr_my_debug(PREFIX "transaction start\n");
+	msleep(1);
 	/* disable GPE during transaction if storm is detected */
 	if (test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) {
 		clear_bit(EC_FLAGS_GPE_MODE, &ec->flags);
@@ -263,6 +265,7 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
 	}
 	if (EC_FLAGS_MSI)
 		udelay(ACPI_EC_DELAY);
+	msleep(1);
 	/* start transaction */
 	spin_lock_irqsave(&ec->curr_lock, tmp);
 	/* following two actions should be kept atomic */
@@ -271,15 +274,21 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
 	if (ec->curr->command == ACPI_EC_COMMAND_QUERY)
 		clear_bit(EC_FLAGS_QUERY_PENDING, &ec->flags);
 	spin_unlock_irqrestore(&ec->curr_lock, tmp);
+
+	msleep(1);
 	/* if we selected poll mode or failed in GPE-mode do a poll loop */
 	if (force_poll ||
 	    !test_bit(EC_FLAGS_GPE_MODE, &ec->flags) ||
 	    acpi_ec_wait(ec))
 		ret = ec_poll(ec);
-	pr_debug(PREFIX "transaction end\n");
+	pr_my_debug(PREFIX "transaction end\n");
+
+	msleep(1);
 	spin_lock_irqsave(&ec->curr_lock, tmp);
 	ec->curr = NULL;
 	spin_unlock_irqrestore(&ec->curr_lock, tmp);
+
+	msleep(1);
 	if (test_bit(EC_FLAGS_GPE_STORM, &ec->flags)) {
 		/* check if we received SCI during transaction */
 		ec_check_sci(ec, acpi_ec_read_status(ec));
Comment 6 Alexey Starikovskiy 2009-03-27 18:01:35 UTC
Created attachment 20707 [details]
change mode after transaction

Please check if this patch helps.
Comment 7 Corrado Zoccolo 2009-03-28 09:27:30 UTC
Created attachment 20715 [details]
dmesg after patch applied

Still same problem at startup, but after a while, it settles on interrupts enabled, and seems working. Only command 0x81 causes problems.
Note: this is taken when on ac, without battery. I have an other dmesg output when charging battery, that may be helpful.
Comment 8 Corrado Zoccolo 2009-03-28 09:30:04 UTC
Created attachment 20716 [details]
dmesg with patch applied, while charging battery

All previous dmesg outputs were taken when on AC without battery.
This one is dwhile charging battery
Comment 9 Corrado Zoccolo 2009-03-29 15:09:50 UTC
Created attachment 20725 [details]
patch against v2.6.29 that solves the problem

I've elaborated a way to overcome the issue. The attached patch unifies poll + interrupt handling, so that controllers that mostly work do not incur the full penalty of polling mode.

Current patch reduces the interrupt handler to just wake up the main thread, that sequences all the controller operations, thus it slightly increase the latency for perfectly working controllers, but reduces it for partially working ones like mine.

If this is a problem, you can probably try an hybrid approach, in which the interrupt handler calls advance_transaction as well (I've not tried it, since it appears I only have non-conforming controllers).

I've tested the code also on a controller that generate GPE storms, but not tried to optimize this case (it could probably be done easily, though).

If the slightly increased latency is acceptable, some more cleanup can be done (advance_transaction shouldn't need the locking stuff if called from main thread).
Comment 10 Corrado Zoccolo 2009-03-29 15:10:57 UTC
Created attachment 20726 [details]
dmesg after my proposed patch
Comment 11 Alexey Starikovskiy 2009-03-30 13:42:53 UTC
Created attachment 20736 [details]
merge irq and poll modes

Nice idea,
but implementation is not quite right. Please check if my version works too.
Comment 12 Corrado Zoccolo 2009-03-30 14:20:32 UTC
Sure, I'll test it. This implements what I was suggesting, but unable to test.

Few points:
* Isn't 1 ms too much as timeout? For Burst mode, I see from APCI 3.0 specs that you have to respect the following time constraints:
 * First Access – 400 microseconds
 * Subsequent Accesses – 50 microseconds each
 * Total Burst Time – 1 millisecond

So I think it is likely that the controller should do the transition in few tens of microseconds. Can't we use a smaller timeout, and maybe backoff if we find that the timeout expired but we cannot advance?

* The part of the code starting at line 240 (/* missing GPEs, switch back to poll mode */) still bugs me. I think that if this happens, then we have a severely broken hardware, and the answer is not to just switch off interrupt mode, but probably to report to the system that we don't have a (working) EC at all (is this  possible after initialization? I'm new to kernel hacking).
Comment 13 Alexey Starikovskiy 2009-03-30 14:27:51 UTC
1. Burst mode will be deleted "real soon now". It was added as a workaround for MSI notebooks but cause more troubles.

2. missing GPE part should go way of dodo too, just because there is no separate poll mode any longer.
Comment 14 Corrado Zoccolo 2009-03-30 21:15:51 UTC
Your patch works, but still needs some cleanup:
** same unification can be done when waiting for ibf to become 0
** the initial state should have interrupts enabled, because otherwise it discards the first interrupt and needs to wait for timeout (I have HZ=100, so this is noticeable from my logs).

I look forward to see this patch in mainline!
Comment 15 Alexey Starikovskiy 2009-04-04 23:32:31 UTC
Created attachment 20805 [details]
merge irq and poll modes #2

Here is the cleaned up version.
Comment 16 Len Brown 2009-04-05 04:23:09 UTC
mark as RESOLVED since there is a patch proposed
and available for test.
Comment 17 Corrado Zoccolo 2009-04-05 12:54:32 UTC
Alexey, the patch works, but it logs too many "EC: non-query interrupt received, switching to interrupt mode" stuff, even for non-storming controllers.
I think the log can be removed, since now we don't have a distinct interrupt mode any more, and nothing is switched.

I've an idea about how to handle also storming controllers, without disabling interrupts for them during transaction setup.
Basically, if the controller is known to be storming, as soon as the transaction is done (i.e. when going to wake up the invoking thread) we disable GPE.
Is it safe to call acpi_disable_gpe inside the gpe handling?
Comment 18 Alexey Starikovskiy 2009-04-05 15:44:43 UTC
Created attachment 20821 [details]
merge irq and poll modes #3

Right, this printk is just leftover... Thanks!
Comment 19 Len Brown 2009-04-24 04:13:44 UTC
patch in comment #18 applies to 2.6.29, but not later.
i took a minute to merge it, but must have screwed it up
because I got:

drivers/acpi/ec.c:214: warning: ‘ec_delay’ defined but not used

Alexey, Please refresh against 2.6.30-rc3, or even better,
the acpi-test tree.
Comment 20 Alexey Starikovskiy 2009-04-24 20:53:00 UTC
Created attachment 21105 [details]
merge irq and poll modes #4

This one is clear against 2.6.29-rc3, don't have acpi-test handy.
Comment 21 Maxim Levitsky 2009-05-01 09:37:28 UTC
Isn't that obvious?

EC can send interrupts when it wants to, and there is an interrupt handlers inside EC code that listen for that.

Currently any interrupt that has come from ec, and wasn't due to reading/writing of data is flagged as a false interrupt.

This patch fixed that issue for me:

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 04e9044..6b82596 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -190,7 +190,7 @@ static void gpe_transaction(struct acpi_ec *ec, u8 status)
                } else
                        goto err;
        } else if (ec->curr->wlen == ec->curr->wi &&
-                  (status & ACPI_EC_FLAG_IBF) == 0)
+                  (status & (ACPI_EC_FLAG_IBF|ACPI_EC_FLAG_SCI)) == 0)
                ec->curr->done = true;
        goto unlock;
 err:
Comment 22 Maxim Levitsky 2009-05-01 09:45:34 UTC
Actually the above patch seems to be wrong, I shouldn't ack data transfer when SCI EC interrupt arrives, I will post correct one soon
Comment 23 Maxim Levitsky 2009-05-01 09:56:19 UTC
Yep, sorry for the type, but following patch works as well:

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 04e9044..fb07fd4 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -195,7 +195,7 @@ static void gpe_transaction(struct acpi_ec *ec, u8 status)
 	goto unlock;
 err:
 	/* false interrupt, state didn't change */
-	if (in_interrupt())
+	if (in_interrupt() && !(status & ACPI_EC_FLAG_SCI))
 		++ec->curr->irq_count;
 unlock:
 	spin_unlock_irqrestore(&ec->curr_lock, flags);
-- 
1.6.0.4
Comment 24 Maxim Levitsky 2009-05-01 09:57:54 UTC
I mean sorry for the typo :-)
Comment 25 Maxim Levitsky 2009-05-01 11:43:35 UTC
In fact after a hour, it detected a storm anyway, still better that instantly.
My notebook's EC does seem to send false interrupts.

Still I think that SCI interrupts shouldn't be marked as false
Comment 26 Alexey Starikovskiy 2009-05-03 18:28:03 UTC
Maxim,
ACPI_EC_FLAG_SCI is active from the moment EC decided to raise it till the moment driver issues query command, thus, it will be active across many invocations of interrupt handler, even more at irq storm. Only one of these invocations is right place to not count bogus interrups, all others are still bogus if they don't carry completion information. So, what your patch does is disabling storm detection for some time (while SCI bit is not acknowledged by driver). This is not a solution, and changing it to handle only first SCI bit invocation (GPE_PENDING flag is clear) will not help, while increase code size, but you may try it too.
Comment 27 Maxim Levitsky 2009-05-03 20:46:50 UTC
Thanks, didn't know that. Anyway my EC sends plenty (~6 irqs/s) of false interrupts.

Yet, I would like to use irq mode, because it works. I just raise the storm threshold for that to 20

Thus I would like to see an option to disable storm detection.

But anyway, polling or interrupt mode doesn't seem to affect my system in any way
Comment 28 Corrado Zoccolo 2009-06-21 07:00:10 UTC
Is anything preventing this to be considered resolved? I tested it thoroughly on my hardware, and it solves the problem.
Comment 29 Corrado Zoccolo 2009-07-05 13:44:47 UTC
Created attachment 22220 [details]
Fix GPE storm on my machine

Hi Alexey,
I found that on my machine, all the spurious interrupts are coming from the burst transaction in acpi_ec_space_handler .
The attached patch, that applies on top of your latest, skips counting spurious interrupts if they are coming from a burst transaction.
I also added a printk to see if on other machines, there are other recognizable patterns for GPE storms. Maxim, if you can give it a try, please report the dmesg output.
Comment 30 Corrado Zoccolo 2009-07-07 20:53:33 UTC
Created attachment 22251 [details]
Unification of GPE storm handling with normal (merged irq&poll) path.

Unfortunately, the previous patch was just hiding the problem, as uncovered disabling burst mode (I still got storm interrupts, so they were not caused by it).

Here is a new attempt, that further unifies handling of broken EC with
normal ones.
In case a spurious interrupt is received, GPEs are disabled only for
the remaining of current transaction (to avoid storm), and re-enabled at the
end.
This allows us to reduce the transaction latency on partially broken ECs, which send correctly the first interrupt (the one acknowledging the command).
See next attachment for dmesg after patch.
Comment 31 Corrado Zoccolo 2009-07-07 20:54:53 UTC
Created attachment 22252 [details]
dmesg after 22251 patch applied
Comment 32 Alexey Starikovskiy 2009-08-28 17:57:20 UTC
Corrado, there is a patch to limit burst mode to MSI notebooks only, because only they benefit from it. I'm closing this bug as resolved. Thanks.
Comment 33 Corrado Zoccolo 2009-08-28 20:50:00 UTC
Did the patch in comment #20 also get merged? That fixed the original problem, that is not related to burst.
Comment 34 Alexey Starikovskiy 2009-08-28 21:09:55 UTC
It is passed to Len Brown for inclusion. Hopefully, it will be merged to 2.6.32.
Comment 35 Len Brown 2009-08-29 18:50:32 UTC
Alexey, the patch in comment #20 applies cleanly on top of 2.6.29,
but was dropped b/c of conflicts with later versions of ec.c.
Please refresh it so that it applies cleanly on top of 2.6.31-rc8

re-opening to reflect that the candidate fix is now stale.
Comment 36 Alexey Starikovskiy 2009-08-29 21:52:09 UTC
Created attachment 22904 [details]
2.6.31 update

updated patch for 2.6.31
Comment 37 Len Brown 2009-08-30 02:05:26 UTC
Created attachment 22908 [details]
patch as applied to acpi-test tree
Comment 38 Len Brown 2009-09-24 21:57:59 UTC
commit 2a84cb9852f52c0cd1c48bca41a8792d44ad06cc
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Sun Aug 30 03:06:14 2009 +0400

    ACPI: EC: Merge IRQ and POLL modes


shipped in linux-2.6.31-git14
closed