Bug 11549

Summary: 2.6.27-rc5 acpi: EC Storm error message on bootup
Product: ACPI Reporter: Rafael J. Wysocki (rjw)
Component: ECAssignee: Alexey Starikovskiy (astarikovskiy)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, sitsofe
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11167    
Attachments: fast transaction
fast transaction
patch vs 2.6.27-rc7

Description Rafael J. Wysocki 2008-09-12 11:12:21 UTC
Subject    : 2.6.27-rc5 acpi: EC Storm error message on bootup
Submitter  : jmerkey@wolfmountaingroup.com
Date       : 2008-09-02 21:27
References : http://marc.info/?l=linux-kernel&m=122039255517586&w=4
Handled-By : Alexey Starikovskiy <astarikovskiy@suse.de>
Patch      : http://marc.info/?l=linux-kernel&m=122098180019264&w=4

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Alexey Starikovskiy 2008-09-16 16:03:33 UTC
Created attachment 17818 [details]
fast transaction
Comment 2 ykzhao 2008-09-16 18:33:53 UTC
Hi, Alexey
   I doubt whether the laptop of bug 11309 can be resovled by this patch.
   As said in bug 11309#C7, sometimes the _TMP object will return the incorrect temperature when the EC GPE storm is detected.
   >ACPI: EC: GPE storm detected, disabling EC GPE
   
   In the current upstream kernel EC will work in polling mode after the EC GPE storm is detected. It seems that on the laptop of bug11309 sometimes EC can't be accessed correctly while EC is in polling mode.
   
   In your patch when EC GPE storm is detected, it will also work in polling mode. So I suspect that the laptop of bug 11309 still is broken after applying your patch.

   Thanks.
Comment 3 Alexey Starikovskiy 2008-09-17 15:49:37 UTC
Created attachment 17845 [details]
fast transaction

make strict sync around transaction
Comment 4 ykzhao 2008-09-17 18:35:56 UTC
Hi, Alexey
    Thanks for the fixing.
    Why not send your patch to ACPI mailing list before it is attached in the bugzilla?
    When an issue is raised about your patch, you only fix it. But IMO you never consider the side-effect of your change.
    For example: I raise the synchronization issue in the patch of commet #1. Then you add the spin_lock protection in the comment #3. In the updated patch once the EC I/O data port is accessed, the spin_lock will be used. But if OS does 1000 EC transactions per second, it will disable interrupt at least for about 1ms.(EC device is connected with LPC bus. The access speed is very slow).
    Is it appropriate that the interrupt is disabled for so long time?
    
    The following ugly code still exists. The address of local variable in function is assigned to the global pointer variable.
     >struct transaction_data t = {.wdata = wdata, .rdata = rdata,
                                     .wlen = wdata_len, .rlen = rdata_len};
      >ec->t = &t;

    At the same time I raise three issues about your patch. But there is no explanation about it.(Two are raised in my email. One is raised in comment #2)
    a. bogus timeout in EC transaction
    b. the EC notification event will be lost in some specific cases
    c. On the laptop of bug 11309 the EC sometimes can't be accessed correctly while EC is in polling mode(After EC GPE storm is detected). But in your patch when the EC GPE storm is detected, it will also work in polling mode while doing EC transaction. 

    Thanks.
Comment 5 Zhang Rui 2008-09-17 19:34:59 UTC
Hi, Alexey,

I've reviewed your patch and I have two questions.
Q1. do we really need a spin_lock in gpe_transaction? In fact, we don't need to disable the irq here, we only want to make sure ec->t can not be accessed by  acpi_ec_gpe_handler and ec_poll at the same time, right?
of course this won't introduce new bugs, but it's overkill. :p

Q2. the same question (question c) as Yakui listed above.
In this patch, one ec transaction can be done by ec_poll and ec_gpe_handler together. but this doesn't work for the machines that has "wrong status before GPE arrives".
In fact, I don't think the ec driver should fit for all ECes, including the broken ones. there are always some cases that we can not cover, like the one listed above.
We only need to implement different modes in ec driver just like we did before, and for every ec, we can make sure there is at least one mode can work for it.

thanks,
rui
Comment 6 Alexey Starikovskiy 2008-09-17 23:32:33 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> I've reviewed your patch and I have two questions.
> Q1. do we really need a spin_lock in gpe_transaction? In fact, we don't need
> to
> disable the irq here, we only want to make sure ec->t can not be accessed by 
> acpi_ec_gpe_handler and ec_poll at the same time, right?
> of course this won't introduce new bugs, but it's overkill. :p
fill free to suggest lighter synchronization.
> 
> Q2. the same question (question c) as Yakui listed above.
> In this patch, one ec transaction can be done by ec_poll and ec_gpe_handler
> together. but this doesn't work for the machines that has "wrong status
> before
> GPE arrives".
ec_poll() starts to work only if interrupt-based transaction fails. And interrupt-based
transaction code will be started by GPE handler, so will automatically wait for
a GPE to check correct status. 
> In fact, I don't think the ec driver should fit for all ECes, including the
> broken ones. there are always some cases that we can not cover, like the one
> listed above.
> We only need to implement different modes in ec driver just like we did
> before,
> and for every ec, we can make sure there is at least one mode can work for
> it.
Doesn't work this way.
Comment 7 Alexey Starikovskiy 2008-09-17 23:36:47 UTC
Hi Yakui,
I answered all your "issues" several times already, and I am getting tired
to repeat same things over and over.
Please consider filtering your questions with Len or Rui.
Thanks,
Alex.

bugme-daemon@bugzilla.kernel.org wrote:
> Hi, Alexey
>     Thanks for the fixing.
>     Why not send your patch to ACPI mailing list before it is attached in the
> bugzilla?

>     When an issue is raised about your patch, you only fix it. But IMO you
> never consider the side-effect of your change.
>     For example: I raise the synchronization issue in the patch of commet #1.
> Then you add the spin_lock protection in the comment #3. In the updated patch
> once the EC I/O data port is accessed, the spin_lock will be used. But if OS
> does 1000 EC transactions per second, it will disable interrupt at least for
> about 1ms.(EC device is connected with LPC bus. The access speed is very
> slow).
>     Is it appropriate that the interrupt is disabled for so long time?
> 
>     The following ugly code still exists. The address of local variable in
> function is assigned to the global pointer variable.
>      >struct transaction_data t = {.wdata = wdata, .rdata = rdata,
>                                      .wlen = wdata_len, .rlen = rdata_len};
>       >ec->t = &t;
> 
>     At the same time I raise three issues about your patch. But there is no
> explanation about it.(Two are raised in my email. One is raised in comment
> #2)
>     a. bogus timeout in EC transaction
>     b. the EC notification event will be lost in some specific cases
>     c. On the laptop of bug 11309 the EC sometimes can't be accessed
>     correctly
> while EC is in polling mode(After EC GPE storm is detected). But in your
> patch
> when the EC GPE storm is detected, it will also work in polling mode while
> doing EC transaction. 
> 
>     Thanks.
> 
> 
Comment 8 ykzhao 2008-09-18 00:34:55 UTC
> ec_poll() starts to work only if interrupt-based transaction fails. And
> interrupt-based
> transaction code will be started by GPE handler, so will automatically wait
> for
> a GPE to check correct status. 
If the input parameter of force_poll is non-zero or the GPE mode flag is cleared, the gpe_transaction will be called in the two different routines.
Right? 
Comment 9 ykzhao 2008-09-18 01:40:44 UTC
Hi, Alexey
    Maybe you answer some issues I raised. But I haven't seen any change or explanation in the attached patch. 
    a. bogus timeout in EC transaction. The bogus timeout seldom happens. But I don't care whether it is easy to happen on some laptops. What I only care is how to explain and process it when the bogus timeout happens.
    b. the EC notification event will be lost in some specific cases. According to what you said, the EC notification event will be lost and it can't be handled by your patch. Right? 
    c. This issue is raised several times. And Rui also suspects that the patch can't work on the machines with "Incorrect GPE status before GPE arrives". Unfortunately we can't find any explanation about this. Maybe the laptop of bug 8110 is also broken again by your patch. 
    
    IMO what Rui said in comment #5 is right. Too many EC controllers are available. And some of them don't follow the spec. It is not easy that one EC driver can cover all the ECs. Maybe it is appropriate that different modes are impemented in ec driver just as we did before and for every ec, we can make sure there is at least one mode can work for it. If it is confirmed that neither of the available modes can work for the broken EC, it is unnecessary to fix them. 
   At the same time IMO it is worse to mix the interrupt-driven mode with the polling mode. It is so obcure that it is not easy to understand especiallly when there is no detailed change log.
   Thanks.
Comment 10 Sitsofe Wheeler 2008-09-24 09:37:04 UTC
The patch in comment #3 resolves the slow key issue on my EeePC 900...
Comment 11 Len Brown 2008-09-25 12:19:21 UTC
Created attachment 18047 [details]
patch vs 2.6.27-rc7

please verify that this version of the patch
(checked into the acpi test tree) works also.

thanks,
-Len
Comment 13 Sitsofe Wheeler 2008-10-05 04:24:11 UTC
Len:
The new version of the patch also solves the problem for me.

Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Comment 14 Len Brown 2008-10-24 23:25:16 UTC
shipped in linux-2.6.28-rc1
closed

commit 7c6db4e050601f359081fde418ca6dc4fc2d0011
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Thu Sep 25 21:00:31 2008 +0400

    ACPI: EC: do transaction from interrupt context