Subject : 2.6.27-rc5 acpi: EC Storm error message on bootup Submitter : jmerkey@wolfmountaingroup.com Date : 2008-09-02 21:27 References : http://marc.info/?l=linux-kernel&m=122039255517586&w=4 Handled-By : Alexey Starikovskiy <astarikovskiy@suse.de> Patch : http://marc.info/?l=linux-kernel&m=122098180019264&w=4 This entry is being used for tracking a regression from 2.6.26. Please don't close it until the problem is fixed in the mainline.
Created attachment 17818 [details] fast transaction
Hi, Alexey I doubt whether the laptop of bug 11309 can be resovled by this patch. As said in bug 11309#C7, sometimes the _TMP object will return the incorrect temperature when the EC GPE storm is detected. >ACPI: EC: GPE storm detected, disabling EC GPE In the current upstream kernel EC will work in polling mode after the EC GPE storm is detected. It seems that on the laptop of bug11309 sometimes EC can't be accessed correctly while EC is in polling mode. In your patch when EC GPE storm is detected, it will also work in polling mode. So I suspect that the laptop of bug 11309 still is broken after applying your patch. Thanks.
Created attachment 17845 [details] fast transaction make strict sync around transaction
Hi, Alexey Thanks for the fixing. Why not send your patch to ACPI mailing list before it is attached in the bugzilla? When an issue is raised about your patch, you only fix it. But IMO you never consider the side-effect of your change. For example: I raise the synchronization issue in the patch of commet #1. Then you add the spin_lock protection in the comment #3. In the updated patch once the EC I/O data port is accessed, the spin_lock will be used. But if OS does 1000 EC transactions per second, it will disable interrupt at least for about 1ms.(EC device is connected with LPC bus. The access speed is very slow). Is it appropriate that the interrupt is disabled for so long time? The following ugly code still exists. The address of local variable in function is assigned to the global pointer variable. >struct transaction_data t = {.wdata = wdata, .rdata = rdata, .wlen = wdata_len, .rlen = rdata_len}; >ec->t = &t; At the same time I raise three issues about your patch. But there is no explanation about it.(Two are raised in my email. One is raised in comment #2) a. bogus timeout in EC transaction b. the EC notification event will be lost in some specific cases c. On the laptop of bug 11309 the EC sometimes can't be accessed correctly while EC is in polling mode(After EC GPE storm is detected). But in your patch when the EC GPE storm is detected, it will also work in polling mode while doing EC transaction. Thanks.
Hi, Alexey, I've reviewed your patch and I have two questions. Q1. do we really need a spin_lock in gpe_transaction? In fact, we don't need to disable the irq here, we only want to make sure ec->t can not be accessed by acpi_ec_gpe_handler and ec_poll at the same time, right? of course this won't introduce new bugs, but it's overkill. :p Q2. the same question (question c) as Yakui listed above. In this patch, one ec transaction can be done by ec_poll and ec_gpe_handler together. but this doesn't work for the machines that has "wrong status before GPE arrives". In fact, I don't think the ec driver should fit for all ECes, including the broken ones. there are always some cases that we can not cover, like the one listed above. We only need to implement different modes in ec driver just like we did before, and for every ec, we can make sure there is at least one mode can work for it. thanks, rui
bugme-daemon@bugzilla.kernel.org wrote: > I've reviewed your patch and I have two questions. > Q1. do we really need a spin_lock in gpe_transaction? In fact, we don't need > to > disable the irq here, we only want to make sure ec->t can not be accessed by > acpi_ec_gpe_handler and ec_poll at the same time, right? > of course this won't introduce new bugs, but it's overkill. :p fill free to suggest lighter synchronization. > > Q2. the same question (question c) as Yakui listed above. > In this patch, one ec transaction can be done by ec_poll and ec_gpe_handler > together. but this doesn't work for the machines that has "wrong status > before > GPE arrives". ec_poll() starts to work only if interrupt-based transaction fails. And interrupt-based transaction code will be started by GPE handler, so will automatically wait for a GPE to check correct status. > In fact, I don't think the ec driver should fit for all ECes, including the > broken ones. there are always some cases that we can not cover, like the one > listed above. > We only need to implement different modes in ec driver just like we did > before, > and for every ec, we can make sure there is at least one mode can work for > it. Doesn't work this way.
Hi Yakui, I answered all your "issues" several times already, and I am getting tired to repeat same things over and over. Please consider filtering your questions with Len or Rui. Thanks, Alex. bugme-daemon@bugzilla.kernel.org wrote: > Hi, Alexey > Thanks for the fixing. > Why not send your patch to ACPI mailing list before it is attached in the > bugzilla? > When an issue is raised about your patch, you only fix it. But IMO you > never consider the side-effect of your change. > For example: I raise the synchronization issue in the patch of commet #1. > Then you add the spin_lock protection in the comment #3. In the updated patch > once the EC I/O data port is accessed, the spin_lock will be used. But if OS > does 1000 EC transactions per second, it will disable interrupt at least for > about 1ms.(EC device is connected with LPC bus. The access speed is very > slow). > Is it appropriate that the interrupt is disabled for so long time? > > The following ugly code still exists. The address of local variable in > function is assigned to the global pointer variable. > >struct transaction_data t = {.wdata = wdata, .rdata = rdata, > .wlen = wdata_len, .rlen = rdata_len}; > >ec->t = &t; > > At the same time I raise three issues about your patch. But there is no > explanation about it.(Two are raised in my email. One is raised in comment > #2) > a. bogus timeout in EC transaction > b. the EC notification event will be lost in some specific cases > c. On the laptop of bug 11309 the EC sometimes can't be accessed > correctly > while EC is in polling mode(After EC GPE storm is detected). But in your > patch > when the EC GPE storm is detected, it will also work in polling mode while > doing EC transaction. > > Thanks. > >
> ec_poll() starts to work only if interrupt-based transaction fails. And > interrupt-based > transaction code will be started by GPE handler, so will automatically wait > for > a GPE to check correct status. If the input parameter of force_poll is non-zero or the GPE mode flag is cleared, the gpe_transaction will be called in the two different routines. Right?
Hi, Alexey Maybe you answer some issues I raised. But I haven't seen any change or explanation in the attached patch. a. bogus timeout in EC transaction. The bogus timeout seldom happens. But I don't care whether it is easy to happen on some laptops. What I only care is how to explain and process it when the bogus timeout happens. b. the EC notification event will be lost in some specific cases. According to what you said, the EC notification event will be lost and it can't be handled by your patch. Right? c. This issue is raised several times. And Rui also suspects that the patch can't work on the machines with "Incorrect GPE status before GPE arrives". Unfortunately we can't find any explanation about this. Maybe the laptop of bug 8110 is also broken again by your patch. IMO what Rui said in comment #5 is right. Too many EC controllers are available. And some of them don't follow the spec. It is not easy that one EC driver can cover all the ECs. Maybe it is appropriate that different modes are impemented in ec driver just as we did before and for every ec, we can make sure there is at least one mode can work for it. If it is confirmed that neither of the available modes can work for the broken EC, it is unnecessary to fix them. At the same time IMO it is worse to mix the interrupt-driven mode with the polling mode. It is so obcure that it is not easy to understand especiallly when there is no detailed change log. Thanks.
The patch in comment #3 resolves the slow key issue on my EeePC 900...
Created attachment 18047 [details] patch vs 2.6.27-rc7 please verify that this version of the patch (checked into the acpi test tree) works also. thanks, -Len
Ignore-Patch : http://marc.info/?l=linux-kernel&m=122098180019264&w=4 Patch : http://bugzilla.kernel.org/attachment.cgi?id=18047&action=view
Len: The new version of the patch also solves the problem for me. Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
shipped in linux-2.6.28-rc1 closed commit 7c6db4e050601f359081fde418ca6dc4fc2d0011 Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Thu Sep 25 21:00:31 2008 +0400 ACPI: EC: do transaction from interrupt context