Bug 45151
Summary: | [REGRESSION] [BISECTED] ACPI event stop to be detected | ||
---|---|---|---|
Product: | ACPI | Reporter: | Francesco (trentini) |
Component: | ACPICA-Core | Assignee: | acpi_acpica-core (acpi_acpica-core) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | chris, feng.tang, florian, lenb, lv.zheng, Robert.Moore |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://bugs.launchpad.net/ubuntu/+source/linux/+bug/908825 | ||
Kernel Version: | 2.6.39 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Acpidump of my Notebook
gpe_debug.patch dmesg output when running kernel with "gpe_debug.patch" new gpe patch dmesg output of test proposed in comment #7 dmesg output of test proposed in comment #9 dmesg outputs of comment #13 (good?) dmesg outputs of comment #13 (bad?) dmesg outputs of comment #14 (good?) dmesg outputs of comment #14 (bad?) test of comment #21 adding ec_irq_force param make_threshold_parameter.patch dmidecode of santech t45 add quirk for Clevo M720 |
Description
Francesco
2012-07-26 14:20:48 UTC
could you post your acpidump binary? The commit's logic looks ok, if you revert this patch like this, will there be some difference? thanks diff --git a/drivers/acpi/acpica/evgpe.c b/drivers/acpi/acpica/evgpe.c index afbd5cb..36da618 100644 --- a/drivers/acpi/acpica/evgpe.c +++ b/drivers/acpi/acpica/evgpe.c @@ -377,10 +377,12 @@ u32 acpi_ev_gpe_detect(struct acpi_gpe_xrupt_info * gpe_xrupt_list) * Optimization: If there are no GPEs enabled within this * register, we can safely ignore the entire register. */ + /* if (!(gpe_register_info->enable_for_run | gpe_register_info->enable_for_wake)) { continue; } + */ /* Read the Status Register */ Created attachment 76621 [details]
Acpidump of my Notebook
I've compiled the 3.2.21 kernel (the ubuntu precise kernel source) with and without the proposed patch. The results is that applying the patch, thus comment out the if statement, solve the issue and the acpi restart working correctly on my notebook. I don't know why, but those three lines cause the stop of acpi event on my notebook. I've also attached the acpidump output of my notebook. Thanks Created attachment 76651 [details]
gpe_debug.patch
Hi,
Could you try to build a kernel with this patch, and enable CONFIG_ACPI_DEBUG in kernel config, then do:
1. bootup the kernel, switch to root user
2. run
#echo 0x4 > /sys/module/acpi/parameters/debug_layer
#echo 0x0800000c > /sys/module/acpi/parameters/debug_level
3. startup the "acpi_listen" and press _once the hotkey which doesn't work
4. send out the "dmesg"
Thanks,
Created attachment 76711 [details]
dmesg output when running kernel with "gpe_debug.patch"
I did as you written. I press only the "backlight down" hotkey that use cause an acpi event. The acpi listen output is: video LCD 00000087 00000000 Thanks for your time Very good, the output is useful. Now, please comment out those three lines of code (as in comment #1), and get the same debug output so we can compare the two debug sequences. Thanks. bob What does you mean exactly? Option A: from a virgen kernel source I apply _only the patch in comment #1 _and enable CONFIG_ACPI_DEBUG in kernel config and do the same step of comment #4 Option B: apply _both patch, comment #1 and "gpe_debug.patch", and run the same step as comment #4 with CONFIG_ACPI_DEBUG enabled. Which of two option do you mean? I prefer to be precise because I'm very far to be an expert in kernel hacking, so I want to be sure that I'm doing the right thing. Thanks Created attachment 76731 [details]
new gpe patch
Hi Francesco,
You already did a good job in testing kernel and providing the useful info.
I guess Bob's intention is to compare both the working and non-working case.
So could you try the new attached patch on a clean kernel without any of my previous patch, do the same test and provide the dmesg info?
Also you can add "log_buf_len=4M acpi.debug_layer=0x1f acpi.debug_level=0x8000000c" to your kernel cmdline by change grub.conf/grub.cfg, so that you don't need to do the
#echo 0x4 > /sys/module/acpi/parameters/debug_layer
#echo 0x0800000c > /sys/module/acpi/parameters/debug_level.
thanks,
Ok, I've done both test. I've done the test as described in comment #4 and with the patch requested in comment #7 and comment #9. The results are: Comment #7 test: in this case I've comment out the three lines of the if statement. The acpi subsystem works and acpi_listen gives the normal output as: "video LCD 00000087 00000000". The dmesg output are attached in "dmesg_comm7.txt" file. Comment #9 test: using "new gpe patch" proposed in this comment doesn't resume the acpi subsystem and acpi_listen doesn't put any output. In any case, the relative dmesg is attached in "dmesg_comm9.txt" file. I hope that it is the thinks what you want. If more test is needed, there is no problem. Thanks, Created attachment 77431 [details] dmesg output of test proposed in comment #7 Created attachment 77441 [details] dmesg output of test proposed in comment #9 From the 2 dmesgs for bad/good situations, I can't find obvious clue. Could you add these parameters to your kernel command line (by editing your /boot/grub/grub.cfg) "debug log_buf_len=8M acpi.debug_layer=0x1009001F acpi.debug_level=0x082002FF", and try the same test process for both the good/bad kernels? thanks Hi Francesco, Since this bug is heavily bound with EC driver, especially when there is a GPE storm warning in your dmesg. please apply this patch to your EC driver for all your good/bad test kernels: diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 7edaccc..1b70769 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -27,7 +27,7 @@ */ /* Uncomment next line to get verbose printout */ -/* #define DEBUG */ +#define DEBUG #include <linux/kernel.h> #include <linux/module.h> thanks Francesco, Could you provide the debug info with the new debug method? I mean make the changes suggested in Comment 13/14 and do the test and post the dmesg for both the good and bad kernel case. Here it is the results but something strange happens. I've check three times but I think I'm doing something wrong. The strange behavior is that adding the string of comment #13 ("debug log_buf_len=8M cpi.debug_layer=0x1009001F acpi.debug_level=0x082002FF") to kernel boot parameter, the acpi subsystem work vice versa as booting without the string. This mean that the acpi works with the if statement whereas stop working when is comment out (thus applying the patch of comment #1). On the other hand, when all the kernels are booted without the previous string the acpi subsystem work as described in the bug. Therefore, the string seems to invert the behavior of the acpi subsystem. Hence, I've reset all the kernel source folder and restart to compile all the kernel variant before doing the test. All test are done _WITH the boot string of comment #13 and acpi debug compiling option activated. The dmesg files included are: dmesg_c1.txt: in this case _only patch of comment #1 is applied. In that case acpi_listen doesn't output anything, the acpi subsystem doesn't work. dmesg_c9-new-gpe.txt: in this case _only patch of comment #9 is applied (called "new gpe patch"). In that case acpi work correctly. dmesg_c14-c1.txt: in this case _both patch of comment #1 and comment #14 are applied. In that case acpi_listen doesn't output anything, the acpi subsystem doesn't work. dmesg_c14-c9.txt: in this case _both patch of comment #9 and comment #14 are applied. In that case acpi work correctly. I hope that this help. Thanks, Francesco Created attachment 77981 [details] dmesg outputs of comment #13 (good?) Created attachment 77991 [details] dmesg outputs of comment #13 (bad?) Created attachment 78001 [details] dmesg outputs of comment #14 (good?) Created attachment 78011 [details] dmesg outputs of comment #14 (bad?) (In reply to comment #16) > Here it is the results but something strange happens. I've check three times > but I think I'm doing something wrong. > > The strange behavior is that adding the string of comment #13 ("debug > log_buf_len=8M cpi.debug_layer=0x1009001F acpi.debug_level=0x082002FF") to > kernel boot parameter, You can enter /sys/module/acpi/parameters, check debug_level and debug_layer to see if they are really what we wanted to set. > the acpi subsystem work vice versa as booting without the string. This mean > that the acpi works with the if statement whereas stop working when is > comment > out (thus applying the patch of comment #1). On the other hand, when all the > kernels are booted without the previous string the acpi subsystem work as > described in the bug. Therefore, the string seems to invert the behavior of > the > acpi subsystem. Yes, that's weird, Comment #1 should work while Comment #9 should not which simply add some debug info. So the too many dmesg output must affect the time sequence of EC(Embedded Controller), the EC on your platform is fragile which force the driver to use polling mode for normal transaction, while normal EC will use driver's interrupt driven mode. Since the dmesg is too many and its ring buffer has been overwritten (the same cmdline set doesn't have so many msg on my platform I can't find any useful info from the dmesg. So need your help to do more tests as follows: 1. add "log_buf_len=8M acpi.debug_layer=0x0000001F acpi.debug_level=0x080002FF" to cmdline 2.a test comment #1 + comment #14 2.b test comment #9 + comment #14 2.c test comment #1 2.d test comment #9 hope this time, the dmesg won't be wrapped. Created attachment 78071 [details] test of comment #21 (In reply to comment #21) > You can enter /sys/module/acpi/parameters, check debug_level and debug_layer > to > see if they are really what we wanted to set. I always check this files and all option are set up correctly. > Yes, that's weird, Comment #1 should work while Comment #9 should not which > simply add some debug info. There is something that is worse, now in all four test the acpi work correctly! I do not understand how the "debug string" can change this behavior. > 1. add "log_buf_len=8M acpi.debug_layer=0x0000001F > acpi.debug_level=0x080002FF" > to cmdline > 2.a test comment #1 + comment #14 > 2.b test comment #9 + comment #14 > 2.c test comment #1 > 2.d test comment #9 The results are put together in the "test_c21.tar.bz2" file in comment #22 >
> > Yes, that's weird, Comment #1 should work while Comment #9 should not which
> > simply add some debug info.
> There is something that is worse, now in all four test the acpi work
> correctly!
> I do not understand how the "debug string" can change this behavior.
I don't think these are worse :) As I said before, this issue is highly bound with EC driver, with these lots of ACPI debug info, the timing is changed, which makes your EC driver work "normally" in interrupt mode, not polling mode (you can't find "GPE storm" info any more in these dmesgs).
Could you try the following patch _only_ without any previou patches or those "debug string" in cmdline?
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 7edaccc..9196ab6 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -71,7 +71,7 @@ enum ec_command {
#define ACPI_EC_UDELAY_GLK 1000 /* Wait 1ms max. to get global lock */
#define ACPI_EC_MSI_UDELAY 550 /* Wait 550us for MSI EC */
-#define ACPI_EC_STORM_THRESHOLD 8 /* number of false interrupts
+#define ACPI_EC_STORM_THRESHOLD 20 /* number of false interrupts
per one transaction */
enum {
With the proposed patch in comment #24 the acpi event are come back to dead. I've compiled twice kernel to be sure. On dmesg, there is no output. Now I'm using the kernel with the patch as default on my notebook. Thanks a lot for the help! Francesco Sorry for my confused English... In the previous comment, I would mean that the patch fixes the bug. Francesco Created attachment 79351 [details]
adding ec_irq_force param
Glag to hear it fix the problem. But changing the threshold may not be accepted as some other platform with broken EC HW need it to be 8.
So could you try the attached patch cleanly, and add a "acpi.ec_irq_force=1" in the kernel cmdline? thanks,
If it works fine, hopefully we can push it to mainline kernel.
- Feng
No, I've check twice but the patch in comment #27 doesn't solve the issue, the bug is still present. Thanks, (In reply to comment #28) > No, I've check twice but the patch in comment #27 doesn't solve the issue, > the > bug is still present. > Thanks for the test, when you tried the patch, did you also add "acpi.ec_irq_force=1" in the kernel cmdline as in comment 27. btw, we have another patch "make_threshold_module_param.patch", could you try it without any other patch, and add "acpi.ec_storm_threshold=20" in cmdline at the same time? Created attachment 79881 [details]
make_threshold_parameter.patch
(In reply to comment #29) > Thanks for the test, when you tried the patch, did you also add > "acpi.ec_irq_force=1" in the kernel cmdline as in comment 27. No, I in test of my comment #28 to add the string. > btw, we have another patch "make_threshold_module_param.patch", could you try > it without any other patch, and add "acpi.ec_storm_threshold=20" in cmdline > at > the same time? Yes, I've tried patch of comment #27 and comment #30 WITH the respectively strings and both cases resumes the acpi, both resolve the bug. Thanks a lot for all your work! Francesco, Could you run "sudo dmidecode" on your machine, and post the output here. We'd like to make a patch to change the threshold for your laptop based on dmidecode info. With that patch checked into future and stable kernel, your machine can just work without any action. Created attachment 81041 [details]
dmidecode of santech t45
here it is, the dmidecode of my laptop.
Created attachment 81081 [details] add quirk for Clevo M720 Francesco, Pls test this patch plug the patch in comment 30, and make sure no "ec_storm_threshold" in the kernel cmdline. thanks, ok, I've tested the new kernel with patch 30 and 34, without any extra kernel command line at boot up. The acpi seems working correctly, all acpi event are detected by acpi_listen. Thanks! Francesco Many thanks for test. Francesco. I'll submit the 2 patches soon. patches staged for 3.7 A patch referencing this bug report has been merged in Linux v3.7-rc1: commit a520d52e99b14ba7db135e916348f12f2a6e09be Author: Feng Tang <feng.tang@intel.com> Date: Fri Sep 28 15:22:00 2012 +0800 ACPI: EC: Make the GPE storm threshold a module parameter A patch referencing this bug report has been merged in Linux v3.7-rc1: commit 67bfa9b60bd689601554526d144b21d529f78a09 Author: Feng Tang <feng.tang@intel.com> Date: Fri Sep 28 15:22:01 2012 +0800 ACPI: EC: Add a quirk for CLEVO M720T/M730T laptop Hi, Can anyone here try the upstream kernel without this quirk. We are about to remove the quirk as it is covered by the following commit: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ca37bfdf Thanks in advance. Best regards -Lv |