Bug 11141
Description
Gu Rui
2008-07-21 19:43:00 UTC
> Latest working kernel version:2.6.26-git8
> Earliest failing kernel version:2.6.26-git1
do you mean the problem is fixed in 2.6.26-git8?
I'm afraid you misunderstood these two questions.
"Latest working kernel version" is the latest kernel that works well and "Earliest failing kernel version" is the kernel that you got this problem for the first time.
please attach the acpidump output.
Created attachment 16932 [details]
the acpidump output on my machine
(In reply to comment #1) > do you mean the problem is fixed in 2.6.26-git8? > I'm afraid you misunderstood these two questions. > "Latest working kernel version" is the latest kernel that works well and > "Earliest failing kernel version" is the kernel that you got this problem for > the first time. the problem still exists in 2.6.26-git8. I'm so sorry that I'm too hasted so I didn't see the word "work"... > please attach the acpidump output. > I attached it to the Bug Tracker. Thanks for your quick reply~ Hi, Gu Rui Will you please confirm whether the 2.6.26-rc1 kernel can work? It will be great if you can confirm whether the windows can work(Please confirm whether the battery is present). Thanks. (In reply to comment #4) > Hi, Gu Rui > Will you please confirm whether the 2.6.26-rc1 kernel can work? I'm sorry that I don't know how to checkout 2.6.26-rc1 with git which I use to keep my kernel src up-to-date. I only know git-pull but don't know how to checkout a specific version... But at least my battery will always be present in 2.6.26-rc9. > It will be great if you can confirm whether the windows can work(Please > confirm whether the battery is present). Yes, the battery is always present in windows. Thanks for the response. Do you mean that the battery can be normal on the 2.6.26-rc9? But it sometimes fails in 2.6.26-git8, OK? From the acpidump it seems that the status of battery is related with EC. But there is no difference between 2.6.26-git8 and 2.6.26-rc9. Will you please double check whether the status of battery/Ac adaptor is correct on the two kernel? Had better attach the corresponding dmesg. Thanks. Created attachment 16985 [details]
dmesg of my 2.6.26-git8 kernel
Created attachment 16986 [details]
dmesg of my 2.6.26-rc9 kernel
Created attachment 16987 [details]
the diff of config between my 2.6.26-rc9 and 2.6.26-git8 kernel
the diff is trivial and didn't have much done with ACPI in my eyes. Provide for convenience.
(In reply to comment #6) > Thanks for the response. > Do you mean that the battery can be normal on the 2.6.26-rc9? But it > sometimes > fails in 2.6.26-git8, OK? Yes, the battery is normal on the 2.6.26-rc9. But _always_ fails on 2.6.26-git8. > From the acpidump it seems that the status of battery is related with EC. But > there is no difference between 2.6.26-git8 and 2.6.26-rc9. > Will you please double check whether the status of battery/Ac adaptor is > correct on the two kernel? Had better attach the corresponding dmesg. I attached the dmesg(with uptime info) both of 2.6.26-git8 and 2.6.26-rc9. And the diff of two config for convenience. Wish thay could do a help~ > Thanks. > I should thank you. It's your response let me know that I'm not alone when I struggle with the hard-taming frash new kernel;) Created attachment 17113 [details]
try the debug patch
Will you please try the debug patch on the failing kernel and see whether the problem still exists?
Please attach the output of dmesg after the test.
thanks.
Handled-By : Zhao Yakui <yakui.zhao@intel.com> Created attachment 17137 [details]
dmesg of 2.6.27-rc2 kernel patched with ykzhao's debug patch
I don't lost the bettery after applying your patch. But problem still exists. Even after "ACPI: EC: missing confirmations, switch off interrupt mode.", dmesg, sensors and KSysGusrd become very slow and then the whole system lost response for a while. Anyway, thanks for your efforts and I think we moved a step further.
Created attachment 17173 [details] try the debug patch Will you please try the debug patch and see whether the problem in comment #13 still exists? (Of course the patch in comment #11 is still required.) After test,please attach the output of dmesg. Thanks. Created attachment 17182 [details]
dmesg of 2.6.27-rc2-git5 kernel patched with ykzhao's two debug patches
This time it switched to the "ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b0=1"" problem again... And the system also lost response for a while shortly after the problem happened.
Created attachment 17183 [details] test the debug patch(disable burst access mode) Will you please try the debug patch and see whether the problem still exists? In this debug patch OS will disable the burst access mode. Of course the patches in comment #11 and #14 are still required. After the test, please attach the output of dmesg. Thanks. Created attachment 17184 [details]
dmesg of 2.6.27-rc2-git5 kernel patched with ykzhao's three debug patches
It seems that the problem was solved. ;) And I found it maybe the Firefox and X that freeze my system, not the kernel...Good job~ 谢谢~;D
Hi, Gu Rui Will you please try it again when the patch in comment #16 is not applied? Thanks. Created attachment 17188 [details] dmesg of 2.6.27-rc2-git5 kernel patched with ykzhao's two debug patches but without patch in comment #16 Yes, the problem appeared on the kernel without the patch in comment #16. And I noticed this time that the system freeze is IO related. Because the running programs have response while I cannot launch a new program and when I want to browse a new folder in the Konqueror, it will be frozen with a message pop up: http://picasaweb.google.com/chaos.proton/BugHunting/photo#5233637903075835330 Created attachment 17226 [details] try the debug patch Will you please try the attached debug patch on your laptop? In this debug patch the burst mode will be discarded if the burst mode timeout happens. After test ,please attach the output of dmesg. Of course the patches in comment #11 and #14 are still required. Thanks. Created attachment 17250 [details]
dmesg of the kernel patched with c#11, c#14 and c#20
Can the system work well? Does it happen that dmesg, sensors and KSysGusrd become very slow and then the whole system lost response for a while? Thanks. This bug is referred to in commit 9d699ed92a459cb408e2577e8bbeabc8ec3989e1 Author: Zhao Yakui <yakui.zhao@intel.com> Date: Mon Aug 11 10:33:31 2008 +0800 ACPI: Avoid bogus EC timeout when EC is in Polling mode (now upstream). Can someone confirm please that it is fixed by this patch? Created attachment 17285 [details] dmesg of upstream kernel I booted the upstream kernel for about 3 times without problem. That sounds strange because only the patch in Comment #11 goes upstream and others does not yet. In fact the issue can be fixed by applying the patch in comment #11. (No battery or DC status). The purpose of other two patches is to eliminate the following warning messages and make the EC work more reasonably. >[ 1028.112952] ACPI: EC: missing confirmations, switch off interrupt mode. >[ 1031.163449] ACPI: EC: acpi_ec_wait timeout, status = 0x09, event = "b0=1" >[ 1031.163619] ACPI: EC: read timeout, command = 130 >[ 510.076787] ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b0=1" >[ 510.076787] ACPI: EC: read timeout, command = 130 At the same time I am not sure whether the other two patches are reasonable for other laptops. Before it can go upstream, more discussion is needed. thanks. Hi, Gu Rui Can you confirm whether the system can work well after applying the patches in comment #11,14, 20? Does it still happen that dmesg, sensors and KSysGusrd become very slow and then the whole system lost response for a while? Thanks. Created attachment 17332 [details] dmesg of clean 2.6.27-rc3-git6 >Does it still happen that dmesg, sensors and KSysGusrd become very slow and >then the whole system lost response for a while? Yes, these problems happen on clean 2.6.27-rc3-git6(without patches in comment #11,14,20), when running about half an hour or so. Shortly after the message "[ 1492.442721] ACPI: EC: missing confirmations, switch off interrupt mode." pop up. But on the 2.6.27-rc3-git6 patched with the patches in comment #11,14,20, these problem doesn't happen again. And I noticed that the patched kernel response faster than the clean upsteam kenel. Created attachment 17402 [details] dmesg of 2.6.27-rc4-git1 kernel patched with patches in c#11,14,20 Hi, ykzhao I found a system freeze even patched with your patches in comment#11,14,20. This time I booted with kernel with same kernel hack configuration enabled. So maybe you could found same details. I also found a freeze in a clean 2.6.27-rc4-git1 kernel and I will upload the dmesg later. Thanks. Created attachment 17403 [details]
dmesg of clean 2.6.27-rc4-git1 kernel
Hi, Gu Rui Thanks for the info. From the log in comment #29 it seems that there exists the following warning message when the system freezes. > BUG: soft lockup - CPU#1 stuck for 73s! [swapper:0] > [ 4668.559770] Modules linked in: > [ 4668.559770] > [ 4668.559770] Pid: 0, comm: swapper Not tainted (2.6.27-rc4-g9-00123-gd3ee1b4 #13) > [ 4668.559770] EIP: 0060:[<c04aca78>] EFLAGS: 00000203 CPU: 1 [ 4668.559770] EIP is at _spin_unlock_irq+0x8/0x30 [ 4668.559770] EAX: 00000001 EBX: f70df700 ECX: f669bc18 EDX: f784bef4 [ 4668.559770] ESI: f7844000 EDI: f784bef4 EBP: 00000102 ESP: f784bed4 [ 4668.559770] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 4668.559770] CR0: 8005003b CR2: b7ef3000 CR3: 0065e000 CR4: 000006d0 [ 4668.559770] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 4668.559770] DR6: ffff0ff0 DR7: 00000400 [ 4668.559770] [<c012eed3>] ? run_timer_softirq+0x123/0x1a0 [ 4668.559770] [<c012f0c0>] ? process_timeout+0x0/0x10 [ 4668.559770] [<c012a837>] ? __do_softirq+0x77/0xf0 [ 4668.559770] [<c012a905>] ? do_softirq+0x55/0x60 [ 4668.559770] [<c012ab27>] ? irq_exit+0x77/0x80 [ 4668.559770] [<c0114507>] ? smp_apic_timer_interrupt+0x57/0x90 [ 4668.559770] [<c0103dec>] ? apic_timer_interrupt+0x28/0x30 [ 4668.559770] [<c012abd3>] ? raise_softirq+0x3/0x80 [ 4668.559770] [<c0109d7b>] ? c1e_idle+0x9b/0xd0 [ 4668.559770] [<c015da0e>] ? rcu_check_callbacks+0x3e/0xb0 [ 4668.559770] [<c0101d20>] ? cpu_idle+0x90/0x120 [ 4668.559770] ======================= Now the remaing issue is not related with ACPI. I don't know how to get the root cause and fix it. You can send the corresponding log to LKML mail list. Maybe someone can fix it. In fact the issue can be fixed by applying the patch in comment #11. (No battery or DC status). The purpose of other two patches is to eliminate the following warning messages and make the EC work more reasonably. >[ 1028.112952] ACPI: EC: missing confirmations, switch off interrupt mode. >[ 1031.163449] ACPI: EC: acpi_ec_wait timeout, status = 0x09, event = "b0=1" >[ 1031.163619] ACPI: EC: read timeout, command = 130 >[ 510.076787] ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b0=1" >[ 510.076787] ACPI: EC: read timeout, command = 130 But I am not sure whether the other two patches are reasonable for other laptops. Before it can go upstream, more discussion and more tests are needed. As the patch in comment #11 is already included in 2.6.27-rc4, IMO this bug can be marked as resolved. > commit 9d699ed92a459cb408e2577e8bbeabc8ec3989e1 > Author: Zhao Yakui <yakui.zhao@intel.com> > Date: Mon Aug 11 10:33:31 2008 +0800 >ACPI: Avoid bogus EC timeout when EC is in Polling mode thanks. Hi, Rafael The patch in comment #11 can fix the problem (no Battery/DC status). And the remaing issue is not related with ACPI. I don't understand why this bug is reopened again. Thanks. OK, so there should be a different bug entry for the other problem. Is the patch from comment #11 upstream already? Hi, ykzhao Thanks for your great effort. I agree with you. This bug can be marked as RESOLVED. I opened an other bug entry: http://bugzilla.kernel.org/show_bug.cgi?id=11418 for the system freeze problem but get no response yet... Maybe I should provide more infomation for it. 谢谢 Hi, Rafael The patch in comment #11 is already included in 2.6.27-rc4. Thanks for caring this issue. Thanks. i'm glad to reopen the issue; i'm using the kernel release 2.6.27-rc7 and i get errors on dmesg and sometimes also a freeze related in my opinion here are dmesg's errors, please give me the informations to do some test; thanks ACPI: EC: GPE storm detected, disabling EC GPE ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b0=1" ACPI: EC: read timeout, command = 130 ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b1=0" ACPI: EC: input buffer is not empty, aborting transaction ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b1=0" ACPI: EC: input buffer is not empty, aborting transaction ACPI Exception (evregion-0419): AE_TIME, Returned by Handler for [EmbeddedControl] [20080609] ACPI Error (psparse-0530): Method parse/execution failed [\_SB_.ADP1._PSR] (Node f781aea0), AE_TIME ACPI Exception (ac-0136): AE_TIME, Error reading AC Adapter state [20080609] ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b1=0" ACPI: EC: input buffer is not empty, aborting transaction ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b1=0" ACPI: EC: input buffer is not empty, aborting transaction ACPI Exception (evregion-0419): AE_TIME, Returned by Handler for [EmbeddedControl] [20080609] ACPI Error (psparse-0530): Method parse/execution failed [\_TZ_.TZ01._TMP] (Node f781c108), AE_TIME ACPI: EC: acpi_ec_wait timeout, status = 0x1a, event = "b1=0" ACPI: EC: write_cmd timeout, command = 128 ACPI: EC: acpi_ec_wait timeout, status = 0x1a, event = "b1=0" ACPI: EC: input buffer is not empty, aborting transaction ACPI Exception (evregion-0419): AE_TIME, Returned by Handler for [EmbeddedControl] [20080609] ACPI Error (psparse-0530): Method parse/execution failed [\_SB_.ADP1._PSR] (Node f781aea0), AE_TIME ACPI Exception (ac-0136): AE_TIME, Error reading AC Adapter state [20080609] ACPI: EC: acpi_ec_wait timeout, status = 0x1a, event = "b1=0" ACPI: EC: input buffer is not empty, aborting transaction ACPI: EC: acpi_ec_wait timeout, status = 0x1a, event = "b1=0" ACPI: EC: write_cmd timeout, command = 128 ACPI Exception (evregion-0419): AE_TIME, Returned by Handler for [EmbeddedControl] [20080609] ACPI Error (psparse-0530): Method parse/execution failed [\_TZ_.TZ01._TMP] (Node f781c108), AE_TIME ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b0=1" ACPI: EC: read timeout, command = 130 ACPI: EC: acpi_ec_wait timeout, status = 0x0a, event = "b1=0" ACPI: EC: input buffer is not empty, aborting transaction ACPI Exception (evregion-0419): AE_TIME, Returned by Handler for [EmbeddedControl] [20080609] ACPI Error (psparse-0530): Method parse/execution failed [\_TZ_.TZ00._TMP] (Node f781c078), AE_TIME evdev.c(EVIOCGBIT): Suspicious buffer size 511, limiting output to 64 bytes. See http://userweb.kernel.org/~dtor/eviocgbit-bug.html If your box is not Dell i1501, please create a separate bug entry for this problem and put the information in there. Please add my address to the CC list of the new entry, thanks. ok ... i'm doing it now; mine is a sony vgn-sr [new series of july] thanks a lot On Sun, Sep 28, 2008 at 12:36 PM, <bugme-daemon@bugzilla.kernel.org> wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11141 > > > > > > ------- Comment #37 from rjw@sisk.pl 2008-09-28 03:36 ------- > If your box is not Dell i1501, please create a separate bug entry for this > problem and put the information in there. Please add my address to the CC > list > of the new entry, thanks. > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. > <div dir="ltr">ok ... i'm doing it now;<br><br>mine is a sony vgn-sr [new series of july]<br>thanks a lot<br><br><div class="gmail_quote">On Sun, Sep 28, 2008 at 12:36 PM, <span dir="ltr"><<a href="mailto:bugme-daemon@bugzilla.kernel.org">bugme-daemon@bugzilla.kernel.org</a>></span> wrote:<br> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><a href="http://bugzilla.kernel.org/show_bug.cgi?id=11141" target="_blank">http://bugzilla.kernel.org/show_bug.cgi?id=11141</a><br> <br> <br> <br> <br> <br> ------- Comment #37 from rjw@sisk.pl 2008-09-28 03:36 -------<br> If your box is not Dell i1501, please create a separate bug entry for this<br> problem and put the information in there. Please add my address to the CC list<br> of the new entry, thanks.<br> <font color="#888888"><br> <br> --<br> Configure bugmail: <a href="http://bugzilla.kernel.org/userprefs.cgi?tab=email" target="_blank">http://bugzilla.kernel.org/userprefs.cgi?tab=email</a><br> ------- You are receiving this mail because: -------<br> You are on the CC list for the bug, or are watching someone who is.<br> </font></blockquote></div><br></div> sorry for the reply by mail =) i've to add only you to the cc list? (In reply to comment #39) > sorry for the reply by mail =) > > i've to add only you to the cc list? Of course you can add more addresses to it as needed. :-) |