Bug 10237 (bios-update)
Description Joshua Covington 2008-03-13 08:38:12 UTC
Latest working kernel version: N/A Earliest failing kernel version: 220.127.116.11 Distribution: Vanialla, Fedora, opensuse Hardware Environment: Software Environment: Problem Description: After upgrading my bios from 3303 to 3315 the kernel cannot recognize my cpu. i upgraded my cpu from amd mk36 to amd tl60 and now i have kernel panic. the only way to start the mashine is to pass the noapic argument. all the attached files are from fedora8 started with noapic. i tried the vanilla kernel 18.104.22.168 and opensuse 10.3 and got the same error. I tried with bios 3309 (available from acer webiste) with the same results. according to the bios changelog after 3303 the kernel cpu code has been updated. maybe this is the cause of the problem. Steps to reproduce: update bios
Comment 6 Joshua Covington 2008-03-13 08:41:09 UTC
Created attachment 15245 [details] /proc/scsi/scsi
Comment 7 Joshua Covington 2008-03-13 08:43:42 UTC
Created attachment 15246 [details] dmesg dmesg after startinf with noapic
Comment 8 Joshua Covington 2008-03-13 08:50:21 UTC
Created attachment 15247 [details] 3303 bios 3303 bios. i have no problems with it. but the tl-60 processor cannot work with this bios because of temperature problems. therefore i had to upgrade to bios 3315.
Comment 9 Joshua Covington 2008-03-13 08:51:49 UTC
Created attachment 15248 [details] bios 3315 this is the bios 3315. i cannot start the kernel without the noapic argument. windows has no problems with it.
Comment 10 Joshua Covington 2008-03-13 08:53:29 UTC
Created attachment 15249 [details] Bios change log this is the bios change log. after v. 3303 the bios cpu code has been updated from 2.6.9 to 2.08.03.
Comment 11 Joshua Covington 2008-03-14 10:57:26 UTC
There is some really strange behaviour. After I've used windows for about an hour (the mashine is already hot enough) then even the noapic argument cannot start the kernel. when i use it, it stops after line 316 from the dmesg log (line 316: pci_hotplug: PCI Hot Plug PCI Core version: 0.5). the only way to start the kernel is to wait untill the notebook is cooled down and then use the noapic argument. Any ideas?
Comment 12 Joshua Covington 2008-03-15 09:53:01 UTC
i tried to dump the acpi with acpidump. however i got this error: acpi_os_map_memory: cannot open /dev/mem. the compiter was started with noapic, as this is the only way to boot the kernel.
Comment 13 Joshua Covington 2008-03-15 10:07:55 UTC
Created attachment 15279 [details] photo of the error actually i don't get kernel panic all the time. it is randomly. and therefore i attached this photo. the kernel was started with no argument
Comment 14 Joshua Covington 2008-03-15 23:32:25 UTC
Created attachment 15282 [details] kernel panic error ok, now i have a photo of the kernel panic error. it occurrs when the coomputer is cold (started after cooled off)
Comment 15 Joshua Covington 2008-03-16 09:28:28 UTC
Created attachment 15294 [details] mk36 error this is a snapshot from the error that i got when i start the mashine with turion mk36 processor.
Comment 16 Joshua Covington 2008-03-16 09:30:43 UTC
Created attachment 15295 [details] mk36 error 2 another photo of the error with mk36 processor.there are another info on this one.
Comment 17 ykzhao 2008-03-20 22:35:39 UTC
HI, Joshua Will you please try the latest kernel (2.6.25-rc6) and see whether the problem still exists? Please boot the system with "acpi=off" and attach the output of acpidump. Thanks.
Comment 18 Joshua Covington 2008-03-21 10:52:23 UTC
hi i tried 2.6.25-rc6-git5, but it took it from fedora. here you can find more info about it http://koji.fedoraproject.org/koji/buildinfo?buildID=43458. the version that i tried is kernel-2.6.25-0.136.rc6.git5.fc9.i686. I know that it is not the vanilla kernel but i decided to save the compile time. Actuall this kernel loaded with acpi=off. 2 days ago i tried the git3 version and i got a kernel panic. anyway here is the attached info from the acpidump.
Comment 19 Joshua Covington 2008-03-21 10:53:37 UTC
Created attachment 15383 [details] acpi info from acpidump on kernel-2.6.25-0.136.rc6.git5.fc9.i686 acpi info from acpidump on kernel-2.6.25-0.136.rc6.git5.fc9.i686
Comment 20 Joshua Covington 2008-04-08 10:01:47 UTC
any ideas? I tried kernel-2.6.25-0.212.rc8.git6.fc9 and it stops with the messages ACPI0000.7 registered as cooling_device0 ACPI0000.7 registered as cooling_device1 It is still a kernel panic but now there is a definite proof that this is a acpi problem.
Comment 21 ykzhao 2008-04-14 02:43:26 UTC
Hi, Joshua Sorry for the timely response. It seems that there exists the following problem when the system is booted with the option of "noapic". >ACPI Exception (evregion-0420): AE_TIME, Returned by Handler for [EmbeddedControl]  > ACPI Exception (dswexec-0462): AE_TIME, While resolving operands for [OpcodeName unavailable]  > ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.THRM._TMP] (Node f781a8b8), AE_TIME Maybe the problem is related with the BIOS. Will you please confirm whether there exists the problem on BIOS 3303 and attach the output of dmesg, acpidump? Thanks.
Comment 22 Joshua Covington 2008-04-14 10:41:57 UTC
i cannot confirm this because of the following: 1. The notebook (acer aspire 5051awxmi s/n: LX.AG305.026) was originally equipted with Turion 64 mk36 processor (single core). According to the specs it can work with any processor for socket S1 and TDP < 31W. 2. I changed the cpu and now I have Turion x2 64 tl-60 (doble core). the only way to make the system boot with this processor is to use bios version 3315. so bios 3303 cannot boot my mashine. 3. according to the change log of the bios in version 3309 a fix was applied to fix a thermal problem. and with kernel kernel-2.6.25-0.228.rc9.fc9.i686 the mashine stops with the following messages: ACPI0000.07 registered as cooling_device0 ACPI0000.07 registered as cooling_device1. It is definitely a bios problem and i think it is connected with the thermal table (see bios change log for version 3307). I'll change the cpu, reflash the bios and report back.
Comment 23 Joshua Covington 2008-04-14 14:21:35 UTC
actually i cannot try it. i have one more mashine from the same model but there is an issue with the fedora 9 beta (the display is black - driver problem). But i can say that bios 3303 was the only bios version that worked with linux. untill changing the cpu i was using it and it had no problems with it. so bios 3303 is working properly.
Comment 24 Joshua Covington 2008-04-17 12:54:04 UTC
Created attachment 15791 [details] Kernel Panic with kernel-2.6.25 This is the kernel panic trace that i got with the kernel-2.6.25-1.fc9.i686. it show that the problem is somewhere in the acpi_thermal_***. I hope this can help.
Comment 25 Joshua Covington 2008-04-17 12:55:54 UTC
Created attachment 15792 [details] Kernel trace with kernel-2.6.25-1.fc9.i686 more detailed image of the stack trace from the error.
Comment 26 Joshua Covington 2008-04-20 13:24:21 UTC
Created attachment 15825 [details] acpidump from bios 3303 This is the acpidump output from bios 3303. everything works fine with this bios. I tried with the fedora RC 9 with kernel 2.6.25-0.234.rc9.git1.fc9.i686
Comment 27 Joshua Covington 2008-04-20 13:58:02 UTC
Created attachment 15826 [details] dmesg from kernel 2.6.25-0.234.rc9.git1.fc9.i686 this dmesg is from kernel 2.6.25-0.234.rc9.git1.fc9.i686 started with noapci nolapic apci=off acpi=off debug. the bios verion is 3315.
Comment 28 Joshua Covington 2008-04-29 14:55:28 UTC
the only way to start my mashine is to use acpi=off. then it boots normaly but almost nothing works (no wireless, the cooler is aways on, s.n.) so it is definitely a acpi error. i think it is something connected with the TZ_.THRM._TMP part of the bios but i cannot prove it. the kernel just stops when trying to inteprete this part of the bios. Any ideas how this should be solved?
Comment 29 Joshua Covington 2008-04-29 16:21:14 UTC
Created attachment 15981 [details] dmesg-30.11.2007-mk36-bios3303-kernel22.214.171.124-42.fc8 dmesg from 30.11.2007 with mk36 processor, bios 3303 and kernel126.96.36.199-42.fc8. this can be linked with the acpidump from bios 3303 (attachment # 15825 [details]). you can see that the acpi system is initialized properly and everything is fine. however the new tl-60 processor cannot work with bios 3303 and needs 3315. the linux kernel cannot initialize the acpi system on the latest bios and stops (presumably) on the ACPI Thermal Zone. However i cannot prove this.
Comment 30 ykzhao 2008-05-06 02:35:29 UTC
Created attachment 16043 [details] try the custom DSDT Will you please try the custom DSDT and see whether the problem still exists?
Comment 31 ykzhao 2008-05-06 02:49:27 UTC
Hi, Joshua From the info in comment #26 and comment # 19there exists the following error on the bios 3315. > Store (\_SB.PHSR (0x0D, 0x00), TJ85) But the PHSR doesn't exist, which cause that the mutex \_SB.PCI0.LPC0.EC0.MUT1 can't be released. Will you please try the custom DSDT in comment #30 and see whether the problem still exists? If exists, please attach the failure picture. It will be great if you can boot the system with the option of "noapic" and attach the output of dmesg. Thanks.
Comment 32 Joshua Covington 2008-05-06 13:13:06 UTC
Created attachment 16048 [details] dmesg with the custom dsdt this is the dmesg with the custom dsdt. i used the kernel-tuxonice-188.8.131.52-85_1.cubbi_tuxonice.fc8.i686 (from atrpms.net) which is based on the fc8 kernel. it has the custom dsdt option enabled. first i compiled the dsdt.dsl file on windows and merged the dsdt.aml in the initrd. so everything is working now. when you look in the dmesg there are several messages that the lapic is not functional. maybe this is connected with something else from the bios, not just the dsdt.
Comment 33 Joshua Covington 2008-05-06 13:16:22 UTC
Created attachment 16049 [details] /var/log/messages with the custom dsdt this are the kernel messages with the custom dsdt (from kernel-tuxonice-184.108.40.206-85_1.cubbi_tuxonice.fc8.i686). i got a kernel panic with the ath5k driver. look at the end of the file and there is the trace from it. can this panic be connected to the custom dsdt? is this a pure ath5k error or something connected with the bios?
Comment 34 ykzhao 2008-05-09 03:09:26 UTC
Created attachment 16080 [details] try the custom DSDT From the log it seems that the bug is caused by the broken BIOS. There doesn't exist the \_SB.PHSR object, which causes that the mutex \_SB.PCI0.LPC0.EC0.MUT1 can't be released. Sorry for my fault: The following is modified by mistake. >OperationRegion (BAR5, SystemMemory, GBAA (), 0x0200) Will you please try the new custom DSDT and see whether the system is still panic? Please attach the output of dmesg. Thanks.
Comment 35 Joshua Covington 2008-05-12 07:06:43 UTC
Created attachment 16113 [details] my custom DSDT this is my version of the DSDT table. i compiled several times both of your suggestions and they always resulted in identical .aml files. this means that the iasl compiler doesn't care if there is GBAA or just GBAA(). i tried to recreate the missing object. and it is working now. according to the bios changelog TJ85 should be connected with the thermal table for the tk38 processor. and \_SB.PCI0.LPC0.PHSR looks like some kind of a message bus for the system. thefore i added it. up to now the system is working ok. i tried both version: 1. with PHSR 2. without PHSR object you'll find the dmesg files attached below. As I said in both cases the mashine is working fine so i cannot say if the PHSR object is really needed. But i added it anyway and I'm using the system with it.
Comment 36 Joshua Covington 2008-05-12 07:09:08 UTC
Created attachment 16114 [details] dmesg from DSDT without PHSR object this is the dmesg when the system was booted with DSDT table without the PHSR object. the inserted DSDT.aml file is the same as the one compiled from your dsl file.
Comment 37 Joshua Covington 2008-05-12 07:12:15 UTC
Created attachment 16115 [details] dmesg from DSDT with PHSR object this is the dmesg when the system was booted with DSDT with the PHSR object added. the DSDT table was compiled from my custom version of the DSDT file. everything looks like the version without the PHSR object. so i don't know if it is really needed for the tl60 processor. I'm using it anyway.
Comment 38 Joshua Covington 2008-05-12 07:15:41 UTC
now about the kernel panic from post #33: I worked in the last 4 days with both dsdt tables (with and without the PHSR object) and the madwifi package. it never resulted in any kernel panic. therefore i think the panic from post #33 is from the ath5k driver, not from the bios. but i cannot prove this.
Comment 39 ykzhao 2008-05-14 02:32:47 UTC
Hi, Joshua Thanks for the test. It seems that the bug is caused by the broken bios. There doesn't exist the \_SB.PHSR object, which causes that the mutex \_SB.PCI0.LPC0.EC0.MUT1 can't be released. So it will hange when the _TMP object is called to get the thermal temperature in the boot phase. After removing the \_SB.PHSR object, the system can boot well. It is appropriate to fix this problem by upgrading bios. IMO the bug can be rejected.
Comment 40 Joshua Covington 2008-05-14 11:25:06 UTC
yes, it is working now. I replaced \_SB.PHSR with \_SB.PCI0.LPC0.PHSR and it is working without problems for the past 3 days. I'm not sure which of the following is better: 1. removing the \_SB.PHSR object 2. replacing it with \_SB.PCI0.LPC0.PHSR. TJ85 is connected with the athlon tk38 processor but it is for the aspire 4720 series. the aspire 5050 series is sold with the turion mk36. i think that acer uses the same bios in both series. Anyway: removing or replacing the \_SB.PHSR object? Which will cause less harm to the system? What do you think?
Comment 41 ykzhao 2008-05-18 19:55:23 UTC
Hi, Joshua It seems that your system can work well when the \_SB.PHSR is replaced by the \_SB.PCI0.LPC0.PHSR. Please continue to use it. Maybe it is more better to replace it with \_SB.PCI0.LPC0.PHSR. Of course the bug had better be fixed by upgrading bios.
Comment 42 Joshua Covington 2008-05-19 16:44:29 UTC
This is the latest available bios. I'll try to contact acer about this but i don't think they'll do anything. Anyway, thank you for the help!
Comment 43 Joshua Covington 2008-06-26 03:56:03 UTC
I decided to reopen the bug based on comments from the fedora-devel-list available here: https://www.redhat.com/archives/fedora-devel-list/2008-June/msg01349.html I know that my fedora works again with the corrected dsdt table but maybe there should be code in the acpi that should circumvent/reject/recorrect the problems of the original table. if windows (vista and xp)work, why shouldn't linux do it either. Maybe there should be something in acpi_thermal_init (according to the photo of the kernel-panic-trace) that doesn't lock the kernel but just ignores the missing/wrong implenetation in the dsdt. By the way the broken line of the dsdt table (#3907 if I remember correctly) deals with object TJ85 which is connected with the thermal control. and the acpi_thermal_init is the last funktion from the kernel that gets executed. may the bug is in this funktion. What do you think?
Comment 44 ykzhao 2008-06-30 02:40:09 UTC
Hi, Joshua Understand what you said. And our target is that Linux is also expected to work if windows can work. Will you please confirm whether windows can work well on your laptop? Thanks.
Comment 45 Joshua Covington 2008-07-01 03:22:17 UTC
(In reply to comment #44) > Hi, Joshua > Understand what you said. And our target is that Linux is also expected to > work if windows can work. > > Will you please confirm whether windows can work well on your laptop? > > Thanks. > Yes, Windows XP SP3 and Vista, Vista SP1 work without any issues on this machine. I've never experienced any problems regarding windows. Even with bios versions 3303 and with 3309 everything was fine. The same for 3315 - absolutely no prolems. --joshua
Comment 46 Joshua Covington 2008-07-23 07:59:36 UTC
Can someone, please, take care of this? It's more than 3 weeks and there's not a single answer from anybody. only reassignment to someone else. Please.
Comment 47 Lin Ming 2008-07-23 17:24:37 UTC
I'm working on a patch that will release mutexes on method error exist. Please help to test when the patch is available.
Comment 48 Lin Ming 2008-07-24 01:16:31 UTC
Hi, Joshua Woulde
Comment 49 Lin Ming 2008-07-24 01:20:14 UTC
Created attachment 16960 [details] Release all metexes acquired by method on error exit Hi, Joshua Would you please help to test this patch? On method error exit, it releases automatically all mutexes acquired by this method.
Comment 50 ykzhao 2008-07-24 03:17:05 UTC
Hi, Joshua Will you please confirm whether the windows can work well? Please check whether the status of the battery can be changed correctly after AC adapter is plugged/unplugged and whether the LID can work. Thanks.
Comment 51 Joshua Covington 2008-07-24 12:53:49 UTC
(In reply to comment #50) > Hi, Joshua > Will you please confirm whether the windows can work well? > Please check whether the status of the battery can be changed correctly > after AC adapter is plugged/unplugged and whether the LID can work. > Thanks. > > All were tested and work without problems. everything is ok. Now I have to test the patch from Lin Ming and see what happens. (But first I have to recompile the kernel :()
Comment 52 Joshua Covington 2008-07-24 14:36:08 UTC
Created attachment 16972 [details] dmesg with debug acpi=off / the patch was applied Here is the dmesg from the kernel-2.6.27-0.173.rc0.git11.fc10 taken from here: http://koji.fedoraproject.org/koji/buildinfo?buildID=57343. take a look at line #288: ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread F7850000 could not acquire Mutex   I recompiled it with your patch. It was started with "debug acpi=off" because I got kernel panic. Hope this can help.
Comment 53 Joshua Covington 2008-07-24 14:40:32 UTC
Created attachment 16973 [details] kernel panic with applied patch This is a snapshot from the kernel panic that I got after applying the patch.
Comment 54 ykzhao 2008-07-24 19:07:05 UTC
Created attachment 16974 [details] try the debug patch Will you please try the debug patch to see whether the system can be booted normally? In the debug patch OS will ignore the error about evaluating the _REG object and continue to initailize the EC device. Of course please confirm whether the battery/thermal can work. Thanks.
Comment 55 Joshua Covington 2008-07-26 10:07:46 UTC
Created attachment 16992 [details] dmesg from 2.6.27-0.173.rc0.git11 with applied patch2 This is the dmesg from 2.6.27-0.173.rc0.git11 after applying the second patch. Everything seems to work ok now. I recompiled the kernel with CONFIG_ACPI_DEBUG=y and CONFIG_ACPI_DEBUG_FUNC_TRACE=y. The fan, the baterry and the LID seem to work normally
Comment 56 Joshua Covington 2008-07-26 10:11:06 UTC
Created attachment 16993 [details] dmesg from 2.6.27-0.173.rc0.git11 with patch2 / SUSPEND Here is dmesg and I tried suspend to RAM and suspend to Disk. it works ok. the kernel panic at the end is from kpowersave. ignore it.
Comment 57 Joshua Covington 2008-07-26 10:16:23 UTC
Created attachment 16994 [details] dmesg from 220.127.116.11-60.fc8 with patch2 I applied your patch to the current FC8 kernel 18.104.22.168-60. It works without problems. There are no kernel oops or anything else. Battery and LID also work. Suspend to RAM/Disk works, too. But there are some issues with the display after comming from Suspend to Disk and shutting down. I think it is connected with the radeonfb. the resolution is too big and nothing can be seen. But this happens only after the Xserver is closed and it is not such a big issue.
Comment 58 Joshua Covington 2008-07-26 10:18:31 UTC
Created attachment 16995 [details] dmesg 22.214.171.124-47.fc8 with corrected DSDT dmesg 126.96.36.199-47.fc8 with corrected DSDT. It is just for comparision.
Comment 59 Joshua Covington 2008-07-26 10:19:33 UTC
Created attachment 16996 [details] dmesg from 188.8.131.52-10.fc8 with corrected DSDT dmesg from 184.108.40.206-10.fc8 with corrected DSDT. Just for comparison
Comment 60 Joshua Covington 2008-07-26 10:23:08 UTC
What consequences can there be after not registering the right object: (line #163 from dmesg-patch2-220.127.116.11-60.fc8.i686) ACPI Error (psargs-0355): [\_SB_.PHSR] Namespace lookup failure, AE_NOT_FOUND ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPC0.EC0_._REG] (Node f78175a0), AE_NOT_FOUND Fail in evaluating_REG object. It is broken BIOS. Try to upgrade it Can this "harm" the system/hardware?
Comment 61 ykzhao 2008-07-29 18:53:52 UTC
Hi, Joshua Thanks for the test. It seems that the patch in comment #54 can make your system work well. In the debug patch OS will ignore the error about evaluating the _REG object and continue to initailize the EC device. The following warning message is harmless. It only prints the error of the BIOS. >ACPI Error (psargs-0355): [\_SB_.PHSR] Namespace lookup failure, AE_NOT_FOUND >ACPI Error (psparse-0537): Method parse/execution failed >[\_SB_.PCI0.LPC0.EC0_._REG] (Node f78175a0), AE_NOT_FOUND >Fail in evaluating_REG object. It is broken BIOS. Try to upgrade it thanks.
Comment 62 ykzhao 2008-07-29 18:55:59 UTC
As the patch in comment #51 can make the system work well, IMO this bug can be marked as the resolved. Thanks.
Comment 63 Joshua Covington 2008-07-30 00:59:19 UTC
Can you, please, inform me when this is merged upstream? Thank you for the help.