Bug 10237 (bios-update)
Description
Joshua Covington
2008-03-13 08:38:12 UTC
Created attachment 15240 [details]
/proc/cpuinfo
Created attachment 15241 [details]
/proc/iomem
Created attachment 15242 [details]
/proc/ioports
Created attachment 15243 [details]
lspci -vvv
Created attachment 15244 [details]
/proc/modules
Created attachment 15245 [details]
/proc/scsi/scsi
Created attachment 15246 [details]
dmesg
dmesg after startinf with noapic
Created attachment 15247 [details]
3303 bios
3303 bios. i have no problems with it. but the tl-60 processor cannot work with this bios because of temperature problems. therefore i had to upgrade to bios 3315.
Created attachment 15248 [details]
bios 3315
this is the bios 3315. i cannot start the kernel without the noapic argument. windows has no problems with it.
Created attachment 15249 [details]
Bios change log
this is the bios change log. after v. 3303 the bios cpu code has been updated from 2.6.9 to 2.08.03.
There is some really strange behaviour. After I've used windows for about an hour (the mashine is already hot enough) then even the noapic argument cannot start the kernel. when i use it, it stops after line 316 from the dmesg log (line 316: pci_hotplug: PCI Hot Plug PCI Core version: 0.5). the only way to start the kernel is to wait untill the notebook is cooled down and then use the noapic argument. Any ideas? i tried to dump the acpi with acpidump. however i got this error: acpi_os_map_memory: cannot open /dev/mem. the compiter was started with noapic, as this is the only way to boot the kernel. Created attachment 15279 [details]
photo of the error
actually i don't get kernel panic all the time. it is randomly. and therefore i attached this photo. the kernel was started with no argument
Created attachment 15282 [details]
kernel panic error
ok, now i have a photo of the kernel panic error. it occurrs when the coomputer is cold (started after cooled off)
Created attachment 15294 [details]
mk36 error
this is a snapshot from the error that i got when i start the mashine with turion mk36 processor.
Created attachment 15295 [details]
mk36 error 2
another photo of the error with mk36 processor.there are another info on this one.
HI, Joshua Will you please try the latest kernel (2.6.25-rc6) and see whether the problem still exists? Please boot the system with "acpi=off" and attach the output of acpidump. Thanks. hi i tried 2.6.25-rc6-git5, but it took it from fedora. here you can find more info about it http://koji.fedoraproject.org/koji/buildinfo?buildID=43458. the version that i tried is kernel-2.6.25-0.136.rc6.git5.fc9.i686. I know that it is not the vanilla kernel but i decided to save the compile time. Actuall this kernel loaded with acpi=off. 2 days ago i tried the git3 version and i got a kernel panic. anyway here is the attached info from the acpidump. Created attachment 15383 [details]
acpi info from acpidump on kernel-2.6.25-0.136.rc6.git5.fc9.i686
acpi info from acpidump on kernel-2.6.25-0.136.rc6.git5.fc9.i686
any ideas? I tried kernel-2.6.25-0.212.rc8.git6.fc9 and it stops with the messages ACPI0000.7 registered as cooling_device0 ACPI0000.7 registered as cooling_device1 It is still a kernel panic but now there is a definite proof that this is a acpi problem. Hi, Joshua Sorry for the timely response. It seems that there exists the following problem when the system is booted with the option of "noapic". >ACPI Exception (evregion-0420): AE_TIME, Returned by Handler for [EmbeddedControl] [20070126] > ACPI Exception (dswexec-0462): AE_TIME, While resolving operands for [OpcodeName unavailable] [20070126] > ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.THRM._TMP] (Node f781a8b8), AE_TIME Maybe the problem is related with the BIOS. Will you please confirm whether there exists the problem on BIOS 3303 and attach the output of dmesg, acpidump? Thanks. i cannot confirm this because of the following: 1. The notebook (acer aspire 5051awxmi s/n: LX.AG305.026) was originally equipted with Turion 64 mk36 processor (single core). According to the specs it can work with any processor for socket S1 and TDP < 31W. 2. I changed the cpu and now I have Turion x2 64 tl-60 (doble core). the only way to make the system boot with this processor is to use bios version 3315. so bios 3303 cannot boot my mashine. 3. according to the change log of the bios in version 3309 a fix was applied to fix a thermal problem. and with kernel kernel-2.6.25-0.228.rc9.fc9.i686 the mashine stops with the following messages: ACPI0000.07 registered as cooling_device0 ACPI0000.07 registered as cooling_device1. It is definitely a bios problem and i think it is connected with the thermal table (see bios change log for version 3307). I'll change the cpu, reflash the bios and report back. actually i cannot try it. i have one more mashine from the same model but there is an issue with the fedora 9 beta (the display is black - driver problem). But i can say that bios 3303 was the only bios version that worked with linux. untill changing the cpu i was using it and it had no problems with it. so bios 3303 is working properly. Created attachment 15791 [details]
Kernel Panic with kernel-2.6.25
This is the kernel panic trace that i got with the kernel-2.6.25-1.fc9.i686. it show that the problem is somewhere in the acpi_thermal_***. I hope this can help.
Created attachment 15792 [details]
Kernel trace with kernel-2.6.25-1.fc9.i686
more detailed image of the stack trace from the error.
Created attachment 15825 [details]
acpidump from bios 3303
This is the acpidump output from bios 3303. everything works fine with this bios. I tried with the fedora RC 9 with kernel 2.6.25-0.234.rc9.git1.fc9.i686
Created attachment 15826 [details]
dmesg from kernel 2.6.25-0.234.rc9.git1.fc9.i686
this dmesg is from kernel 2.6.25-0.234.rc9.git1.fc9.i686 started with noapci nolapic apci=off acpi=off debug. the bios verion is 3315.
the only way to start my mashine is to use acpi=off. then it boots normaly but almost nothing works (no wireless, the cooler is aways on, s.n.) so it is definitely a acpi error. i think it is something connected with the TZ_.THRM._TMP part of the bios but i cannot prove it. the kernel just stops when trying to inteprete this part of the bios. Any ideas how this should be solved? Created attachment 15981 [details] dmesg-30.11.2007-mk36-bios3303-kernel2.6.23.1-42.fc8 dmesg from 30.11.2007 with mk36 processor, bios 3303 and kernel2.6.23.1-42.fc8. this can be linked with the acpidump from bios 3303 (attachment # 15825 [details]). you can see that the acpi system is initialized properly and everything is fine. however the new tl-60 processor cannot work with bios 3303 and needs 3315. the linux kernel cannot initialize the acpi system on the latest bios and stops (presumably) on the ACPI Thermal Zone. However i cannot prove this. Created attachment 16043 [details]
try the custom DSDT
Will you please try the custom DSDT and see whether the problem still exists?
Hi, Joshua From the info in comment #26 and comment # 19there exists the following error on the bios 3315. > Store (\_SB.PHSR (0x0D, 0x00), TJ85) But the PHSR doesn't exist, which cause that the mutex \_SB.PCI0.LPC0.EC0.MUT1 can't be released. Will you please try the custom DSDT in comment #30 and see whether the problem still exists? If exists, please attach the failure picture. It will be great if you can boot the system with the option of "noapic" and attach the output of dmesg. Thanks. Created attachment 16048 [details]
dmesg with the custom dsdt
this is the dmesg with the custom dsdt. i used the kernel-tuxonice-2.6.24.5-85_1.cubbi_tuxonice.fc8.i686 (from atrpms.net) which is based on the fc8 kernel. it has the custom dsdt option enabled. first i compiled the dsdt.dsl file on windows and merged the dsdt.aml in the initrd. so everything is working now.
when you look in the dmesg there are several messages that the lapic is not functional. maybe this is connected with something else from the bios, not just the dsdt.
Created attachment 16049 [details]
/var/log/messages with the custom dsdt
this are the kernel messages with the custom dsdt (from kernel-tuxonice-2.6.24.5-85_1.cubbi_tuxonice.fc8.i686). i got a kernel panic with the ath5k driver. look at the end of the file and there is the trace from it. can this panic be connected to the custom dsdt?
is this a pure ath5k error or something connected with the bios?
Created attachment 16080 [details]
try the custom DSDT
From the log it seems that the bug is caused by the broken BIOS.
There doesn't exist the \_SB.PHSR object, which causes that the mutex \_SB.PCI0.LPC0.EC0.MUT1 can't be released.
Sorry for my fault: The following is modified by mistake.
>OperationRegion (BAR5, SystemMemory, GBAA (), 0x0200)
Will you please try the new custom DSDT and see whether the system is still panic? Please attach the output of dmesg.
Thanks.
Created attachment 16113 [details]
my custom DSDT
this is my version of the DSDT table.
i compiled several times both of your suggestions and they always resulted in identical .aml files. this means that the iasl compiler doesn't care if there is GBAA or just GBAA().
i tried to recreate the missing object. and it is working now. according to the bios changelog TJ85 should be connected with the thermal table for the tk38 processor. and \_SB.PCI0.LPC0.PHSR looks like some kind of a message bus for the system. thefore i added it.
up to now the system is working ok. i tried both version:
1. with PHSR
2. without PHSR object
you'll find the dmesg files attached below. As I said in both cases the mashine is working fine so i cannot say if the PHSR object is really needed. But i added it anyway and I'm using the system with it.
Created attachment 16114 [details]
dmesg from DSDT without PHSR object
this is the dmesg when the system was booted with DSDT table without the PHSR object. the inserted DSDT.aml file is the same as the one compiled from your dsl file.
Created attachment 16115 [details]
dmesg from DSDT with PHSR object
this is the dmesg when the system was booted with DSDT with the PHSR object added. the DSDT table was compiled from my custom version of the DSDT file. everything looks like the version without the PHSR object. so i don't know if it is really needed for the tl60 processor. I'm using it anyway.
now about the kernel panic from post #33: I worked in the last 4 days with both dsdt tables (with and without the PHSR object) and the madwifi package. it never resulted in any kernel panic. therefore i think the panic from post #33 is from the ath5k driver, not from the bios. but i cannot prove this. Hi, Joshua Thanks for the test. It seems that the bug is caused by the broken bios. There doesn't exist the \_SB.PHSR object, which causes that the mutex \_SB.PCI0.LPC0.EC0.MUT1 can't be released. So it will hange when the _TMP object is called to get the thermal temperature in the boot phase. After removing the \_SB.PHSR object, the system can boot well. It is appropriate to fix this problem by upgrading bios. IMO the bug can be rejected. yes, it is working now. I replaced \_SB.PHSR with \_SB.PCI0.LPC0.PHSR and it is working without problems for the past 3 days. I'm not sure which of the following is better: 1. removing the \_SB.PHSR object 2. replacing it with \_SB.PCI0.LPC0.PHSR. TJ85 is connected with the athlon tk38 processor but it is for the aspire 4720 series. the aspire 5050 series is sold with the turion mk36. i think that acer uses the same bios in both series. Anyway: removing or replacing the \_SB.PHSR object? Which will cause less harm to the system? What do you think? Hi, Joshua It seems that your system can work well when the \_SB.PHSR is replaced by the \_SB.PCI0.LPC0.PHSR. Please continue to use it. Maybe it is more better to replace it with \_SB.PCI0.LPC0.PHSR. Of course the bug had better be fixed by upgrading bios. This is the latest available bios. I'll try to contact acer about this but i don't think they'll do anything. Anyway, thank you for the help! I decided to reopen the bug based on comments from the fedora-devel-list available here: https://www.redhat.com/archives/fedora-devel-list/2008-June/msg01349.html I know that my fedora works again with the corrected dsdt table but maybe there should be code in the acpi that should circumvent/reject/recorrect the problems of the original table. if windows (vista and xp)work, why shouldn't linux do it either. Maybe there should be something in acpi_thermal_init (according to the photo of the kernel-panic-trace) that doesn't lock the kernel but just ignores the missing/wrong implenetation in the dsdt. By the way the broken line of the dsdt table (#3907 if I remember correctly) deals with object TJ85 which is connected with the thermal control. and the acpi_thermal_init is the last funktion from the kernel that gets executed. may the bug is in this funktion. What do you think? Hi, Joshua Understand what you said. And our target is that Linux is also expected to work if windows can work. Will you please confirm whether windows can work well on your laptop? Thanks. (In reply to comment #44) > Hi, Joshua > Understand what you said. And our target is that Linux is also expected to > work if windows can work. > > Will you please confirm whether windows can work well on your laptop? > > Thanks. > Yes, Windows XP SP3 and Vista, Vista SP1 work without any issues on this machine. I've never experienced any problems regarding windows. Even with bios versions 3303 and with 3309 everything was fine. The same for 3315 - absolutely no prolems. --joshua Can someone, please, take care of this? It's more than 3 weeks and there's not a single answer from anybody. only reassignment to someone else. Please. I'm working on a patch that will release mutexes on method error exist. Please help to test when the patch is available. Hi, Joshua Woulde Created attachment 16960 [details]
Release all metexes acquired by method on error exit
Hi, Joshua
Would you please help to test this patch?
On method error exit, it releases automatically all mutexes acquired by this method.
Hi, Joshua Will you please confirm whether the windows can work well? Please check whether the status of the battery can be changed correctly after AC adapter is plugged/unplugged and whether the LID can work. Thanks. (In reply to comment #50) > Hi, Joshua > Will you please confirm whether the windows can work well? > Please check whether the status of the battery can be changed correctly > after AC adapter is plugged/unplugged and whether the LID can work. > Thanks. > > All were tested and work without problems. everything is ok. Now I have to test the patch from Lin Ming and see what happens. (But first I have to recompile the kernel :() Created attachment 16972 [details] dmesg with debug acpi=off / the patch was applied Here is the dmesg from the kernel-2.6.27-0.173.rc0.git11.fc10 taken from here: http://koji.fedoraproject.org/koji/buildinfo?buildID=57343. take a look at line #288: ACPI Exception (utmutex-0263): AE_BAD_PARAMETER, Thread F7850000 could not acquire Mutex [1] [20080609] I recompiled it with your patch. It was started with "debug acpi=off" because I got kernel panic. Hope this can help. Created attachment 16973 [details]
kernel panic with applied patch
This is a snapshot from the kernel panic that I got after applying the patch.
Created attachment 16974 [details]
try the debug patch
Will you please try the debug patch to see whether the system can be booted normally?
In the debug patch OS will ignore the error about evaluating the _REG object and continue to initailize the EC device.
Of course please confirm whether the battery/thermal can work.
Thanks.
Created attachment 16992 [details]
dmesg from 2.6.27-0.173.rc0.git11 with applied patch2
This is the dmesg from 2.6.27-0.173.rc0.git11 after applying the second patch. Everything seems to work ok now. I recompiled the kernel with CONFIG_ACPI_DEBUG=y and CONFIG_ACPI_DEBUG_FUNC_TRACE=y. The fan, the baterry and the LID seem to work normally
Created attachment 16993 [details]
dmesg from 2.6.27-0.173.rc0.git11 with patch2 / SUSPEND
Here is dmesg and I tried suspend to RAM and suspend to Disk. it works ok. the kernel panic at the end is from kpowersave. ignore it.
Created attachment 16994 [details]
dmesg from 2.6.25.11-60.fc8 with patch2
I applied your patch to the current FC8 kernel 2.6.25.11-60. It works without problems. There are no kernel oops or anything else. Battery and LID also work. Suspend to RAM/Disk works, too. But there are some issues with the display after comming from Suspend to Disk and shutting down. I think it is connected with the radeonfb. the resolution is too big and nothing can be seen. But this happens only after the Xserver is closed and it is not such a big issue.
Created attachment 16995 [details]
dmesg 2.6.25.10-47.fc8 with corrected DSDT
dmesg 2.6.25.10-47.fc8 with corrected DSDT. It is just for comparision.
Created attachment 16996 [details]
dmesg from 2.6.25.4-10.fc8 with corrected DSDT
dmesg from 2.6.25.4-10.fc8 with corrected DSDT. Just for comparison
What consequences can there be after not registering the right object: (line #163 from dmesg-patch2-2.6.25.11-60.fc8.i686) ACPI Error (psargs-0355): [\_SB_.PHSR] Namespace lookup failure, AE_NOT_FOUND ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPC0.EC0_._REG] (Node f78175a0), AE_NOT_FOUND Fail in evaluating_REG object. It is broken BIOS. Try to upgrade it Can this "harm" the system/hardware? Hi, Joshua Thanks for the test. It seems that the patch in comment #54 can make your system work well. In the debug patch OS will ignore the error about evaluating the _REG object and continue to initailize the EC device. The following warning message is harmless. It only prints the error of the BIOS. >ACPI Error (psargs-0355): [\_SB_.PHSR] Namespace lookup failure, AE_NOT_FOUND >ACPI Error (psparse-0537): Method parse/execution failed >[\_SB_.PCI0.LPC0.EC0_._REG] (Node f78175a0), AE_NOT_FOUND >Fail in evaluating_REG object. It is broken BIOS. Try to upgrade it thanks. As the patch in comment #51 can make the system work well, IMO this bug can be marked as the resolved. Thanks. Can you, please, inform me when this is merged upstream? Thank you for the help. patch in comment #54 applied to acpi-test *** This bug has been marked as a duplicate of bug 8953 *** |