Created attachment 72858 [details] acpidump of dell vostro 3350 I have a Dell Vostro 3350 that I have been using for almost one year, but the laptop got a big problem of auto-shutdown from time to time. The problem was ACPI reports critical temperature even though the output of sensors command says otherwise. I had tried Fedora 15, Fedora 16, CentOS 6.2, Ubuntu 11.10 and Ubuntu 12.04-beta2 and many version of kernels, but they all exhibited the same problem. If I use dual display, the auto-shutdown can be triggered within 5 minutes. If I don't use dual display, the auto-shutdown may take some time before it is triggered. I had enabled ACPI debug in the kernel, but the output is too verbose and I don't understand ACPI well enough to know what's the cause of the problem. Therefore, I would like to ask for your help on this matter. The following is the syslog output from the kernel: Apr 7 02:12:26 wcang-laptop kernel: [ 364.265571] osl-1128 [430818512] [4294967291] os_wait_semaphore : Waiting for semaphore[ffff88011b004300|1|65535] Apr 7 02:12:26 wcang-laptop kernel: [ 364.265601] osl-1147 [430818512] [4294967291] os_wait_semaphore : Acquired semaphore[ffff88011b004300|1|65535] utmutex-0269 [430818512] [4294967291] ut_acquire_mutex : Thread 430818512 acquired Mutex [ACPI_MTX_Interpreter] Apr 7 02:12:26 wcang-laptop kernel: [ 364.265655] exutils-0099 [430818512] [4294967291] ex_enter_interpreter : ----Exit- Apr 7 02:12:26 wcang-laptop kernel: [ 364.265680] utdelete-0678 [430818512] [4294967291] ut_remove_reference : ----Entry ffff880030afa480 Apr 7 02:12:26 wcang-laptop kernel: [ 364.265706] utdelete-0698 [430818512] [4294967291] ut_remove_reference : Obj ffff880030afa480 Current Refs=1 [To Be Decremented] Apr 7 02:12:26 wcang-laptop kernel: [ 364.265737] utdelete-0486 [430818512] [4294967292] ut_update_object_refer: ----Entry ffff880030afa480 Apr 7 02:12:26 wcang-laptop kernel: [ 364.265764] utdelete-0412 [430818512] [4294967292] ut_update_ref_count : Obj ffff880030afa480 Refs=0, [Decremented] Apr 7 02:12:26 wcang-laptop kernel: [ 364.265795] utdelete-0080 [430818512] [4294967293] ut_delete_internal_obj: ----Entry ffff880030afa480 Apr 7 02:12:26 wcang-laptop kernel: [ 364.265821] utdelete-0320 [430818512] [4294967293] ut_delete_internal_obj: Deleting Object ffff880030afa480 [Integer] Apr 7 02:12:26 wcang-laptop kernel: [ 364.265851] utobject-0420 [430818512] [4294967294] ut_delete_object_desc : ----Entry ffff880030afa480 Apr 7 02:12:26 wcang-laptop kernel: [ 364.265877] utobject-0432 [430818512] [4294967294] ut_delete_object_desc : ----Exit- Apr 7 02:12:26 wcang-laptop kernel: [ 364.265901] utdelete-0323 [430818512] [4294967293] ut_delete_internal_obj: ----Exit- Apr 7 02:12:26 wcang-laptop kernel: [ 364.265926] utdelete-0612 [430818512] [4294967292] ut_update_object_refer: ----Exit- AE_OK Apr 7 02:12:26 wcang-laptop kernel: [ 364.265947] utdelete-0706 [430818512] [4294967291] ut_remove_reference : ----Exit- Apr 7 02:12:26 wcang-laptop kernel: [ 364.265963] exutils-0151 [430818512] [4294967291] ex_exit_interpreter : ----Entry Apr 7 02:12:26 wcang-laptop kernel: [ 364.265980] utmutex-0300 [430818512] [4294967291] ut_release_mutex : Thread 430818512 releasing Mutex [ACPI_MTX_Interpreter] Apr 7 02:12:26 wcang-laptop kernel: [ 364.266001] osl-1167 [430818512] [4294967291] os_signal_semaphore : Signaling semaphore[ffff88011b004300|1] Apr 7 02:12:26 wcang-laptop kernel: [ 364.266021] exutils-0159 [430818512] [4294967291] ex_exit_interpreter : ----Exit- Apr 7 02:12:26 wcang-laptop kernel: [ 364.266038] nsxfeval-0358 [430818512] [4294967290] evaluate_object : ----Exit- AE_OK Apr 7 02:12:26 wcang-laptop kernel: [ 364.266055] utils-0286 [430818512] [4294967289] evaluate_integer : Return value [3732] Apr 7 02:12:26 wcang-laptop kernel: [ 364.266073] thermal-0212 [430818512] [4294967289] thermal_get_temperatur: Temperature is 3732 dK Apr 7 02:12:26 wcang-laptop kernel: [ 364.266115] Critical temperature reached (100 C), shutting down. Apr 7 02:12:26 wcang-laptop kernel: [ 364.266128] Pid: 4, comm: kworker/0:0 Tainted: G C 3.3.1+ #4 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266135] Call Trace: Apr 7 02:12:26 wcang-laptop kernel: [ 364.266158] [<ffffffff814ce287>] thermal_zone_device_update+0x1e7/0x2f0 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266179] [<ffffffff810687df>] ? queue_delayed_work_on+0x11f/0x130 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266195] [<ffffffff8136d56c>] ? acpi_os_wait_events_complete+0x23/0x23 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266210] [<ffffffff813ae5ad>] acpi_thermal_notify+0x4c/0x124 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266224] [<ffffffff81370b63>] acpi_device_notify+0x19/0x1b Apr 7 02:12:26 wcang-laptop kernel: [ 364.266240] [<ffffffff813825c8>] acpi_ev_notify_dispatch+0x6c/0x83 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266252] [<ffffffff8136d593>] acpi_os_execute_deferred+0x27/0x34 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266272] [<ffffffff8106abaa>] process_one_work+0x11a/0x480 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266286] [<ffffffff8106b864>] worker_thread+0x164/0x370 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266299] [<ffffffff8106b700>] ? manage_workers.isra.29+0x230/0x230 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266312] [<ffffffff81070043>] kthread+0x93/0xa0 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266329] [<ffffffff816391a4>] kernel_thread_helper+0x4/0x10 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266343] [<ffffffff8106ffb0>] ? kthread_freezable_should_stop+0x70/0x70 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266358] [<ffffffff816391a0>] ? gs_change+0x13/0x13 Apr 7 02:12:26 wcang-laptop kernel: [ 364.266380] utstate-0339 [430818512] [4294967290] ut_delete_generic_stat: ----Entry Apr 7 02:12:26 wcang-laptop kernel: [ 364.266396] utstate-0346 [430818512] [4294967290] ut_delete_generic_stat: ----Exit- Apr 7 02:12:26 wcang-laptop kernel: [ 364.266439] hwvalid-0132 [430818512] [4294967290] hw_validate_io_request: ----Entry Do note that I don't use any proprietary kernel module and I added dump_stack call to trace the event. In addition to that, if I comment out orderly_poweroff (drivers/thermal/thermal_sys.c:1066), the system will not shutdown when this bug occurs. However, it will later cause an abrupt poweroff. I'll appreciate if someone can give a pointer or two.
When the critical temperature is triggered. The output of sensors will be in the following format: $ sensors acpitz-virtual-0 Adapter: Virtual device temp1: +100°C (crit = +99.0°C) coretemp-isa-0000 Adapter: ISA adapter Physical id 0: +62.0°C (high = +86.0°C, crit = +100.0°C) Core 0: +60.0°C (high = +86.0°C, crit = +100.0°C) Core 1: +55.0°C (high = +86.0°C, crit = +100.0°C) The acpitz-virtual-0 will be 100 degree while the rest is obviously lower than that.
please attach the output of "grep . /sys/class/thermal/*/*"
Created attachment 87691 [details] Output of grep . /sys/class/thermal/*/* Hi Zhang Rui, Thanks for taking the time to check this bug. Attached is the output of grep . /sys/class/thermal/*/* that you requested.
Hello Way-Chuang, Did you solve the problem? I have same laptop (Vostro 3350) and I have same problem.
Hi Adrian, Unfortunately, I know not any solution for this problem. My current workaround is to pass thermal.off=1 to the kernel during bootup. It alleviates most of the problems, but using dual display still results in ramp up of fan and subsequent power off. I solicit for advice from Dave Airlie and Matthew Garrett, there is not much that can be done with this Dell Vostro 3350 (https://plus.google.com/109386511629819124958). I had even contacted Dell technical support for this issue to no avail. If you have any luck in solving this issue, do update me. Thanks.
This morning I've got new updates in my Ubuntu (12.04.2) that includes new kernel version for my distribution (3.2.0-39-generic). Now the cpu temperature is less than 70º (cooler that before). I will test dual display and tell you if there are some good news.
is there any update? does the problem still exist in latest upstream kernel, say, 3.10-rc1?
The problem is still reproducible on 3.10-rc1.
Hi Zhang Rui, Can you give me a pointer on how to debug this issue? I have some familiarity with C, but not AML/ACPI nor hardware specific stuff. Thanks.
Hi Zhang Rui, This doesn't seems like ACPI power management issue. There is another issue on the system whereby plugging in HDMI external display causes touchpad to be disabled and the EDID returned from HDMI is corrupted. Reading the HDMI i2c returns junk. Perhaps, this issue is related to memory corruption caused by DSDT. I don't think there is much that can be done to resolve this issue unless someone takes a look at the hardware and the DSDT itself. Can we close this as unresolvable? Thanks. Sorry for the trouble.
(In reply to Way-Chuang Ang from comment #10) > Hi Zhang Rui, > > This doesn't seems like ACPI power management issue. There is another issue > on the system whereby plugging in HDMI external display causes touchpad to > be disabled and the EDID returned from HDMI is corrupted. Reading the HDMI > i2c returns junk. Perhaps, this issue is related to memory corruption caused > by DSDT. if this is true, boot option "acpi=copy_dsdt" should help, > I don't think there is much that can be done to resolve this issue > unless someone takes a look at the hardware and the DSDT itself. Can we > close this as unresolvable? Thanks. Sorry for the trouble. Okay, bug closed.