Bug 43075 - (Dell Vostro 3350) ACPI reports critical temperature even though the sensors data says otherwise
Summary: (Dell Vostro 3350) ACPI reports critical temperature even though the sensors ...
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-09 08:45 UTC by Way-Chuang Ang
Modified: 2013-10-14 10:42 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.0 till 3.10-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump of dell vostro 3350 (210.58 KB, text/plain)
2012-04-09 08:45 UTC, Way-Chuang Ang
Details
Output of grep . /sys/class/thermal/*/* (1010 bytes, text/plain)
2012-11-29 06:46 UTC, Way-Chuang Ang
Details

Description Way-Chuang Ang 2012-04-09 08:45:45 UTC
Created attachment 72858 [details]
acpidump of dell vostro 3350 

I have a Dell Vostro 3350 that I have been using for almost one year, but the laptop got a big problem of auto-shutdown from time to time. The problem was ACPI reports critical temperature even though the output of sensors command says otherwise. I had tried Fedora 15, Fedora 16, CentOS 6.2, Ubuntu 11.10 and Ubuntu 12.04-beta2 and many version of kernels, but they all exhibited the same problem. If I use dual display, the auto-shutdown can be triggered within 5 minutes. If I don't use dual display, the auto-shutdown may take some time before it is triggered. I had enabled ACPI debug in the kernel, but the output is too verbose and I don't understand ACPI well enough to know what's the cause of the problem. Therefore, I would like to ask for your help on this matter. The following is the syslog output from the kernel:

Apr  7 02:12:26 wcang-laptop kernel: [  364.265571]      osl-1128 [430818512] [4294967291] os_wait_semaphore     : Waiting for semaphore[ffff88011b004300|1|65535]
Apr  7 02:12:26 wcang-laptop kernel: [  364.265601]      osl-1147 [430818512] [4294967291] os_wait_semaphore     : Acquired semaphore[ffff88011b004300|1|65535] utmutex-0269 [430818512] [4294967291] ut_acquire_mutex      : Thread 430818512 acquired Mutex [ACPI_MTX_Interpreter]
Apr  7 02:12:26 wcang-laptop kernel: [  364.265655]  exutils-0099 [430818512] [4294967291] ex_enter_interpreter  : ----Exit-
Apr  7 02:12:26 wcang-laptop kernel: [  364.265680] utdelete-0678 [430818512] [4294967291] ut_remove_reference   : ----Entry ffff880030afa480
Apr  7 02:12:26 wcang-laptop kernel: [  364.265706] utdelete-0698 [430818512] [4294967291] ut_remove_reference   : Obj ffff880030afa480 Current Refs=1 [To Be Decremented]
Apr  7 02:12:26 wcang-laptop kernel: [  364.265737] utdelete-0486 [430818512] [4294967292] ut_update_object_refer: ----Entry ffff880030afa480
Apr  7 02:12:26 wcang-laptop kernel: [  364.265764] utdelete-0412 [430818512] [4294967292] ut_update_ref_count   : Obj ffff880030afa480 Refs=0, [Decremented]
Apr  7 02:12:26 wcang-laptop kernel: [  364.265795] utdelete-0080 [430818512] [4294967293] ut_delete_internal_obj: ----Entry ffff880030afa480
Apr  7 02:12:26 wcang-laptop kernel: [  364.265821] utdelete-0320 [430818512] [4294967293] ut_delete_internal_obj: Deleting Object ffff880030afa480 [Integer]
Apr  7 02:12:26 wcang-laptop kernel: [  364.265851] utobject-0420 [430818512] [4294967294] ut_delete_object_desc : ----Entry ffff880030afa480
Apr  7 02:12:26 wcang-laptop kernel: [  364.265877] utobject-0432 [430818512] [4294967294] ut_delete_object_desc : ----Exit-
Apr  7 02:12:26 wcang-laptop kernel: [  364.265901] utdelete-0323 [430818512] [4294967293] ut_delete_internal_obj: ----Exit-
Apr  7 02:12:26 wcang-laptop kernel: [  364.265926] utdelete-0612 [430818512] [4294967292] ut_update_object_refer: ----Exit- AE_OK
Apr  7 02:12:26 wcang-laptop kernel: [  364.265947] utdelete-0706 [430818512] [4294967291] ut_remove_reference   : ----Exit-
Apr  7 02:12:26 wcang-laptop kernel: [  364.265963]  exutils-0151 [430818512] [4294967291] ex_exit_interpreter   : ----Entry
Apr  7 02:12:26 wcang-laptop kernel: [  364.265980]  utmutex-0300 [430818512] [4294967291] ut_release_mutex      : Thread 430818512 releasing Mutex [ACPI_MTX_Interpreter]
Apr  7 02:12:26 wcang-laptop kernel: [  364.266001]      osl-1167 [430818512] [4294967291] os_signal_semaphore   : Signaling semaphore[ffff88011b004300|1]
Apr  7 02:12:26 wcang-laptop kernel: [  364.266021]  exutils-0159 [430818512] [4294967291] ex_exit_interpreter   : ----Exit-
Apr  7 02:12:26 wcang-laptop kernel: [  364.266038] nsxfeval-0358 [430818512] [4294967290] evaluate_object       : ----Exit- AE_OK
Apr  7 02:12:26 wcang-laptop kernel: [  364.266055]    utils-0286 [430818512] [4294967289] evaluate_integer      : Return value [3732]
Apr  7 02:12:26 wcang-laptop kernel: [  364.266073]  thermal-0212 [430818512] [4294967289] thermal_get_temperatur: Temperature is 3732 dK
Apr  7 02:12:26 wcang-laptop kernel: [  364.266115] Critical temperature reached (100 C), shutting down.
Apr  7 02:12:26 wcang-laptop kernel: [  364.266128] Pid: 4, comm: kworker/0:0 Tainted: G         C   3.3.1+ #4
Apr  7 02:12:26 wcang-laptop kernel: [  364.266135] Call Trace:
Apr  7 02:12:26 wcang-laptop kernel: [  364.266158]  [<ffffffff814ce287>] thermal_zone_device_update+0x1e7/0x2f0
Apr  7 02:12:26 wcang-laptop kernel: [  364.266179]  [<ffffffff810687df>] ? queue_delayed_work_on+0x11f/0x130
Apr  7 02:12:26 wcang-laptop kernel: [  364.266195]  [<ffffffff8136d56c>] ? acpi_os_wait_events_complete+0x23/0x23
Apr  7 02:12:26 wcang-laptop kernel: [  364.266210]  [<ffffffff813ae5ad>] acpi_thermal_notify+0x4c/0x124
Apr  7 02:12:26 wcang-laptop kernel: [  364.266224]  [<ffffffff81370b63>] acpi_device_notify+0x19/0x1b
Apr  7 02:12:26 wcang-laptop kernel: [  364.266240]  [<ffffffff813825c8>] acpi_ev_notify_dispatch+0x6c/0x83
Apr  7 02:12:26 wcang-laptop kernel: [  364.266252]  [<ffffffff8136d593>] acpi_os_execute_deferred+0x27/0x34
Apr  7 02:12:26 wcang-laptop kernel: [  364.266272]  [<ffffffff8106abaa>] process_one_work+0x11a/0x480
Apr  7 02:12:26 wcang-laptop kernel: [  364.266286]  [<ffffffff8106b864>] worker_thread+0x164/0x370
Apr  7 02:12:26 wcang-laptop kernel: [  364.266299]  [<ffffffff8106b700>] ? manage_workers.isra.29+0x230/0x230
Apr  7 02:12:26 wcang-laptop kernel: [  364.266312]  [<ffffffff81070043>] kthread+0x93/0xa0
Apr  7 02:12:26 wcang-laptop kernel: [  364.266329]  [<ffffffff816391a4>] kernel_thread_helper+0x4/0x10
Apr  7 02:12:26 wcang-laptop kernel: [  364.266343]  [<ffffffff8106ffb0>] ? kthread_freezable_should_stop+0x70/0x70
Apr  7 02:12:26 wcang-laptop kernel: [  364.266358]  [<ffffffff816391a0>] ? gs_change+0x13/0x13
Apr  7 02:12:26 wcang-laptop kernel: [  364.266380]  utstate-0339 [430818512] [4294967290] ut_delete_generic_stat: ----Entry
Apr  7 02:12:26 wcang-laptop kernel: [  364.266396]  utstate-0346 [430818512] [4294967290] ut_delete_generic_stat: ----Exit-
Apr  7 02:12:26 wcang-laptop kernel: [  364.266439]  hwvalid-0132 [430818512] [4294967290] hw_validate_io_request: ----Entry

Do note that I don't use any proprietary kernel module and I added dump_stack call to trace the event.

In addition to that, if I comment out orderly_poweroff (drivers/thermal/thermal_sys.c:1066), the system will not shutdown when this bug occurs. However, it will later cause an abrupt poweroff.

I'll appreciate if someone can give a pointer or two.
Comment 1 Way-Chuang Ang 2012-04-20 11:13:44 UTC
When the critical temperature is triggered. The output of sensors will be in the following format:

$ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +100°C  (crit = +99.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +62.0°C  (high = +86.0°C, crit = +100.0°C)
Core 0:         +60.0°C  (high = +86.0°C, crit = +100.0°C)
Core 1:         +55.0°C  (high = +86.0°C, crit = +100.0°C)


The acpitz-virtual-0 will be 100 degree while the rest is obviously lower than that.
Comment 2 Zhang Rui 2012-11-28 13:21:14 UTC
please attach the output of "grep . /sys/class/thermal/*/*"
Comment 3 Way-Chuang Ang 2012-11-29 06:46:29 UTC
Created attachment 87691 [details]
Output of grep . /sys/class/thermal/*/*

Hi Zhang Rui,
    Thanks for taking the time to check this bug. Attached is the output of grep . /sys/class/thermal/*/* that you requested.
Comment 4 Adrian 2013-03-03 21:56:00 UTC
Hello Way-Chuang,

Did you solve the problem? I have same laptop (Vostro 3350) and I have same problem.
Comment 5 Way-Chuang Ang 2013-03-04 12:18:44 UTC
Hi Adrian,
   Unfortunately, I know not any solution for this problem. My current workaround is to pass thermal.off=1 to the kernel during bootup. It alleviates most of the problems, but using dual display still results in ramp up of fan and subsequent power off. I solicit for advice from Dave Airlie and Matthew Garrett, there is not much that can be done with this Dell Vostro 3350 (https://plus.google.com/109386511629819124958). I had even contacted Dell technical support for this issue to no avail. If you have any luck in solving this issue, do update me. 

   Thanks.
Comment 6 Adrian 2013-03-19 21:06:21 UTC
This morning I've got new updates in my Ubuntu (12.04.2) that includes new kernel version for my distribution (3.2.0-39-generic).

Now the cpu temperature is less than 70º (cooler that before).

I will test dual display and tell you if there are some good news.
Comment 7 Zhang Rui 2013-05-15 07:00:43 UTC
is there any update?
does the problem still exist in latest upstream kernel, say, 3.10-rc1?
Comment 8 Way-Chuang Ang 2013-05-15 14:04:14 UTC
The problem is still reproducible on 3.10-rc1.
Comment 9 Way-Chuang Ang 2013-05-15 14:11:30 UTC
Hi Zhang Rui,

   Can you give me a pointer on how to debug this issue? I have some familiarity with C, but not AML/ACPI nor hardware specific stuff. Thanks.
Comment 10 Way-Chuang Ang 2013-05-27 04:39:07 UTC
Hi Zhang Rui,

This doesn't seems like ACPI power management issue. There is another issue on the system whereby plugging in HDMI external display causes touchpad to be disabled and the EDID returned from HDMI is corrupted. Reading the HDMI i2c returns junk. Perhaps, this issue is related to memory corruption caused by DSDT. I don't think there is much that can be done to resolve this issue unless someone takes a look at the hardware and the DSDT itself. Can we close this as unresolvable? Thanks. Sorry for the trouble.
Comment 11 Zhang Rui 2013-10-14 10:42:53 UTC
(In reply to Way-Chuang Ang from comment #10)
> Hi Zhang Rui,
> 
> This doesn't seems like ACPI power management issue. There is another issue
> on the system whereby plugging in HDMI external display causes touchpad to
> be disabled and the EDID returned from HDMI is corrupted. Reading the HDMI
> i2c returns junk. Perhaps, this issue is related to memory corruption caused
> by DSDT.

if this is true, boot option "acpi=copy_dsdt" should help,

> I don't think there is much that can be done to resolve this issue
> unless someone takes a look at the hardware and the DSDT itself. Can we
> close this as unresolvable? Thanks. Sorry for the trouble.

Okay, bug closed.

Note You need to log in before you can comment on or make changes to this bug.