Bug 31812 - Temp. sensors and fan are not working when resuming without battery - Dell-notebook
Temp. sensors and fan are not working when resuming without battery - Dell-no...
Status: CLOSED DOCUMENTED
Product: ACPI
Classification: Unclassified
Component: Power-Thermal
All Linux
: P1 normal
Assigned To: Lv Zheng
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-24 15:00 UTC by Simon Gebler
Modified: 2015-06-26 02:17 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.2.1
Tree: Mainline
Regression: No


Attachments
dmesg after boot start (54.07 KB, application/octet-stream)
2011-03-26 13:32 UTC, Alessandro Zigliani
Details
acpidum 2.6.38.1 (152.01 KB, application/octet-stream)
2011-03-26 13:33 UTC, Alessandro Zigliani
Details
dmesg after suspend (60.41 KB, application/octet-stream)
2011-03-26 13:35 UTC, Alessandro Zigliani
Details
dmesg after inserting the battery (60.41 KB, application/octet-stream)
2011-03-26 13:40 UTC, Alessandro Zigliani
Details
DSDT.hex (220.99 KB, text/x-hex)
2011-07-26 08:51 UTC, Lan Tianyu
Details
Another ACPI-Dump of an affected machine (151.05 KB, application/octet-stream)
2012-01-18 20:15 UTC, Simon Gebler
Details
debug.patch (1.16 KB, patch)
2013-08-12 05:59 UTC, Lan Tianyu
Details | Diff
Commands from comment 30 (2.71 KB, text/plain)
2014-07-21 11:11 UTC, Simon Gebler
Details
dsdt.hex (268.45 KB, text/x-hex)
2014-08-25 08:01 UTC, Lan Tianyu
Details
Output for some cases (1.59 KB, text/plain)
2014-12-15 23:14 UTC, Simon Gebler
Details
Last case output (1.58 KB, text/plain)
2014-12-15 23:15 UTC, Simon Gebler
Details
customized DSDT to see why _TMP returns 0 (269.18 KB, application/octet-stream)
2015-02-14 04:04 UTC, Zhang Rui
Details
customized DSDT to see why _TMP returns 0 (28.66 KB, application/octet-stream)
2015-02-14 04:05 UTC, Zhang Rui
Details
customized DSDT to see why _TMP returns 0 (270.13 KB, application/octet-stream)
2015-02-14 04:05 UTC, Zhang Rui
Details
Debug output as per comment #43 (5.34 KB, text/plain)
2015-03-03 12:06 UTC, Simon Gebler
Details
Dmesg output during the resume (781 bytes, text/plain)
2015-03-03 12:13 UTC, Simon Gebler
Details
Output, grep for cooling_device*/* (1.05 KB, text/plain)
2015-03-28 14:24 UTC, Simon Gebler
Details

Description Simon Gebler 2011-03-24 15:00:01 UTC
On many Dell-notebook systems, when resuming from ram without the battery, the temperature sensors and also the fan are not working, causing the system to overheat.
It's a form of bug 14667 just for the specific case when the battery is not inserted. When you insert the battery afterwards, everything starts to work properly, but that's rather a workaround and not a solution.
Comment 1 Alessandro Zigliani 2011-03-26 13:32:56 UTC
Created attachment 52042 [details]
dmesg after boot start

dmesg after booting 2.6.38.1
Comment 2 Alessandro Zigliani 2011-03-26 13:33:25 UTC
Created attachment 52052 [details]
acpidum 2.6.38.1
Comment 3 Alessandro Zigliani 2011-03-26 13:35:30 UTC
Created attachment 52062 [details]
dmesg after suspend
Comment 4 Alessandro Zigliani 2011-03-26 13:40:26 UTC
Created attachment 52072 [details]
dmesg after inserting the battery

After battery insertion everything works fine.
Comment 5 Alessandro Zigliani 2011-03-26 13:51:29 UTC
I was wondering: shouldn't something show up in dmesg when I put the battery back in? What's about the "unknown key pressed"? I didn't press any key, by the way...
Comment 6 Simon Gebler 2011-03-26 13:54:45 UTC
The same event also occurs on my Notebook.
It seems to get triggered by inserting/removing the battery
Comment 7 Lan Tianyu 2011-07-26 08:51:09 UTC
Created attachment 66682 [details]
DSDT.hex

Please override the DSDT.  
http://www.lesswatts.org/projects/acpi/overridingDSDT.php
Run "echo 1 > /sys/module/acpi/parameters/aml_debug_output"
Run the sensors when battery is on and unplugged.
Attach the output of dmesg.
Comment 8 Zhang Rui 2012-01-18 03:24:27 UTC
It's great that the kernel bugzilla is back.

Can you please verify if the problem still exists in the latest upstream kernel?
Comment 9 Simon Gebler 2012-01-18 17:26:26 UTC
It certainly still exists in debians current testing-release (3.1.x)
I'm just compiling a vanilla 3.2.1 with that custom DSDT provided
Hang on
Comment 10 Simon Gebler 2012-01-18 20:13:23 UTC
I did compile the Kernel using that attached DSDT

Using that attached DSDT doesn't give me all temperatures, only those using the module coretemp, the section
------
acpitz-virtual-0
Adapter: Virtual device
temp1:         +0.0°C  (crit = +100.0°C)
temp2:         +0.0°C  (crit = +100.0°C)
temp3:         +0.0°C  (crit = +100.0°C)
------
does not appear anymore, still the gnome sensor-applet seems to detect it, but gives error, when reading it (using libsensors)
Note: The above output was taken, when not overriding the dsdt and causing the mentioned bug

Besides gnome isn't offering me the option to suspend anymore.
When trying to force a suspend using s2ram --force, it returns
"s2ram_do: No such device"

Was the DSDT meant for me at all or was it meant for Alessandros ACPI-Dump?
I will attach mine, just in case
Comment 11 Simon Gebler 2012-01-18 20:15:26 UTC
Created attachment 72114 [details]
Another ACPI-Dump of an affected machine

In case that attached DSDT wasn't meant for my machine, thus causing bugs/errors and mine is needed
Comment 12 Simon Gebler 2012-01-20 10:51:15 UTC
One more thing to add:
Sensors and suspend are working without that custom DSDT.
But the bug still exists in the current 3.2.1
Comment 13 shinydoofy 2012-02-15 02:44:25 UTC
I also am still affected by this on my Dell Studio 1537.
acpitz-virtual-0 reports 0°C three times upon resuming. Plugging in the battery fixes it and both acpitz-virtual-0 and coretemp-isa-0000 report ~30 degrees Celsius.

If you need any more system data or things to test, please ping.
Comment 14 Lan Tianyu 2013-04-09 06:25:47 UTC
Sorry for later response, does this bug exist in the latest upstream kernel?
Comment 15 Simon Gebler 2013-04-10 02:32:18 UTC
I can still reproduce this bug in the latest Arch Linux kernel (v. 3.8.6).
With that persistance, I doubt it will be fixed in the latest stable vanilla kernel. Didn't try the current release candidate though.
Comment 16 Lan Tianyu 2013-07-01 03:14:17 UTC
Hi:
    Sorry for later response. I check the acpi table. All temperatures are from EC(Embedded Controller) and temp sensors normally are connected to EC.  Linux doesn't do anything with EC during system suspend and resume. So this may a Bios issue. Does this happen on the Windows?
Comment 17 Lan Tianyu 2013-07-22 01:53:34 UTC
ping...
Comment 18 Lan Tianyu 2013-07-29 02:52:09 UTC
ping...
Comment 19 shinydoofy 2013-07-29 06:29:13 UTC
Sorry for the delay. Unfortunately, I can't test this right now as I don't have a Windows installation any more. I'll try and find the Vista DVD it came with to give it a quick reinstall. As I don't have a battery anymore for my Studio 1535 and solely rely on the wall plug, it may prove difficult for me to successfully test patches and the like on a before-after basis - but oh well.
Comment 20 Simon Gebler 2013-07-29 10:00:39 UTC
Oops, sorry, didn't notice the first ping.
I'll test it later once I get to reboot into Windows, but I doubt it.
It is still strangely similar to Bug 14667, but I'm not sure, what was done over there to fix it. Might have been worked around a BIOS bug? I don't know.
Anyways, I will check Windows' behavious later once I am able to reboot
Comment 21 Simon Gebler 2013-08-01 14:32:58 UTC
Okay; finally got around to start Windows, pulled the battery, selected sleep (suspend 2 ram), let it wake up and tested if the fan still works.
In fact it did when putting some load onto the cpu and gpu after waking up.
Didn't read temperatures, but I highly doubt those wouldn't work, when the fan works afterwards (unlike on linux)

Also, I can still confirm this issue on Kernel 3.10
Comment 22 Lan Tianyu 2013-08-05 03:18:13 UTC
(In reply to Simon Gebler from comment #21)
> Okay; finally got around to start Windows, pulled the battery, selected
> sleep (suspend 2 ram), let it wake up and tested if the fan still works.
> In fact it did when putting some load onto the cpu and gpu after waking up.
> Didn't read temperatures, but I highly doubt those wouldn't work, when the
> fan works afterwards (unlike on linux)
> 

Could you check whether the temperature is correct on Winodws after resume?
At last, Linux depends on the temperature to decide the fan status. So if temperature doesn't work and it also will affect fan.

> Also, I can still confirm this issue on Kernel 3.10
Comment 23 Simon Gebler 2013-08-05 09:34:23 UTC
> Could you check whether the temperature is correct on Winodws after resume?
> At last, Linux depends on the temperature to decide the fan status. So if
> temperature doesn't work and it also will affect fan.

Yes, the temperatures are still working afterwards.
And it really is Linux, that is actively controlling the fan or some ACPI stuff, that stop working like the temperatures are?
In any case - it does not affect the notebook(s) on Windows
Comment 24 Lan Tianyu 2013-08-12 05:59:08 UTC
Created attachment 107182 [details]
debug.patch

Hi, could you try this patch?
Comment 25 shinydoofy 2013-08-17 17:22:55 UTC
(In reply to Lan Tianyu from comment #24)
> Created attachment 107182 [details]
> debug.patch
> 
> Hi, could you try this patch?
Sure!

Diffed /var/log/messages on tag v3.10 with and without said patch. With it, I get these messages that don't appear on the unmodified kernel upon resume:
ACPI Error: No handler for Region [ECRM] (ffff88013b027048) [EmbeddedControl] (20130328/evregion-161)
ACPI Error: Region EmbeddedControl (ID=3) has no handler (20130328/exfldio-305)
ACPI Error: Method parse/execution failed [\_SB_.LID0._PSW] (Node ffff88013b035028), AE_NOT_EXIST (20130328/psparse-537)
ACPI: _PSW execution failed

Hope that helps.
Comment 26 Zhang Rui 2013-10-14 10:22:30 UTC
ping Tianyu.
Comment 27 Zhang Rui 2014-06-03 02:05:06 UTC
ping Tianyu.
Comment 28 Lan Tianyu 2014-07-04 06:00:30 UTC
Sorry for later response. Could you check whether the issue still takes place in the latest upstream kernel v3.16?
Comment 29 Simon Gebler 2014-07-10 16:49:57 UTC
Current standard Arch Linux Kernel
$uname -a
Linux Mr-Radar 3.15.4-1-ARCH #1 SMP PREEMPT Mon Jul 7 07:42:54 CEST 2014 x86_64 GNU/Linux

After suspend and resume the temperatures seem to be working finally
Yet I was trying to stresstest the CPU a little and the fan still doesn't seem to turn on, even on 80°C, where it was properly running before suspend and resume. Inserting the battery made it kick back to life immediately.

Freshly compiled Linux mainline (3.16 rc4):
$ uname -a
Linux Mr-Radar 3.16.0-1-mainline #1 SMP PREEMPT Thu Jul 10 17:57:35 CEST 2014 x86_64 GNU/Linux
Temperatures work. Fan does not, jumps into overdrive when reinserting it at 80°C core temperature

So it has been fixed partially.
Temperature reporting works, fan still does not work after suspend-to-ram and waking back up without battery.
Comment 30 Lan Tianyu 2014-07-21 03:19:29 UTC
This seems a thermal bug. Could you provide the output of the following command when the bug take places?

grep . /sys/class/thermal/cooling_device*/*
grep . /sys/class/thermal/thermal_zone*/*
Comment 31 Zhang Rui 2014-07-21 03:46:12 UTC
please try the patches at
https://bugzilla.kernel.org/show_bug.cgi?id=78201#c10
https://bugzilla.kernel.org/show_bug.cgi?id=78201#c20
and see if the problem still exist.
Comment 32 Simon Gebler 2014-07-21 11:11:39 UTC
Created attachment 143631 [details]
Commands from comment 30
Comment 33 Simon Gebler 2014-07-21 11:19:20 UTC
Looking at the output, I checked my "sensors" command again.
I think I missed that it doesn't seem to output any ACPI-temperatures.
And according to my previous attachment, they're still zero.
SO I just totally missed some temperatures missing and looked at the cpucore and radeon temperatures instead.
can't see "acpitz-virtual-0" even after a sensors-detect. Which is kinda odd, but I suppose every part of the bug is still valid and not fixed.
I will try those patches though and report back
Comment 34 Simon Gebler 2014-07-22 13:03:01 UTC
Tried the patch and no change
Comment 35 Lan Tianyu 2014-08-25 08:01:56 UTC
Created attachment 147961 [details]
dsdt.hex

Could you try this dsdt.hex? 
Following this link
https://01.org/linux-acpi/documentation/overriding-dsdt?langredirect=1

Put it where the kernel build can include it:
$ cp DSDT.hex $SRC/include/
Add this to the kernel .config:
CONFIG_STANDALONE=n
CONFIG_ACPI_CUSTOM_DSDT=y
CONFIG_ACPI_CUSTOM_DSDT_FILE="DSDT.hex"
Make the kernel and off you go!
Comment 36 Simon Gebler 2014-08-25 23:20:58 UTC
I can report no changes in this issue with the other DSDT
Compiled a Kernel with said options and the linked DSDT, but no changes.
Temperatures are 0 when resuming, no fan.
It's still working as before with battery.
Aside from a confirmation about the override at boot...

(Aug 25 16:24:35 Mr-Radar kernel: ACPI: Override [DSDT-CANTIGA ], this is unsafe: tainting kernel
Aug 25 16:24:35 Mr-Radar kernel: Disabling lock debugging due to kernel taint
Aug 25 16:24:35 Mr-Radar kernel: ACPI: DSDT 0x00000000BDFE8000 Logical table override, new table: 0xFFFFFFFF8188F380
Aug 25 16:24:35 Mr-Radar kernel: ACPI: DSDT 0xFFFFFFFF8188F380 007268 (v02 Intel  CANTIGA  06040000 INTL 20140724)

...I can't see any additional log-output about the problem when suspending and resuming.
Or is there anything else I should look for?
Comment 37 Zhang Rui 2014-12-02 05:33:08 UTC
for the fan not working issue, there is not ACPI fan available on this laptop, which means the Fan is not controlled via ACPI.

for the temp sensor issue, please attach the output of
"grep . /sys/class/thermal/thermal_zone*/*"
1. after resuming, with battery.
2. after resuming, without battery.
3. before suspending, with battery.
4. before suspending, without battery.
Comment 38 Zhang Rui 2014-12-02 05:34:25 UTC
BTW, please check if there are any BIOS options related with Thermal control.
please try to upgrade to the latest BIOS, if available.
Comment 39 Tobias Jakobi 2014-12-15 23:02:02 UTC
I'm seeing a similar issue here on a Toshiba Satellite P50-B-108. If I resume the system without the AC adapter plugged in, then the fan stops working and the machine quickly overheats, leading to thermal throttling and a very hot case (to the point where you can't touch it anymore).

Sensors still work here though, it's just the fan control that 'disappears'.

@Zhang: Should I open a separate bug for this, or should I post further information in this bug?
Comment 40 Simon Gebler 2014-12-15 23:14:49 UTC
Created attachment 160731 [details]
Output for some cases

This is the output (Might have sane temperature differences) for:
Before suspend with and without battery
After suspend with battey
Comment 41 Simon Gebler 2014-12-15 23:15:34 UTC
Created attachment 160741 [details]
Last case output

And this is the output for the other case.
After suspend without battery.
Comment 42 Simon Gebler 2014-12-15 23:21:34 UTC
There are no options related to any thermal controls in the BIOS and it already is the lastest one available.
Not sure how the fan is controlled, might be some Dell-way. It's combined for GPU and CPU, might be some off platform code in the device's firmware, what do I know.
Either way, I figured it will probably be related, especially when looking at that other fixed bug in my initial comment on this bug.
Maybe it'll be okay again, whenever the temperatures are working properly after resuming from batteryless suspend.

Anyways,
the only proper differing case from that command is after resuming without battery, all the temperatures seem to be 0.
And as mentioned before, it is fixed shortly after reinserting the battery.
Comment 43 Zhang Rui 2015-02-14 04:04:29 UTC
Created attachment 166731 [details]
customized DSDT to see why _TMP returns 0

First, please check if the problem still exists in the latest upstream kernel.
If yes, please apply the customized DSDT attached, and after boot, run "echo 1 > /sys/module/acpi/parameters/aml_debug_output", and then please re-attach the dmesg output after "grep . /sys/class/thermal/thermal*/temp" both before and after resume, and with & without battery inserted.
Comment 44 Zhang Rui 2015-02-14 04:05:02 UTC
Created attachment 166741 [details]
customized DSDT to see why _TMP returns 0
Comment 45 Zhang Rui 2015-02-14 04:05:43 UTC
Created attachment 166751 [details]
customized DSDT to see why _TMP returns 0
Comment 46 Zhang Rui 2015-02-14 04:06:53 UTC
note: please test using the dsdt attached in comment #45
Comment 47 Zhang Rui 2015-03-02 02:00:38 UTC
ping...
Comment 48 Simon Gebler 2015-03-02 11:57:12 UTC
Sorry, life is very busy lately. Currently building a kernel with that DSDT, will test once I don't have anything productive up and am able to reboot and test
Comment 49 Simon Gebler 2015-03-03 12:06:29 UTC
Created attachment 168711 [details]
Debug output as per comment #43

There we go.
Did a test with battery in and one with removed battery, additionally I added the values the I get after 'fixing' the problem by reinserting the battery
Comment 50 Simon Gebler 2015-03-03 12:13:52 UTC
Created attachment 168721 [details]
Dmesg output during the resume

Just in case of interest, some output that was generated during a resume procedure, before I was typing anything new on the terminal.
Comment 51 Simon Gebler 2015-03-28 14:23:38 UTC
Bug still applies to the current 3.19.2 Kernel.
But now I can see fans as well!
Attaching the output of "grep . /sys/class/thermal/cooling_device*/*" seperately.
Yet it always seems to be the same, no matter what.
Output of /sys/class/thermal/thermal*/temp is unchanged

In other news, it's more or less related, I get some more sensors output by the i8k module, now showing 4 temperature zones and 2 fans (despite the notebook only having one)
The output seems to be equally affected by this problem, showing 0 RPM and 0°C while the bug is active (and querying sensors at any time with that module enabled freezes the notebook for a bit, possibly letting it 'hold' ctrl afterwards, but that's mostly unrelated here I suppose)

I'm currently just wondering why I can't get the acpi temperatures detected by the sensors-detect program, so it is shown by sensors. ACPI Temps are visible in the sensors-plugin for the XFCE-Panel
Comment 52 Simon Gebler 2015-03-28 14:24:29 UTC
Created attachment 172541 [details]
Output, grep for cooling_device*/*
Comment 53 Zhang Rui 2015-03-29 13:07:00 UTC
[root@Mr-Radar sige]# grep . /sys/class/thermal/thermal*/temp
/sys/class/thermal/thermal_zone0/temp:0
/sys/class/thermal/thermal_zone1/temp:0
/sys/class/thermal/thermal_zone2/temp:0

[  463.628391] [ACPI Debug 34131653]  String [0x07] "In _TMP"
[  463.628429] [ACPI Debug 34131699]  Integer 0x0000000000000001
[  463.628440] [ACPI Debug 34131710]  Integer 0x0000000000000000
[  463.629067] [ACPI Debug 34132335]  Integer 0x0000000000000000
[  463.630099] [ACPI Debug 34133367]  String [0x07] "In _TMP"
[  463.630118] [ACPI Debug 34133388]  Integer 0x0000000000000001
[  463.630127] [ACPI Debug 34133397]  Integer 0x0000000000000000
[  463.635051] [ACPI Debug 34138316]  Integer 0x0000000000000000
[  463.636052] [ACPI Debug 34139318]  String [0x07] "In _TMP"
[  463.636071] [ACPI Debug 34139341]  Integer 0x0000000000000001
[  463.636078] [ACPI Debug 34139348]  Integer 0x0000000000000000
[  463.638229] [ACPI Debug 34141492]  Integer 0x0000000000000000

We can see that, when the bug occurs, some EC operation region field returns 0, instead of a meaningful value, including
\_SB.PCI0.LPCB.EC0.CPUT
\_SB.PCI0.LPCB.EC0.SYST
\_SB.PCI0.LPCB.EC0.VGAT

Thus, IMO, this is a BIOS/Firmware problem.
Anyway, to make a double check, Lv, can you please ask Simon do some test to make sure EC is working well when the problem occurs.
Comment 54 Zhang Rui 2015-06-26 00:57:45 UTC
Checked with Lv, EC driver has no impact of the EC operation region field values.
So close it as this is a BIOS bug.
Please check if there is any BIOS upgrade available, please feel free to re-open it if it can be proven not a BIOS bug, e.g. the problem can not be reproduced in Windows.
Comment 55 Lv Zheng 2015-06-26 02:17:35 UTC
Yes, EC transactions are serial and should have nothing to do with the return value of the RD_EC command.
All known EC state machine issues after 3.19 are all related to QR_EC command not RD_EC command.
So this is not an EC related issue.
We can confirm that after 4.2-rcx released.
I'll ping here when the fixes are ready in the upstream.

The problem might be an thermal driver issue or ACPICA core issue.
Let's also confirm this after some tracing facilities upstreamed.

Thanks
-Lv

Note You need to log in before you can comment on or make changes to this bug.