Bug 42796 - thermal temperature stops updating after hibernate - HP Presario A900 Notebook PC/30ED
Summary: thermal temperature stops updating after hibernate - HP Presario A900 Noteboo...
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-18 13:48 UTC by Philip Ashmore
Modified: 2013-08-26 20:29 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.2.0-1-amd64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
output of "acpidump" before hibernate (159.91 KB, text/plain)
2012-02-18 13:48 UTC, Philip Ashmore
Details
output of "grep . /sys/class/thermal/*/*" before hibernate (725 bytes, text/plain)
2012-02-18 13:49 UTC, Philip Ashmore
Details
output of "acpidump" after hibernate (159.91 KB, text/plain)
2012-02-18 13:49 UTC, Philip Ashmore
Details
output of "grep . /sys/class/thermal/*/*" after hibernate (725 bytes, text/plain)
2012-02-18 13:50 UTC, Philip Ashmore
Details
output of "dmesg" after hibernate (56.98 KB, text/plain)
2012-02-18 13:50 UTC, Philip Ashmore
Details
A picture of Trinity/konsole from my desktop after hibernate (129.80 KB, image/png)
2012-03-06 07:22 UTC, Philip Ashmore
Details
/var/log/apt/term.log (343.03 KB, application/octet-stream)
2012-03-31 17:21 UTC, Philip Ashmore
Details
/var/log/dpkg.log (671.05 KB, application/octet-stream)
2012-03-31 17:22 UTC, Philip Ashmore
Details
customized DSDT (301.19 KB, application/octet-stream)
2012-11-28 10:27 UTC, Zhang Rui
Details

Description Philip Ashmore 2012-02-18 13:48:02 UTC
Created attachment 72437 [details]
output of "acpidump" before hibernate

This started with
Presario A975 EM: fan runs at a constant (low) speed after hibernate, until it starts to overheat
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=640293

The thermal information from before the hibernate is used but it's not the actual temperature.
Rebooting shows the actual elevated temperature resulting from running both cores at 100%.
Comment 1 Philip Ashmore 2012-02-18 13:49:08 UTC
Created attachment 72438 [details]
output of "grep .  /sys/class/thermal/*/*" before hibernate
Comment 2 Philip Ashmore 2012-02-18 13:49:41 UTC
Created attachment 72439 [details]
output of "acpidump" after hibernate
Comment 3 Philip Ashmore 2012-02-18 13:50:03 UTC
Created attachment 72440 [details]
output of "grep .  /sys/class/thermal/*/*" after hibernate
Comment 4 Philip Ashmore 2012-02-18 13:50:29 UTC
Created attachment 72441 [details]
output of "dmesg" after hibernate
Comment 5 Jonathan Nieder 2012-02-18 16:24:37 UTC
(In reply to comment #0)

> This started with
> Presario A975 EM: fan runs at a constant (low) speed after hibernate, until
> it
> starts to overheat
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=640293
>
> The thermal information from before the hibernate is used but it's not the
> actual temperature.
> Rebooting shows the actual elevated temperature resulting from running both
> cores at 100%.

Can you spell this out for me?  What steps should I run to reproduce this, what should I expect to happen, and what will actually happen instead?

The above description suggests that you are saying the temperature information after hibernation is wrong because the temperature can change while the computer is off.  But that doesn't make sense, since the sensed temperature is not a static value and is supposed to be always changing (in particular changing after the resume).
Comment 6 Jonathan Nieder 2012-02-18 16:49:23 UTC
Grasping at straws: could you try appending acpi_sleep=nonvs to the kernel command line? It seems to have helped recently on some Asus and Sony systems.
Comment 7 Philip Ashmore 2012-02-18 21:50:46 UTC
Steps to reproduce:
1. power on pc.
2. hibernate.
3. power on pc (this resumes from hibernate)
4. run some 100% CPU intensive tasks

What should have happened:
the thermal information updated to indicate that the CPUs were getting hot, resulting in the fan speeding up.

What actually happened:
the thermal information reported was the temperature before hibernation.
Running some 100% CPU tasks goes unnoticed until a kind of emergency alarm kicks in and the fan goes straight to its top speed - I haven't waited for this recently as it seems a bad idea to rely on.

A reboot restores the CPU thermal information and shows the CPUs are still a bit hot but the fan behaves normally to cool them down and the temperature falls.
Comment 8 Philip Ashmore 2012-02-18 22:04:44 UTC
acpi_sleep=nonvs makes no difference.
Comment 9 Zhang Rui 2012-02-20 06:04:40 UTC
Does this happen after suspend?
Comment 10 Philip Ashmore 2012-02-20 13:43:04 UTC
No.

Also I get this pop-up message box.

Sound server fatal error:
cpu overload, aborting
Comment 11 Len Brown 2012-03-06 02:42:23 UTC
please try changing /sys/power/disk
to "shutdown" instead of "platform" and report
if any difference after the hibernate/resume.
Comment 12 Philip Ashmore 2012-03-06 07:21:50 UTC
The resulting screen and font corruption prevented me from doing any useful
tests after restoring from hibernate.

I managed to get a snapshot of the screen - see attached.

Also, after reboot, the [platform] option was once again selected in /sys/power/disk,
if that helps.

Philip
Comment 13 Philip Ashmore 2012-03-06 07:22:49 UTC
Created attachment 72545 [details]
A picture of Trinity/konsole from my desktop after hibernate
Comment 14 Jonathan Nieder 2012-03-30 18:22:50 UTC
Grasping at straws again: does 3fa016a0b5c5 (drm/i915: suspend fbdev device around suspend/hibernate, 2012-03-28) help?
Comment 15 Philip Ashmore 2012-03-30 21:33:39 UTC
The good news is that that screen shot found its way to someone who knew what it meant - the screen+font corruption problem is fixed.

You'll have to dumb down that you mean by "3fa016a0b5c5 drm/i915: suspend fbdev device" - what package(s) names does that map to in Debian Wheezy/Sid?

If it helps, dpkg -l '*intel*' gets me
ii  libdrm-intel1:amd64  2.4.32-1  Userspace interface to intel-specific kernel DRM services -- runtime
ii  xserver-xorg-video-intel  2:2.18.0-1  X.Org X server -- Intel i8xx, i9xx display driver
Comment 16 Jonathan Nieder 2012-03-30 21:38:17 UTC
Sorry about that.  I meant to ask if applying the patch with that
name to the i915 kernel driver helps.  Instructions are at [1],
and if you have any questions, please don't hesitate to ask.

Thanks,
Jonathan

[1] http://bugs.debian.org/645547#26
Comment 17 Jonathan Nieder 2012-03-30 21:39:49 UTC
(In reply to comment #15)
> The good news is that that screen shot found its way to someone who knew what
> it meant - the screen+font corruption problem is fixed.

Was that in the kernel or in userspace?  Is there a relevant patch that distro people should consider backporting?
Comment 18 Philip Ashmore 2012-03-31 06:49:20 UTC
I don't know for sure that my screen shot was the trigger for the fix.
All I know is - it's fixed.
Hope it doesn't break again.

As for "Was that in the kernel or in userspace?" you'll have to dumb it down for me.

I started working on the patch issue mentioned in #16, but don't hold your breadth - I'm fumbling in the dark.

I would have thought that Intel would be the people with the relevant knowledge and skill set to track this down.

It's sad that the PC vendors don't consider this an issue worthy of their time and resources even though Intel provide open source drivers - strange.
Comment 19 Jonathan Nieder 2012-03-31 07:04:34 UTC
(In reply to comment #18)
> I don't know for sure that my screen shot was the trigger for the fix.
> All I know is - it's fixed.
> Hope it doesn't break again.
>
> As for "Was that in the kernel or in userspace?" you'll have to dumb it down
> for me.

Sorry for the lack of clarity.  I meant to ask what components you upgraded in order to get the fix.  (/var/log/dpkg.log might help in figuring that out if you remember when the fix happened and when you rebooted.)

> I would have thought that Intel would be the people with the relevant
> knowledge
> and skill set to track this down.

Part of the beauty of free software is that work can be distributed more widely. ;-)

Keith Packard who reviewed the patch works for Intel.
Comment 20 Jonathan Nieder 2012-03-31 07:08:23 UTC
(In reply to comment #14)
> Grasping at straws again: does 3fa016a0b5c5 (drm/i915: suspend fbdev device
> around suspend/hibernate, 2012-03-28) help?

A simpler way to test this guess: if you boot with i915.modeset=0 on the kernel command line and boot in "recovery mode" (i.e., don't start X), do you still get fan control trouble after hibernating with "echo disk >/sys/power/state"?
Comment 21 Jonathan Nieder 2012-03-31 07:09:44 UTC
(In reply to comment #20)
> A simpler way to test this guess: if you boot with i915.modeset=0 on the
> kernel
> command line and boot in "recovery mode" (i.e., don't start X), do you still
> get fan control trouble after hibernating with "echo disk >/sys/power/state"?

For "fan control trouble" please read "incorrect temperature readings".  Sorry for the noise.
Comment 22 Jonathan Nieder 2012-03-31 07:11:49 UTC
I'd also suggest trying the test suggested by Len Brown in comment #11:

 echo shutdown >/sys/power/disk
 echo disk >/sys/power/state
Comment 23 Jonathan Nieder 2012-03-31 12:51:23 UTC
(In reply to comment #22)
>  echo shutdown >/sys/power/disk
>  echo disk >/sys/power/state

Ah, now that I look more carefully I see that you tried this but I didn't understand the result.

You wrote:

> The resulting screen and font corruption prevented me from doing any useful
> tests after restoring from hibernate.
>
> I managed to get a snapshot of the screen - see attached.
>
> Also, after reboot, the [platform] option was once again selected in
> /sys/power/disk,
> if that helps.

Do I understand correctly that the screen and font corruption only occurs in "shutdown" mode and not in "platform" mode?  Is the thermal information right or wrong in that state?  (It should be possible to check by writing sensor info to a file and then rebooting to read it.)
Comment 24 Philip Ashmore 2012-03-31 17:21:10 UTC
I've attached

   /var/log/dpkg.log

although it only goes as far back as 2012-03-22

   /var/log/dpkg.log.1

ends at 2012-02-08.

I've also attached

   /var/log/term.log

as it gives the previous versions.

I had a system freeze after (I think) some libdrm updates so I restored from backup and reverted to Squeeze for a few weeks and tried again - that could explain the gap.

I'd be a liar if I said the freeze definitely wasn't after trying hibernate again.

The font/screen corruption problems definitely happen(ed) with Squeeze.
Comment 25 Philip Ashmore 2012-03-31 17:21:59 UTC
Created attachment 72773 [details]
/var/log/apt/term.log
Comment 26 Philip Ashmore 2012-03-31 17:22:44 UTC
Created attachment 72774 [details]
/var/log/dpkg.log
Comment 27 Philip Ashmore 2012-03-31 17:50:19 UTC
Yeah I meant

   /var/log/apt/term.log

I ran the following script after booting in single user mode.
I did this by adding

   i915.modeset=0 single

to the kernel command line<<EOF
echo shutdown >/sys/power/disk
echo disk >/sys/power/state
pm-hibernate
EOF

The system hibernated ok but when I rebooted it finished reading the hibernate image and did a hard power off.

I booted up again with

   noresume

on the kernel command line to boot normally and add this comment.
Comment 28 Jonathan Nieder 2012-04-02 14:41:00 UTC
(In reply to comment #27)
> I did this by adding
>
>    i915.modeset=0 single
>
> to the kernel command line<<EOF
> echo shutdown >/sys/power/disk
> echo disk >/sys/power/state
> pm-hibernate
> EOF
>
> The system hibernated ok but when I rebooted it finished reading the
> hibernate
> image and did a hard power off.

Thanks for testing.  Am I correct in assuming the same thing happens when you try to hibernate in "platform" mode without modesetting enabled, too?

Regarding the package manager logs: I don't think anyone here is going to read them. The only one who has the context that would allow those logs to jog memories about such events as when each symptom appeared and when the machine was rebooted to actually use each kernel is you.
Comment 29 Philip Ashmore 2012-04-02 22:48:32 UTC
(In reply to comment #28)
> (In reply to comment #27)
> > I did this by adding
> >
> >    i915.modeset=0 single
> >
> > to the kernel command line<<EOF
> > echo shutdown >/sys/power/disk
> > echo disk >/sys/power/state
> > pm-hibernate
> > EOF
> >
> > The system hibernated ok but when I rebooted it finished reading the
> hibernate
> > image and did a hard power off.
> 
> Thanks for testing.  Am I correct in assuming the same thing happens when you
> try to hibernate in "platform" mode without modesetting enabled, too?
Yep.
> 
> Regarding the package manager logs: I don't think anyone here is going to
> read
> them. The only one who has the context that would allow those logs to jog
> memories about such events as when each symptom appeared and when the machine
> was rebooted to actually use each kernel is you.

Yeah, I wish I'd tracked it more closely. It appear(s/ed) to be more related to using swap space, something hibernate also does.
I was focusing on the thermal issue.
Once that's fixed I'll make sure to note when problems occur.
Comment 30 Zhang Rui 2012-11-28 10:27:17 UTC
Created attachment 87501 [details]
customized DSDT

please apply this customized DSDT.
and then
1. echo 1 > /sys/modules/acpi/parameters/aml_debug_output
2. grep . /sys/class/thermal/*/*
3. hibernate
4. resume
5. grep ./sys/class/thermal/*/*
6. dmesg > dmesg.out
and attach the dmesg.out here.
Comment 31 Zhang Rui 2012-12-11 01:44:37 UTC
ping...
Comment 32 Philip Ashmore 2012-12-11 16:39:26 UTC
Sorry for the delay - I gave the laptop to someone else when I got a new one.
I updated to 3.2.0-04 and lm-sensors (after a fresh Gnome install) and I'm getting readings from acpiz and both cores - they update just fine after hibernate - the fans are working.
Comment 33 Zhang Rui 2013-05-15 07:01:56 UTC
Bug closed.
please feel free to re-open it once you can reproduce the problem again.
Comment 34 Philip Ashmore 2013-05-15 15:05:50 UTC
The original problem was with 3.2.0-1-amd64, and since 3.2.0-4 doesn't show the problem I propose that the problem was with the original kernel.
Sadly no-one with the same make/model could confirm the bug with the old kernel, or that the new kernel fixed it, so it might be more accurate to change UNREPRODUCIBLE to CODE_FIX or whatever indicates that it was fixed in the later kernel version.

Note You need to log in before you can comment on or make changes to this bug.