Bug 22022

Summary: 2.6.32 regression: sometimes Suspend-To-RAM causes system hangup - ATI RS480
Product: Drivers Reporter: rolf (hubba)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: aaron.lu, alan, alexdeucher, hubba, jrnieder, lenb, max, power-management_other, rjw, rui.zhang
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.8.5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 56331    
Attachments: Output of dmesg
Output of lspci -nnvv
Output of dmesg after booting with acpi_sleep=nonvs
Foto of strange screen appearing sometimes and indicating old code
Output of dmesg after suspend to disk failure.
Output of lspci_nnvv after syspend to disk failure.

Description rolf 2010-11-04 07:21:24 UTC
On my notebook (running under the current Debian Testing version), doing a Suspend-To-RAM, activating the system again, and doing a second Suspend-To-RAM - this causes the system to hangup.
That means: Screen is turned off, keyboard and mouse seem to be disabled, but the PC does not turn to suspend mode - the power LED does not start to blink, the pc does not react any more on any input and does not come up again when pressing the power button.
Installing some older kernels I could isolate that the problem does not exist in the 2.6.30-*-amd64 and 2.6.31-*-amd64 versions - it first comes up with version 2.6.32-1-amd64 and still exists in 2.6.36-trunk-amd64.
A bug report including detailed information about hardware and software is allready filed to the Debian bugtracking system, bug id 600846 , and can be shown at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=600846 .

Thanks and regards
Rolf
Comment 1 rolf 2010-11-04 14:39:26 UTC
Correction: The problem often appears at the second Suspend-To-RAM, but not allways - on a new test some minutes ago it appeared at the 4th Suspend-To-RAM in follow.

Find attached now the output of dmesg and of lspci -nnvv .
Comment 2 rolf 2010-11-04 14:42:20 UTC
Created attachment 36172 [details]
Output of dmesg
Comment 3 rolf 2010-11-04 14:43:12 UTC
Created attachment 36182 [details]
Output of lspci -nnvv
Comment 4 Rafael J. Wysocki 2010-11-16 22:18:44 UTC
Quite frankly, I don't see any way to debug it further other than bisecting
the changes between 2.6.31 and 2.6.32.
Comment 5 Rafael J. Wysocki 2011-01-16 22:42:50 UTC
Is the problem still present in 2.6.37?
Comment 6 rolf 2011-01-17 08:50:43 UTC
Yes - the problem is still present with kernel 2.6.37-1~experimental.1 .
Comment 7 Rafael J. Wysocki 2011-01-17 22:20:26 UTC
It looks like a hardware-related issue to me, maybe ACPI.

Does booting with acpi_sleep=nonvs help?
Comment 8 rolf 2011-01-18 07:48:59 UTC
Created attachment 43962 [details]
Output of dmesg after booting with acpi_sleep=nonvs
Comment 9 rolf 2011-01-18 07:53:15 UTC
No, booting with acpi_sleep=nonvs did not help.

But as I am not a linux professional I am not sure if I added the boot option right, so please check the attached dmesg output.
Comment 10 rolf 2011-01-18 07:54:32 UTC
No, booting with acpi_sleep=nonvs did not help.

But as I am not a linux professional I am not sure if I added the boot option right, so please check the attached dmesg output.
Comment 11 Rafael J. Wysocki 2011-01-18 19:15:47 UTC
Please see if the following:

# echo core > /sys/power/pm_test
# echo mem > /sys/power/state

also hangs if done twice in a row (it should return to the command prompt
every time after about 5-10 seconds).
Comment 12 rolf 2011-01-18 19:38:03 UTC
Doing

# echo core > /sys/power/pm_test
# echo mem > /sys/power/state
# echo core > /sys/power/pm_test
# echo mem > /sys/power/state

works fine.
The "echo core..." returns immediately.
The "echo mem..." turns black the display, the monitor goes into power safe mode for a second or two and then everything gets back to normal display and operation mode.
Comment 13 rolf 2011-04-23 10:04:10 UTC
Created attachment 55112 [details]
Foto of strange screen appearing sometimes and indicating old code

I don't know if this is in any relation to the bug.
But just in case - this is a fotograph of a screen that sometimes appears in systems containing the hibernation bug and does not appear in systems not containing the bug.
In my opinion this screen points to a piece of buggy old code that existed in former kernels, then was eliminated in kernel 2.6.30 and finally reappeared in kernel 2.6.32 and later.
Comment 14 rolf 2011-04-23 10:07:49 UTC
Info: The bug still exists in kernel version 2.6,38 .

Is there a chance that the bug will be fixed somewhen? Cause I am using kernel 2.6.31 now for such a long time and it would be nicer to use a newer one.
Comment 15 rolf 2011-04-29 09:29:57 UTC
Info: The bug still exists in kernel version 2.6,39 experimental .
Comment 16 rolf 2011-06-14 16:01:39 UTC
Created attachment 62062 [details]
Output of dmesg after suspend to disk failure.
Comment 17 rolf 2011-06-14 16:02:51 UTC
Created attachment 62072 [details]
Output of lspci_nnvv after syspend to disk failure.
Comment 18 rolf 2011-06-14 16:24:52 UTC
Tried current Ubuntu 11.04 with kernel 2.6.38-8 .
And under this system also suspend to disk fails (not only suspend to ram).
Same symptom as before, the system doesn't react any more - but after a minute or so the fan begins to run in a higher speed.
Looks like an endless loop to me.
Comment 19 rolf 2011-06-14 16:26:47 UTC
I forgot - you find attached the output of dmesg and lspci after the next boot.
Comment 20 rolf 2011-08-16 12:58:11 UTC
Current state is that constantly the first suspend to ram succeeds and the second constantly hangs (seems to start an infinite loop cause ventilator speeds up after some time).
This constant behaviour seemed to start in kernel version 2.6.39 .
Kernel version 3.0 is a big disappointment for me cause nothing at all has changed.
The one who is in charge for this seems to have an unprofessional attitude. She is playing around with new releases instead of exercising her duties.
In my opinion such fundamental functions like the suspend modi must work absolutely clean before the one in charge can go on to the free skating part of funny new features.

Regards.
Comment 21 rolf 2011-08-28 08:38:54 UTC
Is there anybody out there ?

(Pink Floyd)
Comment 22 Zhang Rui 2012-01-18 02:23:07 UTC
It's great that kernel bugzilla is back.

can you please verify if the problem still exists in the latest upstream
kernel?
Comment 23 Jonathan Nieder 2012-03-21 16:21:42 UTC
Zhang Rui: it seems that this bug is being tracked at https://bugs.freedesktop.org/show_bug.cgi?id=43278 now.  It is still present in current 3.2.y kernels.
Comment 24 Aaron Lu 2013-04-08 02:01:24 UTC
According to the bug page Jonathan pointed out, it seems to be a raedon driver issue.
Comment 25 Zhang Rui 2013-04-08 06:58:15 UTC
from https://bugs.freedesktop.org/show_bug.cgi?id=43278

Some tests last week by Debian (please watch the debian bug report link for further details) showed that the problem is caused by the Radeon module.
The following error message occurrs when loading the radeon module:

[  270.715016] radeon_cp: Failed to load firmware "radeon/R300_cp.bin"
[  270.715045] [drm: r100_cp.init] *ERROR* Failed to load firmware!
[  270.715061] radeon 0000:01:05:0: failed initializing CP (-2) .
[  270.715072] radeon 0000:01:05:0:Disabling GPU acceleration

reassign to radeon guys.
Comment 26 Jonathan Nieder 2013-04-08 07:06:49 UTC
(In reply to comment #25)
> [  270.715016] radeon_cp: Failed to load firmware "radeon/R300_cp.bin"
> [  270.715045] [drm: r100_cp.init] *ERROR* Failed to load firmware!
> [  270.715061] radeon 0000:01:05:0: failed initializing CP (-2) .
> [  270.715072] radeon 0000:01:05:0:Disabling GPU acceleration

That's a red herring from testing in an initramfs.  In an actual production environment with the firmware installed, the reporter is still able to reproduce hangs when trying to hibernate.
Comment 27 Jonathan Nieder 2013-04-08 07:08:01 UTC
s/is/was, a year ago/
Comment 28 Aaron Lu 2013-04-08 07:20:40 UTC
Hi Jonathan,

Rolf's comment seems suggest radeon driver is the problem:
https://bugs.freedesktop.org/show_bug.cgi?id=43278#c18

If this is the case, then there is not much we can do in PM side.
Comment 29 Jonathan Nieder 2013-04-08 07:32:22 UTC
Yes, I agree with that.  Just wanted to make sure anyone stumbling on this later doesn't get confused by the request_firmware() stuff.

Rolf, can you still reproduce this with a 3.8.y or newer kernel?
Comment 30 rolf 2013-04-08 08:27:04 UTC
On 08.04.2013 09:32, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=22022
>
>
>
>
>
> --- Comment #29 from Jonathan Nieder<jrnieder@gmail.com>   2013-04-08
> 07:32:22 ---
> Yes, I agree with that.  Just wanted to make sure anyone stumbling on this
> later doesn't get confused by the request_firmware() stuff.
>
> Rolf, can you still reproduce this with a 3.8.y or newer kernel?
>
Yes, it can be reproduced with kernel Debian 3.8.5-1~experimental.1 .
Comment 31 Alex Deucher 2013-12-10 22:50:03 UTC
I think this should be fixed by this patch:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=acf88deb8ddbb73acd1c3fa32fde51af9153227f