Bug 51381 - [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting, when disabled via vgaswitcheroo
Summary: [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs abortin...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-07 11:00 UTC by a
Modified: 2018-04-07 15:10 UTC (History)
15 users (show)

See Also:
Kernel Version: 3.6.9
Tree: Mainline
Regression: Yes


Attachments
journald log (110.83 KB, text/plain)
2012-12-07 11:00 UTC, a
Details
kernel log (91.30 KB, text/plain)
2014-05-04 17:32 UTC, Garri
Details
Dmesg from 3.15rc7 from Debian/Experimental (71.29 KB, text/plain)
2014-06-04 21:53 UTC, Teofilis Martisius
Details
Dmesg output from 3.15rc8 without radeon.dpm=1 switch (65.49 KB, text/plain)
2014-06-06 07:08 UTC, Teofilis Martisius
Details
Dmesg output from 3.12, fails to turn on dGPU (66.83 KB, text/plain)
2014-06-06 21:05 UTC, Teofilis Martisius
Details
possible fix (630 bytes, patch)
2014-06-06 21:14 UTC, Alex Deucher
Details | Diff
dmesg output for 3.12.21 patched with delay of 200 (66.62 KB, text/plain)
2014-06-09 18:19 UTC, Teofilis Martisius
Details
dmesg output for 3.15-rc8 patched with delay of 200 (67.51 KB, text/plain)
2014-06-09 18:20 UTC, Teofilis Martisius
Details
testing patch (811 bytes, patch)
2014-06-09 18:27 UTC, Alex Deucher
Details | Diff
dmesg output for 3.15-rc8, delay of 200, patched with testing patch 2 (67.36 KB, text/plain)
2014-06-10 06:28 UTC, Teofilis Martisius
Details
disable runpm by default on problematic systems (4.79 KB, patch)
2014-07-18 16:00 UTC, Alex Deucher
Details | Diff
dmesg output from 4.8-rc4 when turning OFF dGPU via vgaswitcheroo (87.38 KB, text/plain)
2016-09-08 23:19 UTC, Teofilis Martisius
Details
dmesg output from 4.8-rc4 with RADEON_PX_QUIRK_DISABLE_PX quirks removed (88.92 KB, text/plain)
2016-09-08 23:21 UTC, Teofilis Martisius
Details

Description a 2012-12-07 11:00:34 UTC
Created attachment 88591 [details]
journald log

After updating from 3.6.6 to 3.6.9 my laptop with Intel graphics and ATI HD 5650 will not resume from suspend. I use vgaswitcheroo to disable the ATI card at boot. On resume the computer almost hangs (I can press power button and wait 5 minutes for a proper shutdown, but no other interaction is possible). It logs a lot of messages saying:

[drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[drm:atom_execute_table_locked] *ERROR* atombios stuck executing D098 (len 72, WS 0, PS 0) @ 0xD0C7

Steps to reproduce:
echo "OFF" > /sys/kernel/debug/vgaswitcheroo/switch
[suspend and resume]

Actual results:
Almost freeze.

Expected results:
Resume and work as normal.

Log is attached, but if you need anything else just ask.
Comment 1 Alex Deucher 2012-12-07 14:27:18 UTC
The error messages are just a symptom.  They are generated because the driver is trying to access hardware that has been powered down.  vgaswitcheroo probably needs some logic to track what state the hardware is in so that the drivers know whether it's there or not.
Comment 2 a 2012-12-07 16:18:48 UTC
Since this is a regression (everything worked like a charm in 3.6.6) i would guess the logic is in place and that this is just a side effect of some other fix. But that's only a guess as I'm not a kernel developer.
Comment 3 Alex Deucher 2012-12-07 16:22:31 UTC
Can you bisect?
Comment 4 a 2012-12-07 17:34:31 UTC
There's a lot of commits between 3.6.6 and 3.6.9, but I will see what I can do.
Comment 5 a 2012-12-09 13:55:11 UTC
I now ran the same 3.6.6 kernel as I did before, but now it didn't work there either, so something else must be wrong. I will do some more testing, but it seems the kernel is not to blame after all.
Comment 6 Christoph Haag 2013-06-13 17:10:13 UTC
intel + radeon + vgaswitcheroo seems to have been problematic for quite some time:
https://bugzilla.kernel.org/show_bug.cgi?id=23592

I'm using 3.10rc5.

While I have no problems with suspending or quitting X (didn't have problems with some earlier kernels either), I have had yet another problem for some time now. If I

start X with the radeon card enabled
then disable the radeon card with vgaswitcheroo
then switch to a tty
then switch back to X

then X hangs and I get many messages like
[  236.688466] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[  236.688470] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing D46E (len 62, WS 0, PS 0) @ 0xD48A

I assume the problem is still the same.

The most pressing issue is that X should not get stuck, no matter what. It's much better to reset the driver somehow, even if you lose 3d acceleration or so. As it is now, X eats all keyboard input and you have to use sysrq keys or reboot over ssh which is not really ideal.
Comment 7 Alex Deucher 2013-06-13 17:11:40 UTC
See comment #1.
Comment 8 Garri 2014-05-04 17:28:48 UTC
I have same dual graphics configuration (Intel graphics and ATI HD 5650) and also have problems with resume. It takes 40 seconds, and I get following error messages:

[drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[drm:atom_execute_table_locked] *ERROR* atombios stuck executing CD0C (len 62, WS 0, PS 0) @ 0xCD28
[drm:atom_execute_table_locked] *ERROR* atombios stuck executing BA84 (len 937, WS 4, PS 0) @ 0xBB94
[drm:atom_execute_table_locked] *ERROR* atombios stuck executing BA1A (len 76, WS 0, PS 8) @ 0xBA22
[drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed

I use 3.14.2 kernel, system init is systemd. The problems began since switching from 3.12 to 3.13. I use vgaswitcheroo to disable discrete GPU during boot time. Beginning with 3.13 kernel radeon use runtime power management (runpm), and I have to pass "runpm=0" to radeon module, because runpm automatically reenables discrete GPU and brings another problems (https://bugs.gentoo.org/show_bug.cgi?id=506188).

Without parameter "runpm=0" my system rusumes immediately. Please let me know if you need additional information. Thank you!
Comment 9 Garri 2014-05-04 17:32:20 UTC
Created attachment 135131 [details]
kernel log

Kernel log includes suspend-resume info and error messages.
Comment 10 Alex Deucher 2014-05-05 13:33:22 UTC
(In reply to newgarry from comment #8)
> 
> Without parameter "runpm=0" my system rusumes immediately. Please let me
> know if you need additional information. Thank you!

Are you saying that everything is working properly without runpm=0?
Comment 11 Garri 2014-05-05 17:44:04 UTC
(In reply to Alex Deucher from comment #10)
> Are you saying that everything is working properly without runpm=0?

Yes, it is. System resumes immediately and without error messages.
Comment 12 Teofilis Martisius 2014-06-04 20:43:02 UTC
Hello,

I'm not sure if the problem I have is the same as reported in the original bug report on 2012-12-07, but I have a problem very similar to one described by newgarry on 2014-05-04.

I have an Asus K73TA laptop with AMD A6-3400M APU and Radeon 6550 GPU. I get foloowing errors on boot with Kernels version 3.13 and above:

[   53.720848] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[   53.720975] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CE56 (len 62, WS 0, PS 0) @ 0xCE72
[   53.721107] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing BB62 (len 1036, WS 4, PS 0) @ 0xBC5F
[   53.721240] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing BAF8 (len 76, WS 0, PS 8) @ 0xBB00
[   55.775951] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xFFFFFFFF)
[   55.776083] [drm:evergreen_resume] *ERROR* evergreen startup failed on resume
[   55.776364] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed

Initially I thought this is Debian specific so I reported it on Debian BTS, it has more details here:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=737684

I bisected kernel versions between 3.12 and 3.13 and I determined that this issue was introduced in the following git commit:

commit 10ebc0bc09344ab6310309169efc73dfe6c23d72
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Sep 17 14:40:31 2012 +1000

    drm/radeon: add runtime PM support (v2)
    
    This hooks radeon up to the runtime PM system to enable
    dynamic power management for secondary GPUs in switchable
    and powerxpress laptops.
    
    v2: agd5f: clean up, add module parameter
    
    Signed-off-by: Dave Airlie <airlied@redhat.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Newgarry, can you please try kernel v3.12 and see if it works correctly for you?

Because of this issue I cannot upgrade my kernel to anything above 3.12. I spent a week bisecting the kernel, so I would really appreciate someone looking into this.

I tried reviewing the changes introduced in this commit, but I know too little about radon drivers to be able to understand the impact they had.

If you need me to do any additional testing or provide you with extra information, please don't hesitate to contact me, I'll do what I can.

Sincerely,
Teofilis Martisius
Comment 13 Alex Deucher 2014-06-04 20:59:50 UTC
There have been a lot of PX fixes in 3.15.  Can you try 3.15?  Additionally, you can disable the PX runtime pm support by appending radeon.runpm=0 on the kernel command line in grub.
Comment 14 Teofilis Martisius 2014-06-04 21:51:12 UTC
Hi,

Thank you for very quick response.

I have just tried v3.15rc7 from Debian Experimental, it still has this same problem. I have attached an excerpt from dmesg at the end of this message. I'll attach a full dmesg log as well.

I have tried same kernel with radeon.runpm=0, and it works correctly. I can run glxgears on both my primary and my secondary GPU with "xrandr --setprovideroffloadsink xx yy" and "DRI_PRIME=1 glxgears". Both work correctly.

So disabling power management works as a workaround. However, it's just a workaround, and it would be interesting to get the underlying issue fixed.

I'll try 3.15rc8 next, see if that has any improvements.

P.S. My current kernel boot-time options are:

quiet radeon.audio=0 modeset=1 radeon.dpm=1 radeon.no_wb=1 radeon.runpm=0

I'm running Debian/Sid. Could it be something broken in userspace interfering?

Sincerely,
Teofilis Martisius

======

[   55.886107] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[   55.886234] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CE56 (len 62, WS 0, PS 0) @ 0xCE72
[   55.889662] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing BB62 (len 1036, WS 4, PS 0) @ 0xBC5F
[   55.892979] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing BAF8 (len 76, WS 0, PS 8) @ 0xBB00
[   55.896345] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
[   57.418996] radeon 0000:01:00.0: Wait for MC idle timedout !
[   57.609356] radeon 0000:01:00.0: Wait for MC idle timedout !
[   57.628060] [drm] PCIE GART of 1024M enabled (table at 0x0000000000273000).
[   57.628181] radeon 0000:01:00.0: WB enabled
[   57.628189] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff88014876ac00
[   57.628194] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff88014876ac0c
[   57.637229] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc90011cb2118
[   57.844594] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xFFFFFFFF)
[   57.844724] [drm:evergreen_resume] *ERROR* evergreen startup failed on resume
[   57.847954] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
Comment 15 Teofilis Martisius 2014-06-04 21:53:07 UTC
Created attachment 138161 [details]
Dmesg from 3.15rc7 from Debian/Experimental

Asus K73TA laptop with AMD A6-3400M APU and Radeon 6550 GPU
Comment 16 Teofilis Martisius 2014-06-05 06:18:36 UTC
Hello,

I can reproduce this problem on v3.15rc8 with power management enabled (radeon.runpm=1). So this hasn't been fixed in v3.15 yet.

v3.15rc8 works OK with radeon.runpm=0 flag

Sincerely,
Teofilis Martisius
Comment 17 Alex Deucher 2014-06-05 13:21:32 UTC
Does removing radeon.dpm=1 from the kernel command line fix the issue?  It's enabled by default on asics where it is stable.
Comment 18 Teofilis Martisius 2014-06-05 22:38:29 UTC
Hello,

I tried removing radeon.dpm=1, it did NOT fix the issue. I think my GPUs are considered "stable" by now- it's not a new laptop.

Let me know if you need anything else.

Sincerely,
Teofilis Martisius

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.15.0-rc8 root=UUID=6b66e960-0da6-4a99-abfe-f23614b50db9 ro quiet radeon.audio=0 modeset=1 radeon.no_wb=1

.....

[   59.122053] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[   59.122183] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CE56 (len 62, WS 0, PS 0) @ 0xCE72
[   59.122315] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing BB62 (len 1036, WS 4, PS 0) @ 0xBC5F
[   59.122448] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing BAF8 (len 76, WS 0, PS 8) @ 0xBB00
[   60.722717] radeon 0000:01:00.0: Wait for MC idle timedout !
[   60.923224] radeon 0000:01:00.0: Wait for MC idle timedout !
[   60.942019] [drm] PCIE GART of 1024M enabled (table at 0x0000000000273000).
[   60.942140] radeon 0000:01:00.0: WB enabled
[   60.942148] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880089552c00
[   60.942153] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880089552c0c
[   60.951189] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc90011cb2118
[   61.178505] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xFFFFFFFF)
[   61.178637] [drm:evergreen_resume] *ERROR* evergreen startup failed on resume
[   66.180845] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
[   66.180974] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing C50C (len 1136, WS 0, PS 0) @ 0xC536
[   66.219989] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none
[   66.219995] vgaarb: device changed decodes: PCI:0000:00:01.0,olddecodes=io+mem,decodes=none:owns=none
Comment 19 Alex Deucher 2014-06-05 22:40:10 UTC
(In reply to Teofilis Martisius from comment #18)
> Hello,
> 
> I tried removing radeon.dpm=1, it did NOT fix the issue. I think my GPUs are
> considered "stable" by now- it's not a new laptop.

Please attach your full dmesg output with radeon.dpm=1 removed.
Comment 20 Alex Deucher 2014-06-05 22:43:51 UTC
Does disabling the dGPU manually via vgaswitcheroo with runpm=0 or on older kernels prior to 10ebc0bc09344ab6310309169efc73dfe6c23d72 actually work or does it have similar problems?
Comment 21 Teofilis Martisius 2014-06-06 07:08:32 UTC
Created attachment 138351 [details]
Dmesg output from 3.15rc8 without radeon.dpm=1 switch

Dmesg output from 3.15rc8 without radeon.dpm=1 switch and without radeon.runpm=0 switch.

Command line: BOOT_IMAGE=/boot/vmlinuz-3.15.0-rc8 root=UUID=6b66e960-0da6-4a99-abfe-f23614b50db9 ro quiet radeon.audio=0 modeset=1 radeon.no_wb=1
Comment 22 Teofilis Martisius 2014-06-06 07:29:40 UTC
Ok,

I tried playing around with Linux 3.12, that's the last stable version before that commit.

Booted with parameters:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.12-1-amd64 root=UUID=... ro quiet radeon.audio=0 modeset=1 radeon.dpm=1 radeon.no_wb=1 radeon.runpm=0

Executed:

echo OFF >/sys/kernel/debug/vgaswitcheroo/switch

dGPU got switched off, everything works fine. Got a message in dmesg:

[ 1454.396648] radeon: switched off

I don't want to spam attachments on this bug report, full dmesg output if needed at:
http://menulis.org/kernel/dmesg_3.12_off1.log

I tried it with radeon.runpm=1 as well, same results, everything works fine. Dmesg:
http://menulis.org/kernel/dmesg_3.12_off2.log

Let me know if you want these two dmesg logs attached here in bugzilla.

Let me know if you need me to try something else next.

Sincerely,
Teofilis Martisius
Comment 23 Alex Deucher 2014-06-06 14:17:54 UTC
Sorry, I should have been more clear, on kernel 3.12, does the dGPU switch on again properly after via debugfs after you've disabled it via debugfs?  The problem you are sseing is that the GPU turns off ok, but seems to have problems turning back on:

[   52.340438] radeon 0000:01:00.0: Refused to change power state, currently in D3
[   52.416464] radeon 0000:01:00.0: Refused to change power state, currently in D3
[   52.432469] radeon 0000:01:00.0: Refused to change power state, currently in D3
and the registers are all reading back 0xffffffff which usually means the device is still powered off.

Also why are you using radeon.no_wb=1?  That may cause problems.
Comment 24 Teofilis Martisius 2014-06-06 21:04:31 UTC
Hi,

Thank you once again for quick response.

Ok, a while ago I added radeon.no_wb=1 as without it I was getting display corruption. That problem seems to be gone now, so there's no reason to keep that option any more- I've taken it off.

I think you nailed the problem. On 3.12, it fails to turn ON the dGPU after it has been turned OFF. I have tried this by booting 3.12 with following boot parameters:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.12-1-amd64 root=UUID=6b66e960-0da6-4a99-abfe-f23614b50db9 ro quiet radeon.audio=0 modeset=

Then I did:

echo OFF >/sys/kernel/debug/vgaswitcheroo/switch
echo ON >/sys/kernel/debug/vgaswitcheroo/switch

After "echo ON" I got similar "atombios stuck in loop" errors in dmesg, and errors that dGPU failed to come back on. 

"DRI_PRIME=1 glxgears" of course fails to work after that as well.

I've attached the dmesg from 3.12. Ok, what next?

P.S. Sorry for slow responses. I only have time to do this in the evenings, and I'm in London- I guess you are in a different timezone.

Sincerely,
Teofilis Martisius
Comment 25 Teofilis Martisius 2014-06-06 21:05:25 UTC
Created attachment 138411 [details]
Dmesg output from 3.12, fails to turn on dGPU
Comment 26 Alex Deucher 2014-06-06 21:13:43 UTC
Ok, so it appears your dGPU never powered up properly.  You just see the problem now because prior to the runpm patch (which dynamically turns the dGPU on/off) it was always left on.
Comment 27 Alex Deucher 2014-06-06 21:14:55 UTC
Created attachment 138421 [details]
possible fix

Does this patch help?  If not, can you try increasing the size of the delay and see if that helps?
Comment 28 Teofilis Martisius 2014-06-09 18:18:45 UTC
Hello,

Sorry for the delay, I had other plans for the weekend.

The patch did not help. I tried it with default delay of 20, and then I tried it with delay set to 200 (200 what? milliseconds?). I tried both default delay and 200 delay on both 3.12.21 and on 3.15rc8, no luck. I changed the patch to increase the delay and to print out the delay- you can see it in dmesg. I have attached the dmesg output for the 200 delay runs for 3.12.21 and 3.15rc8.

I ran the kernels with following boot parameters:

3.12.21: BOOT_IMAGE=/boot/vmlinuz-3.12.21d200 root=UUID=xxx ro quiet radeon.audio=0 modeset=1

3.15.0-rc8 BOOT_IMAGE=/boot/vmlinuz-3.15.0-rc8teo root=xxx ro quiet radeon.audio=0 modeset=1 radeon.runpm=0

Sincerely,
Teofilis Martisius

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index b512c00..0574d56 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1093,8 +1093,10 @@ static void radeon_switcheroo_set_state(struct pci_dev *pdev, enum vga_switchero
                /* don't suspend or resume card normally */
                dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 
-               if (d3_delay < 20 && radeon_switcheroo_quirk_long_wakeup(pdev))
-                       dev->pdev->d3_delay = 20;
+               if (d3_delay < 200 /*&& radeon_switcheroo_quirk_long_wakeup(pdev)*/) {
+                       dev->pdev->d3_delay = 200;
+                       printk(KERN_INFO "radeon: d3 delay set to 200\n");
+               }
 
                radeon_resume_kms(dev, true, true);
Comment 29 Teofilis Martisius 2014-06-09 18:19:34 UTC
Created attachment 138621 [details]
dmesg output for 3.12.21 patched with delay of 200
Comment 30 Teofilis Martisius 2014-06-09 18:20:07 UTC
Created attachment 138631 [details]
dmesg output for 3.15-rc8 patched with delay of 200
Comment 31 Alex Deucher 2014-06-09 18:22:12 UTC
Ok, it appears powering up the dGPU has never worked properly on your system.  As a workaround for now you can disable runpm by adding radeon.runpm=0.  I can add a quirk to the driver to disable runpm on your system by default until someone figures out how to fix it.
Comment 32 Alex Deucher 2014-06-09 18:27:02 UTC
Created attachment 138641 [details]
testing patch

Does this patch help?  Note, this will probably break regular suspend/resume, so just test it for switcheroo.
Comment 33 Teofilis Martisius 2014-06-10 06:28:45 UTC
Created attachment 138791 [details]
dmesg output for 3.15-rc8, delay of 200, patched with testing patch 2
Comment 34 Teofilis Martisius 2014-06-10 06:29:04 UTC
Hello,

I have applied patch2 to 3.15-rc8 along with patch1 and delay of 200, ran with "radeon.runpm=0", tried to turn off and on the dGPU, and the problem is still there. Dmesg attached.

Sincerely,
Teofilis Martisius
Comment 35 Garri 2014-06-28 13:12:14 UTC
Hello,

Kernel 3.15.2 solved my problem, described in comment 8. Many thanks!
Comment 36 Alex Deucher 2014-07-18 16:00:21 UTC
Created attachment 143411 [details]
disable runpm by default on problematic systems

This patch disables runpm by default on the Asus K73TA laptop so that the system will be usable out of the box until we fix the deeper issue.
Comment 37 Teofilis Martisius 2014-07-19 09:52:19 UTC
Hi,

Please let me know if I can do anything to help "fix the deeper issue". I'd like to have runpm working properly on my machine, and I do have some free time. Unfortunately I'm not familiar with Radeon GPU internals or kernel driver internals so I cannot do it myself- I tried going through the .c files affected by the patches you sent and understand what's going on there and failed. But I can build & test kernels, do some experiments, and feed you the information.

On the other hand, your time is probably better spent on R9 support and new OpenGL features...

Sincerely,
Teofilis
Comment 38 JM 2014-09-22 06:00:08 UTC
I'm also getting this with kernel 3.16.3 on Arch. I've also fixed the problem by adding radeon.runpm=0 to my boot parameters.

CPU=A6-3420m with Ati 6520G and 7670m -> Asus K53TK

[    1.056775] checking generic (b0000000 300000) vs hw (b0000000 10000000)
[    1.056777] fb: switching to radeondrmfb from EFI VGA
[    1.056802] Console: switching to colour dummy device 80x25
[    1.057757] [drm] initializing kernel modesetting (SUMO 0x1002:0x9647 0x1043:0x2122).
[    1.057777] [drm] register mmio base: 0xFEB00000
[    1.057778] [drm] register mmio size: 262144

.......................................................................

[    1.487137] [drm] Initialized radeon 2.39.0 20080528 for 0000:00:01.0 on minor 0
[    1.487250] radeon 0000:01:00.0: enabling device (0000 -> 0003)
[    1.487940] [drm] initializing kernel modesetting (TURKS 0x1002:0x6840 0x1043:0x2122).
[    1.487962] [drm] register mmio base: 0xFEA20000
[    1.487964] [drm] register mmio size: 131072

---
        /* Asus K53TK laptop with AMD A6-3420M APU and Radeon 7670m GPU
	 * https://bugzilla.kernel.org/show_bug.cgi?id=51381
	 */
	{ PCI_VENDOR_ID_ATI, 0x6840, 0x1043, 0x2122, RADEON_PX_QUIRK_DISABLE_PX },
Comment 39 Alex Deucher 2014-09-22 21:39:01 UTC
Thanks.  I've added a quirk for your system.
Comment 40 Richard Szibele 2016-03-12 18:58:56 UTC
I'm also experiencing this issue on kernel 4.4 on Debian where this happens at boot time and the system is basically unable to boot due to it locking up completely. runpm=0 also solves the issue.

My hardware:
CPU:         AMD Phenom II X4 965
Motherboard: Gigabyte 990FXA-UD3
GPU: AMD Radeon HD 5670

If there is any way I can help then let me know.
Comment 41 Peter Uchno 2016-04-26 14:35:59 UTC
I've started seeing this on a Dell Latitude E6540 with a Radeon 8790M and Intel hybrid graphics. I'm using vgaswitcheroo to disable the Radeon in Linux. I'm on Arch Linux, I wasn't seeing the problem in kernel 4.5.0 but I'm seeing it now in 4.5.1. The kernel spits out errors about the atombios being stuck in a loop and then fills dmesg with messages about "ring 3 stalled".
After about a full minute of this, X starts on the Intel GPU and the system seems to run normally. The kernel continues to spit out error messages during operation. Adding radeon.runpm=0 to the kernel parameters results in  a normal boot, where the driver starts up normally and X starts up much more quickly. I do see that a patch related to runtime PM (e64c952efb8e0c15ae82cec8e455ab4910690ef1) went into the kernel recently.
Comment 42 RussianNeuroMancer 2016-09-02 03:25:24 UTC
I guess you probably have different issue.

Alex, what info I need provide to add laptop to quirk list? Acer 7560g, 6620G+6650M.
Comment 43 Alex Deucher 2016-09-02 13:35:51 UTC
Is this still an issue with kernel 3.8?  A bunch of PX fixes went into that kernel.
Comment 44 Teofilis Martisius 2016-09-02 19:59:35 UTC
Hi,

Last I tried it was 4.7. I removed my laptop from the quirks list and ATOMBIOS stuck in loop message still happened.

I'll test this with 4.8-rc4 over the weekend.

Sincerely,
Teofilis
Comment 45 Boris Carvajal 2016-09-05 21:03:22 UTC
Hello,

I was hoping the new changes would fix this, but it's not the case.
The system freeze (and gets "atombios stuck" msg) like 15secs when starting X and resuming from suspend.
Also running DRI_PRIME=1 glxgears just gets a blank window with a print loop of this message: "radeon: The kernel rejected CS, see dmesg for more information."
and of course there is the "*ERROR* atombios stuck" stuff.
I'm using an Asus K53TA laptop (6520G + 6650M) and a git kernel just recompiled yesterday.
Comment 46 Teofilis Martisius 2016-09-08 23:19:54 UTC
Created attachment 232761 [details]
dmesg output from 4.8-rc4 when turning OFF dGPU via vgaswitcheroo
Comment 47 Teofilis Martisius 2016-09-08 23:21:08 UTC
Created attachment 232771 [details]
dmesg output from 4.8-rc4 with RADEON_PX_QUIRK_DISABLE_PX quirks removed
Comment 48 Teofilis Martisius 2016-09-08 23:30:06 UTC
Hi, 

I ran two tests over the weekend.

First, I tried booting up stock 4.8-rc4. glxgears runs fine both on APU and dGPU. But it fails when I try to turn OFF my dGPU by doing:

echo OFF >/sys/kernel/debug/vgaswitcheroo/switch

dmesg output attached.

Second, I modified 4.8-rc4 & removed my computer from radeon_device.c quirks list (it has RADEON_PX_QUIRK_DISABLE_PX quirk assigned) and rebooted. dmesg attached. It does bootup but with DRI_PRIME=1 glxgears displays just a blank window. dmesg still shows the "atombios stuck in loop" error.

I won't be able to run more tests soon as I bricked the laptop trying to upgrade BIOS. I plan to repair/recover it but it will take a while.

OS being used is Debian/Sid. I used stock kernel from kernel.org, with 1 line change in 2nd test. Same laptop as before, described in previous comments. 

I hope any of this helps.

Sincerely,
Teofilis Martisius
Comment 49 Teofilis Martisius 2016-10-11 00:16:00 UTC
Hi,

Ok, I got my laptop fixed and now my BIOS is v2.14 (was 2.06). Nothing changed. 

I have tested with kernel v4.8. I get same errors as in 4.8-rc4.

Please let me know what else can I do to help get this solved.

I'll try to reproduce this with v4.9 when it's closer to release.

Sincerely,
Teofilis Martisius
Comment 50 Alex Deucher 2016-10-28 21:24:21 UTC
Your system does not appear to support powerdown of the dGPU.  From your log:
[    9.611183] ATPX version 1, functions 0x00000181
Bit 1 of the functions should be set if it does.
Comment 51 luminoso 2016-12-11 16:31:06 UTC
System is kernel 4.9.0-RC8, Fedora 25, "[AMD/ATI] Venus PRO [Radeon HD 8850M / R9 M265X]"

Some other bug pushed me to boot kernel with radeon.rumpm=0 at boot. I noticied that I had this ATOM bios stuck when trying to turn ON the graphic card due to laptop power usage being to high.

turn off is "echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch
turn on is "echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch"

my switch looks like this:
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :Off:0000:01:00.0

Power usage is reported by powertop. Low means ~12w and high means ~17w

So, the test cases are:
1)
Boot, turn off AMD. Power usage is LOW 
Suspend and resume.
Power usage is now high. Switch reports OFF.

2)
Boot, turn off AMD. Power usage is LOW 
Suspend and resume.
Power usage is now high. Switch reports OFF.
Turn on the GPU takes a long time and throws atom bios stuck error.
Turning it off again reduces usage almost to low but my laptop fans are a bit crazy.

3) 
Boot, turn off AMD. Power usage is LOW 
Before suspending turn ON via systemd hook.
Suspend and resume.
After suspend turn OFF via systemd hook.
No errors in dmesg.
Power usage is low.
No delays.

So 3) is the perfect solution where everything works as expected with just and quick workaround.

Probably my GPU doesn't support turn ON after suspending and my laptop should suspend with GPU turned on.
Comment 52 Nico Sneck 2018-04-07 15:10:52 UTC
I have this same issue, with an Asus K73TK laptop. It has an A6-3420M-APU and a 7670M dGPU.

On boot these appear in dmesg:

[   27.436081] [drm:atom_op_jump [radeon]] *ERROR* atombios stuck in loop for more than 5secs aborting
[   27.436195] [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing CE9C (len 62, WS 0, PS 0) @ 0xCEB8
[   27.436307] [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing BB9C (len 1036, WS 4, PS 0) @ 0xBC99
[   27.436420] [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing BB32 (len 76, WS 0, PS 8) @ 0xBB3A
[   27.436740] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed
[   29.148572] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
[   29.370027] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xFFFFFFFF)
[   29.370149] [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume
[   29.370321] [drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed
[   34.372053] [drm:atom_op_jump [radeon]] *ERROR* atombios stuck in loop for more than 5secs aborting
[   34.372115] [drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing C546 (len 1136, WS 0, PS 0) @ 0xC570

Adding a PX quirk for this device worked, and booting happens without errors now.
I'll send the patch to amd-gfx.

Note You need to log in before you can comment on or make changes to this bug.