Bug 47481 - Random blank screen: radeon_cs_ib_chunk Failed to schedule IB on AMD PALM
Summary: Random blank screen: radeon_cs_ib_chunk Failed to schedule IB on AMD PALM
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-13 17:15 UTC by Anisse Astier
Modified: 2012-12-18 22:23 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.5.3, 3.6.0-rc5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Dmesg of the problem (54.79 KB, text/plain)
2012-09-13 17:15 UTC, Anisse Astier
Details

Description Anisse Astier 2012-09-13 17:15:27 UTC
Created attachment 80131 [details]
Dmesg of the problem

Hi,

Randomly I might get a bug on an AMD E1-1200 : screen won't turn on and when X is supposed to start I get a lot of these:
[  307.718332] radeon 0000:00:01.0: couldn't schedule ib
[  307.718348] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !
[  307.719063] radeon 0000:00:01.0: couldn't schedule ib
[  307.719075] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !
[  307.734461] radeon 0000:00:01.0: couldn't schedule ib
[  307.734476] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !
[  307.735489] radeon 0000:00:01.0: couldn't schedule ib
[  307.735502] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB !

They stop once X is stopped.

Full dmesg in attachment.

Extract of my xorg.conf:
Section "Device"
        Identifier  "Card0
        Driver      "radeon"
        Option  "DRI"
EndSection
Comment 1 Anisse Astier 2012-09-13 17:18:45 UTC
Correction: screen is blank, but backlight is on.
Comment 2 Alex Deucher 2012-09-13 17:19:43 UTC
Is this a regression?  If so what components did you change (kernel, mesa 3D driver, radeon xorg driver, etc.).
Comment 3 Anisse Astier 2012-09-13 17:27:23 UTC
I'm not sure. Since it's random I need to do a lot of tests to reproduce it. I'll try to test with 3.2.

Sometimes, screen might turn on, but I get vertical black and white stripes. (I have a photo if you need one).

My current stack is:
 - kernel 3.5.3, but it also crashes/bugs with kernel 3.6.0-rc5
 - mesa 8.0.4
 - DDX 6.14.4
 - xserver 1.7.7

Not sure it's relevant since the problem starts before X when modesetting should kick in (screen is blank).
Comment 4 Anisse Astier 2012-09-14 10:16:22 UTC
I think it is indeed a regression. After many power cycles, I wasn't able to reproduce it with 3.2.16.

This is going to take forever to bisect…
Comment 5 Anisse Astier 2012-09-17 17:41:33 UTC
I did a first round of bisect:
375849ce878f1a2dfac12ea2a62b361ab6b7f9a5 is the first bad commit
commit 375849ce878f1a2dfac12ea2a62b361ab6b7f9a5
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Fri Jul 27 16:32:24 2012 -0400

    drm/radeon: do not reenable crtc after moving vram start address

    commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 upstream.

    It seems we can not update the crtc scanout address. After disabling
    crtc, update to base address do not take effect after crtc being
    reenable leading to at least frame being scanout from the old crtc
    base address. Disabling crtc display request lead to same behavior.

    So after changing the vram address if we don't keep crtc disabled
    we will have the GPU trying to read some random system memory address
    with some iommu this will broke the crtc engine and will lead to
    broken display and iommu error message.

    So to avoid this, disable crtc. For flicker less boot we will need
    to avoid moving the vram start address.

    This patch should also fix :

    https://bugs.freedesktop.org/show_bug.cgi?id=42373




Problem: I'm not sure this is in fact the culprit, and that this commit didn't just uncover another bug. During the bisect between v3.5 and v3.5.3 I had another problem: machine would some time block at boot with a black screen (no backlight); I ignored it because it didn't have the same symptom (boot ok, blank screen, backlight on, error on X start).

That's why I'm not sure if the bug isn't more ancient (between v3.2 and v3.5).
Comment 6 Alex Deucher 2012-09-18 14:41:46 UTC
This bug sounds kind of like the bugs that 375849ce878f1a2dfac12ea2a62b361ab6b7f9a5 fixes.  Can you see if this is fixed in my drm-next-3.7-wip branch:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.7-wip
I have a more complete set of patches to properly disable MC clients when changing the MC.
Comment 7 Anisse Astier 2012-09-18 16:46:05 UTC
After about 180 reboots, I can confirm that this bug isn't present anymore on drm-next-3.7-wip.
But I wasn't able to find commit 375849ce878f1a2dfac12ea2a62b361ab6b7f9a5 . Which commit do you think contains the fix ?

I still have a seemlingly unrelated "screen off/lockup on reboot" problem (that shouldn't be a regression). But this is a story for another bug report.
Comment 8 Alex Deucher 2012-09-18 17:00:44 UTC
The commit is:
drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2)

As of right now, it's here:
http://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-3.7-wip&id=297a34942adb9c547e922258263cd2642ebca61b
but I rebase that branch regularly so it may change.
Comment 9 Anisse Astier 2012-10-09 07:53:11 UTC
I was able to reproduce this bug on an AMD E2-1800 with kernel 3.6. I can confirm that backporting those two patches fixes the problem:

4a15903 drm/radeon/dce4+: don't use radeon_crtc for vblank callback
62444b7 drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2)


It would be great if they could be imported to stable, if they are not demmed too invasive.
Comment 10 Andre 2012-11-25 19:03:33 UTC
I can confirm this bug on my AMD A8-3550-m
Kernel 3.5.7 and 3.6.6  -> radeon.modeset=1

The trouble began about tree days ago, as i updated my Gentoo system (stable branch) and the following updates where installed:

xorg-server    1.12.4     =>   1.13.0-r1
mesa           8.0.4-r1   =>   9.0
xf86-video-ati 6.14.6-r1  =>   7.0.0

After that i had strange artifacts in X and random crashes of the X-server. Where the mouse is movable but the windows are frozen and the screen turns black after a a short time.

I have test-installed 3 other linux distributions, all came out with the same bug, after upgrading the installations to the latest stable releases.


Sadly only after tree days of experimenting, i came to the conclusion that this error only occurs when the VGA output is active. 
With the LVDA or HDMI output i have no problems so far. 

It's really sad to see, that this bug is two months old and now shows up in major Linux Distributions, in the >>stable-branches<<.
Comment 11 Anisse Astier 2012-11-25 22:23:45 UTC
Indeed. Bug is fixed in 3.7, but I still have to backport the two commits I cited on kernel <= 3.6. These should be proposed to stable (I thought they were).
Comment 12 Alex Deucher 2012-11-26 03:01:56 UTC
They have been back-ported to certain stable kernels.  E.g.,
https://patchwork.kernel.org/patch/1590951/
I'll see if I can get them applied to 3.6 as well if they haven't already been.
Comment 13 Andre 2012-11-27 20:17:53 UTC
I have to correct myself. The bug hits me again, this time only the HDMI output was active.
Today after 4 hours the screen was blinking black and freezes except the mouse was movable.


I'm runing currently 
Linux 3.6.0-sabayon 
x11-drivers/xf86-video-ati-6.14.6-r1
media-libs/mesa-9_pre20120831-r1
x11-base/xorg-server-1.12.4

At the time the bug ocurs runing software:
KDE-Desktop with effects on
Chrome, Firefox, Thundebrird and
VMWare Workstation.

now testing with Desktop effects off.

/var/log/messages -> http://pastebin.com/qQkX8qAQ
/var/log/Xorg.0.log -> http://pastebin.com/v28mpB3i
dmesg -> http://pastebin.com/U0Dzr7Qf
Comment 14 Andre 2012-12-16 23:22:54 UTC
I have tried the new kernel 3.7 and have the same behavior as described  in the initial bug report.



/var/log/messages
http://pastebin.com/a4EN1kmL

I will recompile the kernel with debugging enabled to maybe get a more detailed kernel.log.
Comment 16 Andre 2012-12-17 16:11:28 UTC
Before i could crash X almost instantaneously by switching the kde (openGL) screensaver back and forth.

Now the system runs for almost two hours without any problems so far.
Comment 17 Andre 2012-12-18 22:23:33 UTC
Okay, after more than 30 hours uptime, i think this patch fixed it. :-)

Note You need to log in before you can comment on or make changes to this bug.