Created attachment 80131 [details] Dmesg of the problem Hi, Randomly I might get a bug on an AMD E1-1200 : screen won't turn on and when X is supposed to start I get a lot of these: [ 307.718332] radeon 0000:00:01.0: couldn't schedule ib [ 307.718348] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 307.719063] radeon 0000:00:01.0: couldn't schedule ib [ 307.719075] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 307.734461] radeon 0000:00:01.0: couldn't schedule ib [ 307.734476] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [ 307.735489] radeon 0000:00:01.0: couldn't schedule ib [ 307.735502] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! They stop once X is stopped. Full dmesg in attachment. Extract of my xorg.conf: Section "Device" Identifier "Card0 Driver "radeon" Option "DRI" EndSection
Correction: screen is blank, but backlight is on.
Is this a regression? If so what components did you change (kernel, mesa 3D driver, radeon xorg driver, etc.).
I'm not sure. Since it's random I need to do a lot of tests to reproduce it. I'll try to test with 3.2. Sometimes, screen might turn on, but I get vertical black and white stripes. (I have a photo if you need one). My current stack is: - kernel 3.5.3, but it also crashes/bugs with kernel 3.6.0-rc5 - mesa 8.0.4 - DDX 6.14.4 - xserver 1.7.7 Not sure it's relevant since the problem starts before X when modesetting should kick in (screen is blank).
I think it is indeed a regression. After many power cycles, I wasn't able to reproduce it with 3.2.16. This is going to take forever to bisect…
I did a first round of bisect: 375849ce878f1a2dfac12ea2a62b361ab6b7f9a5 is the first bad commit commit 375849ce878f1a2dfac12ea2a62b361ab6b7f9a5 Author: Jerome Glisse <jglisse@redhat.com> Date: Fri Jul 27 16:32:24 2012 -0400 drm/radeon: do not reenable crtc after moving vram start address commit 81ee8fb6b52ec69eeed37fe7943446af1dccecc5 upstream. It seems we can not update the crtc scanout address. After disabling crtc, update to base address do not take effect after crtc being reenable leading to at least frame being scanout from the old crtc base address. Disabling crtc display request lead to same behavior. So after changing the vram address if we don't keep crtc disabled we will have the GPU trying to read some random system memory address with some iommu this will broke the crtc engine and will lead to broken display and iommu error message. So to avoid this, disable crtc. For flicker less boot we will need to avoid moving the vram start address. This patch should also fix : https://bugs.freedesktop.org/show_bug.cgi?id=42373 Problem: I'm not sure this is in fact the culprit, and that this commit didn't just uncover another bug. During the bisect between v3.5 and v3.5.3 I had another problem: machine would some time block at boot with a black screen (no backlight); I ignored it because it didn't have the same symptom (boot ok, blank screen, backlight on, error on X start). That's why I'm not sure if the bug isn't more ancient (between v3.2 and v3.5).
This bug sounds kind of like the bugs that 375849ce878f1a2dfac12ea2a62b361ab6b7f9a5 fixes. Can you see if this is fixed in my drm-next-3.7-wip branch: http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.7-wip I have a more complete set of patches to properly disable MC clients when changing the MC.
After about 180 reboots, I can confirm that this bug isn't present anymore on drm-next-3.7-wip. But I wasn't able to find commit 375849ce878f1a2dfac12ea2a62b361ab6b7f9a5 . Which commit do you think contains the fix ? I still have a seemlingly unrelated "screen off/lockup on reboot" problem (that shouldn't be a regression). But this is a story for another bug report.
The commit is: drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2) As of right now, it's here: http://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-3.7-wip&id=297a34942adb9c547e922258263cd2642ebca61b but I rebase that branch regularly so it may change.
I was able to reproduce this bug on an AMD E2-1800 with kernel 3.6. I can confirm that backporting those two patches fixes the problem: 4a15903 drm/radeon/dce4+: don't use radeon_crtc for vblank callback 62444b7 drm/radeon: properly handle mc_stop/mc_resume on evergreen+ (v2) It would be great if they could be imported to stable, if they are not demmed too invasive.
I can confirm this bug on my AMD A8-3550-m Kernel 3.5.7 and 3.6.6 -> radeon.modeset=1 The trouble began about tree days ago, as i updated my Gentoo system (stable branch) and the following updates where installed: xorg-server 1.12.4 => 1.13.0-r1 mesa 8.0.4-r1 => 9.0 xf86-video-ati 6.14.6-r1 => 7.0.0 After that i had strange artifacts in X and random crashes of the X-server. Where the mouse is movable but the windows are frozen and the screen turns black after a a short time. I have test-installed 3 other linux distributions, all came out with the same bug, after upgrading the installations to the latest stable releases. Sadly only after tree days of experimenting, i came to the conclusion that this error only occurs when the VGA output is active. With the LVDA or HDMI output i have no problems so far. It's really sad to see, that this bug is two months old and now shows up in major Linux Distributions, in the >>stable-branches<<.
Indeed. Bug is fixed in 3.7, but I still have to backport the two commits I cited on kernel <= 3.6. These should be proposed to stable (I thought they were).
They have been back-ported to certain stable kernels. E.g., https://patchwork.kernel.org/patch/1590951/ I'll see if I can get them applied to 3.6 as well if they haven't already been.
I have to correct myself. The bug hits me again, this time only the HDMI output was active. Today after 4 hours the screen was blinking black and freezes except the mouse was movable. I'm runing currently Linux 3.6.0-sabayon x11-drivers/xf86-video-ati-6.14.6-r1 media-libs/mesa-9_pre20120831-r1 x11-base/xorg-server-1.12.4 At the time the bug ocurs runing software: KDE-Desktop with effects on Chrome, Firefox, Thundebrird and VMWare Workstation. now testing with Desktop effects off. /var/log/messages -> http://pastebin.com/qQkX8qAQ /var/log/Xorg.0.log -> http://pastebin.com/v28mpB3i dmesg -> http://pastebin.com/U0Dzr7Qf
I have tried the new kernel 3.7 and have the same behavior as described in the initial bug report. /var/log/messages http://pastebin.com/a4EN1kmL I will recompile the kernel with debugging enabled to maybe get a more detailed kernel.log.
Does this patch fix the issue? http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-next&id=bd25f0783dc3fb72e1e2779c2b99b2d34b67fa8a
Before i could crash X almost instantaneously by switching the kde (openGL) screensaver back and forth. Now the system runs for almost two hours without any problems so far.
Okay, after more than 30 hours uptime, i think this patch fixed it. :-)