Bug 6626

Summary: Infrequent system freezes when using OpenGL on ATI radeon driver r100
Product: Drivers Reporter: Michael (auslands-kv)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: akpm, bugz.kernel.tormod, bunk, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16 Subsystem:
Regression: --- Bisected commit-id:
Attachments: kern.log from the time of the last crash
syslog from the time of the last crash
/var/log/messages from the time of the last crash
dmesg -s 1000000 (however, only starting after the last crash)

Description Michael 2006-05-30 06:15:00 UTC
Most recent kernel where this bug did not occur:
Distribution: Kanotix
Hardware Environment: IBM Thinkpad X31, ATI Mobility M6
Software Environment: Debian unstable
Problem Description:

First of all: I
Comment 1 Adrian Bunk 2006-05-30 12:15:51 UTC
Please attach the output of
  dmesg -s 1000000
Comment 2 Michael 2006-05-31 00:36:42 UTC
Created attachment 8227 [details]
kern.log from the time of the last crash
Comment 3 Michael 2006-05-31 00:37:22 UTC
Created attachment 8228 [details]
syslog from the time of the last crash
Comment 4 Michael 2006-05-31 00:37:58 UTC
Created attachment 8229 [details]
/var/log/messages from the time of the last crash
Comment 5 Michael 2006-05-31 00:39:25 UTC
Created attachment 8230 [details]
dmesg -s 1000000 (however, only starting after the last crash)
Comment 6 Michael 2006-05-31 00:39:49 UTC
Dmesg is a bit late for the last freeze as the ring buffer has run out already.
I still attach the output, but the first message is already after the restart of
the system.

I also attach /var/log/syslog and /var/log/messages and kern.log from the time
of the crash. However, as I have written above, there is (at leat to my eyes) no
relevant message in it. I always check syslog, messages and kern.log directly
after such a crash and it always seems that the crash is instantaneous. No time
for the kernel to write any relevant message.

That
Comment 7 Michel D 2006-05-31 23:40:43 UTC
This is most likely an issue with the OpenGL driver, or possibly the X driver.
Please check the Mesa and xorg products at http://bugs.freedesktop.org and file
a new bug there in the unlikely event that there is none about this yet. Attach
full X config and log files there, and try running in depth 16 and not enabling
Option "DynamicClocks" if you haven't yet.
Comment 8 Michael 2006-06-01 05:32:50 UTC
Michel:

Thanks for your help. :)

I started with the bugtracker mentioned (strangely searching for "bugtracker
xorg" with google did not give any links to this). There are quite a lot bugs in
there that report freezes or lockup on Radeon HW. However, I am still not sure
if one of them resembles my case or not.

Before filing a new bug, I thought it's best to try a newer verion of the DRI
driver. I was still using the one from debian unstable (050528 !), but didn't
know how to get a newer one. I now found the download address in one of the bug
reports.

I tried to install the binary builds from 060403. However, the install script
does not find the correct directories on a debian system. Fortunately, a newer
version of libgl1-mesa-dri featuring the 060327 driver appeared just yesterday
in debian alioth (experimental).

So now I have installed this DRI driver. Nearly all works so far (only the game
chromium refuses to run). I will now see, whether the freeze still occurs (may
take some time...).

BTW. I am running on 16 bit always. But I do use DynamicClocks in order to save
energy on my laptop. If the newer driver does not prevent another freeze, I will
disable the dynamicclocks next. If this doesn't help either, I will file a new
bug report.

I have to admit I was quite disappointed to see that there is nowhere a method
described to get more debug info in order to find the culprit of such freezes.
It's awful to see how many people have some kind of freeze and no way to find
out what
Comment 9 Michael 2006-06-01 10:39:50 UTC
Well, o.k. The new driver does not help. Just tried Enemy Territory and after 5
minutes the system froze again.

Now I will try disabling dynamicclocks (after already having disabled FastWrites
and going back to AGP Mode 1x).

Michael

Btw. there is no message related to the crash in either dmesg (only starts with
the reboot), syslog, messages, kernlog or xorg.0.log
Comment 10 Michael 2006-06-05 03:20:21 UTC
o.k. next freeze (again under enemy territory), this time without DynamicClocks.

I will file a bug at freedesktop.org, but I
Comment 11 Andrew Morton 2007-01-31 02:00:01 UTC
Is this still happening in current kernels?

Some people have found that X is stable if you do NOT have the radeon fbdev
driver loaded.  Did you have it loaded?  If so, can you try that?

Also, increasing the AGP window size in BIOS might help, if that's possible.
Comment 12 Michael 2007-02-13 10:03:11 UTC
Sorry, I needed some time to check it. In the last months I tried to not use any
OpenGL app, so I changed the screensaver to something simple and so on.

Now, I first upgraded to kernel 2.6.19.
The X system is version 7.1.1
The ati driver is module version 6.6.3.
The radeon driver is submodule version 4.2.0.
The openGL driver is Mesa DRI Radeon 20060327 AGP 1x NO-TCL

Testing is always not that easy as these freezes are really, really infrequent.
So I tried a few games, used google earth. All worked well.

I then started beryl and wow, it seems, some errors have been remedied in the
driver. Earlier, I only saw a quarter of the screen and everything was very
slow. Now everything works fine and fast!!! What a nice eye-candy!!!!

After 4 hours, unfortunately, the system froze again completely :-( :-( :-(

So, it's a pity, but the problem does still exist.

Concerning the frame buffer: I think the kanotix kernel does NOT use the
radeonfb but vesa frame buffer (at least the kernel config says that vesafb is
"y" and radeonfb is "m", and the latter one is not loaded).

I haven't found anything in the bios where I could change the AGP window size.
What number should I look for and what should it be?

Thanks and kind regards

Michael 
Comment 13 Ben Blum 2007-03-20 18:42:34 UTC
I've lately been having exactly the same problem, and I'm fairly sure I can
pinpoint it to the in-kernel DRM.

I'm an Intel (i810 driver) user, so this rules out the ATI drivers causing it.
Also, I've been using Beryl for a while (via AIGLX) and I have never had it lock
up on me during regular usage. Lockups have occurred for me while playing Quake
3, Project 64 (via Wine, with the glN64 graphics plugin which uses openGL), and
Jedi Academy (via Wine, which I believe uses DirectX). Here's a bug I filed on
freedesktop.org about it, in which I explain in detail what happens to me:
https://bugs.freedesktop.org/show_bug.cgi?id=10330 

The problem started around the time I upgraded from the 2.6.18 kernel to 2.6.19.
I downgraded to .18 and the problem persisted. However, I recently realized that
when I switched to 2.6.19, x11-drm (a non-kernel DRM module) broke, and I
switched then to the in-kernel DRM. Therefore I'm fairly sure the problem is
caused by the in-kernel DRM driver.

Regards,
Ben Blum
Comment 14 Ben Blum 2007-03-20 20:34:46 UTC
Whoops, scratch that. After a bit of testing, it appears I still get this when
using x11-drm and 2.6.18 kernel. I have no idea what's causing this, but I'd
love to see it solved shortly.
Comment 15 Ben Blum 2007-03-24 11:01:25 UTC
I'd had my motherboard overclocked to 108%, and moving it down to 100% fixed the
problem. To whomever posted this: Is your motherboard/graphics card overclocked?
If so, try it at regular speed.
Comment 16 Natalie Protasevich 2007-06-21 17:00:10 UTC
Please test with recent kernel and confirm if the problem is still there.
Thanks.
Comment 17 Michael 2007-06-21 22:58:00 UTC
Hi

Peter: No, this is not an overclocked system. This is an IBM laptop.

Natalie: What do you consider a recent kernel? A released one (2.6.21), a release candidate or a bleeding edge one (e.g. GIT)?

At the moment I am on 2.6.21. But I have turned off any OpenGL app as I use this system as a productive one and don't like to have severe crashes here and then.

A test is not so simple as these crashes really are infrequent. When I posted the bug, it took something between half an hour and 2 weeks between the crashes. The only similarity was that always an OpenGL app was running (e.g. OpenGL screensaver). Before I started using OpenGL (and after I disabled OpenGL apps) the system never crashed.

Secondly, such a crash often led to data loss on my system. It seemed to help to switch on the sync option on the disk driver, but that slowed down the system noticebly.

So, do you have any specific reason that the problem could be cured in a recent kernel?

Thanks and best regards

Michael
Comment 18 Dave Airlie 2007-06-21 23:14:58 UTC
this most likely the video driver, not the kernel so probably should be in freedesktop.orgs bug tracker.

I can't think of anything else to help fix it, running in PCI mode might be the only thing, but there were certain bugs on those chips we haven't tracked down due to the 3d driver.
Comment 19 Natalie Protasevich 2007-06-22 00:23:23 UTC
Michael,
I was thinking about 2.6.22-rc5, because it has a huge update for usb and video etc. and refreshing this issue would be good with particularly this release.
However this problem is striking to me, because I ran into similar one myself. I've looked on the web and found that Xorg+pretty recent kernels produce hangs just like yours and they range from keybord, and/or mouse, up to hanging a whole system. I am working on building latest kernel and instrumenting it. What makes this harder is just like you said - hangs are very random and rare, and there is no way to reproduce them at will. 
Look into bug 6645 - similar problem, check if it is same as yours and the workaround applies to your case.
Comment 20 Natalie Protasevich 2008-03-24 12:59:48 UTC
Any new updates on this problem? Michael, have you tried newer kernel/X?

If so the problem should probably be reported on xorg or dri lists: xorg@lists.freedesktop.org, dri-devel@lists.sourceforge.net
Comment 21 Michael 2008-10-17 02:37:22 UTC
Just to let you know. I am now on 2.6.27-rc6, xserver-core 1.4.2, xorg 7.3, and the problem is still there.

I needed to install some openGL apps (I did not use any before because of the crashes), and it took three days until I got my freeze :-( Again, nothing in the syslog or anywhere else.

Being more than two years old now, I guess this bug will never be found. Only solution seems to upgrade to another hardware or just not to use any opengl apps. Unfortunately, I am still very happy with my Thinkpad X31, otherwise. So I guess I will go without opengl for another two to three years, until there is a stronger reason to upgrade the hardware. Pity, I like compiz...

Thanks anyway and kind regards

Michael
Comment 22 Tormod Volden 2008-10-17 04:20:42 UTC
Michael, did you ever try turning off AGP? Wrong AGPMode setting, or AGP at all, is known to cause system freezes. Add this to your Device section:
 Option "BusType" "PCI"
If that works out well, you can remove that line and instead try
 Option "AGPMode" "2"
or try 4 instead of 2.
Comment 23 Michael 2008-10-17 04:50:58 UTC
Tormod, well, I am pretty sure that I have tried many xorg options when I first reported the bug (two years ago) including different AGPmodes, but I will try BusType PCI again.

Due to the infrequent nature of these freezes it may, however, take days or weeks until I can give feedback.

Thanks for your help,

Michael
Comment 24 Michael 2008-10-17 06:11:20 UTC
Hmm, one question: Is AGP only used in opengl apps? I guess not. It rather seems to be the basic hardware communication protocol, isn't it?

Because, without opengl I don't have any freezes of the machine, whatsoever. Only with opengl apps (e.g. compiz, googleearth) I occasionally experience these crashes. Wouldn't this rule out an effect of the option "BusType" "PCI"?
Comment 25 Tormod Volden 2008-10-17 06:27:09 UTC
AFAIK, the DRI is the only component that uses AGP transfers with the ATI cards. If you don't enable DRI, AGP is not used, only basic PCI over the AGP slot. 
Comment 26 Jérôme Glisse 2008-10-17 06:36:07 UTC
Using option PCI often help in lockup case, AGP is wacky.
And having freeze only with gl apps doesn't rule out this
option.
Comment 27 Alex Deucher 2008-10-17 07:20:55 UTC
PCI vs. AGP only affects the GART setup for GPU access to buffers in system memory (command buffers, vertex buffers, etc.).  AGP tends to be problematic.  The radeon PCI GART interface is usually pretty stable.