Bug 34772

Summary: [radeon] [R300] GPU lockups with when KMS is enabled
Product: Drivers Reporter: Rogério Brito (rbrito)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: REOPENED ---    
Severity: normal CC: alan, alexdeucher, rbrito, schwab, szg00000, xerofoify
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output right after the lock up, obtained via the network
A dmesg log from 2.6.39-rc7 showing problems.
The log of X with the 2.6.39-rc7 kernel
A dmesg log with 2.6.38 kernel
Log from X with the kernel 2.6.38
dmesg log with 2.6.39-rc7 with KMS + agpmode=-1 + no_wb=1
X log with 2.6.39-rc7 + KMS + agpmode=-1 + no_wb=1
Allow forcing on all GPU clocks

Description Rogério Brito 2011-05-09 23:26:44 UTC
Created attachment 57062 [details]
dmesg output right after the lock up, obtained via the network

Hi there.

I have been getting some Oopses/stack traces when I try to use my iBook G4 (with an "ATI Technologies Inc M11 NV/FireGL Mobility T2e" card) and I enable KMS.

The userland here is Debian unstable with the DRM from experimental, but I am willing to test anything that you would like me to.

For example, attached is the last of a series of such Oopses that I got when I tried to test if a video was playing or not with mplayer.

I tried to use 2.6.39-rc{5,6}, but upon boot I get messages telling me that there were failures and that hardware acceleration will be disabled and I that I get is a desktop with colors distorted like if there were some endianness issues.

This is, BTW, part of my attempts to get Linux running well on PowerPC, with some of my logs (with photos) present at my homepage:


    http://www.ime.usp.br/~rbrito/linux/debug-r300/

Please, if there is anything that I can provide to fix this, let me know and I will do my best.


Thanks, Rogério Brito.
Comment 1 Rogério Brito 2011-05-19 20:34:40 UTC
Just for the record, I can provide further messages of these: this is as reproducible as I like.

In fact, I am now able to reproduce it with kernel 2.6.38 if I boot the iBook G4 with the options:

"video=radeonfb:off radeon.agpmode=-1 radeon.modeset=1"

and play a video with mplayer.

If, OTOH, I leave off the KMS, then I don't get the GPU lockups that I reported.

Anyway, things are *way* better with 2.6.38 than with 2.6.39, as with 2.6.39 the kernel doesn't even get the colors correctly---everything that should be red becomes blue and so forth (any kind of endianness problem?).

I am attaching here another stacktrace, in case it helps.


Regards,

Rogério Brito.
Comment 2 Rogério Brito 2011-05-19 20:36:24 UTC
Created attachment 58602 [details]
A dmesg log from 2.6.39-rc7 showing problems.
Comment 3 Rogério Brito 2011-05-19 20:37:08 UTC
Created attachment 58612 [details]
The log of X with the 2.6.39-rc7 kernel
Comment 4 Rogério Brito 2011-05-19 20:38:14 UTC
Created attachment 58622 [details]
A dmesg log with 2.6.38 kernel

Please, notice the GPU hang with kernel 2.6.38.
Comment 5 Rogério Brito 2011-05-19 20:38:56 UTC
Created attachment 58632 [details]
Log from X with the kernel 2.6.38
Comment 6 Michel Dänzer 2011-05-20 12:11:38 UTC
(In reply to comment #1)
> Anyway, things are *way* better with 2.6.38 than with 2.6.39, as with 2.6.39
> the kernel doesn't even get the colors correctly---everything that should be
> red becomes blue and so forth (any kind of endianness problem?).

That's probably nothing to do with the kernel directly but endianness bugs in the X driver when acceleration is not available.

It would be interesting if you could bisect what broke acceleration with radeon.agpmode=-1. Note that you should boot with radeon.no_wb=1 as well for this, as CP writeback was only fixed during the 2.6.39 cycle (in commit dc66b325f161bb651493c7d96ad44876b629cf6a).
Comment 7 Michel Dänzer 2011-05-20 14:31:00 UTC
I was able to reproduce the acceleration initialization failure with the Debian 2.6.39-rc7-powerpc kernel, but not with a self-built 2.6.39 kernel. So this was probably just an intermittent problem during the 2.6.39 cycle, e.g. due to the intermittent broken usage of the DMA API by TTM.

As for the GPU lockups, does radeon.dynclks=1 help for those?
Comment 8 Andreas Schwab 2011-05-20 20:58:03 UTC
radeon.dynclks=1 causes the wrong resolution to be selected.  It thinks something is conncted to the S-video port with a max resolution of 800x600, so it selects this instead of the native resolution (1024x768).

-<6>Console: switching to colour frame buffer device 128x48
+<6>[drm] crtc 1 is connected to a TV
+<6>Console: switching to colour frame buffer device 100x37

+(II) RADEON(0): Printing probed modes for output S-video
+(II) RADEON(0): Modeline "800x600"x59.9   38.25  800 832 912 1024  600 603 607 624 -hsync +vsync (37.4 kHz)
+(II) RADEON(0): Modeline "640x480"x59.9   25.18  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz)
+(II) RADEON(0): Modeline "320x240"x60.1   12.59  320 328 376 400  240 245 246 262 doublescan -hsync -vsync (31.5 kHz)
 (II) RADEON(0): Output LVDS connected
 (II) RADEON(0): Output VGA-0 disconnected
-(II) RADEON(0): Output S-video disconnected
+(II) RADEON(0): Output S-video connected
 (II) RADEON(0): Using exact sizes for initial modes
-(II) RADEON(0): Output LVDS using initial mode 1024x768
+(II) RADEON(0): Output LVDS using initial mode 800x600
+(II) RADEON(0): Output S-video using initial mode 800x600
Comment 9 Rogério Brito 2011-05-21 09:16:28 UTC
Hi, Michel.

On Fri, May 20, 2011 at 12:11,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> --- Comment #6 from Michel Dänzer <michel@daenzer.net>  2011-05-20 12:11:38
> ---
> (In reply to comment #1)
>> Anyway, things are *way* better with 2.6.38 than with 2.6.39, as with 2.6.39
>> the kernel doesn't even get the colors correctly---everything that should be
>> red becomes blue and so forth (any kind of endianness problem?).
>
> That's probably nothing to do with the kernel directly but endianness bugs in
> the X driver when acceleration is not available.

OK, then that's a separate issue. Good to know.

> It would be interesting if you could bisect what broke acceleration with
> radeon.agpmode=-1.

Oooh, I guess that I made some mess in your head here, taking into
account the other messages of us. To clear things up: When I use
2.6.38, it works mostly OK if I use radeon.agpmode=-1. It is
sufficiently stable to the point that I told you that this setting was
OK. But, in fact, if I play a video with mplayer, then it always (so
far, 100% reproducible) causes those GPU lockups, but the computer is
still accessible via the network, so that I can take the logs etc. If,
instead, I use 1 instead of -1, then, even with kernel 2.6.38, I get
those lysergide-like :-) pictures that I put on my homepage (but, for
documentation purposes, I am thinking of uploading here as
attachments, as I am quite short of space there).

With kernel 2.6.39, I have not been able to get anything working,
whether or not I pass any option to the kernel.

Summary:

* 2.6.38 with KMS and agpmode=-1: OK, up to me trying to play some
video, then GPU lockups.
* 2.6.38 with KMS and agpmode=1: GPU lockups a few seconds after X
loads (it *does* show up, but locks up a few seconds latter).
* 2.6.39 with KMS and agpmode=-1: Not OK, even if I don't use anything
accelerated (problems with colors and software rendering).

So, I am not quite sure if it would be the case of bisecting or, at
least, what would be a good starting point. I can, though, try to boot
with many other kernels to see if I can (provided that udev doesn't
stop me).

> Note that you should boot with radeon.no_wb=1 as well for

OK. I can try no_wb=1 with agpmode=-1 and report back in a few
moments, to see if the lockups are still there or not.

> this, as CP writeback was only fixed during the 2.6.39 cycle (in commit
> dc66b325f161bb651493c7d96ad44876b629cf6a).

Right. Thanks for that fix of yours (just read the commit).


Regards,
Comment 10 Rogério Brito 2011-05-21 09:23:27 UTC
Hi there.

On Sat, May 21, 2011 at 09:16,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> OK. I can try no_wb=1 with agpmode=-1 and report back in a few
> moments, to see if the lockups are still there or not.

Just for the record, 2.6.38 with KMS + agpmode=-1 + no_wb=1 still
locks up the GPU when I play a video with mplayer.

I will try with 2.6.39 with the same settings.


Thanks,
Comment 11 Rogério Brito 2011-05-21 09:34:20 UTC
Another test.

On Sat, May 21, 2011 at 09:23,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> On Sat, May 21, 2011 at 09:16,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
>> OK. I can try no_wb=1 with agpmode=-1 and report back in a few
>> moments, to see if the lockups are still there or not.
>
> Just for the record, 2.6.38 with KMS + agpmode=-1 + no_wb=1 still
> locks up the GPU when I play a video with mplayer.

Just for the record #2, 2.6.38 with KMS + agpmode=-1 + no_wb=1 +
dynclks=1 still locks up the GPU when I play a video with mplayer.

Besides that, like Andreas, with dynclks=1 the resolution is reduced
to be 800x600. I didn't have the opportunity to read the X logs
regarding the S-Video port, but, at least for the user, iBooks
(differently from PowerBooks) don't have user-accessible S-Video ports
(but this doesn't prevent Apple from having inutilized them somehow).


Thanks,
Comment 12 Rogério Brito 2011-05-21 09:42:12 UTC
On Sat, May 21, 2011 at 09:34,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> On Sat, May 21, 2011 at 09:23,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
>> On Sat, May 21, 2011 at 09:16,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
>>> OK. I can try no_wb=1 with agpmode=-1 and report back in a few
>>> moments, to see if the lockups are still there or not.
>>
>> Just for the record, 2.6.38 with KMS + agpmode=-1 + no_wb=1 still
>> locks up the GPU when I play a video with mplayer.

Wooow! Oopsen galore with 2.6.39 with KMS + agpmode=-1 + no_wb=1...
Five in a row.

OK, probably only the first one matters. Then, it stays there and
doesn't load the system... Actually, as I am writing this thing, after
about 180 seconds, the boot process is continuing and X is being
loaded, but with the wrong colors (the "endianness issue"). I will try
to see if the network is available and attach here what I get from
dmesg.

BTW, I hope that you don't mind me providing copious amounts of
testing here (and their results) in the hope to get this fixed... :-)
Comment 13 Rogério Brito 2011-05-21 09:47:45 UTC
Created attachment 58892 [details]
dmesg log with 2.6.39-rc7 with KMS + agpmode=-1 + no_wb=1
Comment 14 Rogério Brito 2011-05-21 09:50:35 UTC
Created attachment 58902 [details]
X log with 2.6.39-rc7 + KMS + agpmode=-1 + no_wb=1
Comment 15 Rogério Brito 2011-05-21 09:56:33 UTC
With 2.6.39-rc7 + KMS + agpmode=-1 + no_wb=1 + dynclks=1:

* I don't get the Oopsen.
* the resolution is restricted to 800x600.
* XV is not available to mplayer or other applications.

I think the XV extension not working is something that has always happened with 2.6.39 kernels.


Thanks,

Rogério Brito.
Comment 16 Michel Dänzer 2011-05-21 14:54:25 UTC
(In reply to comment #15)
> With 2.6.39-rc7 + KMS + agpmode=-1 + no_wb=1 + dynclks=1:
> 
> * XV is not available to mplayer or other applications.

When the kernel radeon driver fails to initialize acceleration, there's no point in trying any functionality that needs acceleration, such as XVideo.

I don't think there's any point doing any more tests with 2.6.39-rc7, as it's obviously suffering from additional issues which only occurred intermittently during the 2.6.39 cycle.


(In reply to comment #12)
> BTW, I hope that you don't mind me providing copious amounts of
> testing here (and their results) in the hope to get this fixed... :-)

Well, I'm afraid less quantity but more quality would be better... It's becoming rather difficult and time-consuming to find the relevant pieces of information in this mass.


(In reply to comment #11)
> Just for the record #2, 2.6.38 with KMS + agpmode=-1 + no_wb=1 +
> dynclks=1 still locks up the GPU when I play a video with mplayer.

Has either of you tried agpmode=1 dynclks=1? Does that increase stability at all?

> Besides that, like Andreas, with dynclks=1 the resolution is reduced
> to be 800x600. I didn't have the opportunity to read the X logs
> regarding the S-Video port, but, at least for the user, iBooks
> (differently from PowerBooks) don't have user-accessible S-Video ports
> (but this doesn't prevent Apple from having inutilized them somehow).

I thought there was some kind of multimedia adapter for the external output.

Anyway, it should be possible to override the incorrect output detection, either on the kernel command line with something like video=S-video-1:d or later in xorg.conf or during X runtime with something like xrandr.

But really, we need to focus on one problem per bug report as much as possible, or things are getting out of hand.


(In reply to comment #9)
> So, I am not quite sure if it would be the case of bisecting or, at
> least, what would be a good starting point.

No, there's no point in bisecting, as that problem should be gone with 2.6.39 final.

> > Note that you should boot with radeon.no_wb=1 as well for
> 
> OK. I can try no_wb=1 with agpmode=-1 and report back in a few
> moments, to see if the lockups are still there or not.

no_wb=1 would only have been important for bisecting, to avoid the writeback endianness bug interfering.


P.S. beware of Debian package udev version 169-1: IME an initrd generated with that installed prevents the radeon module from being loaded automatically, and when trying to load it manually, it fails to load the CP microcode and consequently fails to initialize acceleration.
Comment 17 Rogério Brito 2011-05-21 15:34:37 UTC
Hi, Michel.

Thank you very much for the attention.

(In reply to comment #16)
> When the kernel radeon driver fails to initialize acceleration, there's no
> point in trying any functionality that needs acceleration, such as XVideo.

OK.

> I don't think there's any point doing any more tests with 2.6.39-rc7, as it's
> obviously suffering from additional issues which only occurred intermittently
> during the 2.6.39 cycle.

Right.

> Well, I'm afraid less quantity but more quality would be better... It's
> becoming rather difficult and time-consuming to find the relevant pieces of
> information in this mass.

Indeed, it is getting out of hand pretty quickly. Do you want me to give you some SSH access to this notebook? Or, if that's not feasible/useful, what would you like me to test as the next step, so that I avoid flooding you with so much data?

> Has either of you tried agpmode=1 dynclks=1? Does that increase stability at
> all?

I will try those. But with which kernel? I have been avoiding compiling a kernel nowadays, since they take ages on this notebook, but I can set up a cross-compilation environment, if necessary.

BTW, would you mind sharing your .config?

> I thought there was some kind of multimedia adapter for the external output.

The only external adapter is one to a VGA port. No traces of S-video here.

> But really, we need to focus on one problem per bug report as much as
> possible,
> or things are getting out of hand.

OK, I can file a separate bug for this S-Video issue, then.



Thank you so much for your patience,

Rogério Brito.
Comment 18 Alex Deucher 2011-05-21 15:38:57 UTC
apples sells VGA to s-video adapters, so we list both connectors in the driver.
Comment 19 Rogério Brito 2011-05-21 15:42:21 UTC
(In reply to comment #18)
> apples sells VGA to s-video adapters, so we list both connectors in the
> driver.

Oh, sorry for the ignorance.
Comment 20 Michel Dänzer 2011-05-21 16:47:10 UTC
Created attachment 58922 [details]
Allow forcing on all GPU clocks

(In reply to comment #17)
> > Has either of you tried agpmode=1 dynclks=1? Does that increase stability
> at
> > all?
> 
> I will try those. But with which kernel?

2.6.38 should be fine for this test. But at some point it'll probably be useful for you to be able to try kernel patches. Once you've built a kernel, building the radeon module with a patch shouldn't take long.

E.g., you guys could try this patch, and booting with radeon.dynclks=0, which should force on all GPU clocks. Does that increase stability with agpmode=1 or agpmode=-1?


> BTW, would you mind sharing your .config?

My .config still takes 1-2 hours to build on this 1.6 GHz PowerBook. If that could help you, please ask for it on the debian-powerpc list.
Comment 21 Michel Dänzer 2011-05-21 16:58:59 UTC
Would also be interesting if one of you guys could attach dmesg with agpmode=1.
Comment 22 Rogério Brito 2012-10-28 18:51:45 UTC
Just for the record,

I can still provide the information, as I am going to reinstall Linux on the iBook.


Thanks in advance,

Rogério Brito.
Comment 23 xerofoify 2014-06-25 02:02:45 UTC
This bug needs to be tested against a newer kernel to see if it's fixed.
Cheers Nick
Comment 24 Rogério Brito 2014-06-28 10:45:22 UTC
Hi, Nick.

On Jun 25 2014, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=34772
> --- Comment #23 from xerofoify@gmail.com ---
> This bug needs to be tested against a newer kernel to see if it's fixed.
> Cheers Nick

OK, I think that this may be easier to test than the previous issue, but, if
I recall correctly, this issue was so fragile that almost anything crashed
it.

Again, as my other e-mail, please ping me if I don't respond, as I am
swamped with work.


Thanks,