Bug 19002 - Radeon rv730 AGP/KMS/DRM kernel lockup
Summary: Radeon rv730 AGP/KMS/DRM kernel lockup
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks: 16444
  Show dependency tree
 
Reported: 2010-09-23 16:48 UTC by Duncan
Modified: 2010-12-15 22:19 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.36-rc5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
workaround for 7xx AGP (2.04 KB, patch)
2010-12-06 20:00 UTC, Alex Deucher
Details | Diff
lspci -vv on the VIA/K8 system (17.15 KB, text/plain)
2010-12-14 21:41 UTC, Holger Lenz
Details

Description Duncan 2010-09-23 16:48:07 UTC
+++ This bug was initially created as a clone of Bug #17702 +++

This is a follow-on to bug #17702, which I filed, and bug #17201, which it was a dup of.  I mentioned in #17201 that the fix only fixed part of my problem, getting me farther into starting X/KDE, but I still end up with a crash, now worse, as while it was an X crash before but left the kernel running, now it's a hard kernel lock.  I asked if I should file another bug, or... and was told to file it, so here it is, tho it took some time to get back to it.

Hardware again:  Older dual-dual-core Opteron 290, AMD 8xxx chipset, Radeon hd4650/RV730 (AGP).

Software: Gentoo/~amd64 Linux, xorg-server 1.9.0, xf86-video-ati 6.13.1, gcc 4.5.1, kde (also) 4.5.1.

The kernel config is attached to the previous bug.

The current situation (as of 2.6.36-rc5 plus 49 commits):

When I start KDE, it now gets to the desktop, but, with my ordinary activity config, freezes almost immediately.  I traced that freeze down to a single plasmoid, the comic-strip plasmoid.  With it deleted or deconfigured so all it shows when I start kde is a configure button instead of trying to render a comic, I get a working but highly unstable X/KDE which tends to crash within a few minutes as I work with windows, etc.  If I hit that configure button and load a comic, it will appear to fetch it from the net, then immediately crash as it tries to render it, same as it does when it's configured at startup.  So trying to render a comic (any comic) in that plasmoid causes an immediate hard kernel lockup, but with the plasmoid disabled so it won't render a comic, the system is still very unstable and locks up within a few minutes.

That's with DRI enabled in xorg.conf.d.  If I uncomment the Disable "dri" line in the modules section, thus disabling DRI, I have a stable (but incredibly slow and boring) system.  So it's definitely DRI related.

Back on rc3 in connection with the previous bug, I reverted the commit in question (the bisected to commit), and again had a stable system.  I ran it with that commit reverted, for several days without rebooting, full DRI, etc, twice.  But without that revert but with the patch said to fix that bug, the system is as above, reliably crashing within a few minutes or almost immediately upon reaching the desktop if I have that plasmoid configured, if DRI is enabled.  It was that way with the patch applied directly to rc3, and it's still that way with a "pure" rc5+49, today.

After rc3 I ran with 44437579efca258e3c4a09f59838c8f933611990 reverted for some time, with the system stable for days.  Yesterday I updated and tested pure mainline again.  It still locked up, so I switched to my revert branch again.  There was a single conflict in drivers/gpu/drm/radeon/r600.c.  After resolving it, I built and rebooted, and that's what I'm running now.  It works fine as long as that revert and conflict resolution is applied...

Question:  In the commit I'm reverting ( 44437579efca258e3c4a09f59838c8f933611990 ), in a couple places, there's this:

if ((rdev->family >= CHIP_RV770) && (rdev->family <= CHIP_RV740)) 

I believe I found where the families are defined in radeon_family.h, and the order is strange, 770 < 730 < 710 < 740, which explains the seemingly reversed logic in that if, but my chip is an RV730 (both as reported by the kernel, and based on the radeon manpage table entry for an hd4650).  Might it be on the wrong side of the if?  It looks to me like the ELSE is identical to the previous (working) behavior, so maybe my RV730 should be falling thru to the ELSE?

Otherwise... maybe they corrected the bug in the later production runs, or perhaps in the AGP bridge (if such would be possible) since I think it's native PCIE and requires one?  Is there a simple test I could run to see if that bug really does apply, and/or some serial/batch/revision number that could be used to distinguish between runs with and without the bug?

Because hardware bug or not, it sure seems like on my hardware it was working fine as it was, and now we're just screwing things up.
Comment 1 Rafael J. Wysocki 2010-09-23 18:27:12 UTC
What was the last known good kernel?

Have you updated the user space graphics components (the Xorg server, Mesa,
the Xorg radeon driver)?
Comment 2 Alex Deucher 2010-09-24 15:35:17 UTC
(In reply to comment #0)
> Question:  In the commit I'm reverting (
> 44437579efca258e3c4a09f59838c8f933611990 ), in a couple places, there's this:
> 
> if ((rdev->family >= CHIP_RV770) && (rdev->family <= CHIP_RV740)) 
> 
> I believe I found where the families are defined in radeon_family.h, and the
> order is strange, 770 < 730 < 710 < 740, which explains the seemingly
> reversed
> logic in that if, but my chip is an RV730 (both as reported by the kernel,
> and
> based on the radeon manpage table entry for an hd4650).  Might it be on the
> wrong side of the if?  It looks to me like the ELSE is identical to the
> previous (working) behavior, so maybe my RV730 should be falling thru to the
> ELSE?

All r7xx asics are affected by that issue, including the rv730.
Comment 3 Rafael J. Wysocki 2010-10-10 17:24:13 UTC
So, what's the status here?
Comment 4 Duncan 2010-10-12 14:45:49 UTC
I seem to have missed a couple email update notifications...

The status as of 2.6.36-rc7-149 (pulled a couple hours ago) is basically unchanged, if not slightly worse.  Without the revert, I still can't get into X/KDE.

The "slightly worse" is because I've updated to kde 4.5.2 now, and it won't take xorg.conf, section module, disable "dri", at all -- it crashes X instead, leaving me back at a KMS text mode command prompt.  I tried with earlier kernels that worked before (with kde 4.5.1 and no dri), too, and it still failed, so that's NOT a kernel issue, but it does mean I have no working method of testing no-dri at all.  I can get working dri on the old kernels (or the new one with that commit reverted), but can't test no-dri on any kernel, now.

The last tested good kernel was 2.6.35, as I thought was plain based on the bugs I noted it was a followup to.  I still get a working kernel with commit 44437579efca258e3c4a09f59838c8f933611990 reversed, basically, in drivers/gpu/drm/radeon/r600.c and etc,

WREG32(R_005480_HDP_MEM_COHERENCY_FLUSH_CNTL, 0x1);

instead of the if statements my hardware is obviously coming up on the wrong side of.

FWIW, 2.6.35.4 and 2.6.35.7, which I tested from the stable series, have the same problem.

I already mentioned that I'm running the latest kde 4.5.2, and you know my kernel is testing series.  xorg-server, mesa, libdrm, and xf86-video-ati are equally updated; I'm currently running xorg-server 1.9.0.901 (1.9.1-rc1), tho I tested 1.9.0 as well, mesa 7.9, libdrm 2.4.22, and xf86-video-ati 6.13.2.  gcc 4.5.1 is the current compiler, with a fresh full system rebuild a couple days ago, and I'm running glibc 2.12.1 (gentoo revision -r1). Some of those are updates from the original report back on bug #17702 which this is a clone of.

Possibly complicating factors include AGP, and AMD 8000 chipset so including agpgart IOMMU taking up part of the AGP window.

If all r7xx are affected, why on earth is it working for me with that commit reverted, and not working for me with it applied?  Based on the fact that my Radeon 4650 card is quite new, I'm still wondering if, somehow, whatever problem that commit I'm reverting is supposed to fix doesn't apply to this card revision.

Or perhaps it's a mixup with AGPGART.

I did note one of the recent drm/radeon commits updated an ID to categorize it correctly, but the retail package labeling, the kernel, and the radeon manpage, all seem to agree that the hd4560 I bought should be an rv730, so it'd seem the ID is right.  But the hardware sure doesn't like the stuff in that if statement I'm reverting, that's for sure, whatever the reason may be!

Meanwhile, if you can come up with a patch to give you information a bit more useful, or if you just need more dmesg output (beyond what's posted to the original bug), just ask and I'll do what I can.  I have been working some pretty crazy hours lately, however (in addition to those missed email notifications), the reason my followups haven't been as timely as I'd have liked.
Comment 5 Alex Deucher 2010-10-12 15:30:45 UTC
DRI is enabled by default in the xserver, and there's no way to disable it with kms.  I'm not sure what you were getting before without seeing your X log.  You can disable acceleration in X with:
Option "NoAccel" "TRUE"
in the device section of your xorg.conf

It could be general AGP problems.  Does disabling AGP help?  Boot with radeon.agpmode=-1
Comment 6 Duncan 2010-10-13 14:12:11 UTC
[Grumble!  My prefs are set to mail for all comment additions, but I'm not getting them ATM!  Is this a known bugzilla.kernel.org issue ATM?  Maybe it's my ISP eating them as spam?  Anyway, I know to check manually now.]

With KMS and previous xorg-server versions (1.9.0 at least, as mentioned on the previous bug), Option "NoAccel" in the device section didn't seem to do anything, but Disable "dri" in the Modules section did just that.  The xorg log confirmed it.

With the current setup, I was blaming it on kde 4.5.2 but maybe it's xorg 1.9.0.901, the Modules section Disable "dri" works (disables dri) according to the log, but I get either no X graphics or only a fraction of a second of graphics, before it switches back to the KMS mode text VT I was in previously.

And yes, I know dri was disabled previously because not only kde had no OpenGL effects, but glxinfo reported the mesa software rendering OpenGL fallback, not the radeon hardware OpenGL rendering I normally see.

Hmm.... but one thing different that I forgot about that I bet has something to do with it...  And I better check the variation against this bug now that I remember it, somewhere during my testing I commented the LIBGL_ALWAYS_INDIRECT=1 that I was previously exporting.  But I'd have not messed with it if I'd not had the reported bug to begin with, so I doubt it fixes the problem entirely, but it could well explain why I could boot to KDE with DRI disabled before, and I can't, now.

So I have some more config variant testing to do.

Meanwhile, is the radeon.agpmode=-1 boot parameter like the xorg.conf device section Option "AGPMode" "int" or Option "BusType" "string" that I used to use and that's still documented in the radeon manpage?  Do those do any good now with KMS or are they ignored in that case, with only the boot parameters having any effect?

Anyway, I'll try to test after some sleep.  The results I'd get ATM might be less than useful. =:^(
Comment 7 Alex Deucher 2010-10-13 15:12:16 UTC
(In reply to comment #6)
> 
> Meanwhile, is the radeon.agpmode=-1 boot parameter like the xorg.conf device
> section Option "AGPMode" "int" or Option "BusType" "string" that I used to
> use
> and that's still documented in the radeon manpage?  Do those do any good now
> with KMS or are they ignored in that case, with only the boot parameters
> having
> any effect?

Correct.  the agpmode module parameter is a combination of those two xorg driver options. -1 uses the on board GART mechanism rather than the northbridge provided AGP one. 1,2,4,8 specify the AGP mode when using the northbridge provided AGP GART.  When KMS is active, the AGPMode and BusType xorg.conf options are not used as KMS configures the hw independent of X.
Comment 8 Duncan 2010-10-14 14:36:53 UTC
Thanks for the explanation.

LIBGL_ALWAYS_INDIRECT=1 didn't get back my working xorg.conf module-section disable dri, unfortunately.  I was hoping it would.  But I swear it disabled dri at the time of my earlier test, even with kms, with xorg-server 1.9, etc, now updated.

I'm double-checking that boot-param now, in ordered to test it.
Comment 9 Duncan 2010-10-14 18:58:11 UTC
Unfortunately, while I could verify from dmesg that the kernel was indeed using radeon.agpmode=-1, it didn't help. =:^(

I verified that radeon.agpmode=-1 DID work with the culprit commit reverted, just as normal mode did, so it wasn't some other problem with it.  And, for good measure I checked radeon.agpmode=4 as well.  As expected, it worked on the kernel with the revert, but didn't without the revert.  (This is still the rc7-149-g29979aa I'm testing, FWIW.)

Meanwhile, with -1, dmesg says PCIE mode.  That's PCI (not E) mode too, right?  Because this system's too old to have PCIE at all, tho it does have PCI-X.  Since it works on the revert-kernel I expect it's correct, but just verifying.

Also, GTT memory is 128 MB in normal mode, 512 MB with -1.  Again, that's normal, right?  (FWIW, 6 gigs system RAM, 1 gig vram, 256 MB AGP  aperture, split down the middle between AGPGART IOMMU and actual AGP usage.)

I tried a bunch of other stuff too, including radeon.modeset=0 (didn't work in X for either the good or bad kernel) on the bootline, xorg.conf SWCursor, color-tiling both on and off (no change to results, good kernel worked, bad didn't), etc.

What's next?
Comment 10 Duncan 2010-10-19 11:59:30 UTC
As of 2.6.36-rc8-20-g2d01971, the problem remains.
Comment 11 Duncan 2010-11-10 12:03:49 UTC
As of 2.6.37-rc1-170-gf6614b7, the problem remains.  Reverting 44437579efca258e3c4a09f59838c8f933611990, X/KDE (now kde 4.5.3 and xorg-server 1.9.2) runs.  With the straight Linus kernel, it freezes just as the desktop appears.

One thing I'm not sure I've mentioned that might or might not help.  kwin checks for supported OpenGL and (with the revert above, of course, as otherwise it crashes before I get to that stage) turns effects off initially.  I have to use the assigned effects toggle hotkey to turn them on.  This is apparently a known issue with the xorg/kernel-native radeon driver at this point, with kwin playing it safe by turning them off to begin with if it's not sure.  But effects work in general (with the occasional non-fatal bug, the invert effect simply turns everything white, instead of color-inverting like I believe it's supposed to do, for instance, but it's a nice effect if I want a bit more light without having to reach for the light switch =:^), once I turn them on.
Comment 12 Alex Deucher 2010-12-06 20:00:49 UTC
Created attachment 39182 [details]
workaround for 7xx AGP

Does this patch help?
Comment 13 Duncan 2010-12-08 11:04:26 UTC
(In reply to comment #12)
> Does this patch help?

Yes!  That patch eliminates the problem (and current git as of v2.6.37-rc5-26-g2cedcc4 still has the issue, without the patch).

As mentioned, this has affected stable 2.6.35 and all of 2.6.36 as well, so applying this to the stable tree for both should be considered as well as 2.6.37, once you're sure it's unlikely to cause other problems.

FWIW, with the patch applied, I'm getting the best and most glitch-free OpenGL accel I've ever seen on this hardware.  It's quite nice not to see or know of a single glitch.  Now /this/ is what I purchased this r7xx Radeon upgrade for!  =:^) 

And, even tho there were a couple glitches still and I had that one commit revert, I had an uptime of nearly a month on 2.6.37-rc1-170-gf6614b7, before I rebooted to test current, so everything's pretty stable too, it seems.  2.6.37 might well be the best kernel I've seen in awhile. Certainly, I don't believe I've seen that sort of stability just after rc1 for some time.  =:^)
Comment 14 Holger Lenz 2010-12-11 16:58:14 UTC
Tried the patch on 2.6.36.1-2 (RV730 AGP)
Not much luck:
With KMS the system will boot into a black screen and lock up.
With UMS (nomodeset in kernel command line) it will show X, but freeze while the window manager is loaded.
Guess I have to wait for 2.6.37 to become integrated into my distro.
Comment 15 Holger Lenz 2010-12-13 19:56:36 UTC
Did some more homework:
Compiled kernel 2.6.37 rc5, taking care to enable KMS (new driver) by setting DRM_MODESET_KMS=ON.
Unpatched and patched drivers show the same behavior:
1. Boot without nomodeset in the kernel cmd line: Blank screen before X is fully loaded.
2. Boot with nomodeset in the kernel cmd line: X loads with radeon/R600 classic driver is is reasonably quick in 2D, but a lockup occurs sooner or later. Switching from X to fb terminal usually does the trick quickly. The patched driver seems to lock up more quickly.
Any ideas what I could do to help you fix the bug?
Hardware is similar to Duncans: AMD K8, VIA K8T800 chipset, Radeon HD 4650 AGP (RV730)
 
Cheers Holger
Comment 16 Alex Deucher 2010-12-13 20:13:12 UTC
Holger, your issue may not be the same as Duncan's.  I'd suggest changing the AGP mode in your bios or adjusting it or disabling it via the kernel command line (radeon.agpmode=-1 to disable, 1,2,4,8 to force the agp mode). It might also be related to the latency setting on the host bridge.  VIA boards tend to set them pretty aggressively.
Comment 17 Holger Lenz 2010-12-13 21:00:06 UTC
(In reply to comment #16)
> Holger, your issue may not be the same as Duncan's.  I'd suggest changing the
> AGP mode in your bios or adjusting it or disabling it via the kernel command
> line (radeon.agpmode=-1 to disable, 1,2,4,8 to force the agp mode). It might
> also be related to the latency setting on the host bridge.  VIA boards tend
> to
> set them pretty aggressively.
You may be right: Still using the patched driver I set radeon.agpmode=-1 and the system came up with X using radeon and swrast. Seems like radeon.agpmode=-1 switches off dri / the r600 classic driver. A few switches to the fb console broke the vt session, but I was always able to switch back to X.
Unfortunately I cannot get AGP to work with the radeon driver. Using really conservative settings in the AGP bridge, the driver still doesn't boot without radeon.agpmode=-1. I used the following BIOS settings to get moderate timings
AGP Mode=4x
VLINK 8x disable
AGP 3.0 calibration cycle enabled
DBI output for AGP transactions disable

The hardware itself should be pretty stable - I ran a heavy Furmark stress test under windows to check for robustness.

Any ideas how to debug the AGP issue?
BR Holger
Comment 18 Holger Lenz 2010-12-13 21:52:09 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Holger, your issue may not be the same as Duncan's.  I'd suggest changing
> the
> > AGP mode in your bios or adjusting it or disabling it via the kernel
> command
> > line (radeon.agpmode=-1 to disable, 1,2,4,8 to force the agp mode). It
> might
> > also be related to the latency setting on the host bridge.  VIA boards tend
> to
> > set them pretty aggressively.
> You may be right: Still using the patched driver I set radeon.agpmode=-1 and
> the system came up with X using radeon and swrast. Seems like
> radeon.agpmode=-1
> switches off dri / the r600 classic driver. A few switches to the fb console
> broke the vt session, but I was always able to switch back to X.
> Unfortunately I cannot get AGP to work with the radeon driver. Using really
> conservative settings in the AGP bridge, the driver still doesn't boot
> without
> radeon.agpmode=-1. I used the following BIOS settings to get moderate timings
> AGP Mode=4x
> VLINK 8x disable
> AGP 3.0 calibration cycle enabled
> DBI output for AGP transactions disable
> 
> The hardware itself should be pretty stable - I ran a heavy Furmark stress
> test
New info before getting some sleep: The system was using swrast above because it didnt use kms. If I force KMS by using radeon.modeset=0 on the kcl, radeon and r600 will be loaded even if i set radeon.agpmode=-1. The system will then freeze after loading X and showing a few windows. So the improvement w/o AGP is not freezing immediately.
Comment 19 Duncan 2010-12-14 03:05:58 UTC
Some additional notes, FWIW.  May help compare the two installations, mine now working with either the original reversion or the new patch, his, not.

1) I've been using 100% KMS since the choice was still in staging.  Last time I tried radeon.modeset=0 (see comment #9) I had problems and immediately switched back to KMS, which worked.  My conclusion then was that UMS was being disabled or at least strongly deprecated, which was fine with me as I prefer KMS, but it seems that was incorrect and it's still available.

2) No AGP fast-write here at all.  AFAIK it's a chipset errata on AMD 8xxx.

3) 256 MB AGP aperture, half of which is of course reserved for the AGPGART IOMMU on the AMD 8xxx chipset.  The BIOS allows larger (upto 1GB, which happens to be video memory size as well), but that was a reasonably late addition to the BIOS (256 was the previous limit), and for whatever reason I couldn't get it to work in testing, so I've stuck with 256 MB (128 MB of it AGPGART IOMMU), which /does/ work.

BTW, /should/ a larger aperture work?  Given that half of it's AGGART IOMMU, should larger == better?  I've not tested larger recently, maybe it works now?

4) dmesg says AGP v3 device in 8x mode.  Years ago I had problems with 8x and had to use 4x max, but last I tried, 4x and 8x worked but 1x and 2x were broken.

5) I've been using the classic r6xx/r7xx driver, not the new gallium stuff, which I've not gotten to work yet.  (On Gentoo, I'm naturally compiling my own.  At one point it compiled but I wasn't aware then how to switch to it.  Later I figured that out, but had to turn off LLVM  to get it to compile at all, and I wasn't surprised when that didn't work.  So I've never actually had gallium up and running successfully.)

6) I was trying xorg-server 1.9.2.902 (AFAIK, 1.9.3-rc?), but am now back on 1.9.2 stable, as the pre-release throttled DOSBOX performance something terrible. (I only found out and downgraded yesterday, and haven't filed an X bug yet.)

7) Further to comment #0/Description, there was a kde image cache bug in early kde 4.5, fixed with 4.5.4 (maybe 4.5.3).  The comic-strip-plasmoid thing was related to that, I believe, and is now fixed.

8) It's a bit late to be mentioning this now but something else that could conceivably be a factor.  Apparently on this old hardware, since kernel 2.6.31, ACPI and various hardware sensor addresses conflict, and I have to boot with acpi_enforce_resources=lax or most of my sensors don't work.  I have that compiled in as part of my CONFIG_CMDLINE, so tend to forget about it.  See kernel bug #13939 .  The board was long since past its BIOS update period even back then and I've been unable to find documentation on what sensors drivers might work in place of the ones that broke, so lax enforcement remains my only known option if I want sensors output, which I do.  Might that somehow play into this graphics thing or are they entirely unrelated?

9) Further to comment #13.  I still can't get over how nice it is to (with the patch applied) have a properly glitch-free OpenGL accelerated X. This is the /first/ time I've had perfect glitch-free graphics since I upgraded from the old r2xx based card. In an improvement since comment #11, kwin now comes up in OpenGL mode without effects disabled (back then I had to manually enable them once up and running) and invert works fine now too.  2.6.37 has actually been very stable all around, without the usual assortment of other bugs in the md/raid system, etc, too.
Comment 20 Holger Lenz 2010-12-14 21:08:06 UTC
Thanks a lot Ducan, for all the info.
You helped me eliminate several variables.
AGP still won't work, but I'll just see how far I get with PCI(E) mode
I'm now always forcing KMS by using radeon.modeset=1

Looking at the demsg, the following two errors seem to prevent X from loading the r600 module:

This (1)
 [drm] Forcing AGP to PCIE mode
[   17.311865] radeon 0000:01:00.0: PCI: Disallowing DAC for device
[   17.311867] radeon: No suitable DMA available.

and this (2)
[drm:radeon_ttm_backend_bind] *ERROR* failed to bind 1 pages at 0x00000000
[   17.485886] radeon 0000:01:00.0: object_init failed for (4096, 0x00000002)
[   17.485889] radeon 0000:01:00.0: (-12) create WB bo failed
[   17.485892] radeon 0000:01:00.0: disabling GPU acceleration
[   17.502518] radeon 0000:01:00.0: f7229e00 unpin not necessary

any ideas, how to fix these?
Comment 21 Alex Deucher 2010-12-14 21:17:59 UTC
(In reply to comment #19)
> 
> 2) No AGP fast-write here at all.  AFAIK it's a chipset errata on AMD 8xxx.

It's buggy on all boards.  We don't enable it at all with kms.

> 
> 3) 256 MB AGP aperture, half of which is of course reserved for the AGPGART
> IOMMU on the AMD 8xxx chipset.  The BIOS allows larger (upto 1GB, which
> happens
> to be video memory size as well), but that was a reasonably late addition to
> the BIOS (256 was the previous limit), and for whatever reason I couldn't get
> it to work in testing, so I've stuck with 256 MB (128 MB of it AGPGART
> IOMMU),
> which /does/ work.
> 
> BTW, /should/ a larger aperture work?  Given that half of it's AGGART IOMMU,
> should larger == better?  I've not tested larger recently, maybe it works
> now?
> 

Bigger isn't necessarily better, it just means more pages can be mapped into the gart.  We default to 512 MB on non-AGP systems.

> 4) dmesg says AGP v3 device in 8x mode.  Years ago I had problems with 8x and
> had to use 4x max, but last I tried, 4x and 8x worked but 1x and 2x were
> broken.
> 

AGP v3 devices only support 4x and 8x modes.
Comment 22 Alex Deucher 2010-12-14 21:22:30 UTC
(In reply to comment #20)
> any ideas, how to fix these?

Try adding radeon.no_wb=1 to your kernel command as well.

Also, some boards set unrealistic latency settings on the AGP bridge the card is plugged into.  Often increasing the latency with setpci will improve stability.
lspci -vv will show you what it's currently set to.
Comment 23 Holger Lenz 2010-12-14 21:40:30 UTC
> Try adding radeon.no_wb=1 to your kernel command as well.
> 
> Also, some boards set unrealistic latency settings on the AGP bridge the card
> is plugged into.  Often increasing the latency with setpci will improve
> stability.
> lspci -vv will show you what it's currently set to.

No luck with radeon.no_wb=1, unfortunately.
Attached lspci -vv - no idea if the timings are aggressive or not.
Comment 24 Holger Lenz 2010-12-14 21:41:46 UTC
Created attachment 40192 [details]
lspci -vv on the VIA/K8 system
Comment 25 Alex Deucher 2010-12-14 21:44:12 UTC
Look fine:

00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
	Subsystem: ASUSTeK Computer Inc. A8V Deluxe
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 64


Some set the latency to really low values like 8.
Comment 26 Holger Lenz 2010-12-14 22:11:15 UTC
(In reply to comment #20)
> and this (2)
> [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 1 pages at 0x00000000
> [   17.485886] radeon 0000:01:00.0: object_init failed for (4096, 0x00000002)
> [   17.485889] radeon 0000:01:00.0: (-12) create WB bo failed
> [   17.485892] radeon 0000:01:00.0: disabling GPU acceleration
> [   17.502518] radeon 0000:01:00.0: f7229e00 unpin not necessary
> 
So, radeon.no_wb=1 didn't help. Is there a way to get enough debugging info to nail the bug / create a workaround?
Thx and goodnight Holger
Comment 27 Alex Deucher 2010-12-15 00:11:29 UTC
Can you attach your dmesg with agp enabled using 2.6.37 with the patch on this bug applied?  Also, what agp aperture sizes were available in the bios? Can you try changing the agp size in the bios and see if that helps?
Comment 28 Duncan 2010-12-15 06:07:51 UTC
(In reply to comment #21)
> (In reply to comment #19)
> > 
> > 2) No AGP fast-write here at all.  AFAIK it's a chipset errata on AMD 8xxx.
> 
> It's buggy on all boards.  We don't enable it at all with kms.

Is there, anywhere, any documentation with reasonable information on what settings in the xorg.conf and radeon manpages no longer apply in xorg.conf with kms, and what the parallel radeon.* kernel commandline options are?

Either updating the manpages, or a separate kms manpage (with suitable SEE ALSO section mentions from existing manpages), would be /wonderful/.  Or a kernel Documentation/radeon.txt or kms.txt file.  (And, couldn't all the kernel display/graphics related Docs be moved into a display/graphics/video subdir?  Seems about time.)

Because I really feel like a blind man simply poking things to see what happens, reverse engineering what shouldn't need to be, if you will.  Plus chance comments like this, giving me info I had no idea about.  Surely I'm not alone, and a good many "bugs" could be eliminated if people only knew about various available settings.

> > BTW, /should/ a larger aperture work?  Given that half of it's AGGART
> IOMMU,
> > should larger == better?
> 
> Bigger isn't necessarily better, it just means more pages can be mapped into
> the gart.  We default to 512 MB on non-AGP systems.

Thanks.  Useful information.

> > 4) dmesg says AGP v3 device in 8x mode.  [L]ast I tried,
> > 4x and 8x worked but 1x and 2x were broken.
> 
> AGP v3 devices only support 4x and 8x modes.

Now if only that were documented in the manpages... =:^(
It does explain why modes below 4x wouldn't work, tho. =:^)
Comment 29 Duncan 2010-12-15 12:37:52 UTC
The patch above was committed as f3886f85:

commit f3886f85cfde578f1d0ba6e40ac5f9d70043923b
Author: Alex Deucher <alexdeucher@gmail.com>
Date:   Wed Dec 8 10:05:34 2010 -0500

    drm/radeon/kms: don't apply 7xx HDP flush workaround on AGP

    It should be required for all 7xx asics, but seems to cause
    problems on some AGP 7xx chips.

    Fixes:
    https://bugzilla.kernel.org/show_bug.cgi?id=19002

    Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
    Reported-and-Tested-by: Duncan <1i5t5.duncan@cox.net>
    Cc: stable@kernel.org
    Signed-off-by: Dave Airlie <airlied@redhat.com>


Re-verified working.

I'm updated and on the mainline Linus kernel without patches again as of v2.6.37-rc5-332-g0fcdcfb.  Nice to be back, as I've been unable to go without first the revert and then the patch above, since shortly after 2.6.35.  =:^)

The bug as filed can thus be closed, from my viewpoint.  But as there's another one being investigated, I'll leave it open for now.

Should the remaining open bug be re-filed as a separate bug now and this one closed?  If it is, feel free to CC me if desired.  It's possible the contrast between my working system and HL's not working system might continue to be helpful.
Comment 30 Holger Lenz 2010-12-15 13:30:45 UTC
(In reply to comment #29)
> The patch above was committed as f3886f85:
...
> Should the remaining open bug be re-filed as a separate bug now and this one
> closed?  If it is, feel free to CC me if desired.  It's possible the contrast
> between my working system and HL's not working system might continue to be
> helpful.
I suggest to close the current one and open a new one - there are actually 2:
- Radeon AGP broken on my system and
- Driver not loading (in PCI mode) because of the ttm_background_bind issue
May even be worth openeing 2 new bugs unless you think they are closely related.
Depends on your defect management process.
Alex?
Comment 31 Alex Deucher 2010-12-15 15:45:13 UTC
(In reply to comment #28)
> Is there, anywhere, any documentation with reasonable information on what
> settings in the xorg.conf and radeon manpages no longer apply in xorg.conf
> with
> kms, and what the parallel radeon.* kernel commandline options are?
> 
> Either updating the manpages, or a separate kms manpage (with suitable SEE
> ALSO
> section mentions from existing manpages), would be /wonderful/.  Or a kernel
> Documentation/radeon.txt or kms.txt file.  (And, couldn't all the kernel
> display/graphics related Docs be moved into a display/graphics/video subdir? 
> Seems about time.)
> 

I updated the radeon man page last week to reflect what options are KMS vs. UMS.  Grab the latest xf86-video-ati from git master.

> 
> > > BTW, /should/ a larger aperture work?  Given that half of it's AGGART
> IOMMU,
> > > should larger == better?
> > 
> > Bigger isn't necessarily better, it just means more pages can be mapped
> into
> > the gart.  We default to 512 MB on non-AGP systems.
> 
> Thanks.  Useful information.
> 
> > > 4) dmesg says AGP v3 device in 8x mode.  [L]ast I tried,
> > > 4x and 8x worked but 1x and 2x were broken.
> > 
> > AGP v3 devices only support 4x and 8x modes.
> 
> Now if only that were documented in the manpages... =:^(
> It does explain why modes below 4x wouldn't work, tho. =:^)

It's not really radeon specific, but it would probably be nice to mention it somewhere.

(In reply to comment #30)
> I suggest to close the current one and open a new one - there are actually 2:
> - Radeon AGP broken on my system and
> - Driver not loading (in PCI mode) because of the ttm_background_bind issue
> May even be worth openeing 2 new bugs unless you think they are closely
> related.

Sure.
Comment 32 Florian Mickler 2010-12-15 20:14:51 UTC
If you do so, please also post references to the two other bugreports here and mention this bug in the 2 new bugreports. 

I'm closing this as fixed then.
Comment 33 Holger Lenz 2010-12-15 20:55:33 UTC
YES! I can confirm the fix.
I have the radeon+r600 driver working in AGP mode stable now.
The remaining issues I have been seeing are related to loading the driver:
If I do the following procedure, then radeon/r600 works no matter what I do to memory holes or AGP speeds:
- Boot to runlevel 3 (ugly 4 char fbcon)
- log in as root
- modprobe radeon (get nice 132 char fbcon)
- init 5
- enjoy X in native resolution and glxgears with 540fps 

Question is, why is radeon not loaded before X is started? Is there a conflicting framebuffer device (I habe VGA framebuffer compiled in, no modules). Can I force loading radeon that on the kcmd line?
Comment 34 Alex Deucher 2010-12-15 22:02:10 UTC
(In reply to comment #33)
> Question is, why is radeon not loaded before X is started?

Depends on your distro's inird and modprode config.  It should load automatically at boot on most recent distros that support kms.
Comment 35 Holger Lenz 2010-12-15 22:19:00 UTC
(In reply to comment #34)
> (In reply to comment #33)
> > Question is, why is radeon not loaded before X is started?
> 
> Depends on your distro's inird and modprode config.  It should load
> automatically at boot on most recent distros that support kms.
OK, I'll just add radeon and any helpers it needs to the initrd and be done with it.

The situation is as follows:
- in AGP mode radeon needs to be loaded manually before X starts (then uses r600). Probably tries to do this automatically and fails somehow,
- in PCI mode, X with radeon will be loaded automatically (!) The error described above (ttm_backend_bind) disables r600 and makes X use radeon+swrast

I have not figured this out yet - what to complain about in a new bug report.

Note You need to log in before you can comment on or make changes to this bug.