Bug 5823

Summary: ATI Radeon R300 - complete freeze
Product: Drivers Reporter: Vladimir Kondratiev (vladimir.kondratiev)
Component: Video(DRI - non Intel)Assignee: Steven Christenson (exvor)
Status: REJECTED WILL_NOT_FIX    
Severity: low CC: airlied, akpm, benh, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.15 Subsystem:
Regression: --- Bisected commit-id:
Attachments: my config for 2.6.15
My lspci

Description Vladimir Kondratiev 2006-01-04 08:47:20 UTC
Most recent kernel where this bug did not occur: 2.6.13
Distribution:
Fedora 4
Hardware Environment:
IBM T42p
Software Environment:
Vanilla kernel compiled with 
gcc (GCC) 4.1.0 20051222 (Red Hat 4.1.0-0.12)
Nothing running, idle system in login prompt.
Problem Description:
In several minutes after boot, kernel freeze. No OOPS message on console. System
do not react to SysRq. Reproducible 100%.
vanilla 2.6.13 (Linux version 2.6.13 (root@vkondra-mobl) (gcc version 4.0.1
20050822 (Red Hat 4.0.1-10))) run without problems on the same system.
Steps to reproduce:
Boot and wait for several minutes.
I am ready to provide any additional information and do any experiments to
investigate this problem.
Comment 1 Vladimir Kondratiev 2006-01-04 12:21:43 UTC
Created attachment 6932 [details]
my config for 2.6.15
Comment 2 Vladimir Kondratiev 2006-01-04 12:24:41 UTC
I built gcc 4.0.2 (official stable release) from sources; and recompiled 2.6.15
kernel with it. Also, I removed ieee80211* and ipw2200 modules since they was
not present in my 2.6.13.

Result is the same - complete freeze.

Any ideas what to do next?
Comment 3 Andrew Morton 2006-01-04 12:35:56 UTC
Can you enable the NMI watchdog, see if that catches anything?
Comment 4 Vladimir Kondratiev 2006-01-04 22:13:09 UTC
with NMI - no luck. On boot, kernel says:

Local APIC disabled by BIOS -- you can enable it with "lapic"
mapped APIC to ffffd000 (01806000)

; but if I append "lapic", laptop won't boot. Sure, I tried with
nmi_watchdog=[12], NMI counter not run and in dmesg, I see message: 

Testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)!

I continued to play with modules, and I found who is guilty:
if I remove drm modules (drm.ko and radeon.ko), it works! I am writing this on
2.6.15 that survived 6 hours. Hope it will work much longer.

Thus, either 'radeon.ko" or "drm.ko" requires attention.
Comment 5 Andrew Morton 2006-01-04 22:26:30 UTC
cc'ing David...
Comment 6 Dave Airlie 2006-01-04 22:34:00 UTC
can you give me an lspci? my guess is this is caused by a newer kernel enabling
some feature in X and X crashing out ... you are running X? can you attach an
Xorg.0.log...

thanks.
Comment 7 Vladimir Kondratiev 2006-01-04 23:40:33 UTC
Created attachment 6937 [details]
My lspci
Comment 8 Dave Airlie 2006-01-04 23:48:48 UTC
yes you are using an M10, recent support for r300 chips was added to the kernel,
X.org may not be the stablest on these even with 2D with a DRM loaded, 

Try removing the Load "dri" option from your xorg.conf and see if that helps..
Comment 9 Vladimir Kondratiev 2006-01-05 00:12:28 UTC
Dave,
I doubt it is X crashing. With 2.6.13, I run X for month with DRM enabled
without any problems. What is indeed interesting, drm with 2.6.15 reports 2 devices:
Jan  4 17:53:47 vkondra-mobl kernel: [drm] Initialized drm 1.0.0 20040925
Jan  4 17:53:47 vkondra-mobl kernel: ACPI: PCI Interrupt 0000:01:00.0[A] -> Link
[LNKA] -> GSI 11 (level, low) -> IRQ 11
Jan  4 17:53:47 vkondra-mobl kernel: [drm] Initialized radeon 1.19.0 20050911 on
minor 0: 
Jan  4 17:53:47 vkondra-mobl kernel: agpgart: Found an AGP 2.0 compliant device
at 0000:00:00.0.
Jan  4 17:53:47 vkondra-mobl kernel: agpgart: Putting AGP V2 device at
0000:00:00.0 into 1x mode
Jan  4 17:53:47 vkondra-mobl kernel: agpgart: Putting AGP V2 device at
0000:01:00.0 into 1x mode
Jan  4 17:53:47 vkondra-mobl kernel: [drm] Loading R300 Microcode

while on 2.6.13 it prints just
Jan  4 18:05:07 vkondra-mobl kernel: [drm] Initialized drm 1.0.0 20040925

Devices in question are:
00:00.0 Host bridge: Intel Corporation 82855PM Processor to I/O Controller (rev 03)
        Subsystem: IBM Unknown device 0529
        Flags: bus master, fast devsel, latency 0
        Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Capabilities: [e4] Vendor Specific Information
        Capabilities: [a0] AGP version 2.0

01:00.0 VGA compatible controller: ATI Technologies Inc M10 NT [FireGL Mobility
T2] (rev 80) (prog-if 00 [VGA])
        Subsystem: IBM Unknown device 054f
        Flags: bus master, fast Back2Back, 66MHz, medium devsel, latency 66, IRQ 11
        Memory at e0000000 (32-bit, prefetchable) [size=128M]
        I/O ports at 3000 [size=256]
        Memory at c0100000 (32-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at c0120000 [disabled] [size=128K]
        Capabilities: [58] AGP version 2.0
        Capabilities: [50] Power Management version 2

It may be, drm mistaken with 1-st one.
Comment 10 Vladimir Kondratiev 2006-01-05 00:14:40 UTC
You are correct, this is M10. I'll try without "dri" in xorg.conf;
but it will be some later time. I have to make a break to do my main job.
Comment 11 Dave Airlie 2006-01-05 00:37:05 UTC
the 2.6.13 kernel didn't support the radeon M10 chip, so the drm module loads
and does nothing, the radeon module doesn't load.. you'll notice in the 2.6.15
case the radeon module actually reports some info...

So it is X that is crashing and it is because X is now using the DRM... 
Comment 12 Vladimir Kondratiev 2006-01-07 11:08:16 UTC
Disabling DRI in xorg.conf works.

I changed title to reflect root cause. Also, severity is not "blocker" any more
since simple work around exist (disable DRI in xorg.conf).
Comment 13 Steven Christenson 2006-02-24 10:36:05 UTC
This problem is not resloved with the simple no DRI option. 


im running a ATI moble radeon x600 this is a PCIE card and i get a compleate 
freeze with x86 kernel 2.6.15.4.  This occurs with both the open source radeon 
driver with xorg and with the proprietary ATI drivers.   Strange note is this 
does not occur on fedora 4 64bit with there shipped kernel of course they have 
dri enabled in the kernel witch to get 3d acceleration you need to disable. 
When i figure out how to recompile a kernel in fedora ill check this. BTW i can 
compile a kernel in slackware and my own LFS system go figure :(   The system 
im running on is farily new as well its a Gateway MX7525 the specs are on there 
page. I have the Timer issue as well that is documented here with time running 
at 2x and suspect this may be why i get total freeze with X.  running 
noapictimer as a kernel option disables hardware notably the PCMCIA device. 

NOTE: Compileing the kernel from www.kernel.org into fedora 4 64bit causes 
freeze with propietary ATI drivers and with open source radeon.  This is a 
total crash with no logs generated in /var/log/Xorg.log files  and system is 
non responsive to any thing other then a reboot. AKA kernel panic or something 
other. Another strange note is the mouse still moves even tho the system 
responds to no commands or be loged in remotely too. 

Is there something that needs to be disabled when going though menuconfig or is 
this a real bug ???  Im clueless and after 2weeks and 4 distros later im no 
closer to a resolution rather then trying to fix the other issues in the 
shipped fedora 4 64bit kernel. 

thanks
Comment 14 Steven Christenson 2006-03-05 11:52:44 UTC
UPDATE 
------


To reslove this issue do the following 

Go into menuconfig when building the kernel and disable the new radeon 
Framebuffer driver and use the old vesa one. 

With this framebuffer driver on and using ither radeon or fglrx in X will cause 
a system to halt when starting x.  Simply recompile with the ATI Radeon  
frambuffer driver off and it resloves the issue. 
Comment 15 Andrew Morton 2007-01-31 00:48:49 UTC
+benh

Ben, do you still look after the radeon driver?  It's being bad.

Steven, is this bug still present in 2.6.20-rc7?
Comment 16 Steven Christenson 2007-01-31 13:17:44 UTC
Wow ya know I am not sure as I have bought a new laptop that uses a non ati chip
as I am tired of fighting with mesa for control of 3d.   I still do have the
original lappy that had this problem i could load up a distro and try with a new
kernel if you want to know.  I submitted this almost a year ago 
Comment 17 Benjamin Herrenschmidt 2007-01-31 13:42:52 UTC
I must admit of slacking a bit there... too much stuff to do. I do have some
work-in-progress updates, bits from Solomon Peachy and some bits from myself, I
also need to merge in some updates from X (we found workarounds for various
issues in X that I never had the time to move to radeonfb).

I'm hoping to have some time in february or march to do some serious work on it,
but I can't promise. I would happily hand over the maintainership if we could
find somebody capable of taking that over though.

Comment 18 Andrew Morton 2007-01-31 13:58:35 UTC
Is OK - I'm not aware of many people hurting from this - one, or perhaps two (I'm
waiting to hear back from #2).

And no, I'm afraid fbdev developers aren't growing on trees, especially
after Tony's mysterious disappearance :(
Comment 19 Vladimir Kondratiev 2007-02-01 00:08:05 UTC
I am currently running same hardware as in original report, kernel 2.6.19, Xorg
7.1.1;
I have DRI enabled. precisely, in .config
CONFIG_FB_RADEON=m
CONFIG_FB_RADEON_I2C=y
and
        Load  "dri"
in /etc/X11/xorg.conf
No hangs, however I must notice GoogleEarth works very slow
Comment 20 Natalie Protasevich 2007-07-04 12:27:15 UTC
Any updates with this problem? 
Vladimir, how does it work for you now, have you tried latest kernels?
Thanks.
Comment 21 Steven Christenson 2007-09-06 17:51:13 UTC
I am changing the severity of this bug to low as it does not seem to impact very many users.  There also seams to be a lack of understanding where exactly this is occurring or even if there are multiple things that cause this issue and on what type of platform they occur on. I must admit that I myself have provided little information to help this out and apologize.  One thing I do know is that it does occur when radeonfb drivers are used along with the ati drivers in the X org system.  I am not 100% sure this is what the OP created this bug for and its possible that my experience should have been created as a separate bug.  I have not tested this out on recent kernels and its possible it is not a kernel bug but it may be a X org bug or the other way around. 

     I will leave this open in case it is of any interest in the future or may close it if I file a more generic bug regarding this issue which may be better as it could close this and some others that have the same issue. 


If you have any thoughts please shoot me an email. 
 

 
Comment 22 Natalie Protasevich 2007-09-06 19:22:31 UTC
Yes, the bugs that appear in the OS and Xorg and their interaction are so subtle, hard to reproduce and debug. Is this freeze still happening with 2.6.23+?
Comment 23 Benjamin Herrenschmidt 2007-09-07 10:03:24 UTC
At this stage, radeonfb is a lost cause I'm tempted to say and fixing it isn't a very productive use of anybody's time. The mode setting is moving from X.org into the DRM, and we'll probably end up merging the useful bits of radeonfb such as power management (along with other improvements to those bits) with that new DRM mode setting & deprecate radeonfb alltogether soon. That should get rid of all those nasty interaction issues.

To note also AMD/ATI recent announce about providing specs/docs & support to X.org developers which means that we -might- get some help tracking down that sort of problem in the future.

Unfortunately, both solutions aren't very short term but that's the best I can say at this stage, unless somebody with time to waste can try to port back some of the stuff in X.org to match the memory maps with radeonfb -and- debug all the regressions that such a thing would cause.
Comment 24 Steven Christenson 2007-09-07 19:41:33 UTC
I am not sure Natalie but I would guess that it would still be there as there has been little to no change with this driver.  If you really wanna know I can loadup something on that laptop this weekend and give you some feedback.  I agree Benjamin, there is a workaround and I would rather see work being done on the DRM then anything else.  I still find ATI drivers rather frustrating when dealing with systems that do not have it already setup.  Mostly because of the mesa driver wanting to display 3d instead of the hardware.