Bug 14535
Summary: | Memory corruption detected in low memory | ||
---|---|---|---|
Product: | Drivers | Reporter: | erki85 |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | airlied, alan, alexdeucher, auxsvr, gary.pajer, hmh, jcristau, matifamin, razamatan, secsaba, shy, torsten |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31-5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | kernel patch to block bad behaviour |
Description
erki85
2009-11-03 04:17:14 UTC
Same problems with newer xorg and xf86-video-ati. With KMS enabled no such things happen, but everything is red in games and even glxgears. We have multiple reports of filesystem corruption on DRI radeon. Kernels 2.6.30 and later with mesa 7.6 (galium disabled) cause the issue. Kernel 2.6.30 with mesa 2.5 doesn't cause the issue. It is likely this bug. What is the status of this issue? It causes filesystem corruption and data loss, so it is a very nasty one. Related (gives more data about the bug, and the reporter is a potential tester for fixes): http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=550977 I still have this problem as well, but I originally ran into this with a 2.6.24 kernel (with OpenVZ support). I first blamed it on my old + patched kernel, but upgrading the kernel did not resolve the problems. So I'd expect it to be either a bug in Mesa or some kernel bug only exposed with latest Mesa changes. Can you try mesa git master or the 7.7 branch? Dave fixed some potential issues there: http://cgit.freedesktop.org/mesa/mesa/commit/?id=554043bff72ced41b2a5e03e61cbc087bb41bd3d http://cgit.freedesktop.org/mesa/mesa/commit/?id=42f2880ffd0b847df7cb56b7f7f0747287e0b08f I tried mesa 7.7 on Kubuntu 9.10, kernel 2.6.31-17, and the bug persists there. See details in Ubuntu bug report #474928: https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/474928?comments=all did you confirm the patches were in that build? No. I didn't know to do that. How can I check? In the meantime, I also tried mesa 7.6.0 (default Ubuntu 9.10 package) with kernel 2.6.32-02063203-generic. No change. Tried mesa-git (+ati-dri-git and libgl-git) with xf86-video-ati-git today. Xorg 1.7.3.902. With KMS enabled, glxgears's gears are red and green again (no blue) and running etracer just produces blank screen. Although I can restart X and continue normal working. With KMS disabled, glxgears does not display error message about radeon_tcl.c anymore, but running etracer locks up the system. I can sort of see the first screen, which has "Press any key to continue", but the rest that should be there is garbled. When I press any key X stops and I cannot close it. REISUB works though this time (didn't before). When I ran HOMM4 in wine, I got the usual Memory corruption detected in low memory thingie. Couldn't restart X, couldn't REISUB. I tried mesa 7.7-1(from Debian experimental) on Debian sid with kerenl 2.6.32-tr unk-686. With RV200(uses R100 microcode) glxgears and etracer works fine.(No system lock up.) But with RV280(uses R200 microcode) glxgears locks up the system. I also tried mesa 7.6.1-1(from Debian sid). With both RV200 and RV280 glxgears locks up the system. > --- Comment #9 from Shyouzou Sugitani <shy@m3.catvmics.ne.jp> 2010-01-19
> 01:26:24 ---
> I tried mesa 7.7-1(from Debian experimental) on Debian sid with kerenl
> 2.6.32-tr
> unk-686.
fwiw these mesa packages are built from mesa_7_7_branch commit 6d6c9c66.
Tried 2.6.33-rc4 with same software as in my previous comment and now everything seems to be a bit better. KMS works, glxgears has normal colours again, couldn't test wine+HOMM though because it didn't start working. Hitman2 worked though. Etracer started normally, but when clicking Play button, it exited with "drmRadeonCmdBuffer: -12. Kernel failed to parse or rejected command stream. See dmesg for more info.", dmesg says "[drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation !". But haven't yet got memory or fs corruption with that setup. Although with KMS disabled etracer first screen still doesn't look normal and after keypress system hangs (REISUB out, no corruption). Actually never mind my last two comments. Turns out that the difference was in early and late start KMS. So it seems like kernel upgrade did not change anything either. Has anyone had success by reducing color depth to 16? I just did that, and I got rid of (when runnting glxgears): *********************************WARN_ONCE********************************* File radeon_tcl.c function radeon_run_tcl_render line 499 Rendering was 405 commands larger than predicted size. We might overflow command buffer. *************************************************************************** Honestly, I'm afraid to run googleearth. I've fsck'ed the system so many times ... I'm worried that my nine lives may have run out. :) Additional Comment to #14: Here's the relevant part of my xorg.conf Section "Monitor" Identifier "Monitor0" VendorName "Monitor Vendor" ModelName "Monitor Model" EndSection Section "Device" Identifier "Card0" Driver "radeon" VendorName "ATI Technologies Inc" BoardName "Radeon Mobility M7 LW [Radeon Mobility 7500]" BusID "PCI:1:0:0" Option "AccelMethod" "EXA" Option "DRI" "true" EndSection Section "Screen" Identifier "Screen0" Monitor "Monitor0" Device "Card0" Defaultdepth 16 Virtual 1024 768 EndSection Same corrupted filesystem problem with Radeon Mobility 7500 on a Thinkpad T42 laptop: kernel - 2.6.32 mesa - 7.7 xf86-video-ati - 6.12.4 ati-dri - 7.7 bpp 16 will cause a very dim display but with a fully bright cursor and still the problem is there. bpp 15 the problem is gone but video players displays only a green window. IMHO this is a SEVERE bug and a big regression if an user space graphical application overwritting the memory causing filesystem corruption can put on knee the whole system. The Severity of this bug should be changed from Normal to at least High, as it causes both system crashes and data loss. Perhaps it should be changed to Blocking, as it is impossible to do any 3D graphics development on affected systems. I develop apps that use 3D graphics, so I am blocked. AFAICT, someone with authority must do this. After one month of tip-toeing around in my system, taking care not to use anything that uses 3D, yesterday I "accidently" created a 3D plot; this bug corrupted my file system, the computer crashed, I had to run fsck from a liveCD. I am a scientist, and I need 3D capabilities. I'm considering a switch to BSD, but that sounds painful. Yes, it should have a higher priority. But setting it would be pointless, since it is not even assigned to anyone: it would not make anyone move faster to fix the problem. I would raise it anyway, but I don't have enough 'bugzilla powers' to do it. What you can do is to file bugs in the *distros* for which there are not any bugs about this yet, and mark *those* as critical (causes filesystem corruption) and blocking. That might save someone's data. It's assigned to the non Intel DRI list, which means ATI should be picking up on it. Unfortunately bugzilla has no magical powers to make them care or notice it. Your distro however may well do, or may well employ folks working on this.. I've tried to reproduce this locally and haven't had any luck using Fedora running in user mode setting. It would help if someone could try with an old kernel and see it still happens. Alternatively can someone test this with git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-radeon-testing We haven't touched the non-kms radeon driver much, the only thing that seems to be happening is mesa is getting better at creating fully used command buffers and the kernel has gotten worse at giving out large kmallocs (64k), there are fixes for this in drm-radeon-testing to avoid the larger mallocs which will hopefully help in some way, I'll see if I can spot a codepath where we send crap to the GPU, but we currently test the offsets userspace gives us and validate them for crap to avoid just this. (In reply to comment #20) > I've tried to reproduce this locally and haven't had any luck using Fedora > running in user mode setting. It would help if someone could try with an old > kernel and see it still happens. > The user at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=550562#136 reports kernel BUGs with 2.6.26 and mesa 7.6 (log excerpt at http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=136;filename=kernel.oops;att=1;bug=550562). I've asked him to try drm-radeon-testing. is anyone who is seeing this using page flipping? I'm still not having any luck here reproducing it on an T42 with mesa 7.6.1 from the branch, 2.6.32.3 kernel It would be really nice if the people seeing hangs and crashes that aren't (a) filesystem corruption (b) low memory corruption could not talk about it any more. Then it would really help if we can get someone running git version of the kernel (prefereably from drm-radeon-testing) libdrm/mesa from git master ati ddx from git master to reproduce either issue a or b with open software (i.e. Windows games aren't something I own). Then tell me what you did and I'll go figure it out. I'm more than willing to help, but I need some help with implementation and reporting. And it'll take a couple of days to find time. Me, summary: Kubuntu 9.10, Thinkpad T41, Radeon Mobility 7500. Excerpt from xorg.conf in Comment #15 above. I've tried mesa 7.6.0, 7.7, and 7.5, and kernels 2.6.32, 2.6.31-19, 2.6.31-14. Mostly from packages that someone else compiled, and not every possible combination. I get file system corruption almost always. The exception: Mesa 7.5 appears to be corruption-free, but it introduced something annoying but unrelated to this issue ... can't recall what right now. I'm considering downgrading to 7.5 again. 1. Page flipping: I'm using EXA, so I think that means no page flipping. 2. Need an open app? I've gotten crashes with just about every 3D program I've tried. For testing, I've been using google earth and vpython (www.vpython.org). I'll find another if you want. okay I've reproduced it with Google earth thanks to Gary for pointing it out. I've tested on drm-radeon-testing and it doesn't happen so the upstream fixes for the buffer allocation failing must have fixed it however that patch is majorly intrusive so I'll see if I can actaully figure out what is going wrong and fix that for stable. Created attachment 25180 [details]
kernel patch to block bad behaviour
Block the badness from mesa.
please test the patch in #26, it shuold fix it, not even d-r-t fixes it as previously reported. Dave also xscreensaver-demo is causing file-system corruption. Is this kernel patch fixing this issue also? Does this patch will be applied to the 2.6.32 kernel? As Arch Linux is a rolling distro I can try it if so. mesa fixes are now in the mesa master and mesa 7.7 branches. Simon in comment #28, yes any GL app that causes low memory corruption should be fixed by this. It'll go in to stable kernel as well once I push it. Ouch. Dave, Is there any possibility of userspace-generated content (GL requests, textures, whatever) influence the bogus DMA details, so that it could be used to, e.g., overwrite specific areas of memory? The drm has a command stream checker to keep this from happening, but there can be obscure cases like this one that are not immediately obvious. give me the drum -s report so that i can investigate internal of hardware structure... Your DMA controller might be doing some wrong work.. waiting for your Response i just noticed this behavior on 2.6.38 and 2.6.39 from my gentoo box. i have an rs690 chipset, run kms, use the radeon kernel drivers and run compiz on xorg. b/c i run compiz, all i have to do is be in an X session and do stuff and eventually my root partition (ext3, but moved to ext4 after trying to reformat after a full wipe of the disk... i was worried the disk was going bad) gets corrupted. i'm completely new to debugging issues at this level, so please advise for what you guys need. Closing as fixed as the original bug filed here was fixed razamatan - if your bug is still present please open a new bug |