Bug 32402

Summary: Oops associated with radeon_unpin_work_func
Product: Drivers Reporter: Stuart Foster (smf-linux)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: CLOSED CODE_FIX    
Severity: high CC: adriano.vilela, alexdeucher, cheako911, d.haid, florian, fredlwm+others, lukenshiro, mario.kleiner, rossi.f, smf-linux, thomas
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.39 Subsystem:
Regression: No Bisected commit-id:
Attachments: Console screen on oops.
lspci -v details
KMS-BSOD
SystemInformation.txt
SoftwareInformation.txt
proposed fix.
proposed fix 2. against 2.6.39

Description Stuart Foster 2011-03-31 23:13:02 UTC
If this system (LFS build) is left idle for several hours with xscreensaver running(don't currently know which screen saver app. I am assuming one of the Mesa 3D ones) the system will oops. The system is locked solid and there is no trace in messages.old on reboot. The attached jpg is a photo of the screen at system failure. I first noticed the problem in 2.6.38-rc8 but have waited until now to report the problem in the hope of obtaining a more repeatable scenario.

Version info:

Xorg server 1.10.0
Xorg ati driver 6.14.1
libdrm-2.4.24
Mesa-7.10.1

Processor  AMD Phenom(tm) II X4 940 Processor

Linux Andromeda 2.6.38.2 #1 SMP Mon Mar 28 22:17:13 BST 2011 i686 athlon-4 i386 GNU/Linux
 
Gnu C                  4.6.0
Gnu make               3.81
binutils               2.21
util-linux             2.14.1
mount                  support
module-init-tools      3.5
e2fsprogs              1.41.4
Linux C Library        2.13
Dynamic linker (ldd)   2.13
Linux C++ Library      6.0.15
Procps                 3.2.7
Net-tools              1.60
Kbd                    1.15
Sh-utils               7.4
Modules Loaded         md4 md5 hmac cryptomgr aead nls_cp437 cifs crypto_hash crypto_algapi microcode nfs fscache nfsd lockd sunrpc exportfs fuse usbhid firewire_ohci sr_mod sg firewire_core cdrom psmouse i2c_piix4 crc_itu_t button vboxnetflt fifo splitter ohci_hcd ehci_hcd usbcore nls_base vboxnetadp snd_hda_codec_hdmi snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore brd loop asus_atk0110 powernow_k8 processor thermal_sys mperf pata_atiixp vboxdrv sky2 rtc eeprom unix
Comment 1 Stuart Foster 2011-03-31 23:20:17 UTC
Created attachment 52822 [details]
Console screen on oops.
Comment 2 Stuart Foster 2011-04-01 08:12:41 UTC
Created attachment 52952 [details]
lspci -v details
Comment 3 Mike Mestnik 2011-04-04 22:59:26 UTC
Vary similar OOPS after hitting (esc) in Google-Crome to exit full screen video.

This was the URL:
http://www.animemusicvideos.org/members/members_vidpreview.php?v=177083

As one could assume I maximized the video and then watched it to completion.  After I pressed the esc key I got this BSOD.
Comment 4 Mike Mestnik 2011-04-04 23:11:44 UTC
Created attachment 53492 [details]
KMS-BSOD

ttm_bo_ref -> do_invalid_op
Comment 5 Mike Mestnik 2011-04-04 23:14:18 UTC
Created attachment 53502 [details]
SystemInformation.txt
Comment 6 Mike Mestnik 2011-04-04 23:17:14 UTC
Created attachment 53512 [details]
SoftwareInformation.txt
Comment 7 Alex Deucher 2011-04-06 14:26:32 UTC
Does reverting 69a07f0b117a40fcc1a479358d8e1f41793617f2 help?
Comment 8 Alex Deucher 2011-04-06 14:27:43 UTC
Possibly related to this:
http://lists.freedesktop.org/archives/dri-devel/2011-April/009939.html
Comment 9 Stuart Foster 2011-04-07 07:37:49 UTC
(In reply to comment #7)
> Does reverting 69a07f0b117a40fcc1a479358d8e1f41793617f2 help?

Applied git revert to 2.6.39-rc2. Problem still present.
Comment 10 Adriano 2011-04-14 14:11:32 UTC
I'm having this problem too with kernel 2.6.38 series. As soon as my screen saver kicks in, my entire desktop gets corrupted. If I wait long enough for my screen to stand by or suspend (just the screen, not the computer), then the computer completely freezes. I'm not sure if I can add attachments to bugs reported by others; if so, I would be happy to attach a picture with the error messages.

Please, let me know how I can help with this issue.

Thank you,

Adriano
Comment 11 Adriano 2011-04-16 21:08:04 UTC
An update: at least in my case, the problem seems to be related to page flipping in the radeon driver. I have disabled page flipping in my xorg.conf file and haven't seen the problem since.

Here's some related links:
https://bugs.freedesktop.org/show_bug.cgi?id=35452
https://bugs.gentoo.org/show_bug.cgi?id=359569

I would first see the heavy screen corruption described in the second link above, and only later the kernel oops described in this bug report.

I'm sorry if my problem isn't directly related to this bug report, but the oops message is very similar to the one I was getting.

Thank you,

Adriano
Comment 12 Mike Mestnik 2011-04-16 21:28:03 UTC
On 04/16/11 16:08, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=32402
>
>
>
>
>
> --- Comment #11 from Adriano <adriano.vilela@yahoo.com>  2011-04-16 21:08:04
> ---
> An update: at least in my case, the problem seems to be related to page
> flipping in the radeon driver. I have disabled page flipping in my xorg.conf
> file and haven't seen the problem since.
>
Thank you for the update I haven’t seen this problem in a while and
since then I've upgraded kernel and xorg/mesa.

I have another issue that I'm sure is unrelated where my system locks
up(no network not I/O) if I'm not running at least 3 facebook games:
deeprealms, castle_age and sororitylife.

> Here's some related links:
> https://bugs.freedesktop.org/show_bug.cgi?id=35452
> https://bugs.gentoo.org/show_bug.cgi?id=359569
>
> I would first see the heavy screen corruption described in the second link
> above, and only later the kernel oops described in this bug report.
>
> I'm sorry if my problem isn't directly related to this bug report, but the
> oops
> message is very similar to the one I was getting.
>
> Thank you,
>
> Adriano
>
Comment 13 Stuart Foster 2011-05-19 13:04:45 UTC
Just a quick update to report that the fault is also present in the 2.6.39 kernel.
Comment 14 Dave Airlie 2011-05-29 08:30:09 UTC
Created attachment 59922 [details]
proposed fix.

hopefully this fixes this, let me know.
Comment 15 Dave Airlie 2011-05-29 08:30:51 UTC
*** Bug 29942 has been marked as a duplicate of this bug. ***
Comment 16 Dave Airlie 2011-05-29 10:09:15 UTC
Created attachment 59942 [details]
proposed fix 2. against 2.6.39

try again + fix error paths.
Comment 17 Dave Airlie 2011-05-29 10:24:05 UTC
*** Bug 33222 has been marked as a duplicate of this bug. ***
Comment 18 Stuart Foster 2011-05-30 08:00:45 UTC
(In reply to comment #16)
> Created an attachment (id=59942) [details]
> proposed fix 2. against 2.6.39
> 
> try again + fix error paths.

Ran the patch on 2.6.39 overnight (~9 hours) and there have been no problems the problem looks fixed for me.

thanks

Stuart
Comment 19 Florian Mickler 2011-06-14 11:21:05 UTC
A patch referencing this bug report has been merged in v3.0-rc3:

commit 498c555f56a02ec1059bc150cde84411ba0ac010
Author: Dave Airlie <airlied@redhat.com>
Date:   Sun May 29 17:48:32 2011 +1000

    drm/radeon: fix oops in ttm reserve when pageflipping (v2)
Comment 20 lukenshiro 2011-06-26 12:57:58 UTC
*** Bug 38232 has been marked as a duplicate of this bug. ***
Comment 21 Frédéric L. W. Meunier 2011-07-10 23:13:42 UTC
Will this patch be pushed to the 2.6.39 series ? I don't see it in 2.6.39.3. I got a similar crash with 2.6.39.2, but it's strange that before moving to x86_64 and having to use an old Flash version (10.2p3 111710), because Adobe only make releases for x86, I never experienced the kernel panic. Aside the Flash version, the configuration was identical.