Bug 19372

Summary: 2.6.36-rc6: WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x35a/0x3c0
Product: Drivers Reporter: Maciej Rutecki (maciej.rutecki)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Severity: normal CC: adobriyan, florian, kernel, maciej.rutecki, realhangman, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16444    
Attachments: dmesg log

Description Maciej Rutecki 2010-09-30 18:31:19 UTC
Subject    : 2.6.36-rc6: WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x35a/0x3c0
Submitter  : Alexey Dobriyan <adobriyan@gmail.com>
Date       : 2010-09-29 21:29
Message-ID : 20100929212923.GA5578@core2.telecom.by
References : http://marc.info/?l=linux-kernel&m=128579579400315&w=2

This entry is being used for tracking a regression from 2.6.35. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Michel Dänzer 2010-11-19 09:48:37 UTC
Is this really a regression from 2.6.35? Could also be a userspace issue.
Comment 2 Florian Mickler 2011-01-30 21:22:00 UTC
Well, normally userspace shouldn't be able to wreck the kernel. (I know, graphics are somewhat difficult in that respect.. but still)

Alexey, does this still happen on 2.6.37 or something newer?
Comment 3 Alexey Dobriyan 2011-02-27 11:29:01 UTC
Do not use that card anymore, happened once IIRC.
Comment 4 Honza Stodola 2011-03-03 22:27:00 UTC
Created attachment 50022 [details]
dmesg log

Hi, it happened to me with 2.6.38-rc7 several times.

Seems to be quite easy to reproduce on my system:
1. start system (gentoo), login to KDE
2. start Stellarium in window mode
3. resize the window

Crash usually occurs when the window is almost maximized (about 1800 x 1000 px). Sometimes the lockup happens only once and system seems to be fine, sometimes there are two or more lockups and it also happened that display shut off.

kernel message:

WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:248 radeon_fence_wait+0x39e/0x400()
Hardware name: H55M-USB3
GPU lockup (waiting for 0x00002922 last fence id 0x00002921)
Modules linked in: sit tunnel4 ipv6 coretemp it87 hwmon_vid iptable_mangle iptable_nat nf_nat kvm_intel kvm snd_hda_codec_realtek snd_hda_intel snd_usb_audio snd_hda_codec snd_pcm
 snd_timer snd_hwdep snd_usbmidi_lib snd_rawmidi snd r8169 i2c_i801 mii soundcore snd_page_alloc
Pid: 2336, comm: stellarium Not tainted 2.6.38-rc7 #1
Call Trace:
 [<ffffffff81039ffb>] ? warn_slowpath_common+0x7b/0xc0
 [<ffffffff8103a0f5>] ? warn_slowpath_fmt+0x45/0x50
 [<ffffffff8129821e>] ? radeon_fence_wait+0x39e/0x400
 [<ffffffff81055210>] ? autoremove_wake_function+0x0/0x30
 [<ffffffff812609cd>] ? ttm_bo_wait+0x10d/0x1c0
 [<ffffffff812b0e8b>] ? radeon_gem_wait_idle_ioctl+0x8b/0x110
 [<ffffffff8124aa9c>] ? drm_ioctl+0x38c/0x450
 [<ffffffff810a2136>] ? __pte_alloc+0xc6/0xd0
 [<ffffffff812b0e00>] ? radeon_gem_wait_idle_ioctl+0x0/0x110
 [<ffffffff810a4ccd>] ? handle_mm_fault+0xfd/0x220
 [<ffffffff810242e9>] ? do_page_fault+0x199/0x410
 [<ffffffff810a9daf>] ? mmap_region+0x1df/0x4b0
 [<ffffffff810d1711>] ? do_vfs_ioctl+0x91/0x510
 [<ffffffff810d1bd9>] ? sys_ioctl+0x49/0x80
 [<ffffffff810024fb>] ? system_call_fastpath+0x16/0x1b


01:00.0 VGA compatible controller: ATI Technologies Inc Juniper [Radeon HD 5700 Series] (prog-if 00 [VGA controller])
        Subsystem: Micro-Star International Co., Ltd. Device 2140
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at fbcc0000 (64-bit, non-prefetchable) [size=128K]
        I/O ports at ee00 [size=256]
        [virtual] Expansion ROM at fbc00000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Kernel driver in use: radeon
Comment 5 Benjamin Scherrer 2011-03-04 18:55:32 UTC

the same happens here with kernel and during watching some youtube videos with Opera or Iceweasel, I'm on Debian stable here with a Radeon HD 3450.

radeon 0000:01:00.0: GPU lockup CP stall for more than 10035msec
------------[ cut here ]------------
WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x235/0x2d3()
Hardware name: MS-7376
GPU lockup (waiting for 0x00008D65 last fence id 0x00008D63)
Modules linked in: xt_limit xt_tcpudp iptable_mangle ipt_LOG ipt_MASQUERADE nf_nat xt_DSCP ipt_REJECT nf_conntrack_irc nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables aes_generic fuse loop arc4 ecb crypto_blkcipher cryptomgr aead crypto_algapi rt73usb crc_itu_t rt2x00usb rt2x00lib mac80211 cfg80211 hid_cherry usbhid [last unloaded: scsi_wait_scan]
Pid: 2071, comm: Xorg Not tainted #1
Call Trace:
 [<ffffffff810322e0>] ? warn_slowpath_common+0x78/0x8c
 [<ffffffff81032393>] ? warn_slowpath_fmt+0x45/0x4a
 [<ffffffff811e8407>] ? radeon_fence_wait+0x235/0x2d3
 [<ffffffff81046eeb>] ? autoremove_wake_function+0x0/0x2a
 [<ffffffff811bcb3e>] ? ttm_bo_wait+0xc7/0x16e
 [<ffffffff811f992b>] ? radeon_gem_wait_idle_ioctl+0x7a/0xdf
 [<ffffffff811ab980>] ? drm_ioctl+0x236/0x2ea
 [<ffffffff811f98b1>] ? radeon_gem_wait_idle_ioctl+0x0/0xdf
 [<ffffffff8100a545>] ? save_i387_xstate+0x12e/0x1bd
 [<ffffffff81001906>] ? do_signal+0x58b/0x679
 [<ffffffff8109db9d>] ? do_vfs_ioctl+0x418/0x465
 [<ffffffff81001c1f>] ? sys_rt_sigreturn+0x1c7/0x228
 [<ffffffff8109dc26>] ? sys_ioctl+0x3c/0x5c
 [<ffffffff81001e6b>] ? system_call_fastpath+0x16/0x1b
---[ end trace 9d3e75f9935ec99b ]---
[drm] Disabling audio support
radeon 0000:01:00.0: GPU softreset 
radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xE57024E0
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00110103
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200010C0
radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
radeon 0000:01:00.0:   R_008010_GRBM_STATUS=0xA0003030
radeon 0000:01:00.0:   R_008014_GRBM_STATUS2=0x00000003
radeon 0000:01:00.0:   R_000E50_SRBM_STATUS=0x200080C0
radeon 0000:01:00.0: GPU reset succeed
[drm] ring test succeeded in 1 usecs
[drm] ib test succeeded in 1 usecs
[drm] Enabling audio support
Comment 6 Florian Mickler 2011-03-04 20:55:16 UTC
If userspace is feeding bad commands to the gpu it might hickup. There is not much you can do on the kernel side about that. (Besides sanity checking the whole command stream, I'm not shure what currently is done in that regard). But even if you manage to do that, userspace could  still try to exploit bugs in the gpu silicon. 

If everything humps along as good as before, I'd say the kernel has done his homework. Under that assumption this bug will be closed as a userspace bug and thus invalid for the kernel. 

On the other hand, if you manage to find a kernel > 2.6.32 which does not exhibit the problem, we should probably revisit that assumption.

You might try to upgrade your userspace graphics stack and you could probably take a look at bugzilla.freedesktop.org and file a bug there. I guess there are many of those, as the symptom (gpu lockup) would fit a lot of userspace bugs. But if you are  able to reproduce this, it might even have a chance to get fixed over there.