Bug 28622

Summary: radeon video lockup
Product: Drivers Reporter: Daniel Poelzleithner (bugzilla.kernel.org)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED OBSOLETE    
Severity: high CC: alan, alexdeucher, florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.39 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 21782    
Attachments: 2.6.39 lockup

Description Daniel Poelzleithner 2011-02-08 17:48:21 UTC
Radeon got very unstable from 2.6.36 to 2.6.37 on my system. This is at least one lockup I'm having when swtiching X Servers. KMS is enabled. Disable KMS will crash video in like 30-300 seconds.

[81121.199055] INFO: task Xorg:1791 blocked for more than 120 seconds.
[81121.199059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[81121.199061] Xorg          D ffff8801261549d8     0  1791   1743 0x00400004
[81121.199066]  ffff880123b4bb78 0000000000000082 ffff880123b4bd70 0000000000000000
[81121.199069]  00000000000139c0 ffff880126154640 ffff8801261549d8 ffff880123b4bfd8
[81121.199072]  ffff8801261549e0 00000000000139c0 ffff880123b4a010 00000000000139c0
[81121.199076] Call Trace:
[81121.199083]  [<ffffffff8115e687>] ? inode_init_always+0x1c7/0x1d0
[81121.199088]  [<ffffffff815671e7>] __mutex_lock_slowpath+0xf7/0x180
[81121.199091]  [<ffffffff815670cb>] mutex_lock+0x2b/0x50
[81121.199127]  [<ffffffffa0157924>] radeon_bo_create+0x114/0x2c0 [radeon]
[81121.199146]  [<ffffffffa016d30b>] radeon_gem_object_create+0x8b/0x110 [radeon]
[81121.199163]  [<ffffffffa016d3e8>] radeon_gem_create_ioctl+0x58/0xd0 [radeon]
[81121.199180]  [<ffffffffa00458d3>] drm_ioctl+0x433/0x4f0 [drm]
[81121.199198]  [<ffffffffa016d390>] ? radeon_gem_create_ioctl+0x0/0xd0 [radeon]
[81121.199202]  [<ffffffff81157a29>] do_vfs_ioctl+0xa9/0x5a0
[81121.199205]  [<ffffffff81157fa1>] sys_ioctl+0x81/0xa0
[81121.199209]  [<ffffffff8100b002>] system_call_fastpath+0x16/0x1b
Comment 1 Alex Deucher 2011-02-08 18:00:25 UTC
(In reply to comment #0)
> Radeon got very unstable from 2.6.36 to 2.6.37 on my system. This is at least
> one lockup I'm having when swtiching X Servers. KMS is enabled. Disable KMS
> will crash video in like 30-300 seconds.

What do you mean by "swiching X Servers"?  Are you trying to mix KMS and UMS?
Comment 2 Daniel Poelzleithner 2011-02-08 18:24:19 UTC
no, starting a second x server session.

btw:

01:00.0 VGA compatible controller: ATI Technologies Inc RV670PRO [Radeon HD 3850] (prog-if 00 [VGA controller])
	Subsystem: ATI Technologies Inc Device e630
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 4 bytes
	Interrupt: pin A routed to IRQ 41
	Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at fdee0000 (64-bit, non-prefetchable) [size=64K]
	Region 4: I/O ports at de00 [size=256]
	[virtual] Expansion ROM at fde00000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: radeon
	Kernel modules: radeon
Comment 3 Alex Deucher 2011-02-08 18:28:26 UTC
Any chance you could bisect?
Comment 4 Daniel Poelzleithner 2011-02-08 19:04:57 UTC
it"s quite hard to reproduce. sometimes it does not happen for 3-4 days and then it locks up. which verbose level is good for reporting so maybe a higher one will give more hints.
Comment 5 Rafael J. Wysocki 2011-02-13 10:56:38 UTC
On Sunday, February 13, 2011, david@lang.hm wrote:
> On Sun, 13 Feb 2011, Rafael J. Wysocki wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.36 and 2.6.37.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.36 and 2.6.37.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> >
> >
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=28622
> > Subject             : radeon video lockup
> > Submitter   : Daniel Poelzleithner <bugzilla.kernel.org@poelzi.org>
> > Date                : 2011-02-08 17:48 (5 days old)
> 
> hmm, I've occasionally been expericancing what sounds like a similar 
> issue.
> 
> the screen locks, it doesn't respond to the keyboard (but caps lock still 
> works), frequently the mouse pointer still moves, but you can't click on 
> anything.
> 
> it ususally happens to me a few min after starting firefox (with 10-15 
> windows holding a couple hundred tabs), so it could also be a firefox bug 
> if it can capture the input and not release it.
> 
> I've experianced this on my T61 laptop, and my work machine (Radeon X1300 
> series)
> 
> I'm currently running ubuntu 10.10 which is 2.6.35-25, so if it is the 
> same problem, it wasn't introduced between 2.6.36 and 2.6.37.
> 
> like the OP of the bug report, this is not something that happens 
> frequently, but I don't do much that stresses the video, so starting all 
> these tabs is about as hard as I drive it.
Comment 6 Alex Deucher 2011-02-13 22:20:26 UTC
(In reply to comment #5)
> > 
> > I've experianced this on my T61 laptop, and my work machine (Radeon X1300 
> > series)
> > 
> > I'm currently running ubuntu 10.10 which is 2.6.35-25, so if it is the 
> > same problem, it wasn't introduced between 2.6.36 and 2.6.37.
> > 
> > like the OP of the bug report, this is not something that happens 
> > frequently, but I don't do much that stresses the video, so starting all 
> > these tabs is about as hard as I drive it.

That is probably a different issue (maybe also ddx or mesa related).  Possibly fixed by this drm patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b7d8cce5b558e0c0aa6898c9865356481598b46d
Comment 7 Daniel Poelzleithner 2011-03-13 19:13:07 UTC
not sure if this is related, but i got a lockup with a different traceback today:

[48841.194099] INFO: task vlc:22611 blocked for more than 120 seconds.
[48841.194108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[48841.194111] vlc           D ffff880001e44c58     0 22611   2846 0x00000000
[48841.194116]  ffff880001e29c18 0000000000000082 ffff880001e29b68 ffffffff00000000
[48841.194119]  00000000000139c0 ffff880001e448c0 ffff880001e44c58 ffff880001e29fd8
[48841.194122]  ffff880001e44c60 00000000000139c0 ffff880001e28010 00000000000139c0
[48841.194126] Call Trace:
[48841.194133]  [<ffffffff8156cb67>] __mutex_lock_slowpath+0xf7/0x180
[48841.194137]  [<ffffffff8156ca4b>] mutex_lock+0x2b/0x50
[48841.194173]  [<ffffffffa00cd44e>] radeon_bo_unref+0x3e/0x80 [radeon]
[48841.194190]  [<ffffffffa0013ff0>] ? drm_gem_object_free+0x0/0x40 [drm]
[48841.194208]  [<ffffffffa00e2c25>] radeon_gem_object_free+0x35/0x50 [radeon]
[48841.194219]  [<ffffffffa0014019>] drm_gem_object_free+0x29/0x40 [drm]
[48841.194223]  [<ffffffff812b3e17>] kref_put+0x37/0x70
[48841.194233]  [<ffffffffa001463e>] drm_gem_close_ioctl+0xbe/0x100 [drm]
[48841.194243]  [<ffffffffa00128d3>] drm_ioctl+0x433/0x4f0 [drm]
[48841.194253]  [<ffffffffa0014580>] ? drm_gem_close_ioctl+0x0/0x100 [drm]
[48841.194258]  [<ffffffff8111be40>] ? __remove_shared_vm_struct+0x40/0x60
[48841.194262]  [<ffffffff81127511>] ? free_pages_and_swap_cache+0x21/0xd0
[48841.194266]  [<ffffffff812b044a>] ? cpumask_any_but+0x2a/0x40
[48841.194269]  [<ffffffff81157a29>] do_vfs_ioctl+0xa9/0x5a0
[48841.194272]  [<ffffffff8111b85e>] ? remove_vma+0x6e/0x90
[48841.194274]  [<ffffffff8111d968>] ? do_munmap+0x318/0x3b0
[48841.194277]  [<ffffffff81157fa1>] sys_ioctl+0x81/0xa0
[48841.194281]  [<ffffffff8100b002>] system_call_fastpath+0x16/0x1b
Comment 8 Florian Mickler 2011-03-29 21:13:02 UTC
Is this still a problem on 2.6.38.y / 2.6.39-rc* ?
Comment 9 Daniel Poelzleithner 2011-06-04 00:34:36 UTC
the 2.6.38 was sligthly more stable then the 2.6.39 is.

the current lockup shows:

[17052.294941] runnable tasks:
[17052.294942]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[17052.294943] ----------------------------------------------------------------------------------------------------------
[17052.294967] 
[17240.142256] SysRq : Show Blocked State
[17240.142266]   task                        PC stack   pid father
[17240.142285] Xorg            D ffff8800c68b69d8     0  1979   1917 0x00400004
[17240.142290]  ffff8800c640d598 0000000000000082 ffff880000000000 ffff8800c640d538
[17240.142293]  0000000000000282 ffff8800c640dfd8 ffff8800c640c000 ffff8800c640dfd8
[17240.142296]  ffffffff81a0b020 ffff8800c68b6640 ffffffff81bf1f40 00000001c640d5d0
[17240.142299] Call Trace:
[17240.142306]  [<ffffffff81580613>] schedule_timeout+0x173/0x2e0
[17240.142311]  [<ffffffff8106d280>] ? cascade+0xa0/0xa0
[17240.142348]  [<ffffffffa010e05e>] radeon_fence_wait+0x1ee/0x3d0 [radeon]
[17240.142352]  [<ffffffff81080040>] ? wake_up_bit+0x40/0x40
[17240.142367]  [<ffffffffa010eac1>] radeon_sync_obj_wait+0x11/0x20 [radeon]
[17240.142377]  [<ffffffffa009a7dd>] ttm_bo_wait+0xfd/0x1b0 [ttm]
[17240.142385]  [<ffffffffa009d376>] ttm_bo_move_accel_cleanup+0x226/0x2a0 [ttm]
[17240.142399]  [<ffffffffa010ed80>] radeon_move_blit.clone.0+0x120/0x180 [radeon]
[17240.142414]  [<ffffffffa010f0db>] radeon_bo_move+0xbb/0x180 [radeon]
[17240.142422]  [<ffffffffa009b54e>] ttm_bo_handle_move_mem+0x12e/0x360 [ttm]
[17240.142425]  [<ffffffff81582400>] ? _raw_spin_unlock_irqrestore+0x10/0x30
[17240.142432]  [<ffffffffa009b8d0>] ttm_bo_evict+0x150/0x360 [ttm]
[17240.142439]  [<ffffffffa009fe3b>] ? ttm_eu_list_ref_sub+0x3b/0x50 [ttm]
[17240.142445]  [<ffffffffa009bc86>] ttm_mem_evict_first+0x1a6/0x250 [ttm]
[17240.142452]  [<ffffffffa009c4cd>] ttm_bo_mem_space+0x2fd/0x390 [ttm]
[17240.142459]  [<ffffffffa009c64c>] ttm_bo_move_buffer+0xec/0x160 [ttm]
[17240.142477]  [<ffffffffa001d332>] ? drm_mm_kmalloc+0x32/0xd0 [drm]
[17240.142484]  [<ffffffffa009c756>] ttm_bo_validate+0x96/0x120 [ttm]
[17240.142490]  [<ffffffffa009cade>] ttm_bo_init+0x2fe/0x380 [ttm]
[17240.142505]  [<ffffffffa010fb1d>] radeon_bo_create+0x16d/0x270 [radeon]
[17240.142520]  [<ffffffffa010f840>] ? radeon_create_ttm_backend_entry+0x50/0x50 [radeon]
[17240.142537]  [<ffffffffa01265ad>] radeon_gem_object_create+0x5d/0x100 [radeon]
[17240.142554]  [<ffffffffa0126a08>] radeon_gem_create_ioctl+0x58/0xd0 [radeon]
[17240.142570]  [<ffffffffa0126dd9>] ? radeon_gem_wait_idle_ioctl+0xf9/0x120 [radeon]
[17240.142581]  [<ffffffffa001216c>] drm_ioctl+0x3ec/0x4d0 [drm]
[17240.142598]  [<ffffffffa01269b0>] ? radeon_gem_pwrite_ioctl+0x30/0x30 [radeon]
[17240.142602]  [<ffffffff81084123>] ? __hrtimer_start_range_ns+0x193/0x460
[17240.142606]  [<ffffffff8115fdef>] do_vfs_ioctl+0x8f/0x530
[17240.142610]  [<ffffffff8114ef70>] ? vfs_read+0x120/0x180
[17240.142613]  [<ffffffff81160321>] sys_ioctl+0x91/0xa0
[17240.142616]  [<ffffffff8158a2c2>] system_call_fastpath+0x16/0x1b


i attach the full dmsg output
Comment 10 Daniel Poelzleithner 2011-06-04 00:36:02 UTC
Created attachment 60712 [details]
2.6.39 lockup
Comment 11 Alex Deucher 2012-07-02 20:21:02 UTC
Is this still an issue with a more recent kernel (3.x)?