Bug 79591

Summary: possible circular locking dependency detected
Product: Drivers Reporter: Stefan Ringel (mail)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: high CC: martin.peres
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.16.0-0.rc3.git3 Subsystem:
Regression: No Bisected commit-id:
Attachments: drm/nouveau/therm: fix a potential deadlock in the therm monitoring code

Description Stefan Ringel 2014-07-07 08:11:24 UTC
[  278.448193] ======================================================
[  278.448194] [ INFO: possible circular locking dependency detected ]
[  278.448197] 3.16.0-0.rc3.git3.1.fc21.x86_64 #1 Not tainted
[  278.448198] -------------------------------------------------------
[  278.448200] Xorg.bin/1249 is trying to acquire lock:
[  278.448201]  (&(&priv->lock)->rlock#2){-.-...}, at: [<ffffffffa0108618>] nouveau_therm_update+0x48/0x350 [nouveau]
[  278.448251] 
but task is already holding lock:
[  278.448253]  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, at: [<ffffffffa010a284>] alarm_timer_callback+0x54/0xe0 [nouveau]
[  278.448273] 
which lock already depends on the new lock.

[  278.448275] 
the existing dependency chain (in reverse order) is:
[  278.448276] 
-> #1 (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}:
[  278.448279]        [<ffffffff81102104>] lock_acquire+0xa4/0x1d0
[  278.448283]        [<ffffffff818113f7>] _raw_spin_lock_irqsave+0x57/0xa0
[  278.448287]        [<ffffffffa010a284>] alarm_timer_callback+0x54/0xe0 [nouveau]
[  278.448303]        [<ffffffffa010c3b8>] nv04_timer_alarm_trigger+0x138/0x190 [nouveau]
[  278.448319]        [<ffffffffa010c470>] nv04_timer_alarm+0x60/0xd0 [nouveau]
[  278.448334]        [<ffffffffa01088d7>] nouveau_therm_update+0x307/0x350 [nouveau]
[  278.448349]        [<ffffffffa010893a>] nouveau_therm_alarm+0x1a/0x20 [nouveau]
[  278.448365]        [<ffffffffa010c3b8>] nv04_timer_alarm_trigger+0x138/0x190 [nouveau]
[  278.448380]        [<ffffffffa010c54b>] nv04_timer_intr+0x6b/0x90 [nouveau]
[  278.448395]        [<ffffffffa0105bf1>] nouveau_mc_intr+0x141/0x1c0 [nouveau]
[  278.448410]        [<ffffffff81116127>] handle_irq_event_percpu+0x77/0x340
[  278.448413]        [<ffffffff8111642d>] handle_irq_event+0x3d/0x60
[  278.448415]        [<ffffffff811193e6>] handle_edge_irq+0x66/0x130
[  278.448418]        [<ffffffff8101c3e4>] handle_irq+0x84/0x150
[  278.448421]        [<ffffffff8181482d>] do_IRQ+0x4d/0xe0
[  278.448423]        [<ffffffff81812472>] ret_from_intr+0x0/0x1a
[  278.448426]        [<ffffffff81415aba>] debug_dma_assert_idle+0xea/0x220
[  278.448429]        [<ffffffff811f3d75>] do_wp_page+0xe5/0x970
[  278.448432]        [<ffffffff811f6c9c>] handle_mm_fault+0x8ec/0xfd0
[  278.448434]        [<ffffffff81064379>] __do_page_fault+0x239/0x620
[  278.448437]        [<ffffffff81064782>] do_page_fault+0x22/0x30
[  278.448439]        [<ffffffff818139f8>] page_fault+0x28/0x30
[  278.448441] 
-> #0 (&(&priv->lock)->rlock#2){-.-...}:
[  278.448444]        [<ffffffff8110163b>] __lock_acquire+0x1abb/0x1ca0
[  278.448446]        [<ffffffff81102104>] lock_acquire+0xa4/0x1d0
[  278.448448]        [<ffffffff818113f7>] _raw_spin_lock_irqsave+0x57/0xa0
[  278.448450]        [<ffffffffa0108618>] nouveau_therm_update+0x48/0x350 [nouveau]
[  278.448465]        [<ffffffffa010893a>] nouveau_therm_alarm+0x1a/0x20 [nouveau]
[  278.448480]        [<ffffffffa010c3b8>] nv04_timer_alarm_trigger+0x138/0x190 [nouveau]
[  278.448496]        [<ffffffffa010c470>] nv04_timer_alarm+0x60/0xd0 [nouveau]
[  278.448511]        [<ffffffffa010a30d>] alarm_timer_callback+0xdd/0xe0 [nouveau]
[  278.448526]        [<ffffffffa010c3b8>] nv04_timer_alarm_trigger+0x138/0x190 [nouveau]
[  278.448542]        [<ffffffffa010c54b>] nv04_timer_intr+0x6b/0x90 [nouveau]
[  278.448557]        [<ffffffffa0105bf1>] nouveau_mc_intr+0x141/0x1c0 [nouveau]
[  278.448572]        [<ffffffff81116127>] handle_irq_event_percpu+0x77/0x340
[  278.448574]        [<ffffffff8111642d>] handle_irq_event+0x3d/0x60
[  278.448576]        [<ffffffff811193e6>] handle_edge_irq+0x66/0x130
[  278.448578]        [<ffffffff8101c3e4>] handle_irq+0x84/0x150
[  278.448581]        [<ffffffff8181482d>] do_IRQ+0x4d/0xe0
[  278.448583]        [<ffffffff81812472>] ret_from_intr+0x0/0x1a
[  278.448585]        [<ffffffff81137c72>] __module_text_address+0x12/0x70
[  278.448588]        [<ffffffff8113c196>] is_module_text_address+0x16/0x30
[  278.448590]        [<ffffffff810c566a>] __kernel_text_address+0x3a/0x90
[  278.448592]        [<ffffffff8101da72>] print_context_stack+0x62/0x100
[  278.448594]        [<ffffffff8101c620>] dump_trace+0x170/0x350
[  278.448596]        [<ffffffff8102b47b>] save_stack_trace+0x2b/0x50
[  278.448599]        [<ffffffff81413529>] dma_entry_alloc+0x59/0x90
[  278.448601]        [<ffffffff81413b8f>] debug_dma_alloc_coherent+0x2f/0x90
[  278.448603]        [<ffffffffa008f755>] ttm_dma_populate+0x545/0xaa0 [ttm]
[  278.448613]        [<ffffffffa015727c>] nouveau_ttm_tt_populate+0x14c/0x170 [nouveau]
[  278.448639]        [<ffffffffa0084d80>] ttm_tt_bind+0x40/0x80 [ttm]
[  278.448644]        [<ffffffffa008748f>] ttm_bo_handle_move_mem+0x5bf/0x650 [ttm]
[  278.448649]        [<ffffffffa00883ef>] ttm_bo_validate+0x2df/0x300 [ttm]
[  278.448654]        [<ffffffffa0088663>] ttm_bo_init+0x253/0x3b0 [ttm]
[  278.448658]        [<ffffffffa0157c82>] nouveau_bo_new+0x202/0x310 [nouveau]
[  278.448677]        [<ffffffffa015a42b>] nouveau_gem_new+0x6b/0x160 [nouveau]
[  278.448698]        [<ffffffffa015a5d6>] nouveau_gem_ioctl_new+0xb6/0x220 [nouveau]
[  278.448718]        [<ffffffffa003dcdf>] drm_ioctl+0x1df/0x6a0 [drm]
[  278.448733]        [<ffffffffa0151a45>] nouveau_drm_ioctl+0x65/0xa0 [nouveau]
[  278.448753]        [<ffffffff812628d0>] do_vfs_ioctl+0x2f0/0x520
[  278.448756]        [<ffffffff81262b81>] SyS_ioctl+0x81/0xa0
[  278.448758]        [<ffffffff818118e9>] system_call_fastpath+0x16/0x1b
[  278.448760] 
other info that might help us debug this:

[  278.448762]  Possible unsafe locking scenario:

[  278.448764]        CPU0                    CPU1
[  278.448765]        ----                    ----
[  278.448766]   lock(&(&priv->sensor.alarm_program_lock)->rlock);
[  278.448767]                                lock(&(&priv->lock)->rlock#2);
[  278.448770]                                lock(&(&priv->sensor.alarm_program_lock)->rlock);
[  278.448771]   lock(&(&priv->lock)->rlock#2);
[  278.448773] 
 *** DEADLOCK ***

[  278.448775] 2 locks held by Xorg.bin/1249:
[  278.448776]  #0:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffffa00886e1>] ttm_bo_init+0x2d1/0x3b0 [ttm]
[  278.448783]  #1:  (&(&priv->sensor.alarm_program_lock)->rlock){-.-...}, at: [<ffffffffa010a284>] alarm_timer_callback+0x54/0xe0 [nouveau]
[  278.448800] 
stack backtrace:
[  278.448803] CPU: 0 PID: 1249 Comm: Xorg.bin Not tainted 3.16.0-0.rc3.git3.1.fc21.x86_64 #1
[  278.448804] Hardware name: System manufacturer System Product Name/M4A78LT-M, BIOS 0802    08/24/2010
[  278.448806]  0000000000000000 000000007a5a7c22 ffff88011aa03b00 ffffffff81807cec
[  278.448809]  ffffffff82bc2ef0 ffff88011aa03b40 ffffffff8180508c ffff88011aa03ba0
[  278.448812]  ffff8801185b9a40 ffff8801185b99d0 0000000000000002 ffff8801185ba5a8
[  278.448815] Call Trace:
[  278.448816]  <IRQ>  [<ffffffff81807cec>] dump_stack+0x4d/0x66
[  278.448823]  [<ffffffff8180508c>] print_circular_bug+0x201/0x20f
[  278.448825]  [<ffffffff8110163b>] __lock_acquire+0x1abb/0x1ca0
[  278.448828]  [<ffffffff810242de>] ? native_sched_clock+0x2e/0xb0
[  278.448831]  [<ffffffff81102104>] lock_acquire+0xa4/0x1d0
[  278.448847]  [<ffffffffa0108618>] ? nouveau_therm_update+0x48/0x350 [nouveau]
[  278.448850]  [<ffffffff818113f7>] _raw_spin_lock_irqsave+0x57/0xa0
[  278.448866]  [<ffffffffa0108618>] ? nouveau_therm_update+0x48/0x350 [nouveau]
[  278.448882]  [<ffffffffa0108618>] nouveau_therm_update+0x48/0x350 [nouveau]
[  278.448898]  [<ffffffffa010893a>] nouveau_therm_alarm+0x1a/0x20 [nouveau]
[  278.448915]  [<ffffffffa010c3b8>] nv04_timer_alarm_trigger+0x138/0x190 [nouveau]
[  278.448931]  [<ffffffffa010c470>] nv04_timer_alarm+0x60/0xd0 [nouveau]
[  278.448948]  [<ffffffffa010a30d>] alarm_timer_callback+0xdd/0xe0 [nouveau]
[  278.448964]  [<ffffffffa010c3b8>] nv04_timer_alarm_trigger+0x138/0x190 [nouveau]
[  278.448981]  [<ffffffffa010c54b>] nv04_timer_intr+0x6b/0x90 [nouveau]
[  278.448998]  [<ffffffffa0105bf1>] nouveau_mc_intr+0x141/0x1c0 [nouveau]
[  278.449000]  [<ffffffff81116127>] handle_irq_event_percpu+0x77/0x340
[  278.449003]  [<ffffffff8111642d>] handle_irq_event+0x3d/0x60
[  278.449005]  [<ffffffff811193e6>] handle_edge_irq+0x66/0x130
[  278.449007]  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
[  278.449010]  [<ffffffff810e2145>] ? irqtime_account_irq+0xc5/0xd0
[  278.449012]  [<ffffffff8181482d>] do_IRQ+0x4d/0xe0
[  278.449015]  [<ffffffff81812472>] common_interrupt+0x72/0x72
[  278.449016]  <EOI>  [<ffffffffa0084000>] ? 0xffffffffa0083fff
[  278.449023]  [<ffffffff81137b39>] ? __module_address+0x29/0x150
[  278.449026]  [<ffffffff81137c03>] ? __module_address+0xf3/0x150
[  278.449029]  [<ffffffff81137c72>] __module_text_address+0x12/0x70
[  278.449031]  [<ffffffff8113c196>] is_module_text_address+0x16/0x30
[  278.449034]  [<ffffffff810c566a>] __kernel_text_address+0x3a/0x90
[  278.449036]  [<ffffffff8101da72>] print_context_stack+0x62/0x100
[  278.449038]  [<ffffffff8101c620>] dump_trace+0x170/0x350
[  278.449041]  [<ffffffff8102b47b>] save_stack_trace+0x2b/0x50
[  278.449043]  [<ffffffff81413529>] dma_entry_alloc+0x59/0x90
[  278.449045]  [<ffffffff81413b8f>] debug_dma_alloc_coherent+0x2f/0x90
[  278.449051]  [<ffffffffa008f755>] ttm_dma_populate+0x545/0xaa0 [ttm]
[  278.449072]  [<ffffffffa015727c>] nouveau_ttm_tt_populate+0x14c/0x170 [nouveau]
[  278.449078]  [<ffffffffa0084d80>] ttm_tt_bind+0x40/0x80 [ttm]
[  278.449084]  [<ffffffffa008748f>] ttm_bo_handle_move_mem+0x5bf/0x650 [ttm]
[  278.449089]  [<ffffffffa0087c59>] ? ttm_bo_mem_space+0x179/0x370 [ttm]
[  278.449092]  [<ffffffff810fc24f>] ? lock_release_holdtime.part.28+0xf/0x200
[  278.449098]  [<ffffffffa00883ef>] ttm_bo_validate+0x2df/0x300 [ttm]
[  278.449100]  [<ffffffff810ff72d>] ? trace_hardirqs_on_caller+0x15d/0x200
[  278.449106]  [<ffffffffa0088663>] ttm_bo_init+0x253/0x3b0 [ttm]
[  278.449126]  [<ffffffffa0157c82>] nouveau_bo_new+0x202/0x310 [nouveau]
[  278.449147]  [<ffffffffa0156660>] ? nv10_bo_put_tile_region+0x50/0x50 [nouveau]
[  278.449168]  [<ffffffffa015a42b>] nouveau_gem_new+0x6b/0x160 [nouveau]
[  278.449189]  [<ffffffffa015a5d6>] nouveau_gem_ioctl_new+0xb6/0x220 [nouveau]
[  278.449197]  [<ffffffffa003dcdf>] drm_ioctl+0x1df/0x6a0 [drm]
[  278.449201]  [<ffffffff810ff72d>] ? trace_hardirqs_on_caller+0x15d/0x200
[  278.449203]  [<ffffffff810ff7dd>] ? trace_hardirqs_on+0xd/0x10
[  278.449223]  [<ffffffffa0151a45>] nouveau_drm_ioctl+0x65/0xa0 [nouveau]
[  278.449226]  [<ffffffff812628d0>] do_vfs_ioctl+0x2f0/0x520
[  278.449228]  [<ffffffff81262b81>] SyS_ioctl+0x81/0xa0
[  278.449231]  [<ffffffff8115fb9c>] ? __audit_syscall_entry+0x9c/0xf0
[  278.449234]  [<ffffffff818118e9>] system_call_fastpath+0x16/0x1b
Comment 1 Martin Peres 2014-07-07 18:24:25 UTC
Thanks for reporting. I'll have a look at it during the week!

Please do not report Nouveau bugs to the kernel's bugtracker. We have our own which we monitor much more attentively.
Comment 2 Stefan Ringel 2014-07-07 18:28:55 UTC
and witch bugtracker is it?
Comment 3 Martin Peres 2014-07-07 19:05:33 UTC
I was referring to freedesktop's bug tracker. Reporting bugs to Nouveau is explained here: http://nouveau.freedesktop.org/wiki/Bugs/

But that's ok, no need to report it once again ;)
Comment 4 Stefan Ringel 2014-07-07 20:16:16 UTC
static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
{
	if (bo->bdev->driver->move_notify)
		bo->bdev->driver->move_notify(bo, NULL);

	if (bo->ttm) {
		ttm_tt_unbind(bo->ttm);
		ttm_tt_destroy(bo->ttm);
		bo->ttm = NULL;
	}
	ttm_bo_mem_put(bo, &bo->mem);

	ww_mutex_unlock (&bo->resv->lock);
}

The last line ? must it also change like this:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/ttm/ttm_bo.c?id=c75230833ce4fbbfaa257c07b55f97912fb1dc02
Comment 5 Martin Peres 2014-07-12 21:50:56 UTC
Created attachment 142831 [details]
drm/nouveau/therm: fix a potential deadlock in the therm monitoring code

Sorry for the wait. Can you try to reproduce the issue with this patch?
Comment 6 Stefan Ringel 2014-07-13 12:08:08 UTC
look like okey.
Comment 7 Stefan Ringel 2014-07-14 06:47:32 UTC
(In reply to Martin Peres from comment #5)

> Sorry for the wait. Can you try to reproduce the issue with this patch?

Your patch works. Thanks. I cannot reproduce it with this patch.
Comment 8 Martin Peres 2014-07-14 09:56:55 UTC
(In reply to Stefan Ringel from comment #7)
> (In reply to Martin Peres from comment #5)
> 
> > Sorry for the wait. Can you try to reproduce the issue with this patch?
> 
> Your patch works. Thanks. I cannot reproduce it with this patch.

Great! I've asked for inclusion. I'll close this bug