Bug 41682
Summary: | INFO: possible circular locking dependency detected .. at ext4_evict_inode | ||
---|---|---|---|
Product: | File System | Reporter: | Witold Baryluk (witold.baryluk+kernel) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | alan, kernel, penguin-kernel, sedat.dilek, sruffell |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.1.0-rc3+ | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Kernel config |
Description
Witold Baryluk
2011-08-24 21:18:14 UTC
Created attachment 70122 [details]
Kernel config
This build is of mainline kernel, and already have this commit commit c063d8a60fc912ae198f054608ad461a69dc9a19 Merge: 79058c4 dccaf33 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Sun Aug 21 06:59:41 2011 -0700 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: flush any pending end_io requests before DIO reads w/dioread_nolock ext4: fix nomblk_io_submit option so it correctly converts uninit blocks ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode ext4: Fix ext4_should_writeback_data() for no-journal mode So, it includes commit 2581fdc810889fdea97689cb62481201d579c796 Author: Jiaying Zhang <jiayingz@google.com> Date: Sat Aug 13 12:17:13 2011 -0400 ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode which should actually remove deadlock. So probably some other nasty bug was introduced, or my configuration is very strange (and it is). I got similar one. Is this known? Mounting ext3 using CONFIG_EXT4_USE_FOR_EXT23=y . [12693.651048] [12693.651051] ================================= [12693.656823] [ INFO: inconsistent lock state ] [12693.677329] 3.1.0-rc3-ccs #2 [12693.679755] --------------------------------- [12693.683244] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [12693.688326] kswapd0/480 [HC0[0]:SC0[0]:HE1:SE1] takes: [12693.719783] (&sb->s_type->i_mutex_key#8){+.+.?.}, at: [<c112e21a>] ext4_evict_inode+0x1a/0x390 [12693.731590] {RECLAIM_FS-ON-W} state was registered at: [12693.762451] [<c106fab2>] mark_held_locks+0x92/0xf0 [12693.768854] [<c10700f8>] __lockdep_trace_alloc+0x48/0x80 [12693.775581] [<c1070172>] lockdep_trace_alloc+0x42/0x60 [12693.782109] [<c10cc4b7>] kmem_cache_alloc+0x27/0x110 [12693.812808] [<c10e458e>] __d_alloc+0x1e/0x150 [12693.819798] [<c10e46db>] d_alloc+0x1b/0x70 [12693.825203] [<c10da750>] d_alloc_and_lookup+0x20/0x60 [12693.855996] [<c10da94a>] do_lookup+0x16a/0x250 [12693.864001] [<c10db5fc>] path_lookupat+0xac/0x5f0 [12693.870089] [<c10dbb66>] do_path_lookup+0x26/0xc0 [12693.876151] [<c10dbecc>] user_path_at+0x3c/0x80 [12693.906933] [<c10ed550>] sys_pivot_root+0x60/0x230 [12693.913199] [<c13d7e61>] syscall_call+0x7/0xb [12693.919067] irq event stamp: 2056703 [12693.923434] hardirqs last enabled at (2056703): [<c10a5fdc>] free_hot_cold_page+0xdc/0x140 [12693.958561] hardirqs last disabled at (2056702): [<c10a5f6d>] free_hot_cold_page+0x6d/0x140 [12693.968638] softirqs last enabled at (2056670): [<c1044c61>] __do_softirq+0xa1/0x130 [12694.003380] softirqs last disabled at (2056655): [<c1004ce5>] call_on_stack+0x15/0x30 [12694.012930] [12694.012933] other info that might help us debug this: [12694.046175] Possible unsafe locking scenario: [12694.046181] [12694.053930] CPU0 [12694.057055] ---- [12694.060178] lock(&sb->s_type->i_mutex_key); [12694.114766] <Interrupt> [12694.118110] lock(&sb->s_type->i_mutex_key); [12694.124029] [12694.124031] *** DEADLOCK *** [12694.124033] [12694.156906] 2 locks held by kswapd0/480: [12694.162522] #0: (shrinker_rwsem){++++..}, at: [<c10ac5dc>] shrink_slab+0x2c/0x1e0 [12694.172667] #1: (&type->s_umount_key#20){+++++.}, at: [<c10d279a>] grab_super_passive+0x5a/0x80 [12694.208654] [12694.208658] stack backtrace: [12694.214322] Pid: 480, comm: kswapd0 Not tainted 3.1.0-rc3-ccs #2 [12694.246558] Call Trace: [12694.249721] [<c106f2fe>] print_usage_bug+0x16e/0x1b0 [12694.256060] [<c106f9ca>] mark_lock_irq+0x1ea/0x240 [12694.261860] [<c106f510>] ? print_irq_inversion_bug+0x1d0/0x1d0 [12694.293211] [<c10705ac>] mark_lock+0x21c/0x3c0 [12694.298676] [<c1070243>] mark_irqflags+0xb3/0x180 [12694.304386] [<c107100d>] __lock_acquire+0x37d/0x8f0 [12694.310277] [<c106ec27>] ? validate_chain+0x2a7/0x530 [12694.341481] [<c10705ac>] ? mark_lock+0x21c/0x3c0 [12694.347577] [<c10725da>] lock_acquire+0x7a/0xa0 [12694.353106] [<c112e21a>] ? ext4_evict_inode+0x1a/0x390 [12694.359303] [<c13d5965>] mutex_lock_nested+0x45/0x280 [12694.390984] [<c112e21a>] ? ext4_evict_inode+0x1a/0x390 [12694.397262] [<c112e21a>] ? ext4_evict_inode+0x1a/0x390 [12694.403439] [<c10722d7>] ? __lock_release+0x47/0x70 [12694.433772] [<c112e21a>] ext4_evict_inode+0x1a/0x390 [12694.466113] [<c10e746f>] evict+0x11f/0x180 [12694.471237] [<c10e74fc>] dispose_list+0x2c/0x40 [12694.476780] [<c10e77ee>] prune_icache_sb+0xce/0x240 [12694.482694] [<c10d223f>] prune_super+0xaf/0x130 [12694.513169] [<c10ac712>] shrink_slab+0x162/0x1e0 [12694.518857] [<c10ae3e8>] ? shrink_zone+0xf8/0x140 [12694.524569] [<c10aed47>] balance_pgdat+0x3d7/0x440 [12694.530376] [<c106fc50>] ? trace_hardirqs_on_caller+0x90/0x100 [12694.562185] [<c10aef8e>] kswapd+0xbe/0x160 [12694.567365] [<c105a3a5>] kthread+0x75/0x80 [12694.572419] [<c10aeed0>] ? kswapd_try_to_sleep+0x120/0x120 [12694.604239] [<c105a330>] ? kthread_data+0x20/0x20 [12694.610065] [<c13d8b5a>] kernel_thread_helper+0x6/0xd I also have CONFIG_EXT4_USE_FOR_EXT23=y , hower I use it only to mount /boot (ext3), rest of file systems are ext4. So it XT4_USE_FOR_EXT23=y is not an issue IMHO. Your call trace is somehow similar that it is also in ext4_evict_inode code. However deadlock is detected in simpler condition (single CPU, with some bug due re-aquiring lock by same task). Next one, after reboot. (I was trying to revert commit 2581fdc810889fdea97689cb62481201d579c796, however I do not know if I booted using it, or no :/, so assume it is still with this commit). [ 5473.013620] [ 5473.013622] ======================================================= [ 5473.013626] [ INFO: possible circular locking dependency detected ] [ 5473.013630] 3.1.0-rc3-t43-prod-00098-gd8e9618-dirty #11 [ 5473.013632] ------------------------------------------------------- [ 5473.013635] mplayer/13337 is trying to acquire lock: [ 5473.013639] (&fb_info->lock){+.+.+.}, at: [<c1276f37>] fb_release+0x17/0x50 [ 5473.013652] [ 5473.013653] but task is already holding lock: [ 5473.013656] (&mm->mmap_sem){++++++}, at: [<c10bbac2>] sys_munmap+0x22/0x50 [ 5473.013665] [ 5473.013666] which lock already depends on the new lock. [ 5473.013667] [ 5473.013670] [ 5473.013670] the existing dependency chain (in reverse order) is: [ 5473.013673] [ 5473.013674] -> #2 (&mm->mmap_sem){++++++}: [ 5473.013679] [<c1073e58>] lock_acquire+0x78/0xa0 [ 5473.013687] [<c10b2c5b>] might_fault+0x6b/0x90 [ 5473.013691] [<f8bd1a00>] drm_mode_getresources+0x180/0x580 [drm] [ 5473.013715] [<f8bc5143>] drm_ioctl+0x1e3/0x500 [drm] [ 5473.013725] [<c10ec7df>] do_vfs_ioctl+0x7f/0x5b0 [ 5473.013730] [<c10ecd3d>] sys_ioctl+0x2d/0x60 [ 5473.013734] [<c16307dc>] sysenter_do_call+0x12/0x3c [ 5473.013741] [ 5473.013742] -> #1 (&dev->mode_config.mutex){+.+.+.}: [ 5473.013748] [<c1073e58>] lock_acquire+0x78/0xa0 [ 5473.013752] [<c162de94>] mutex_lock_nested+0x44/0x2c0 [ 5473.013757] [<f8c50db3>] drm_fb_helper_set_par+0x33/0xd0 [drm_kms_helper] [ 5473.013764] [<c1276cec>] fb_set_var+0x17c/0x390 [ 5473.013769] [<c1277bec>] do_fb_ioctl+0x25c/0x4b0 [ 5473.013774] [<c1277e8b>] fb_ioctl+0x4b/0x60 [ 5473.013778] [<c10ec7df>] do_vfs_ioctl+0x7f/0x5b0 [ 5473.013783] [<c10ecd3d>] sys_ioctl+0x2d/0x60 [ 5473.013787] [<c16307dc>] sysenter_do_call+0x12/0x3c [ 5473.013791] [ 5473.013792] -> #0 (&fb_info->lock){+.+.+.}: [ 5473.013797] [<c107373c>] __lock_acquire+0x13cc/0x1640 [ 5473.013802] [<c1073e58>] lock_acquire+0x78/0xa0 [ 5473.013806] [<c162de94>] mutex_lock_nested+0x44/0x2c0 [ 5473.013811] [<c1276f37>] fb_release+0x17/0x50 [ 5473.013815] [<c10dde9e>] fput+0xbe/0x1e0 [ 5473.013820] [<c10b94bd>] remove_vma+0x2d/0x60 [ 5473.013825] [<c10ba462>] do_munmap+0x182/0x2d0 [ 5473.013830] [<c10bbacf>] sys_munmap+0x2f/0x50 [ 5473.013834] [<c16307dc>] sysenter_do_call+0x12/0x3c [ 5473.013839] [ 5473.013840] other info that might help us debug this: [ 5473.013841] [ 5473.013843] Chain exists of: [ 5473.013844] &fb_info->lock --> &dev->mode_config.mutex --> &mm->mmap_sem [ 5473.013852] [ 5473.013854] Possible unsafe locking scenario: [ 5473.013855] [ 5473.013857] CPU0 CPU1 [ 5473.013859] ---- ---- [ 5473.013861] lock(&mm->mmap_sem); [ 5473.013865] lock(&dev->mode_config.mutex); [ 5473.013869] lock(&mm->mmap_sem); [ 5473.013873] lock(&fb_info->lock); [ 5473.013877] [ 5473.013878] *** DEADLOCK *** [ 5473.013879] [ 5473.013882] 1 lock held by mplayer/13337: [ 5473.013884] #0: (&mm->mmap_sem){++++++}, at: [<c10bbac2>] sys_munmap+0x22/0x50 [ 5473.013892] [ 5473.013892] stack backtrace: [ 5473.013897] Pid: 13337, comm: mplayer Not tainted 3.1.0-rc3-t43-prod-00098-gd8e9618-dirty #11 [ 5473.013900] Call Trace: [ 5473.013906] [<c1627339>] ? printk+0x18/0x1a [ 5473.013911] [<c16279e2>] print_circular_bug+0x1de/0x1e8 [ 5473.013916] [<c107373c>] __lock_acquire+0x13cc/0x1640 [ 5473.013921] [<c1073e58>] lock_acquire+0x78/0xa0 [ 5473.013925] [<c1276f37>] ? fb_release+0x17/0x50 [ 5473.013929] [<c162de94>] mutex_lock_nested+0x44/0x2c0 [ 5473.013934] [<c1276f37>] ? fb_release+0x17/0x50 [ 5473.013938] [<c10b363c>] ? tlb_finish_mmu+0xc/0x40 [ 5473.013943] [<c10b96ca>] ? unmap_region+0xba/0xd0 [ 5473.013947] [<c1276f37>] fb_release+0x17/0x50 [ 5473.013951] [<c10dde9e>] fput+0xbe/0x1e0 [ 5473.013955] [<c10b94bd>] remove_vma+0x2d/0x60 [ 5473.013960] [<c10ba462>] do_munmap+0x182/0x2d0 [ 5473.013964] [<c10bbacf>] sys_munmap+0x2f/0x50 [ 5473.013969] [<c16307dc>] sysenter_do_call+0x12/0x3c Hmm. Sorry, last one probably have nothing to do with ext4. But, may help. It looks like this is already being worked, but I thought I would add my backtrace, which looks nearly identical to the one from Tetsuo Handa on 2011-08-25 06:49:29. [ 3188.604654] ================================= [ 3188.610014] [ INFO: inconsistent lock state ] [ 3188.610014] 3.1.0-rc3.sruffelldbg #13 [ 3188.610014] --------------------------------- [ 3188.610014] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. [ 3188.610014] kswapd0/26 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 3188.610014] (&sb->s_type->i_mutex_key#11){+.+.?.}, at: [<ffffffff81224a84>] ext4_evict_inode+0x34/0x4a0 [ 3188.610014] {RECLAIM_FS-ON-W} state was registered at: [ 3188.610014] [<ffffffff810a2fb7>] mark_held_locks+0xb7/0x150 [ 3188.610014] [<ffffffff810a30ef>] lockdep_trace_alloc+0x9f/0xd0 [ 3188.610014] [<ffffffff8116427a>] kmem_cache_alloc+0x3a/0x160 [ 3188.610014] [<ffffffff8118d7f7>] __d_alloc+0x37/0x1a0 [ 3188.610014] [<ffffffff8118d9f8>] d_alloc+0x28/0x80 [ 3188.610014] [<ffffffff811801bd>] d_alloc_and_lookup+0x2d/0x90 [ 3188.610014] [<ffffffff81180773>] do_lookup+0x263/0x380 [ 3188.610014] [<ffffffff811825ac>] path_lookupat+0x12c/0x7a0 [ 3188.690010] [<ffffffff81182c51>] do_path_lookup+0x31/0xc0 [ 3188.690010] [<ffffffff81182d7a>] kern_path+0x2a/0x50 [ 3188.690010] [<ffffffff81196ed7>] do_mount+0xa7/0x240 [ 3188.690010] [<ffffffff81197100>] sys_mount+0x90/0xe0 [ 3188.690010] [<ffffffff815ff442>] system_call_fastpath+0x16/0x1b [ 3188.690010] irq event stamp: 1383041 [ 3188.690010] hardirqs last enabled at (1383041): [<ffffffff810e2b51>] __call_rcu+0x101/0x1e0 [ 3188.690010] hardirqs last disabled at (1383040): [<ffffffff810e2acc>] __call_rcu+0x7c/0x1e0 [ 3188.690010] softirqs last enabled at (1380030): [<ffffffff8106d1a6>] __do_softirq+0x146/0x230 [ 3188.690010] softirqs last disabled at (1380015): [<ffffffff8160173c>] call_softirq+0x1c/0x30 [ 3188.690010] [ 3188.690010] other info that might help us debug this: [ 3188.690010] Possible unsafe locking scenario: [ 3188.690010] [ 3188.690010] CPU0 [ 3188.690010] ---- [ 3188.690010] lock(&sb->s_type->i_mutex_key); [ 3188.690010] <Interrupt> [ 3188.690010] lock(&sb->s_type->i_mutex_key); [ 3188.690010] [ 3188.690010] *** DEADLOCK *** [ 3188.690010] [ 3188.690010] 2 locks held by kswapd0/26: [ 3188.690010] #0: (shrinker_rwsem){++++..}, at: [<ffffffff8112da8c>] shrink_slab+0x3c/0x2f0 [ 3188.690010] #1: (&type->s_umount_key#22){+++++.}, at: [<ffffffff81177b94>] grab_super_passive+0x54/0xa0 [ 3188.690010] [ 3188.690010] stack backtrace: [ 3188.690010] Pid: 26, comm: kswapd0 Not tainted 3.1.0-rc3.sruffelldbg #13 [ 3188.690010] Call Trace: [ 3188.690010] [<ffffffff815f6797>] ? _raw_spin_unlock_irqrestore+0x67/0x70 [ 3188.690010] [<ffffffff810a2630>] print_usage_bug+0x1a0/0x1f0 [ 3188.690010] [<ffffffff810a39b0>] ? print_irq_inversion_bug+0x220/0x220 [ 3188.690010] [<ffffffff810a2797>] mark_lock_irq+0x117/0x2d0 [ 3188.690010] [<ffffffff810a2add>] mark_lock+0x18d/0x410 [ 3188.690010] [<ffffffff8109248f>] ? local_clock+0x6f/0x80 [ 3188.690010] [<ffffffff810a2e16>] mark_irqflags+0xb6/0x1a0 [ 3188.690010] [<ffffffff810a65ec>] __lock_acquire+0x20c/0x930 [ 3188.690010] [<ffffffff810a73e4>] lock_acquire+0xa4/0x130 [ 3188.690010] [<ffffffff81224a84>] ? ext4_evict_inode+0x34/0x4a0 [ 3188.690010] [<ffffffff8109248f>] ? local_clock+0x6f/0x80 [ 3188.690010] [<ffffffff815f4637>] mutex_lock_nested+0x57/0x340 [ 3188.690010] [<ffffffff81224a84>] ? ext4_evict_inode+0x34/0x4a0 [ 3188.690010] [<ffffffff810a08a5>] ? lock_release_holdtime+0x35/0x180 [ 3188.690010] [<ffffffff81224a84>] ? ext4_evict_inode+0x34/0x4a0 [ 3188.690010] [<ffffffff81190001>] ? evict+0x91/0x1a0 [ 3188.690010] [<ffffffff81224a84>] ext4_evict_inode+0x34/0x4a0 [ 3188.690010] [<ffffffff8119000f>] evict+0x9f/0x1a0 [ 3188.690010] [<ffffffff811906c7>] dispose_list+0x47/0x60 [ 3188.690010] [<ffffffff81190832>] prune_icache_sb+0x152/0x320 [ 3188.690010] [<ffffffff81177d2b>] prune_super+0x14b/0x1a0 [ 3188.690010] [<ffffffff8112dbe4>] shrink_slab+0x194/0x2f0 [ 3188.690010] [<ffffffff8112d6d5>] ? shrink_zone+0x175/0x270 [ 3188.690010] [<ffffffff815f67cb>] ? _raw_spin_unlock+0x2b/0x40 [ 3188.690010] [<ffffffff8112e896>] balance_pgdat+0x736/0x940 [ 3188.690010] [<ffffffff8112eaa0>] ? balance_pgdat+0x940/0x940 [ 3188.690010] [<ffffffff8112eba9>] kswapd+0x109/0x220 [ 3188.690010] [<ffffffff8112eaa0>] ? balance_pgdat+0x940/0x940 [ 3188.690010] [<ffffffff8108ab66>] kthread+0xb6/0xc0 [ 3188.690010] [<ffffffff810a3335>] ? trace_hardirqs_on_caller+0x125/0x1b0 [ 3188.690010] [<ffffffff81601644>] kernel_thread_helper+0x4/0x10 [ 3188.690010] [<ffffffff815f6a34>] ? retint_restore_args+0x13/0x13 [ 3188.690010] [<ffffffff8108aab0>] ? __init_kthread_worker+0x70/0x70 [ 3188.690010] [<ffffffff81601640>] ? gs_change+0x13/0x13 Should be fixed by "ext4: fix potential deadlock in ext4_evict_inode()" [1]. - Sedat - [1] http://git.us.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commit;h=a840f388e87f75c18c51e1d868fdcd2ef23e9cea I believe I saw some fix for this (or one of this deadlocks) in mainline kernel. Cannot find it now. I applied Ts'o patch, and indeed deadlock is not occurring. However I have some problems (crashes, lockups) with testing other subsystems now (wirless), so cannot be 100% sure if everything is fixed. Official patch from Ted: "ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining" [1] See also "[GIT PULL] ext4 fixes for 3.1-rc5" [2]. ( Patch is not in mainline yet. ) [1] https://github.com/tytso/ext4/commit/8c0bec2151a47906bf779c6715a10ce04453ab77 [2] http://lkml.org/lkml/2011/8/31/262 Probably this comment isn't needed, but I too have applied the patch and haven't had any other strange behavior on 3.1-rc4. Cannot reproduce anymore at 3.2.4 or 3.3-rc2+. I think this can be closed. |