Bug 219022
Summary: | UBIFS/ext4: Deadlock happens while getting other inodes in the inode evicting process under inode lru traversing context | ||
---|---|---|---|
Product: | File System | Reporter: | Zhihao Cheng (chengzhihao1) |
Component: | Other | Assignee: | fs_other |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
diff
a.c diff_2 a_2.c |
Created attachment 306554 [details]
a.c
Description(ext4): 1. File A has inode i_reg and an ea inode i_ea 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea 3. Then, following three processes running like this: PA PB echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // i_reg is added into lru, lru->i_ea->i_reg prune_icache_sb list_lru_walk_one inode_lru_isolate i_ea->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(i_reg) spin_unlock(&i_reg->i_lock) spin_unlock(lru_lock) rm file A i_reg->nlink = 0 iput(i_reg) // i_reg->nlink is 0, do evict ext4_evict_inode ext4_xattr_delete_inode ext4_xattr_inode_dec_ref_all ext4_xattr_inode_iget ext4_iget(i_ea->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(i_ea) ----→ AA deadlock dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&i_ea->i_state) Reproducer: CONFIG_DETECT_HUNG_TASK=y CONFIG_EXT4_FS=y rootfs=xfs (none ext4) -smp 1 // single core, make all inodes are put into same lru list 1. Apply diff_2 and compile kernel 2. gcc -oaa a_2.c -lpthread 3. ./aa [ 68.021371] Add 14 lru [ 68.035222] Add 13 lru [ 68.035744] <1164> try isolate 14 0 0 [ 68.036507] add inode 14 into dispose list [ 68.037317] <1164> try isolate 13 0 1 [ 68.042992] wait unlink file_a [ 69.027971] wait unlink file_a done [ 69.028901] <1164> prepare to relase ia [ 69.030480] <1164> get xattr 14 [ 69.031318] <1164> wait inode 14 I_FREEEING [ 92.432041] INFO: task bb:1164 blocked for more than 15 seconds. [ 92.433606] Not tainted 6.10.0-rc7-00038-gbf66a9390c24-dirty #715 [ 92.435248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 92.437157] task:bb state:D stack:0 pid:1164 tgid:1028 ppid:890 flags:0x00004000 [ 92.437171] Call Trace: [ 92.437175] <TASK> [ 92.437182] __schedule+0x591/0x1270 [ 92.437232] schedule+0x37/0x160 [ 92.437238] __wait_on_freeing_inode+0xd4/0x130 [ 92.437250] ? __pfx_wake_bit_function+0x10/0x10 [ 92.437262] find_inode_fast+0xde/0x1b0 [ 92.437271] iget_locked+0x7d/0x360 [ 92.437279] __ext4_iget+0x19e/0x1710 [ 92.437289] ? __wake_up_klogd+0x69/0xe0 [ 92.437300] ? vprintk_emit+0x2fd/0x470 [ 92.437307] ext4_xattr_inode_iget+0x4a/0x1a0 [ 92.437315] ext4_xattr_inode_dec_ref_all+0xce/0x580 [ 92.437324] ext4_xattr_delete_inode+0x445/0x580 [ 92.437333] ext4_evict_inode+0x33b/0xa90 [ 92.437343] evict+0x12c/0x2e0 [ 92.437351] iput+0x21e/0x3b0 [ 92.437359] inode_lru_isolate+0x407/0x520 [ 92.437368] __list_lru_walk_one+0xc8/0x300 [ 92.437375] ? __pfx_inode_lru_isolate+0x10/0x10 [ 92.437385] ? __pfx_inode_lru_isolate+0x10/0x10 [ 92.437392] list_lru_walk_one+0x6d/0xb0 [ 92.437416] prune_icache_sb+0x52/0x90 [ 92.437426] super_cache_scan+0x14a/0x200 [ 92.437435] do_shrink_slab+0x1c8/0x5c0 [ 92.437446] shrink_slab+0x5c1/0x7b0 [ 92.437458] drop_slab+0xc9/0x1a0 [ 92.437466] drop_caches_sysctl_handler+0xd2/0x140 [ 92.437476] proc_sys_call_handler+0x1d0/0x2f0 [ 92.437485] proc_sys_write+0x1b/0x30 [ 92.437492] vfs_write+0x243/0x6c0 [ 92.437503] ksys_write+0x7f/0x170 [ 92.437512] __x64_sys_write+0x21/0x30 [ 92.437521] x64_sys_call+0x4531/0x4560 [ 92.437531] do_syscall_64+0xa7/0x230 [ 92.437542] entry_SYSCALL_64_after_hwframe+0x76/0x7e Created attachment 306560 [details]
diff_2
Created attachment 306561 [details]
a_2.c
|
Created attachment 306553 [details] diff Description: The inodes reclaiming process(See function prune_icache_sb) collects all reclaimable inodes and mark them with I_FREEING flag at first, at that time, other processes will be stuck if they try getting these inodes(See function find_inode_fast), then the reclaiming process destroy the inodes. In deleted inode writing function ubifs_jnl_write_inode(), UBIFS holds BASEHD's wbuf->io_mutex while traversing all xattr inodes, which could race with inodes reclaiming process(The reclaiming process could try locking BASEHD's wbuf->io_mutex in inode evicting function), then an ABBA deadlock problem would happens as following: 1. File A has inode ia and a xattr(with inode ixa), regular file B has inode ib and a xattr. 2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa 3. Then, following three processes running like this: PA PB PC echo 2 > /proc/sys/vm/drop_caches // ib and ia area added into lru, lru->ixa->ib->ia shrink_slab prune_icache_sb list_lru_walk_one inode_lru_isolate ixa->inode->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(ib) spin_unlock(&ib->i_lock) spin_unlock(lru_lock) rm file B ib->nlink = 0 iput(ib) rm file A iput(ia) ubifs_evict_inode(ia) ubifs_jnl_delete_inode(ia) ubifs_jnl_write_inode(ia) make_reservation(BASEHD) // Lock wbuf->io_mutex ubifs_iget(ixa->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(ixa) | iput(ib) // ib->nlink is 0, do evict | ubifs_evict_inode | ubifs_jnl_delete_inode(ib) ↓ ubifs_jnl_write_inode ABBA deadlock ←-----make_reservation(BASEHD) dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&inode->i_state) Reproducer: CONFIG_DETECT_HUNG_TASK=y CONFIG_MTD_NAND_NANDSIM=m CONFIG_MTD_UBI=m CONFIG_UBIFS_FS=m -smp 1 // single core, make all inodes are put into same lru list 1. Apply diff and compile kernel 2. gcc -oaa a.c -lpthread 3. ./aa [ 45.237580] Add 66 lru [ 46.255128] Add 67 lru [ 46.257548] Add 65 lru [ 46.258042] <1545> try isolate 66 [ 46.258735] add inode 66 into dispose list [ 46.259552] <1545> try isolate 67 [ 46.262521] wait unlink file_b [ 47.246328] wait unlink file_b done [ 47.247295] wait unlink file_a [ 49.239449] <1504> wait inode 66 I_FREEEING [ 49.292337] wait unlink file_a done [ 49.293586] <1545> prepare to relase ib [ 76.623348] INFO: task aa:1504 blocked for more than 15 seconds. [ 76.624921] Not tainted 6.10.0-rc7-00026-g828740a7415c-dirty #680 [ 76.626583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 76.628535] task:aa state:D stack:0 pid:1504 tgid:1504 ppid:1404 flags:0x00000002 [ 76.628549] Call Trace: [ 76.628553] <TASK> [ 76.628561] __schedule+0x591/0x1270 [ 76.628593] schedule+0x37/0x160 [ 76.628600] __wait_on_freeing_inode+0xd4/0x130 [ 76.628611] ? __pfx_wake_bit_function+0x10/0x10 [ 76.628623] find_inode_fast+0xde/0x1b0 [ 76.628632] iget_locked+0x7d/0x360 [ 76.628643] ubifs_iget+0x4f/0x7b0 [ubifs] [ 76.628725] ubifs_jnl_write_inode+0x1da/0x790 [ubifs] [ 76.628794] ? xas_load+0x15/0x200 [ 76.628803] ? xa_load+0x93/0xf0 [ 76.628811] ? __inode_wait_for_writeback+0x8b/0x120 [ 76.628821] ubifs_jnl_delete_inode+0x50/0x190 [ubifs] [ 76.628888] ubifs_evict_inode+0x161/0x1e0 [ubifs] [ 76.628963] evict+0x12c/0x2e0 [ 76.628971] iput+0x21e/0x3b0 [ 76.628978] do_unlinkat+0x167/0x4a0 [ 76.628989] __x64_sys_unlink+0x3b/0x60 [ 76.628997] x64_sys_call+0x24de/0x4560 [ 76.629007] do_syscall_64+0xa7/0x230 [ 76.629018] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 76.629029] RIP: 0033:0x7ff513501b77 [ 76.629049] RSP: 002b:00007ffdf6d8f288 EFLAGS: 00000206 ORIG_RAX: 0000000000000057 [ 76.629058] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff513501b77 [ 76.629063] RDX: 00007ff5137d1740 RSI: 00007ff50c0008c0 RDI: 0000000000400ead [ 76.629067] RBP: 00007ffdf6d8f3b0 R08: 00007ff513e07700 R09: 0000000000000000 [ 76.629071] R10: 0000000000000004 R11: 0000000000000206 R12: 0000000000400850 [ 76.629075] R13: 00007ffdf6d8f490 R14: 0000000000000000 R15: 0000000000000000 [ 76.629082] </TASK> [ 76.629085] INFO: task bb:1545 blocked for more than 15 seconds. [ 76.630599] Not tainted 6.10.0-rc7-00026-g828740a7415c-dirty #680 [ 76.632262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 76.634185] task:bb state:D stack:0 pid:1545 tgid:1504 ppid:1404 flags:0x00004002 [ 76.634194] Call Trace: [ 76.634197] <TASK> [ 76.634201] __schedule+0x591/0x1270 [ 76.634208] ? stack_depot_save+0x16/0x30 [ 76.634240] schedule+0x37/0x160 [ 76.634246] schedule_preempt_disabled+0x25/0x50 [ 76.634253] __mutex_lock.constprop.0+0x4a3/0x990 [ 76.634262] ? __link_object+0x194/0x240 [ 76.634274] __mutex_lock_slowpath+0x1f/0x30 [ 76.634281] mutex_lock+0x56/0x70 [ 76.634289] make_reservation+0xe1/0xb00 [ubifs] [ 76.634357] ubifs_jnl_write_inode+0x10a/0x790 [ubifs] [ 76.634425] ? prb_read_valid+0x23/0x40 [ 76.634434] ? console_unlock+0x5c/0x180 [ 76.634439] ? __irq_work_queue_local+0x51/0x1b0 [ 76.634447] ? xas_load+0x15/0x200 [ 76.634455] ? xa_load+0x93/0xf0 [ 76.634463] ? __inode_wait_for_writeback+0x8b/0x120 [ 76.634471] ubifs_jnl_delete_inode+0x50/0x190 [ubifs] [ 76.634537] ubifs_evict_inode+0x161/0x1e0 [ubifs] [ 76.634604] evict+0x12c/0x2e0 [ 76.634611] iput+0x21e/0x3b0 [ 76.634619] inode_lru_isolate+0x424/0x540 [ 76.634628] __list_lru_walk_one+0xc8/0x300 [ 76.634635] ? __pfx_inode_lru_isolate+0x10/0x10 [ 76.634645] ? __pfx_inode_lru_isolate+0x10/0x10 [ 76.634652] list_lru_walk_one+0x6d/0xb0 [ 76.634660] prune_icache_sb+0x52/0x90 [ 76.634670] super_cache_scan+0x14a/0x200 [ 76.634679] do_shrink_slab+0x1c8/0x5c0 [ 76.634691] shrink_slab+0x5c1/0x7b0 [ 76.634702] drop_slab+0xc9/0x1a0 [ 76.634711] drop_caches_sysctl_handler+0xd2/0x140 [ 76.634721] proc_sys_call_handler+0x1d0/0x2f0 [ 76.634730] proc_sys_write+0x1b/0x30 [ 76.634736] vfs_write+0x243/0x6c0 [ 76.634748] ksys_write+0x7f/0x170 [ 76.634756] __x64_sys_write+0x21/0x30 [ 76.634765] x64_sys_call+0x4531/0x4560 [ 76.634773] do_syscall_64+0xa7/0x230 [ 76.634782] entry_SYSCALL_64_after_hwframe+0x76/0x7e