A deadlock may happens in the following scenario, backtrace is: PID: 257 TASK: ecdd0000 CPU: 0 COMMAND: "init" #0 [<c0b420ec>] (__schedule) from [<c0b423c8>] #1 [<c0b423c8>] (schedule) from [<c0b459d4>] #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>] #3 [<c0b44fa0>] (down_read) from [<c044233c>] #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>] #5 [<c0442890>] (f2fs_truncate) from [<c044d408>] #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>] #7 [<c030be18>] (evict) from [<c030a558>] #8 [<c030a558>] (iput) from [<c047c600>] #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>] #10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>] #11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>] #12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>] #13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>] #14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>] #15 [<c0324294>] (do_fsync) from [<c0324014>] #16 [<c0324014>] (sys_fsync) from [<c0108bc0>] f2fs_sync_node_pages tries to flush dirty inode and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the beginning by checkpoint->block_operations(). There is a similar issue and patch in android.googlesource.com with commit id is 82cc5e607b56abea4ea77023650247f5ae41b515, but the call trace is: - f2fs_sync_node_pages() - if (is_inline_node(page)) > - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); it seems than this patch has not cover current deadlock scenario, the current deadlock call trace is: - f2fs_sync_node_pages() > - if (flush_dirty_inode(page)) - iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); please help check and confirm. thanks!
Thank you for the report. I think this is valid point that we need to fix. I'm testing a RFC patch like this. Thanks. --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1926,8 +1926,12 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, goto continue_unlock; } - /* flush inline_data, if it's async context. */ - if (do_balance && is_inline_node(page)) { + /* flush inline_data/inode, if it's async context. */ + if (!do_balance) + goto write_node; + + /* flush inline_data */ + if (is_inline_node(page)) { clear_inline_node(page); unlock_page(page); flush_inline_data(sbi, ino_of_node(page)); @@ -1940,7 +1944,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, if (flush_dirty_inode(page)) goto lock_node; } - +write_node: f2fs_wait_on_page_writeback(page, NODE, true, true); if (!clear_page_dirty_for_io(page))
(In reply to Jaegeuk Kim from comment #1) > Thank you for the report. > > I think this is valid point that we need to fix. > I'm testing a RFC patch like this. Thanks. > Could you please check generic/204 testcase? as this fix diff skips codes added in commit 052a82d85a3b ("f2fs: fix to writeout dirty inode during node flush").
That passed. could you check your side?
Passed, we should ask Eric to retest in his enviornment, to make sure there is actually no regression.
(In reply to Jaegeuk Kim from comment #1) > Thank you for the report. > > I think this is valid point that we need to fix. > I'm testing a RFC patch like this. Thanks. > > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1926,8 +1926,12 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > goto continue_unlock; > } > > - /* flush inline_data, if it's async context. */ > - if (do_balance && is_inline_node(page)) { > + /* flush inline_data/inode, if it's async context. */ > + if (!do_balance) > + goto write_node; > + > + /* flush inline_data */ > + if (is_inline_node(page)) { > clear_inline_node(page); > unlock_page(page); > flush_inline_data(sbi, ino_of_node(page)); > @@ -1940,7 +1944,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > if (flush_dirty_inode(page)) > goto lock_node; > } > - > +write_node: > f2fs_wait_on_page_writeback(page, NODE, true, true); > > if (!clear_page_dirty_for_io(page)) Hi Jaegeuk Kim for comment 1, can you help provide a full diff for this patch, I can not apply it directly because of the following error: > fatal: patch fragment without header at line 3: @@ -1926,8 +1926,12 @@ int > f2fs_sync_node_pages(struct f2fs_sb_info *sbi, or Should I merge it by Manually? thanks a lot~
Hi Jaegeuk Kim Could I merge this patch in my code for fixing the deadlock issue? thanks!
On 2020/7/17 1:15, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=208565 > > Jaegeuk Kim (jaegeuk@kernel.org) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |jaegeuk@kernel.org > > --- Comment #1 from Jaegeuk Kim (jaegeuk@kernel.org) --- > Thank you for the report. > > I think this is valid point that we need to fix. > I'm testing a RFC patch like this. Thanks. Shouldn't we revert 34c061ad85a2 ("f2fs: Avoid double lock for cp_rwsem during checkpoint") at the same time? > > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1926,8 +1926,12 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > goto continue_unlock; > } > > - /* flush inline_data, if it's async context. */ > - if (do_balance && is_inline_node(page)) { > + /* flush inline_data/inode, if it's async context. */ > + if (!do_balance) > + goto write_node; > + > + /* flush inline_data */ > + if (is_inline_node(page)) { > clear_inline_node(page); > unlock_page(page); > flush_inline_data(sbi, ino_of_node(page)); > @@ -1940,7 +1944,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > if (flush_dirty_inode(page)) > goto lock_node; > } > - > +write_node: > f2fs_wait_on_page_writeback(page, NODE, true, true); > > if (!clear_page_dirty_for_io(page)) >
Chao, why do we need to revert that patch? Zhiguo, You can see the patch here. https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=b0f3b87fb3abc42c81d76c6c5795f26dbdb2f04b
On 2020/7/24 11:21, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=208565 > > --- Comment #8 from Jaegeuk Kim (jaegeuk@kernel.org) --- > Chao, why do we need to revert that patch? After applying your new patch, below race condition should no longer happen, right? Thread A Thread B f2fs_write_checkpoint() - block_operations(sbi) - f2fs_lock_all(sbi); - down_write(&sbi->cp_rwsem); - open() - igrab() - write() write inline data - unlink() - f2fs_sync_node_pages() + if (!do_balance) + goto write_node; <---- this avoids running into iput(). - if (is_inline_node(page)) - flush_inline_data() - ilookup() page = f2fs_pagecache_get_page() if (!page) goto iput_out; iput_out: -close() -iput() iput(inode); - f2fs_evict_inode() - f2fs_truncate_blocks() - f2fs_lock_op() - down_read(&sbi->cp_rwsem); > > Zhiguo, > You can see the patch here. > > https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=b0f3b87fb3abc42c81d76c6c5795f26dbdb2f04b >
if we revert it, we're not going to call f2fs_flush_inline_data() in checkpoint path, which actually fixes generic/204?
(In reply to Jaegeuk Kim from comment #10) > if we revert it, we're not going to call f2fs_flush_inline_data() in > checkpoint path, which actually fixes generic/204? Hi Jaeqeuk Kim, I still have a problem: As new patch modified, flush_dirty_inode will be skipped for avoiding dead lock, but when/where flush_dirty_inode would be done? thanks!
It'll be called by node_inode->writepages() -> f2fs_write_node_pages(), which should be done by flusher in vfs.