Bug 208565

Summary: There may be dead lock for cp_rwsem during checkpoint
Product: File System Reporter: Zhiguo.Niu (Zhiguo.Niu)
Component: f2fsAssignee: Default virtual assignee for f2fs (filesystem_f2fs)
Status: CLOSED CODE_FIX    
Severity: high CC: chao, jaegeuk
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.14.181 Subsystem:
Regression: No Bisected commit-id:

Description Zhiguo.Niu 2020-07-15 12:24:37 UTC
A deadlock may happens in the following scenario, backtrace is:

PID: 257    TASK: ecdd0000  CPU: 0   COMMAND: "init"
 #0 [<c0b420ec>] (__schedule) from [<c0b423c8>]
 #1 [<c0b423c8>] (schedule) from [<c0b459d4>]
 #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>]
 #3 [<c0b44fa0>] (down_read) from [<c044233c>]
 #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>]
 #5 [<c0442890>] (f2fs_truncate) from [<c044d408>]
 #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>]
 #7 [<c030be18>] (evict) from [<c030a558>]
 #8 [<c030a558>] (iput) from [<c047c600>]
 #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>]
#10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>]
#11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>]
#12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>]
#13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>]
#14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>]
#15 [<c0324294>] (do_fsync) from [<c0324014>]
#16 [<c0324014>] (sys_fsync) from [<c0108bc0>]

f2fs_sync_node_pages tries to flush dirty inode and calls iput(). This results in deadlock as iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

There is a similar issue and patch in android.googlesource.com with commit id is 82cc5e607b56abea4ea77023650247f5ae41b515, but the call trace is:

- f2fs_sync_node_pages()
 - if (is_inline_node(page))
>  - flush_inline_data()
   - ilookup()
     page = f2fs_pagecache_get_page()
     if (!page)
      goto iput_out;
     iput_out:
			-close()
			-iput()
       iput(inode);
       - f2fs_evict_inode()
        - f2fs_truncate_blocks()
         - f2fs_lock_op()
           - down_read(&sbi->cp_rwsem);

it seems than this patch has not cover current deadlock scenario, the current deadlock call trace is:
- f2fs_sync_node_pages()

>  - if (flush_dirty_inode(page))

  - iput(inode);

       - f2fs_evict_inode()

        - f2fs_truncate_blocks()

         - f2fs_lock_op()

           - down_read(&sbi->cp_rwsem);

please help check and confirm.

thanks!
Comment 1 Jaegeuk Kim 2020-07-16 17:15:05 UTC
Thank you for the report.

I think this is valid point that we need to fix.
I'm testing a RFC patch like this. Thanks.

--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1926,8 +1926,12 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
                                goto continue_unlock;
                        }
 
-                       /* flush inline_data, if it's async context. */
-                       if (do_balance && is_inline_node(page)) {
+                       /* flush inline_data/inode, if it's async context. */
+                       if (!do_balance)
+                               goto write_node;
+
+                       /* flush inline_data */
+                       if (is_inline_node(page)) {
                                clear_inline_node(page);
                                unlock_page(page);
                                flush_inline_data(sbi, ino_of_node(page));
@@ -1940,7 +1944,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
                                if (flush_dirty_inode(page))
                                        goto lock_node;
                        }
-
+write_node:
                        f2fs_wait_on_page_writeback(page, NODE, true, true);
 
                        if (!clear_page_dirty_for_io(page))
Comment 2 Chao Yu 2020-07-17 01:54:54 UTC
(In reply to Jaegeuk Kim from comment #1)
> Thank you for the report.
> 
> I think this is valid point that we need to fix.
> I'm testing a RFC patch like this. Thanks.
> 

Could you please check generic/204 testcase? as this fix diff skips codes added in commit 052a82d85a3b ("f2fs: fix to writeout dirty inode during node flush").
Comment 3 Jaegeuk Kim 2020-07-17 02:02:21 UTC
That passed. could you check your side?
Comment 4 Chao Yu 2020-07-17 02:26:47 UTC
Passed, we should ask Eric to retest in his enviornment, to make sure there is actually no regression.
Comment 5 Zhiguo.Niu 2020-07-17 02:47:12 UTC
(In reply to Jaegeuk Kim from comment #1)
> Thank you for the report.
> 
> I think this is valid point that we need to fix.
> I'm testing a RFC patch like this. Thanks.
> 
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1926,8 +1926,12 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>                                 goto continue_unlock;
>                         }
>  
> -                       /* flush inline_data, if it's async context. */
> -                       if (do_balance && is_inline_node(page)) {
> +                       /* flush inline_data/inode, if it's async context. */
> +                       if (!do_balance)
> +                               goto write_node;
> +
> +                       /* flush inline_data */
> +                       if (is_inline_node(page)) {
>                                 clear_inline_node(page);
>                                 unlock_page(page);
>                                 flush_inline_data(sbi, ino_of_node(page));
> @@ -1940,7 +1944,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>                                 if (flush_dirty_inode(page))
>                                         goto lock_node;
>                         }
> -
> +write_node:
>                         f2fs_wait_on_page_writeback(page, NODE, true, true);
>  
>                         if (!clear_page_dirty_for_io(page))

Hi  Jaegeuk Kim

for comment 1, can you help provide a full diff for this patch, I can not apply it directly because of the following error:
> fatal: patch fragment without header at line 3: @@ -1926,8 +1926,12 @@ int
> f2fs_sync_node_pages(struct f2fs_sb_info *sbi,

or Should I merge it by Manually?

thanks a lot~
Comment 6 Zhiguo.Niu 2020-07-23 07:11:05 UTC
Hi Jaegeuk Kim

Could I merge this patch in my code for fixing the deadlock issue?

thanks!
Comment 7 yuchao0 2020-07-24 01:31:17 UTC
On 2020/7/17 1:15, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=208565
> 
> Jaegeuk Kim (jaegeuk@kernel.org) changed:
> 
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                   CC|                            |jaegeuk@kernel.org
> 
> --- Comment #1 from Jaegeuk Kim (jaegeuk@kernel.org) ---
> Thank you for the report.
> 
> I think this is valid point that we need to fix.
> I'm testing a RFC patch like this. Thanks.


Shouldn't we revert 34c061ad85a2 ("f2fs: Avoid double lock for cp_rwsem during checkpoint")
at the same time?

> 
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1926,8 +1926,12 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>                                  goto continue_unlock;
>                          }
> 
> -                       /* flush inline_data, if it's async context. */
> -                       if (do_balance && is_inline_node(page)) {
> +                       /* flush inline_data/inode, if it's async context. */
> +                       if (!do_balance)
> +                               goto write_node;
> +
> +                       /* flush inline_data */
> +                       if (is_inline_node(page)) {
>                                  clear_inline_node(page);
>                                  unlock_page(page);
>                                  flush_inline_data(sbi, ino_of_node(page));
> @@ -1940,7 +1944,7 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>                                  if (flush_dirty_inode(page))
>                                          goto lock_node;
>                          }
> -
> +write_node:
>                          f2fs_wait_on_page_writeback(page, NODE, true, true);
> 
>                          if (!clear_page_dirty_for_io(page))
>
Comment 8 Jaegeuk Kim 2020-07-24 03:21:02 UTC
Chao, why do we need to revert that patch?

Zhiguo,
You can see the patch here.
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=b0f3b87fb3abc42c81d76c6c5795f26dbdb2f04b
Comment 9 yuchao0 2020-07-24 03:38:48 UTC
On 2020/7/24 11:21, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=208565
> 
> --- Comment #8 from Jaegeuk Kim (jaegeuk@kernel.org) ---
> Chao, why do we need to revert that patch?

After applying your new patch, below race condition should no longer
happen, right?

Thread A		Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
  - f2fs_lock_all(sbi);
   - down_write(&sbi->cp_rwsem);

                         - open()
                          - igrab()
                         - write() write inline data
                         - unlink()
- f2fs_sync_node_pages()

+ if (!do_balance)
+	goto write_node;	<---- this avoids running into iput().

  - if (is_inline_node(page))
   - flush_inline_data()
    - ilookup()
      page = f2fs_pagecache_get_page()
      if (!page)
       goto iput_out;
      iput_out:
			-close()
			-iput()
        iput(inode);
        - f2fs_evict_inode()
         - f2fs_truncate_blocks()
          - f2fs_lock_op()
            - down_read(&sbi->cp_rwsem);

> 
> Zhiguo,
> You can see the patch here.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=b0f3b87fb3abc42c81d76c6c5795f26dbdb2f04b
>
Comment 10 Jaegeuk Kim 2020-07-24 03:56:06 UTC
if we revert it, we're not going to call f2fs_flush_inline_data() in checkpoint path, which actually fixes generic/204?
Comment 11 Zhiguo.Niu 2020-07-24 06:36:40 UTC
(In reply to Jaegeuk Kim from comment #10)
> if we revert it, we're not going to call f2fs_flush_inline_data() in
> checkpoint path, which actually fixes generic/204?

Hi Jaeqeuk Kim,

I still have a problem:

As new patch modified, flush_dirty_inode will be skipped for avoiding dead lock,

but when/where flush_dirty_inode would be done? 

thanks!
Comment 12 Jaegeuk Kim 2020-07-24 17:25:37 UTC
It'll be called by node_inode->writepages() -> f2fs_write_node_pages(), which should be done by flusher in vfs.