Bug 217490 - Wrongly judgement for buffer head removing
Summary: Wrongly judgement for buffer head removing
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-26 12:52 UTC by Zhihao Cheng
Modified: 2023-05-26 14:12 UTC (History)
1 user (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
diff_1 (8.34 KB, patch)
2023-05-26 12:53 UTC, Zhihao Cheng
Details | Diff
test_1.sh (443 bytes, application/x-shellscript)
2023-05-26 12:54 UTC, Zhihao Cheng
Details
diff_2 (10.07 KB, patch)
2023-05-26 12:54 UTC, Zhihao Cheng
Details | Diff
test_2.sh (817 bytes, application/x-shellscript)
2023-05-26 12:55 UTC, Zhihao Cheng
Details
diff_3 (8.20 KB, patch)
2023-05-26 12:55 UTC, Zhihao Cheng
Details | Diff
test_3.sh (553 bytes, application/x-shellscript)
2023-05-26 12:55 UTC, Zhihao Cheng
Details

Description Zhihao Cheng 2023-05-26 12:52:36 UTC
Problem 1:In checkpoint process, missed io_end check for non-dirty buffer head may corrupt the filesystem.

jbd2_journal_commit_transaction
// there are several dirty buffer heads in transaction->t_checkpoint_list
          P1            wb_workfn
jbd2_log_do_checkpoint
                 __block_write_full_page
                  trylock_buffer(bh)
                  test_clear_buffer_dirty(bh)
 if (!buffer_dirty(bh))
  __jbd2_journal_remove_checkpoint(jh)
   if (buffer_write_io_error(bh)) // false
          >> bh IO error occurs <<
 jbd2_cleanup_journal_tail
  __jbd2_update_log_tail
   jbd2_write_superblock

The bh won't be replayed in next mount, image becomes corrupted.

1. Apply diff_1 and compile kernel
2. sh test_1.sh
[   69.605320] assign g_bh, trace dir 12 buffer, add a(14)
[   69.617135] mark buffer dirty
[   69.632237] jbd2_log_do_checkpoint: wait clear dirty
[   70.646949] __block_write_full_page: clear buffer dirty, delay submit
[   70.733846] jbd2_log_do_checkpoint: wait dirty cleared
[   70.849564] __block_write_full_page: delay done
[   70.850340] Buffer I/O error on dev sda, logical block 3510, lost async page write
[   70.856224] Kernel panic - not syncing: DONE
[   70.857436] CPU: 2 PID: 2593 Comm: chmod Not tainted 6.4.0-rc3-00028-g848673e8c63a-dirty #12
[   70.859879] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
[   70.862424] Call Trace:
[   70.863141]  <TASK>
[   70.863755]  dump_stack_lvl+0x86/0xc0
[   70.864798]  dump_stack+0x18/0x30
[   70.865750]  panic+0x4f2/0x530
[   70.866619]  jbd2_log_do_checkpoint+0x76e/0x810
[   70.867898]  __jbd2_log_wait_for_space+0x162/0x3d0
[   70.869235]  add_transaction_credits+0x194/0x480
[   70.870512]  ? kmem_cache_alloc+0x2bd/0x5c0
[   70.871665]  start_this_handle+0x14b/0x950
[   70.872787]  ? kmem_cache_alloc+0x2bd/0x5c0
[   70.873949]  jbd2__journal_start+0x105/0x200
[   70.874899]  __ext4_journal_start_sb+0x2d2/0x320
[   70.875824]  ext4_dirty_inode+0x40/0xa0
[   70.876586]  __mark_inode_dirty+0x93/0x720
[   70.877397]  ext4_setattr+0x1a6/0xf30
[   70.878170]  ? path_lookupat+0xc4/0x1e0
[   70.878941]  notify_change+0x46d/0x770
[   70.879675]  ? chmod_common+0x11e/0x210
[   70.880431]  chmod_common+0x11e/0x210
[   70.881181]  ? putname+0x79/0xa0
[   70.881831]  do_fchmodat+0x6b/0xe0
[   70.882504]  __x64_sys_fchmodat+0x21/0x30
[   70.883315]  do_syscall_64+0x6c/0xf0
[   70.884045]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   70.884785] RIP: 0033:0x7fb087cffdb9
[   70.885307] Code: ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f7 c1 ff fe ff ff 75 38 80 e5 01 75 4b 48 63 ff 89 d2 b8 0c 01 00 0
[   70.887957] RSP: 002b:00007ffddfbb7e48 EFLAGS: 00000246 ORIG_RAX: 000000000000010c
[   70.889055] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fb087cffdb9
[   70.890076] RDX: 00000000000001ed RSI: 000055850a71a0f0 RDI: ffffffffffffff9c
[   70.891086] RBP: 000055850a71a060 R08: 0000000000000000 R09: 0000000000000000
[   70.892072] R10: 0000000000000000 R11: 0000000000000246 R12: 000055850900a734
[   70.893062] R13: 000055850a71a0f0 R14: 000055850a71b320 R15: 0000000000008000
3. reboot
4. fsck.ext4 -fa /dev/sda
/dev/sda: recovering journal
/dev/sda: Unattached inode 14


/dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
	(i.e., without -a or -p options)

Problem 2: In forget process, missed io_end check for non-dirty buffer head may corrupt the filesystem.

jbd2_journal_commit_transaction
// there are several dirty buffer heads in transaction->t_checkpoint_list
       P1         wb_workfn      P2
jbd2_journal_forget
              __block_write_full_page
               trylock_buffer(bh)
               test_clear_buffer_dirty(bh)
 if (!buffer_dirty(bh))
  __jbd2_journal_remove_checkpoint(jh)
   if (buffer_write_io_error(bh)) // false
          >> bh IO error occurs <<
		                 jbd2_log_do_checkpoint
                          jbd2_cleanup_journal_tail
                           jbd2_write_superblock

The bh won't be replayed in next mount, image becomes corrupted.

1. Apply diff_2 and compile kernel
2. sh test_2.sh
[ 3435.871701] ext4_delete_entry: assign g_bh
[ 3435.878143] mark buffer dirty
[ 3435.892735] ck wait buffer remove
[ 3436.897053] jbd2_journal_forget: wait clear dirty
[ 3437.910558] __block_write_full_page: clear buffer dirty, delay submit
[ 3437.998090] jbd2_journal_forget: wait dirty cleared
[ 3437.998628] wait checkpoint finish
[ 3438.003545] ck wait done
[ 3438.111919] __block_write_full_page: delay done
[ 3438.112992] Buffer I/O error on dev sda, logical block 3511, lost async page write
[ 3438.115152] Kernel panic - not syncing: DONE
[ 3438.115513] CPU: 3 PID: 2676 Comm: chown Not tainted 6.4.0-rc3-00028-g848673e8c63a-dirty #51
[ 3438.116211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
[ 3438.116896] Call Trace:
[ 3438.117103]  <TASK>
[ 3438.117294]  dump_stack_lvl+0x86/0xc0
[ 3438.117622]  dump_stack+0x18/0x30
[ 3438.117900]  panic+0x4f2/0x530
[ 3438.118151]  jbd2_log_do_checkpoint+0x84f/0x870
[ 3438.118522]  ? do_syscall_64+0x6c/0xf0
[ 3438.118824]  __jbd2_log_wait_for_space+0x162/0x3d0
[ 3438.119222]  add_transaction_credits+0x194/0x480
[ 3438.119595]  ? kmem_cache_alloc+0x2bd/0x5c0
[ 3438.119941]  start_this_handle+0x14b/0x950
[ 3438.120272]  ? kmem_cache_alloc+0x2bd/0x5c0
[ 3438.120613]  jbd2__journal_start+0x105/0x200
[ 3438.120967]  __ext4_journal_start_sb+0x2d2/0x320
[ 3438.121341]  ext4_setattr+0x37a/0xf30
[ 3438.121649]  notify_change+0x46d/0x770
[ 3438.121955]  ? chown_common+0x1a8/0x310
[ 3438.122271]  chown_common+0x1a8/0x310
[ 3438.122566]  do_fchownat+0xfe/0x170
[ 3438.122858]  __x64_sys_fchownat+0x27/0x40
[ 3438.123190]  do_syscall_64+0x6c/0xf0
[ 3438.123488]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 3438.123920] RIP: 0033:0x7f8d29d013ca
[ 3438.124224] Code: 48 8b 0d c1 da 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 04 01 00 8
[ 3438.125709] RSP: 002b:00007ffd1f18b5a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000104
[ 3438.126314] RAX: ffffffffffffffda RBX: 00007ffd1f18b7f0 RCX: 00007f8d29d013ca
[ 3438.126861] RDX: 000000000000004a RSI: 000055c2a2fa5070 RDI: 00000000ffffff9c
[ 3438.127421] RBP: 000055c2a2fa3d10 R08: 0000000000000000 R09: 00000000ffffffff
[ 3438.127989] R10: 00000000ffffffff R11: 0000000000000246 R12: 000055c2a2fa3d88
[ 3438.128562] R13: 00000000ffffffff R14: 00000000ffffff9c R15: 0000000000000001
[ 3438.129145]  </TASK>
[ 3438.129406] Kernel Offset: disabled
[ 3438.129697] ---[ end Kernel panic - not syncing: DONE ]---
3. reboot
4. fsck.ext4 -fn /dev/sda
e2fsck 1.43.4 (31-Jan-2017)
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 44 ref count is 1, should be 2.  Fix? no

Pass 5: Checking group summary information

/dev/sda: ********** WARNING: Filesystem still has errors **********

/dev/sda: 45/25688 files (0.0% non-contiguous), 8899/102400 blocks

Problem 3: __cp_buffer_busy checks buffer dirty after checking buffer locking state, buffer dirty cleaning could happen between them, which leads bh removed from checkpoint list.

jbd2_journal_commit_transaction
// there are several dirty buffer heads in transaction->t_checkpoint_list
     drop_cache     wb_workfn   P2
journal_shrink_one_cp_list
 __cp_buffer_busy
  jh->b_transaction || buffer_locked(bh)
  // returns false
             __block_write_full_page
              trylock_buffer(bh)
              test_clear_buffer_dirty(bh)
  buffer_dirty(bh) // returns false
 __jbd2_journal_remove_checkpoint
  if (buffer_write_io_error(bh)) // false
          >> bh IO error occurs <<
		                  jbd2_log_do_checkpoint
                           jbd2_cleanup_journal_tail
                            __jbd2_update_log_tail
                             jbd2_write_superblock

The bh won't be replayed in next mount, image becomes corrupted.

1. Apply diff_3 and compile kernel
2. sh test_3.sh
[   34.007674] add_dirent_to_buf: assign g_bh
[   34.019950] mark buffer dirty
[   34.071456] __cp_buffer_busy: wait clear dirty
[   35.094103] __block_write_full_page: clear buffer dirty, delay submit
[   35.174522] __cp_buffer_busy: wait dirty cleared
[   35.296479] __block_write_full_page: delay done
[   35.298304] Buffer I/O error on dev sda, logical block 3510, lost async page write
[   36.145015] Kernel panic - not syncing: DONE
[   36.146237] CPU: 3 PID: 2814 Comm: chmod Not tainted 6.4.0-rc3-00028-g848673e8c63a-dirty #72
[   36.148486] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
[   36.150848] Call Trace:
[   36.151529]  <TASK>
[   36.152112]  dump_stack_lvl+0x86/0xc0
[   36.153177]  dump_stack+0x18/0x30
[   36.154136]  panic+0x4f2/0x530
[   36.154990]  jbd2_log_do_checkpoint+0x73a/0x760
[   36.156251]  ? do_fchmodat+0x6b/0xe0
[   36.157245]  __jbd2_log_wait_for_space+0x162/0x3d0
[   36.158481]  add_transaction_credits+0x194/0x480
[   36.159376]  ? kmem_cache_alloc+0x2bd/0x5c0
[   36.160192]  start_this_handle+0x14b/0x950
[   36.160995]  ? kmem_cache_alloc+0x2bd/0x5c0
[   36.161814]  jbd2__journal_start+0x105/0x200
[   36.162654]  __ext4_journal_start_sb+0x2d2/0x320
[   36.163560]  ext4_dirty_inode+0x40/0xa0
[   36.164322]  __mark_inode_dirty+0x93/0x720
[   36.165126]  ext4_setattr+0x1a6/0xf30
[   36.165850]  ? path_lookupat+0xc4/0x1e0
[   36.166621]  notify_change+0x46d/0x770
[   36.167354]  ? chmod_common+0x11e/0x210
[   36.168069]  chmod_common+0x11e/0x210
[   36.168797]  ? putname+0x79/0xa0
[   36.169444]  do_fchmodat+0x6b/0xe0
[   36.170132]  __x64_sys_fchmodat+0x21/0x30
[   36.170924]  do_syscall_64+0x6c/0xf0
[   36.171630]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   36.172613] RIP: 0033:0x7f2dbdeffdb9
[   36.173172] Code: ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f7 c1 ff fe ff ff 75 38 80 e5 01 75 4b 48 63 ff 89 d2 b8 0c 01 00 0
[   36.175777] RSP: 002b:00007ffe6917fa18 EFLAGS: 00000246 ORIG_RAX: 000000000000010c
[   36.176853] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f2dbdeffdb9
[   36.177854] RDX: 00000000000001ed RSI: 000055e20d17a0f0 RDI: ffffffffffffff9c
[   36.178862] RBP: 000055e20d17a060 R08: 0000000000000000 R09: 0000000000000000
[   36.179857] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e20b40a734
[   36.180867] R13: 000055e20d17a0f0 R14: 000055e20d17b320 R15: 0000000000008000
[   36.181889]  </TASK>
[   36.183237] Kernel Offset: disabled
[   36.183645] ---[ end Kernel panic - not syncing: DONE ]---
3. reboot
4. fsck.ext4 -fa /dev/sda
/dev/sda: recovering journal
/dev/sda: Unattached inode 213


/dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
	(i.e., without -a or -p options)
Comment 1 Zhihao Cheng 2023-05-26 12:53:04 UTC
Created attachment 304327 [details]
diff_1
Comment 2 Zhihao Cheng 2023-05-26 12:54:13 UTC
Created attachment 304328 [details]
test_1.sh
Comment 3 Zhihao Cheng 2023-05-26 12:54:48 UTC
Created attachment 304329 [details]
diff_2
Comment 4 Zhihao Cheng 2023-05-26 12:55:05 UTC
Created attachment 304330 [details]
test_2.sh
Comment 5 Zhihao Cheng 2023-05-26 12:55:27 UTC
Created attachment 304331 [details]
diff_3
Comment 6 Zhihao Cheng 2023-05-26 12:55:43 UTC
Created attachment 304332 [details]
test_3.sh
Comment 7 Bagas Sanjaya 2023-05-26 13:56:18 UTC
Can you send your diffs as proper patches to linux-ext4@vger.kernel.org for review? (Hint: see Documentation/process/submitting-patches.rst for how to contribute patches.)
Comment 8 Zhihao Cheng 2023-05-26 14:12:11 UTC
(In reply to Bagas Sanjaya from comment #7)
> Can you send your diffs as proper patches to linux-ext4@vger.kernel.org for
> review? (Hint: see Documentation/process/submitting-patches.rst for how to
> contribute patches.)

These diffs are used to reproduce above problems, formal fix patches are under writing and being sent out soon.

Note You need to log in before you can comment on or make changes to this bug.