Bug 217171

Summary: ext4: stale buffer loading from last failed mounting
Product: File System Reporter: Zhihao Cheng (chengzhihao1)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 6.3.0-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: test.sh
main_diff

Description Zhihao Cheng 2023-03-10 12:35:42 UTC
Following process makes ext4 load stale buffer heads from last failed
mounting in a new mounting operation:
mount_bdev
 ext4_fill_super
 | ext4_load_and_init_journal
 |  ext4_load_journal
 |   jbd2_journal_load
 |    load_superblock
 |     journal_get_superblock
 |      set_buffer_verified(bh) // buffer head is verified
 |   jbd2_journal_recover // failed caused by EIO
 | goto failed_mount3a // skip 'sb->s_root' initialization
 deactivate_locked_super
  kill_block_super
   generic_shutdown_super
    if (sb->s_root)
    // false, skip ext4_put_super->invalidate_bdev->
    // invalidate_mapping_pages->mapping_evict_folio->
    // filemap_release_folio->try_to_free_buffers, which
    // cannot drop buffer head.
   blkdev_put
    blkdev_put_whole
     if (atomic_dec_and_test(&bdev->bd_openers))
     // false, systemd-udev happens to open the device. Then
     // blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
     // truncate_inode_folio->truncate_cleanup_folio->
     // folio_invalidate->block_invalidate_folio->
     // filemap_release_folio->try_to_free_buffers will be skipped,
     // dropping buffer head is missed again.

Second mount:
ext4_fill_super
 ext4_load_and_init_journal
  ext4_load_journal
   ext4_get_journal
    jbd2_journal_init_inode
     journal_init_common
      bh = getblk_unmovable
       bh = __find_get_block // Found stale bh in last failed mounting
      journal->j_sb_buffer = bh
   jbd2_journal_load
    load_superblock
     journal_get_superblock
      if (buffer_verified(bh))
      // true, skip journal->j_format_version = 2, value is 0
    jbd2_journal_recover
     do_one_pass
      next_log_block += count_tags(journal, bh)
      // According to journal_tag_bytes(), 'tag_bytes' calculating is
      // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
      // returns false because 'j->j_format_version >= 2' is not true,
      // then we get wrong next_log_block. The do_one_pass may exit
      // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.

The filesystem is corrupted here, journal is partially replayed, and
new journal sequence number actually is already used by last mounting.
Comment 1 Zhihao Cheng 2023-03-10 12:37:37 UTC
1. Apply main_diff
2. sh test.sh
[  116.709224] mount sda
[  116.711840] load bh ffff88817b33edd0
[  116.712425] inject
[  116.712681] JBD2: recovery failed
[  116.713067] EXT4-fs (sda): error loading journal
[  116.731617] mount sda
[  116.732129] found stale bh ffff88817b33edd0  # stale buffer head from last failed mounting
[  116.732713] Kernel panic - not syncing: not verified
[  116.733366] CPU: 3 PID: 2613 Comm: mount Not tainted 6.3.0-rc1-00003-g737d811674e8-dirty #1125
[  116.734528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraprojec4
[  116.736052] Call Trace:
[  116.736339]  <TASK>
[  116.736600]  dump_stack_lvl+0x7f/0xb0
[  116.736941]  dump_stack+0x18/0x30
[  116.737316]  panic+0x4e6/0x520
[  116.737719]  journal_get_superblock+0x7dc/0x7e0
[  116.738285]  load_superblock+0x16/0x100
[  116.738713]  jbd2_journal_wipe+0x2c/0x140
[  116.739138]  ? ext4_init_journal_params+0xa0/0xe0
[  116.739668]  ext4_load_and_init_journal.isra.0+0x8df/0x11c0
[  116.740291]  ? _printk+0x64/0x90
[  116.740617]  __ext4_fill_super+0x2a15/0x3370
[  116.741098]  ? __ext4_fill_super+0x3370/0x3370
[  116.741667]  ext4_fill_super+0x121/0x2f0
[  116.742073]  get_tree_bdev+0x1d6/0x370
[  116.742493]  ext4_get_tree+0x1d/0x30
[  116.742922]  vfs_get_tree+0x2a/0x110
[  116.743357]  ? capable+0x21/0x30
[  116.743701]  path_mount+0x821/0xfc0
[  116.744077]  do_mount+0x91/0xc0
[  116.744480]  __x64_sys_mount+0xc4/0x160
[  116.744906]  do_syscall_64+0x39/0x80
[  116.745300]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  116.745838] RIP: 0033:0x7fc8f931123a
[  116.746350] Code: 48 8b 0d 51 dc 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b88
[  116.748348] RSP: 002b:00007fffdb903258 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[  116.749172] RAX: ffffffffffffffda RBX: 0000000000733240 RCX: 00007fc8f931123a
[  116.749996] RDX: 0000000000739f90 RSI: 00000000007334a0 RDI: 0000000000733480
[  116.750801] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fffdb9021c8
[  116.751565] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000733480
[  116.752372] R13: 0000000000739f90 R14: 0000000000000001 R15: 0000000000733240
[  116.753137]  </TASK>
[  116.753498] Kernel Offset: disabled
[  116.753822] ---[ end Kernel panic - not syncing: not verified ]---
Comment 2 Zhihao Cheng 2023-03-10 12:38:42 UTC
Created attachment 303917 [details]
test.sh
Comment 3 Zhihao Cheng 2023-03-10 12:38:57 UTC
Created attachment 303918 [details]
main_diff