Bug 216372 - quota: Invalid block number reading from quota file after written faiulre
Summary: quota: Invalid block number reading from quota file after written faiulre
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-17 08:30 UTC by Zhihao Cheng
Modified: 2022-10-27 16:49 UTC (History)
2 users (show)

See Also:
Kernel Version: 6.0.0-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
test.sh (2.72 KB, application/x-shellscript)
2022-08-17 08:30 UTC, Zhihao Cheng
Details
diff (6.97 KB, patch)
2022-08-17 08:31 UTC, Zhihao Cheng
Details | Diff
b.c (1.03 KB, text/plain)
2022-08-17 08:53 UTC, Zhihao Cheng
Details

Description Zhihao Cheng 2022-08-17 08:30:42 UTC
CONFIG_QUOTA=y CONFIG_FAULT_INJECTION=y

Find an environment without ext4 filesystem.

qemu args(Make buffer head is easily released):
-smp 2
-m 2048 

1. Apply diff
2. ./test.sh
[   49.477106] v2_read_file_info: <3> free_blk 0 free_entry 5 dqi_blks 6 isize 6144
[   49.491722] v2_read_file_info: <4> free_blk 0 free_entry 5 dqi_blks 6 isize 6144
[   49.493171] ext4 filesystem being mounted at /root/temp supports timestamps until 2038 (0x7fffffff)
[   50.335459] EXT4-fs (sda): Invalid checksum for backup superblock 8193
[   50.335459] 
[   50.337433] EXT4-fs error (device sda) in ext4_update_backup_sb:174: Filesystem failed CRC
[   50.339314] EXT4-fs (sda): Invalid checksum for backup superblock 24577
[   50.339314] 
[   50.340444] EXT4-fs error (device sda) in ext4_update_backup_sb:174: Filesystem failed CRC
[   52.690568] ext4_quota_write[chown]: write blk 6 phy 4097
[   58.127839] wait offline
[   58.630388] wait done
[   58.630790] Buffer I/O error on dev sda, logical block 4097, lost async page write
[   58.631947] end_buffer_async_write: bh ffff88801b5132d8(blk 4097, ref 2) uptodate 0
[   58.634121] EXT4-fs error (device sda): ext4_check_bdev_write_error:217: comm chown: Error while async write back metadata
[   59.165117] put_free_dqblk[chown]: info->dqi_free_blk is 6
[   65.272115] wait offline
[   65.788928] wait done
[   65.789977] Buffer I/O error on dev sda, logical block 4097, lost async page write
[   65.790713] end_buffer_async_write: bh ffff88801b5132d8(blk 4097, ref 2) uptodate 0
[   66.998850] EXT4-fs error (device sda): ext4_check_bdev_write_error:217: comm ls: Error while async write back metadata
[  104.069217] free_buffer_head: free g_bh ffff88801b5132d8
[  104.155486] get_free_dqblk[chown]: use info->dqi_free_blk 6
[  104.156539] get_free_dqblk[chown]: set info->dqi_free_blk 4294959296
[  104.400851] get_free_dqblk[chown]: use info->dqi_free_blk 4294959296
[  104.402565] get_free_dqblk[chown]: set info->dqi_free_blk 0
[  104.403585] __quota_error: 10 callbacks suppressed
[  104.403593] Quota error (device sda): qtree_write_dquot: Error -8000 occurred while creating quota
[  104.405205] BUG: unable to handle page fault for address: ffffffffffffe120
[  104.405943] #PF: supervisor write access in kernel mode
[  104.406505] #PF: error_code(0x0002) - not-present page
[  104.407070] PGD 2e0b067 P4D 2e0b067 PUD 2e0d067 PMD 0 
[  104.407683] Oops: 0002 [#1] PREEMPT SMP
[  104.408101] CPU: 0 PID: 5974 Comm: chown Not tainted 6.0.0-rc1-00004-gaa59d8788f74-dirty #780
[  104.408970] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc314
[  104.410414] RIP: 0010:_raw_spin_lock+0x3a/0x90
[  104.411202] Code: 05 9a 36 c7 08 01 e8 c5 a3 ff fe 48 83 05 95 36 c7 08 01 48 83 05 95 36 c7 08 01 31 c0 ba 01 00 00 00 48 83 05 36 36 c73
[  104.413329] RSP: 0018:ffff88801c4dfb68 EFLAGS: 00010206
[  104.413953] RAX: 0000000000000000 RBX: ffffffffffffe0c0 RCX: 0000000000000000
[  104.414692] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001
[  104.415408] RBP: ffffffffffffe120 R08: 0000000000000000 R09: ffffffff8a0e97a0
[  104.416243] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88801c4dfbe8
[  104.417188] R13: ffffffffffffe120 R14: 0000000000000000 R15: ffff88801c4dfd08
[  104.418120] FS:  00007fbe0bcfb4c0(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[  104.419215] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.419928] CR2: ffffffffffffe120 CR3: 000000001389c000 CR4: 00000000000006f0
[  104.420803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  104.421871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  104.423104] Call Trace:
[  104.423506]  <TASK>
[  104.423878]  dquot_add_inodes+0x28/0x270
[  104.424509]  __dquot_transfer+0x377/0x840
[  104.425177]  ? jbd2_journal_stop+0x20b/0x540
[  104.425944]  ? percpu_counter_add_batch+0x88/0x120
[  104.426665]  ? remove_waiter+0x400/0x490
[  104.427327]  ? dqput+0x234/0x350
[  104.427826]  ? dqget+0x661/0x6a0
[  104.428347]  dquot_transfer+0xde/0x540
[  104.428975]  ext4_setattr+0x405/0x14d0
[  104.429411]  ? ext4_xattr_security_get+0x23/0x30
[  104.429896]  ? __vfs_getxattr+0x60/0xa0
[  104.430298]  notify_change+0x68e/0x9f0
[  104.430698]  ? chown_common+0x300/0x430
[  104.431125]  chown_common+0x300/0x430
[  104.431492]  ? kmemleak_free+0x43/0xa0
[  104.431906]  ? preempt_count_add+0x83/0x130
[  104.432327]  do_fchownat+0x107/0x180
[  104.432711]  __x64_sys_fchownat+0x29/0x40
[  104.433144]  do_syscall_64+0x35/0x80
[  104.433543]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  104.434069] RIP: 0033:0x7fbe0b7013ca
[  104.434444] Code: 48 8b 0d c1 da 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 04 01 00 008
[  104.436400] RSP: 002b:00007fff12476b78 EFLAGS: 00000246 ORIG_RAX: 0000000000000104
[  104.437228] RAX: ffffffffffffffda RBX: 00007fff12476dc0 RCX: 00007fbe0b7013ca
[  104.438169] RDX: 000000000000fffe RSI: 000055cfd716d070 RDI: 00000000ffffff9c
[  104.439148] RBP: 000055cfd716bd10 R08: 0000000000000000 R09: 00000000ffffffff
[  104.440000] R10: 00000000ffffffff R11: 0000000000000246 R12: 000055cfd716bd88
[  104.440863] R13: 00000000ffffffff R14: 00000000ffffff9c R15: 0000000000000001
[  104.441690]  </TASK>
[  104.441980] Modules linked in:
[  104.442355] CR2: ffffffffffffe120
[  104.442743] ---[ end trace 0000000000000000 ]---
[  104.443296] RIP: 0010:_raw_spin_lock+0x3a/0x90
[  104.443852] Code: 05 9a 36 c7 08 01 e8 c5 a3 ff fe 48 83 05 95 36 c7 08 01 48 83 05 95 36 c7 08 01 31 c0 ba 01 00 00 00 48 83 05 36 36 c73
[  104.445966] RSP: 0018:ffff88801c4dfb68 EFLAGS: 00010206
[  104.446610] RAX: 0000000000000000 RBX: ffffffffffffe0c0 RCX: 0000000000000000
[  104.447489] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000001
[  104.448120] RBP: ffffffffffffe120 R08: 0000000000000000 R09: ffffffff8a0e97a0
[  104.448756] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88801c4dfbe8
[  104.449380] R13: ffffffffffffe120 R14: 0000000000000000 R15: ffff88801c4dfd08
[  104.450036] FS:  00007fbe0bcfb4c0(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[  104.450874] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.451530] CR2: ffffffffffffe120 CR3: 000000001389c000 CR4: 00000000000006f0
[  104.452234] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  104.453081] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Comment 1 Zhihao Cheng 2022-08-17 08:30:55 UTC
Created attachment 301588 [details]
test.sh
Comment 2 Zhihao Cheng 2022-08-17 08:31:07 UTC
Created attachment 301589 [details]
diff
Comment 3 Zhihao Cheng 2022-08-17 08:53:26 UTC
Created attachment 301590 [details]
b.c
Comment 4 Zhihao Cheng 2022-08-17 08:54:04 UTC
(In reply to Zhihao Cheng from comment #0)

0. gcc -o bb b.c
> 
> 1. Apply diff
> 2. ./test.sh
Comment 5 Christian Brauner 2022-08-17 09:17:35 UTC
Is this reproducible with v5.19?
Comment 6 Zhihao Cheng 2022-08-17 12:48:49 UTC
在 2022/8/17 17:17, bugzilla-daemon@kernel.org 写道:
> https://bugzilla.kernel.org/show_bug.cgi?id=216372
> 
> --- Comment #5 from Christian Brauner (brauner@kernel.org) ---
> Is this reproducible with v5.19?
> 

I guess yes. I found this problem firstly in 5.10-lts.
PS: I'm trying to fix it.

  Init: v2_read_file_info: <3> dqi_free_blk 0 dqi_free_entry 5 dqi_blks 
6
 

  Step 1. chown bin f_a -> dquot_acquire -> v2_write_dquot: 

   qtree_write_dquot 

    do_insert_tree 

     find_free_dqentry 

      get_free_dqblk 

       write_blk(info->dqi_blocks) // info->dqi_blocks = 6, failure. The 

            content in physical block (corresponding to blk 6) is 
random.
 

  Step 2. chown root f_a -> dquot_transfer -> dqput_all -> dqput -> 

          ext4_release_dquot -> v2_release_dquot -> qtree_delete_dquot: 

   dquot_release 

    remove_tree 

     free_dqentry 

      put_free_dqblk(6) 

       info->dqi_free_blk = blk    // info->dqi_free_blk = 6 

 

  Step 3. drop cache (buffer head for block 6 is released) 

 

  Step 4. chown bin f_b -> dquot_acquire -> commit_dqblk -> 
v2_write_dquot:
   qtree_write_dquot 

    do_insert_tree 

     find_free_dqentry 

      get_free_dqblk 

       dh = (struct qt_disk_dqdbheader *)buf 

       blk = info->dqi_free_blk     // 6 

       ret = read_blk(info, blk, buf)  // The content of buf is random
       info->dqi_free_blk = le32_to_cpu(dh->dqdh_next_free)  // random 
blk
 

  Step 5. chown bin f_c -> notify_change -> ext4_setattr -> 
dquot_transfer:
   dquot = dqget -> acquire_dquot -> ext4_acquire_dquot -> dquot_acquire 
->
           commit_dqblk -> v2_write_dquot -> dq_insert_tree: 

    do_insert_tree 

     find_free_dqentry 

      get_free_dqblk 

       blk = info->dqi_free_blk    // If blk < 0 and blk is not an error 

                                      code, it will be returned as dquot 

 

   transfer_to[USRQUOTA] = dquot  // A random negative value 

   __dquot_transfer(transfer_to) 

    dquot_add_inodes(transfer_to[cnt]) 

     spin_lock(&dquot->dq_dqb_lock)  // page fault
Comment 7 Jan Kara 2022-08-19 09:27:22 UTC
Thanks for the report. As far as I'm reading the reproducer, it basically corrupts the quota file (by cleverly inserting some IO failures). We have added some more sanity checking of data loaded from quota file after 5.10 (e.g. commit 9bf3d2033129 ("quota: check block number when reading the block in quota file") in 5.16). Now AFAICS we do not check validity of the free blocks list (we should check dqdh_next_free before using it) so your reproducer will still cause more damage than necessary. Will you send a patch to add such checks please?
Comment 8 Zhihao Cheng 2022-08-19 10:07:42 UTC
(In reply to Jan Kara from comment #7)
> Thanks for the report. As far as I'm reading the reproducer, it basically
> corrupts the quota file (by cleverly inserting some IO failures). We have
> added some more sanity checking of data loaded from quota file after 5.10
> (e.g. commit 9bf3d2033129 ("quota: check block number when reading the block
> in quota file") in 5.16). Now AFAICS we do not check validity of the free
> blocks list (we should check dqdh_next_free before using it) so your
> reproducer will still cause more damage than necessary. Will you send a
> patch to add such checks please?

Yes, sure. I will send the fix patch as soon as possible, before that, I need do more verifications to make sure I find all situations.
Comment 9 Jan Kara 2022-10-27 16:49:59 UTC
Patches were merged as:

6c8ea8b8cd47 ("quota: Check next/prev free block number after reading from quota file")
3fc61e0e96a3 ("quota: Replace all block number checking with helper function")
191249f70889 ("quota: Add more checking after reading from quota file")

Closing the bug.

Note You need to log in before you can comment on or make changes to this bug.