Bug 218164
Summary: | ubifs: racing in grabbing LEB when making reservation, cannot reserve XX bytes in jhead, err = -28 | ||
---|---|---|---|
Product: | File System | Reporter: | Zhihao Cheng (chengzhihao1) |
Component: | Other | Assignee: | fs_other |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
diff
test.sh diff_fix diff_fix_v2 |
Description
Zhihao Cheng
2023-11-20 01:40:22 UTC
After analyzing, The dirty LEBs could be grabbed by other writeback threads, which fails finding dirty LEBs of GC in current thread, so make_reservation() should try many times to invoke GC&&committing. Here is a reproducer: 1. Apply diff and compile kernel(CONFIG_MTD_NAND_NANDSIM=m CONFIG_PREEMPTION=y, CONFIG_PREEMPT_DYNAMIC=y, CONFIG_PREEMPT_VOLUNTARY=y, CONFIG_PREEMPT_COUNT=y CONFIG_DEBUG_KMEMLEAK=n,CONFIG_KASAN=n) 2. Config qemu(x86, -smp 4, -m 4096) 3. Start qemu && ./test.sh 4. Several hours later(3h~) [ 6771.139766] UBIFS error (ubi0:0 pid 1113): make_reservation [ubifs]: wait nospc_retries 1 [ 6771.139766] [ 6780.522204] UBIFS error (ubi0:0 pid 1112): make_reservation [ubifs]: stuck [ 6780.522204] [ 6780.522221] UBIFS error (ubi0:0 pid 1111): make_reservation [ubifs]: stuck [ 6780.522221] [ 6780.606098] UBIFS error (ubi0:0 pid 1113): make_reservation [ubifs]: wait done, err -28 [ 6780.606098] [ 6780.822298] UBIFS error (ubi0:0 pid 1113): make_reservation [ubifs]: wait nospc_retries 2 [ 6780.822298] [ 6780.827718] UBIFS error (ubi0:0 pid 1115): ubifs_commit_required [ubifs]: delay commit 1 [ 6780.827718] [ 6782.534601] UBIFS error (ubi0:0 pid 1115): make_reservation [ubifs]: stuck [ 6782.534601] [ 6782.535557] UBIFS error (ubi0:0 pid 1110): ubifs_commit_required [ubifs]: delay commit 1 [ 6782.535557] [ 6782.736785] UBIFS error (ubi0:0 pid 1113): make_reservation [ubifs]: wait done, err -11 [ 6782.736785] [ 6782.950270] UBIFS error (ubi0:0 pid 1113): make_reservation [ubifs]: wait nospc_retries 2 [ 6782.950270] [ 6788.901728] UBIFS error (ubi0:0 pid 1110): make_reservation [ubifs]: stuck [ 6788.901728] [ 6788.901741] UBIFS error (ubi0:0 pid 1111): make_reservation [ubifs]: stuck [ 6788.901741] [ 6788.901746] UBIFS error (ubi0:0 pid 1115): make_reservation [ubifs]: stuck [ 6788.901746] [ 6788.901744] UBIFS error (ubi0:0 pid 1112): make_reservation [ubifs]: stuck [ 6788.901744] [ 6788.982114] UBIFS error (ubi0:0 pid 1113): make_reservation [ubifs]: wait done, err -28 [ 6788.982114] [ 6789.198047] UBIFS error (ubi0:0 pid 1113): make_reservation [ubifs]: cannot reserve 216 bytes in jhead 1, error -28 [ 6789.201011] CPU: 3 PID: 1113 Comm: test.sh Not tainted 6.7.0-rc1-00026-g01552fb03014-dirty #328 [ 6789.203588] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 [ 6789.205955] Call Trace: [ 6789.206685] <TASK> [ 6789.207294] dump_stack_lvl+0x86/0xc0 [ 6789.208338] dump_stack+0x18/0x30 [ 6789.209257] make_reservation+0x5ec/0xb90 [ubifs] [ 6789.210620] ubifs_jnl_truncate+0x184/0xcb0 [ubifs] [ 6789.211724] ? ubifs_xattr_get+0x143/0x310 [ubifs] [ 6789.212769] ? down_read+0x1a/0x180 [ 6789.213492] ? unmap_mapping_pages+0x85/0x180 [ 6789.214410] ubifs_setattr+0x4fa/0x770 [ubifs] [ 6789.215374] notify_change+0x478/0x790 [ 6789.216162] ? do_truncate+0xa5/0x120 [ 6789.216948] do_truncate+0xa5/0x120 [ 6789.217683] ? fscrypt_symlink_getattr+0xe0/0xe0 [ 6789.218659] path_openat+0x2d0/0x1a60 [ 6789.219428] do_filp_open+0xd2/0x180 [ 6789.220175] ? kmem_cache_alloc+0x2ef/0x550 [ 6789.221017] ? alloc_fd+0x137/0x2a0 [ 6789.221727] do_sys_openat2+0xc9/0x130 [ 6789.222294] do_sys_open+0x5c/0xa0 [ 6789.222812] __x64_sys_open+0x29/0x40 [ 6789.223376] do_syscall_64+0x6f/0x140 [ 6789.223916] entry_SYSCALL_64_after_hwframe+0x63/0x6b [ 6789.224669] RIP: 0033:0x7f427a8ffe80 [ 6789.225225] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 69 48 2d 00 00 75 10 b8 02 00 004 [ 6789.227907] RSP: 002b:00007ffd0a044b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 [ 6789.229017] RAX: ffffffffffffffda RBX: 000055cb20f53c40 RCX: 00007f427a8ffe80 [ 6789.230064] RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 000055cb20f54030 [ 6789.231126] RBP: 00007ffd0a044c30 R08: 0000000000000020 R09: fefeff2f312e6f6c [ 6789.232193] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000000000 [ 6789.232958] R13: 0000000000000003 R14: 0000000000000001 R15: 000055cb20f54030 [ 6789.233717] </TASK> [ 6789.234005] (pid 1113) Budgeting info: data budget sum 13184, total budget sum 13760 [ 6789.235105] budg_data_growth 12432, budg_dd_growth 752, budg_idx_growth 576 [ 6789.235865] min_idx_lebs 20, old_idx_sz 87472, uncommitted_idx 4608 [ 6789.236569] page_budget 4144, inode_budget 160, dent_budget 312 [ 6789.237316] nospace 0, nospace_rp 0 [ 6789.237712] dark_wm 4608, dead_wm 512, max_idx_node_sz 192 [ 6789.238337] freeable_cnt 0, calc_idx_sz 85464, idx_gc_cnt 0 [ 6789.238955] dirty_pg_cnt 3, dirty_zn_cnt 17, clean_zn_cnt 761 [ 6789.239622] gc_lnum 44, ihead_lnum 47 [ 6789.240053] jhead 0 (GC) LEB 36 [ 6789.240431] jhead 1 (base) LEB 45 [ 6789.240820] jhead 2 (data) LEB 10 [ 6789.241230] bud LEB 10 [ 6789.241510] bud LEB 36 [ 6789.241786] bud LEB 45 [ 6789.242088] bud LEB 46 [ 6789.242370] commit state 0 [ 6789.242654] Budgeting predictions: [ 6789.242949] available: 23088, outstanding 13184, free 10942 [ 6789.243453] (pid 1113) start dumping LEB properties [ 6789.243870] (pid 1113) Lprops statistics: empty_lebs 2, idx_lebs 20 [ 6789.244431] taken_empty_lebs 1, total_free 58368, total_dirty 312328 [ 6789.245004] total_used 219680, total_dark 46080, total_dead 2992 [ 6789.245511] LEB 10 free 1024 dirty 13888 used 448 free + dirty 14912 dark 4608 dead 0 nodes fit 3 flags 0x10 ) [ 6789.246728] LEB 11 free 0 dirty 48 used 15312 free + dirty 48 dark 0 dead 48 nodes fit 0 flags 0x0 ) [ 6789.247864] LEB 12 free 0 dirty 12896 used 2464 free + dirty 12896 flags 0x22 (dirty index) [ 6789.248759] LEB 13 free 0 dirty 160 used 15200 free + dirty 160 dark 0 dead 160 nodes fit 0 flags 0x0 ) [ 6789.249908] LEB 14 free 0 dirty 8752 used 6608 free + dirty 8752 flags 0x22 (dirty index) [ 6789.250780] LEB 15 free 0 dirty 7296 used 8064 free + dirty 7296 flags 0x22 (dirty index) [ 6789.251661] LEB 16 free 0 dirty 14880 used 480 free + dirty 14880 dark 4608 dead 0 nodes fit 3 flags 0x1 ) [ 6789.252752] LEB 17 free 0 dirty 32 used 15328 free + dirty 32 dark 0 dead 32 nodes fit 0 flags 0x0 ) [ 6789.253920] LEB 18 free 0 dirty 112 used 15248 free + dirty 112 dark 0 dead 112 nodes fit 0 flags 0x0 ) [ 6789.255057] LEB 19 free 0 dirty 496 used 14864 free + dirty 496 dark 0 dead 496 nodes fit 0 flags 0x0 ) [ 6789.256190] LEB 20 free 0 dirty 40 used 15320 free + dirty 40 dark 0 dead 40 nodes fit 0 flags 0x0 ) [ 6789.257322] LEB 21 free 6144 dirty 7088 used 2128 free + dirty 13232 flags 0x22 (dirty index) [ 6789.258210] LEB 22 free 0 dirty 13008 used 2352 free + dirty 13008 flags 0x22 (dirty index) [ 6789.259091] LEB 23 free 15360 dirty 0 used 0 free + dirty 15360 dark 4608 dead 0 nodes fit 3 flags 0x4 ) [ 6789.260176] LEB 24 free 0 dirty 15304 used 56 free + dirty 15304 dark 4608 dead 0 nodes fit 3 flags 0x1 ) [ 6789.261247] LEB 25 free 0 dirty 7520 used 7840 free + dirty 7520 flags 0x22 (dirty index) [ 6789.262125] LEB 26 free 0 dirty 240 used 15120 free + dirty 240 dark 0 dead 240 nodes fit 0 flags 0x0 ) [ 6789.263254] LEB 27 free 0 dirty 7968 used 7392 free + dirty 7968 flags 0x22 (dirty index) [ 6789.264126] LEB 28 free 0 dirty 15040 used 320 free + dirty 15040 dark 4608 dead 0 nodes fit 3 flags 0x1 ) [ 6789.265196] LEB 29 free 0 dirty 48 used 15312 free + dirty 48 dark 0 dead 48 nodes fit 0 flags 0x0 ) [ 6789.266341] LEB 30 free 0 dirty 8080 used 7280 free + dirty 8080 flags 0x22 (dirty index) [ 6789.267219] LEB 31 free 0 dirty 416 used 14944 free + dirty 416 dark 0 dead 416 nodes fit 0 flags 0x0 ) [ 6789.268355] LEB 32 free 0 dirty 11776 used 3584 free + dirty 11776 flags 0x22 (dirty index) [ 6789.269215] LEB 33 free 0 dirty 32 used 15328 free + dirty 32 dark 0 dead 32 nodes fit 0 flags 0x0 ) [ 6789.270366] LEB 34 free 0 dirty 400 used 14960 free + dirty 400 dark 0 dead 400 nodes fit 0 flags 0x0 ) [ 6789.271498] LEB 35 free 0 dirty 9240 used 6120 free + dirty 9240 flags 0x22 (dirty index) [ 6789.272363] LEB 36 free 0 dirty 10744 used 4616 free + dirty 10744 dark 4608 dead 0 nodes fit 2 flags 0x10 ) [ 6789.273506] LEB 37 free 0 dirty 104 used 15256 free + dirty 104 dark 0 dead 104 nodes fit 0 flags 0x0 ) [ 6789.274636] LEB 38 free 0 dirty 9200 used 6160 free + dirty 9200 flags 0x22 (dirty index) [ 6789.275493] LEB 39 free 0 dirty 368 used 14992 free + dirty 368 dark 0 dead 368 nodes fit 0 flags 0x0 ) [ 6789.276624] LEB 40 free 0 dirty 13120 used 2240 free + dirty 13120 flags 0x22 (dirty index) [ 6789.277478] LEB 41 free 0 dirty 5504 used 9856 free + dirty 5504 flags 0x22 (dirty index) [ 6789.278324] LEB 42 free 0 dirty 10656 used 4704 free + dirty 10656 flags 0x22 (dirty index) [ 6789.279184] LEB 43 free 0 dirty 13680 used 1680 free + dirty 13680 flags 0x22 (dirty index) [ 6789.280039] LEB 44 free 15360 dirty 0 used 0 free + dirty 15360 dark 4608 dead 0 nodes fit 3 flags 0x10 ) [ 6789.281161] LEB 45 free 3584 dirty 10176 used 1600 free + dirty 13760 dark 4608 dead 0 nodes fit 3 flags 0x10 ) [ 6789.282319] LEB 46 free 0 dirty 15360 used 0 free + dirty 15360 dark 4608 dead 0 nodes fit 3 flags 0x10 ) [ 6789.283515] LEB 47 free 7168 dirty 7968 used 224 free + dirty 15136 flags 0x30 (index, taken) [ 6789.284366] LEB 48 free 0 dirty 13904 used 1456 free + dirty 13904 flags 0x22 (dirty index) [ 6789.285224] LEB 49 free 0 dirty 13976 used 1384 free + dirty 13976 flags 0x22 (dirty index) [ 6789.286070] LEB 50 free 0 dirty 13672 used 1688 free + dirty 13672 flags 0x22 (dirty index) [ 6789.286902] LEB 51 free 9728 dirty 3392 used 2240 free + dirty 13120 flags 0x22 (dirty index) [ 6789.287753] LEB 52 free 0 dirty 496 used 14864 free + dirty 496 dark 0 dead 496 nodes fit 0 flags 0x0 ) [ 6789.288850] LEB 53 free 0 dirty 15248 used 112 free + dirty 15248 dark 4608 dead 0 nodes fit 3 flags 0x1 ) [ 6789.289897] (pid 1113) finish dumping LEB properties [ 6789.290389] ------------[ cut here ]------------ [ 6789.290764] kernel BUG at fs/ubifs/journal.c:423! [ 6789.291186] invalid opcode: 0000 [#1] PREEMPT SMP [ 6789.291159] UBIFS error (ubi0:0 pid 1110): ubifs_commit_required [ubifs]: delay commit 1 [ 6789.291159] [ 6789.291574] CPU: 3 PID: 1113 Comm: test.sh Not tainted 6.7.0-rc1-00026-g01552fb03014-dirty #328 [ 6789.294725] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 [ 6789.295436] RIP: 0010:make_reservation+0x63b/0xb90 [ubifs] [ 6789.295910] Code: 04 00 48 89 df 48 83 05 c2 bb 06 00 01 e8 7d 01 04 00 4c 89 ff 48 83 05 ba bb 06 00 01 e8 dd 7b f4 e0 48 83 05 b5 bb0 [ 6789.297410] RSP: 0018:ffffc90000627a18 EFLAGS: 00010202 [ 6789.297857] RAX: 0000000000000000 RBX: ffff888102318000 RCX: ffffffff830fd4d0 [ 6789.298443] RDX: 0000000000000000 RSI: 0000000000000206 RDI: ffff888106174e28 [ 6789.299009] RBP: 0000000000000002 R08: ffff888101ac0010 R09: ffff88813bd2d900 [ 6789.299591] R10: 0000000000000000 R11: ffffffffffb6d221 R12: ffff888107d774f8 [ 6789.300153] R13: 00000000ffffffe4 R14: 00000000000000d8 R15: ffff8881023180c8 [ 6789.300742] FS: 00007f427b7ae700(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 6789.301403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6789.301872] CR2: 00007f71146db000 CR3: 000000010e07f000 CR4: 00000000000006f0 [ 6789.302435] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6789.303009] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6789.303604] Call Trace: [ 6789.303806] <TASK> [ 6789.303988] ? show_regs+0x88/0xa0 [ 6789.304286] ? __die_body+0x26/0x90 [ 6789.304568] ? die+0x5c/0xa0 [ 6789.304822] ? do_trap+0x10e/0x140 [ 6789.305106] ? do_error_trap+0x85/0xd0 [ 6789.305412] ? make_reservation+0x63b/0xb90 [ubifs] [ 6789.305834] ? exc_invalid_op+0x68/0x80 [ 6789.306150] ? make_reservation+0x63b/0xb90 [ubifs] [ 6789.306564] ? asm_exc_invalid_op+0x1a/0x20 [ 6789.306921] ? make_reservation+0x63b/0xb90 [ubifs] [ 6789.307348] ? make_reservation+0x633/0xb90 [ubifs] [ 6789.307742] ubifs_jnl_truncate+0x184/0xcb0 [ubifs] [ 6789.308162] ? ubifs_xattr_get+0x143/0x310 [ubifs] [ 6789.308572] ? down_read+0x1a/0x180 [ 6789.308867] ? unmap_mapping_pages+0x85/0x180 [ 6789.309228] ubifs_setattr+0x4fa/0x770 [ubifs] [ 6789.309622] notify_change+0x478/0x790 [ 6789.309946] ? do_truncate+0xa5/0x120 [ 6789.310246] do_truncate+0xa5/0x120 [ 6789.310541] ? fscrypt_symlink_getattr+0xe0/0xe0 [ 6789.310929] path_openat+0x2d0/0x1a60 [ 6789.311247] do_filp_open+0xd2/0x180 [ 6789.311558] ? kmem_cache_alloc+0x2ef/0x550 [ 6789.311899] ? alloc_fd+0x137/0x2a0 [ 6789.312191] do_sys_openat2+0xc9/0x130 [ 6789.312513] do_sys_open+0x5c/0xa0 [ 6789.312797] __x64_sys_open+0x29/0x40 [ 6789.313109] do_syscall_64+0x6f/0x140 [ 6789.313422] entry_SYSCALL_64_after_hwframe+0x63/0x6b [ 6789.313838] RIP: 0033:0x7f427a8ffe80 [ 6789.314136] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 69 48 2d 00 00 75 10 b8 02 00 004 [ 6789.315598] RSP: 002b:00007ffd0a044b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 [ 6789.316203] RAX: ffffffffffffffda RBX: 000055cb20f53c40 RCX: 00007f427a8ffe80 [ 6789.316787] RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 000055cb20f54030 [ 6789.317369] RBP: 00007ffd0a044c30 R08: 0000000000000020 R09: fefeff2f312e6f6c [ 6789.317934] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000000000 [ 6789.318507] R13: 0000000000000003 R14: 0000000000000001 R15: 000055cb20f54030 [ 6789.319071] </TASK> [ 6789.319253] Modules linked in: zstd ubifs ubi nandsim [ 6789.319663] ---[ end trace 0000000000000000 ]--- [ 6789.320032] RIP: 0010:make_reservation+0x63b/0xb90 [ubifs] [ 6789.320469] Code: 04 00 48 89 df 48 83 05 c2 bb 06 00 01 e8 7d 01 04 00 4c 89 ff 48 83 05 ba bb 06 00 01 e8 dd 7b f4 e0 48 83 05 b5 bb0 [ 6789.321939] RSP: 0018:ffffc90000627a18 EFLAGS: 00010202 [ 6789.322368] RAX: 0000000000000000 RBX: ffff888102318000 RCX: ffffffff830fd4d0 [ 6789.322951] RDX: 0000000000000000 RSI: 0000000000000206 RDI: ffff888106174e28 [ 6789.323559] RBP: 0000000000000002 R08: ffff888101ac0010 R09: ffff88813bd2d900 [ 6789.324140] R10: 0000000000000000 R11: ffffffffffb6d221 R12: ffff888107d774f8 [ 6789.324717] R13: 00000000ffffffe4 R14: 00000000000000d8 R15: ffff8881023180c8 [ 6789.325311] FS: 00007f427b7ae700(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 6789.325948] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6789.326428] CR2: 00007f71146db000 CR3: 000000010e07f000 CR4: 00000000000006f0 [ 6789.327015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6789.327584] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6789.328162] Kernel panic - not syncing: Fatal exception [ 6789.329099] Kernel Offset: disabled [ 6789.329411] ---[ end Kernel panic - not syncing: Fatal exception ]--- Tips: 1. It's hardly to reproduce the problem by fsstress in my environment. 2. It's hardly to reproduce the problem without 'diff' patch, we only met once in real physical machine. 3. Reproducing concurrently in multi machines is recommended. 4. After fixing the problem and applying diff_fix, running for 4 days, the problem never occurred(fsstress && reproducer), the max value of 'cmt_retries' is 7, the max value of 'nospc_retries' is 4. Created attachment 305432 [details]
diff
Created attachment 305433 [details]
test.sh
Created attachment 305434 [details]
diff_fix
Created attachment 305744 [details]
diff_fix_v2
|