Created attachment 283827 [details] vmcore dmesg [ 212.339569] BUG: unable to handle page fault for address: ffffffff00000140 [ 212.341301] #PF: supervisor read access in kernel mode [ 212.342207] #PF: error_code(0x0000) - not-present page [ 212.343113] PGD 280a067 P4D 280a067 PUD 0 [ 212.343844] Oops: 0000 [#1] SMP [ 212.344418] CPU: 0 PID: 60 Comm: kworker/u16:1 Kdump: loaded Not tainted 5.2.0 #13 [ 212.345742] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ 212.347732] Workqueue: writeback wb_workfn (flush-ubifs_0_0) [ 212.348762] RIP: 0010:rb_next_postorder+0x2e/0xb0 [ 212.349594] Code: 80 db 03 01 48 85 ff 0f 84 97 00 00 00 48 8b 17 48 83 05 bc 80 db 03 01 48 83 e2 fc 0f 84 82 00 00 00 48 83 05 b2 80 db 03 01 <48> 3b 7a 10 48 89 d0 74 02 f3 c3 48 8b 52 08 48 83 05 a3 80 db 03 [ 212.352845] RSP: 0018:ffffc90000887758 EFLAGS: 00010202 [ 212.353795] RAX: ffff888129ae4700 RBX: ffff888138b08400 RCX: 0000000080800001 [ 212.355035] RDX: ffffffff00000130 RSI: 0000000080800024 RDI: ffff888138b08400 [ 212.356276] RBP: ffff888138b08400 R08: ffffea0004a6b920 R09: 0000000000000000 [ 212.357527] R10: ffffc90000887740 R11: 0000000000000001 R12: ffff888128d48000 [ 212.358769] R13: 0000000000000800 R14: 000000000000011e R15: 00000000000007c8 [ 212.360015] FS: 0000000000000000(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000 [ 212.361442] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 212.362451] CR2: ffffffff00000140 CR3: 000000013789d000 CR4: 00000000000006f0 [ 212.363699] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 212.364951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 212.366191] Call Trace: [ 212.366666] destroy_old_idx+0x5d/0xa0 [ubifs] [ 212.367470] ubifs_tnc_start_commit+0x4fe/0x1380 [ubifs] [ 212.368429] do_commit+0x3eb/0x830 [ubifs] [ 212.369172] ubifs_run_commit+0xdc/0x1c0 [ubifs] [ 212.370001] make_reservation+0x62b/0x930 [ubifs] [ 212.370845] ubifs_jnl_write_data+0x266/0x490 [ubifs] [ 212.371741] ? __test_set_page_writeback+0xb1/0x4f0 [ 212.372625] do_writepage+0x1ab/0x2d0 [ubifs] [ 212.373425] ubifs_writepage+0x1c8/0x330 [ubifs] [ 212.374243] __writepage+0x1b/0xa0 [ 212.374849] write_cache_pages+0x2cc/0x6a0 [ 212.375574] ? __wb_calc_thresh+0x160/0x160 [ 212.376315] generic_writepages+0x74/0xc0 [ 212.377046] ? ubifs_release_budget+0x1c6/0x5e0 [ubifs] [ 212.377980] ? ubifs_release_dirty_inode_budget+0x53/0x80 [ubifs] [ 212.379051] ? do_writepages+0x8d/0xb0 [ 212.379719] do_writepages+0x8d/0xb0 [ 212.380367] __writeback_single_inode+0x6d/0x650 [ 212.381187] writeback_sb_inodes+0x305/0x730 [ 212.381945] wb_writeback+0x16c/0x550 [ 212.382598] ? wb_workfn+0x15e/0x730 [ 212.383235] wb_workfn+0x15e/0x730 [ 212.383846] ? put_prev_entity+0x45/0x1c0 [ 212.384566] process_one_work+0x2a1/0x660 [ 212.385278] worker_thread+0x222/0x6d0 [ 212.385944] ? rescuer_thread+0x540/0x540 [ 212.386658] kthread+0x166/0x1c0 [ 212.387237] ? kthread_cancel_work_sync+0x20/0x20 [ 212.388069] ret_from_fork+0x1f/0x30 [ 212.388714] Modules linked in: ubifs ubi nandsim [last unloaded: nandsim] [ 212.389904] CR2: ffffffff00000140
Step 1. Locate oops in source code $ crash vmcore git/linux/vmlinux crash> bt PID: 60 TASK: ffff88813a428000 CPU: 0 COMMAND: "kworker/u16:1" #0 [ffffc90000887418] machine_kexec at ffffffff8107b86e #1 [ffffc90000887468] __crash_kexec at ffffffff812076f5 #2 [ffffc90000887530] crash_kexec at ffffffff81208ca0 #3 [ffffc90000887548] oops_end at ffffffff81032170 #4 [ffffc90000887568] no_context at ffffffff8108e81f #5 [ffffc900008875d0] __bad_area_nosemaphore at ffffffff8108ed5d #6 [ffffc90000887610] bad_area_nosemaphore at ffffffff8108f06a #7 [ffffc90000887618] do_kern_addr_fault at ffffffff8108f15c #8 [ffffc90000887638] __do_page_fault at ffffffff8108fecb #9 [ffffc90000887658] do_page_fault at ffffffff8108ff5c #10 [ffffc90000887688] do_async_page_fault at ffffffff81085b3d #11 [ffffc900008876a0] async_page_fault at ffffffff81e00dde [exception RIP: rb_next_postorder+46] RIP: ffffffff81d89a1e RSP: ffffc90000887758 RFLAGS: 00010202 RAX: ffff888129ae4700 RBX: ffff888138b08400 RCX: 0000000080800001 RDX: ffffffff00000130 RSI: 0000000080800024 RDI: ffff888138b08400 RBP: ffff888138b08400 R8: ffffea0004a6b920 R9: 0000000000000000 R10: ffffc90000887740 R11: 0000000000000001 R12: ffff888128d48000 R13: 0000000000000800 R14: 000000000000011e R15: 00000000000007c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #12 [ffffc90000887758] destroy_old_idx at ffffffffa009047d [ubifs] #13 [ffffc90000887778] ubifs_tnc_start_commit at ffffffffa00a4e0e [ubifs] #14 [ffffc90000887828] do_commit at ffffffffa009ad6b [ubifs] #15 [ffffc900008878b8] ubifs_run_commit at ffffffffa009b28c [ubifs] #16 [ffffc900008878d8] make_reservation at ffffffffa007417b [ubifs] #17 [ffffc90000887960] ubifs_jnl_write_data at ffffffffa0075406 [ubifs] #18 [ffffc900008879f0] do_writepage at ffffffffa007901b [ubifs] #19 [ffffc90000887a50] ubifs_writepage at ffffffffa0079308 [ubifs] #20 [ffffc90000887a80] __writepage at ffffffff813197eb #21 [ffffc90000887a98] write_cache_pages at ffffffff8131a90c #22 [ffffc90000887b98] generic_writepages at ffffffff8131ad54 #23 [ffffc90000887c08] do_writepages at ffffffff8131e1bd #24 [ffffc90000887c20] __writeback_single_inode at ffffffff814735ad #25 [ffffc90000887c60] writeback_sb_inodes at ffffffff81474355 #26 [ffffc90000887d38] wb_writeback at ffffffff814749fc #27 [ffffc90000887de0] wb_workfn at ffffffff8147594e #28 [ffffc90000887e78] process_one_work at ffffffff811392b1 #29 [ffffc90000887eb8] worker_thread at ffffffff81139dd2 #30 [ffffc90000887f10] kthread at ffffffff81144556 #31 [ffffc90000887f50] ret_from_fork at ffffffff81e0016f ------------------------------------------------ $ gdb git/linux/vmlinux (gdb) list *(rb_next_postorder+46) 0xffffffff81d89a1e is in rb_next_postorder (lib/rbtree.c:646). 641 if (!node) 642 return NULL; 643 parent = rb_parent(node); 644 645 /* If we're sitting on node, we've already seen our children */ 646 if (parent && node == parent->rb_left && parent->rb_right) { 647 /* If we are the parent's left node, go to the parent's right 648 * node then all the way down to the left */ 649 return rb_left_deepest_node(parent->rb_right); 650 } else
Created attachment 283829 [details] struct ubifs_info* c
Step 2. Find oops reason crash> dis rb_next_postorder // 0x10(%rdx) means parent->rb_left, wrong parent in certain rb_node? The problem rb_node is likely to be rdi. 0xffffffff81d899f0 <rb_next_postorder>: addq $0x1,0x3db80c8(%rip) # 0xffffffff85b41ac0 <__gcov0.rb_next_postorder> 0xffffffff81d899f8 <rb_next_postorder+8>: test %rdi,%rdi 0xffffffff81d899fb <rb_next_postorder+11>: je 0xffffffff81d89a98 <rb_next_postorder+168> 0xffffffff81d89a01 <rb_next_postorder+17>: mov (%rdi),%rdx 0xffffffff81d89a04 <rb_next_postorder+20>: addq $0x1,0x3db80bc(%rip) # 0xffffffff85b41ac8 <__gcov0.rb_next_postorder+8> 0xffffffff81d89a0c <rb_next_postorder+28>: and $0xfffffffffffffffc,%rdx 0xffffffff81d89a10 <rb_next_postorder+32>: je 0xffffffff81d89a98 <rb_next_postorder+168> 0xffffffff81d89a16 <rb_next_postorder+38>: addq $0x1,0x3db80b2(%rip) # 0xffffffff85b41ad0 <__gcov0.rb_next_postorder+16> 0xffffffff81d89a1e <rb_next_postorder+46>: cmp 0x10(%rdx),%rdi 0xffffffff81d89a22 <rb_next_postorder+50>: mov %rdx,%rax 0xffffffff81d89a25 <rb_next_postorder+53>: je 0xffffffff81d89a29 <rb_next_postorder+57> 0xffffffff81d89a27 <rb_next_postorder+55>: repz retq 0xffffffff81d89a29 <rb_next_postorder+57>: mov 0x8(%rdx),%rdx 0xffffffff81d89a2d <rb_next_postorder+61>: addq $0x1,0x3db80a3(%rip) # 0xffffffff85b41ad8 <__gcov0.rb_next_postorder+24> 0xffffffff81d89a35 <rb_next_postorder+69>: test %rdx,%rdx 0xffffffff81d89a38 <rb_next_postorder+72>: je 0xffffffff81d89a27 <rb_next_postorder+55> 0xffffffff81d89a3a <rb_next_postorder+74>: mov 0x3db809f(%rip),%rax # 0xffffffff85b41ae0 <__gcov0.rb_next_postorder+32> 0xffffffff81d89a41 <rb_next_postorder+81>: lea 0x1(%rax),%rsi 0xffffffff81d89a45 <rb_next_postorder+85>: mov 0x10(%rdx),%rax 0xffffffff81d89a49 <rb_next_postorder+89>: test %rax,%rax 0xffffffff81d89a4c <rb_next_postorder+92>: je 0xffffffff81d89a93 <rb_next_postorder+163> 0xffffffff81d89a4e <rb_next_postorder+94>: mov 0x3db8093(%rip),%rdx # 0xffffffff85b41ae8 <__gcov0.rb_next_postorder+40> 0xffffffff81d89a55 <rb_next_postorder+101>: jmp 0xffffffff81d89a5a <rb_next_postorder+106> 0xffffffff81d89a57 <rb_next_postorder+103>: mov %rcx,%rax 0xffffffff81d89a5a <rb_next_postorder+106>: mov 0x10(%rax),%rcx 0xffffffff81d89a5e <rb_next_postorder+110>: add $0x1,%rdx 0xffffffff81d89a62 <rb_next_postorder+114>: mov %rdx,0x3db807f(%rip) # 0xffffffff85b41ae8 <__gcov0.rb_next_postorder+40> 0xffffffff81d89a69 <rb_next_postorder+121>: test %rcx,%rcx 0xffffffff81d89a6c <rb_next_postorder+124>: jne 0xffffffff81d89a57 <rb_next_postorder+103> 0xffffffff81d89a6e <rb_next_postorder+126>: mov 0x8(%rax),%rdx 0xffffffff81d89a72 <rb_next_postorder+130>: mov %rsi,0x3db8067(%rip) # 0xffffffff85b41ae0 <__gcov0.rb_next_postorder+32> 0xffffffff81d89a79 <rb_next_postorder+137>: add $0x1,%rsi 0xffffffff81d89a7d <rb_next_postorder+141>: test %rdx,%rdx 0xffffffff81d89a80 <rb_next_postorder+144>: je 0xffffffff81d89a27 <rb_next_postorder+55> 0xffffffff81d89a82 <rb_next_postorder+146>: mov 0x10(%rdx),%rax 0xffffffff81d89a86 <rb_next_postorder+150>: addq $0x1,0x3db8062(%rip) # 0xffffffff85b41af0 <__gcov0.rb_next_postorder+48> 0xffffffff81d89a8e <rb_next_postorder+158>: test %rax,%rax 0xffffffff81d89a91 <rb_next_postorder+161>: jne 0xffffffff81d89a4e <rb_next_postorder+94> 0xffffffff81d89a93 <rb_next_postorder+163>: mov %rdx,%rax 0xffffffff81d89a96 <rb_next_postorder+166>: jmp 0xffffffff81d89a6e <rb_next_postorder+126> 0xffffffff81d89a98 <rb_next_postorder+168>: xor %eax,%eax 0xffffffff81d89a9a <rb_next_postorder+170>: retq ------------------------------------------------ crash> dis destroy_old_idx // r12 stores ubifs_info *c, and r12 hasn't been changed before Oops (r12 hasn't been changed in rb_first_postorder). 0xffffffffa0090420 <destroy_old_idx>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffa0090425 <destroy_old_idx+5>: push %r12 0xffffffffa0090427 <destroy_old_idx+7>: push %rbp 0xffffffffa0090428 <destroy_old_idx+8>: mov %rdi,%r12 0xffffffffa009042b <destroy_old_idx+11>: push %rbx 0xffffffffa009042c <destroy_old_idx+12>: add $0x1a0,%rdi 0xffffffffa0090433 <destroy_old_idx+19>: addq $0x1,0x6eca5(%rip) # 0xffffffffa00ff0e0 0xffffffffa009043b <destroy_old_idx+27>: callq 0xffffffff81d89aa0 <rb_first_postorder> 0xffffffffa0090440 <destroy_old_idx+32>: test %rax,%rax 0xffffffffa0090443 <destroy_old_idx+35>: je 0xffffffffa00904ae <destroy_old_idx+142> 0xffffffffa0090445 <destroy_old_idx+37>: mov %rax,%rbx 0xffffffffa0090448 <destroy_old_idx+40>: addq $0x1,0x6ec98(%rip) # 0xffffffffa00ff0e8 0xffffffffa0090450 <destroy_old_idx+48>: addq $0x1,0x6ec98(%rip) # 0xffffffffa00ff0f0 0xffffffffa0090458 <destroy_old_idx+56>: jmp 0xffffffffa009046d <destroy_old_idx+77> 0xffffffffa009045a <destroy_old_idx+58>: mov %rbx,%rdi 0xffffffffa009045d <destroy_old_idx+61>: mov %rbp,%rbx 0xffffffffa0090460 <destroy_old_idx+64>: addq $0x1,0x6eca8(%rip) # 0xffffffffa00ff110 0xffffffffa0090468 <destroy_old_idx+72>: callq 0xffffffff813d76c0 <kfree> 0xffffffffa009046d <destroy_old_idx+77>: mov %rbx,%rdi 0xffffffffa0090470 <destroy_old_idx+80>: addq $0x1,0x6ec80(%rip) # 0xffffffffa00ff0f8 0xffffffffa0090478 <destroy_old_idx+88>: callq 0xffffffff81d899f0 <rb_next_postorder> 0xffffffffa009047d <destroy_old_idx+93>: addq $0x1,0x6ec83(%rip) # 0xffffffffa00ff108 0xffffffffa0090485 <destroy_old_idx+101>: test %rax,%rax 0xffffffffa0090488 <destroy_old_idx+104>: mov %rax,%rbp 0xffffffffa009048b <destroy_old_idx+107>: jne 0xffffffffa009045a <destroy_old_idx+58> 0xffffffffa009048d <destroy_old_idx+109>: mov %rbx,%rdi 0xffffffffa0090490 <destroy_old_idx+112>: callq 0xffffffff813d76c0 <kfree> 0xffffffffa0090495 <destroy_old_idx+117>: pop %rbx 0xffffffffa0090496 <destroy_old_idx+118>: movq $0x0,0x1a0(%r12) 0xffffffffa00904a2 <destroy_old_idx+130>: addq $0x1,0x6ec56(%rip) # 0xffffffffa00ff100 0xffffffffa00904aa <destroy_old_idx+138>: pop %rbp 0xffffffffa00904ab <destroy_old_idx+139>: pop %r12 0xffffffffa00904ad <destroy_old_idx+141>: retq 0xffffffffa00904ae <destroy_old_idx+142>: addq $0x1,0x6ec3a(%rip) # 0xffffffffa00ff0f0 0xffffffffa00904b6 <destroy_old_idx+150>: jmp 0xffffffffa0090495 <destroy_old_idx+117> 0xffffffffa00904b8 <destroy_old_idx+152>: nopl 0x0(%rax,%rax,1) ------------------------------------------------ crash> ubifs_info ffff888128d48000 // see full in ubifs_info.txt old_idx = { rb_node = 0xffff888138b08ba0 }, crash> tree -t rbtree -o ubifs_old_idx.rb -s ubifs_old_idx.rb -N 0xffff888138b08ba0 // see full in tree.txt ffff888129ae4320 rb = { __rb_parent_color = 18446612687065007201, rb_right = 0xffff888138b08400, rb_left = 0xffff888129ae4780 } ffff888129ae4780 rb = { __rb_parent_color = 18446612687064286976, rb_right = 0x0, rb_left = 0x0 } ffff888138b08400 rb = { __rb_parent_color = 18446744069414584624, rb_right = 0x0, rb_left = 0x0 } ------------------------------------------------ Open KASAN, I found several slab-out-of-bounds. After locating them to the source code, oobs happened in line 239 in tnc_commit.c and line 397 in tnc_commit.c: 232 lnum = ubifs_find_dirty_idx_leb(c); 233 if (lnum < 0) 234 /* 235 * There also may be dirt in the index head that could be 236 * filled, however we do not check there at present. 237 */ 238 return lnum; /* Error code */ 239 *p = lnum; 240 dbg_gc("LEB %d", lnum); 392 leb_needed_cnt = get_leb_cnt(c, cnt); 393 dbg_gc("%d znodes remaining, need %d LEBs, have %d", cnt, 394 leb_needed_cnt, c->ileb_cnt); 395 } while (leb_needed_cnt > c->ileb_cnt); 396 397 *p = -1; 398 return 0; So, I want to valid whether the content of rb_node(rdi) has been overwritten by operations on pointer p. ------------------------------------------------ crash> ubifs_info ffff888128d48000 lst = { empty_lebs = 24, taken_empty_lebs = 23, idx_lebs = 11, total_free = 1336832, total_dirty = 5420152, total_used = 4479448, total_dead = 0, total_dark = 3233816 },... gap_lebs = 0xffff888138b083e0, crash> rd 0xffff880138cc6ca0 6 ffff888138b083e0: 000000e7000000cc 000002230000002e ............#... ffff888138b083f0: 000000d700000082 00000297000000d6 ................ ffff888138b08400: ffffffff00000130 0000000000000000 0............... In function layout_in_gaps(tnc_commit.c:356), c->gap_lebs points to an integer array, and this array should be end with -1. Address ffff888138b08400 stores ffffffff00000130, so the area pointed by c->gap_lebs has been written 9 times. Combined with the oobs prompted by KASAN, we guess oobs on address ffff888138b08400 leads to the overwritten on rb_node (Note, KASAN shows using slab-32 to allocate memory for c->gap_lebs). Adress ffff888138b08400 does exist in the tree.txt, and corresponding value in rb_node->__rb_parent_color is 18446744069414584624: crash> eval 18446744069414584624 hexadecimal: ffffffff00000130 decimal: 18446744069414584624 (-4294966992) octal: 1777777777740000000460 binary: 1111111111111111111111111111111100000000000000000000000100110000 18446744069414584624 = ffffffff00000130, ffffffff00000130 is the value overwritten by operations on pointer p. ------------------------------------------------ Conclusion: Number of cycles in layout_in_gaps exceeds the original value of c->lst.idx_lebs, but memory allocated for c->gap_lebs not been enlarged. c->lst.idx_lebs could be increased in do-while.
Created attachment 283835 [details] rb tree in c->old_idx
Step 2. Find logical error Add print message in layout_in_gaps: 356 static int layout_in_gaps(struct ubifs_info *c, int cnt) 357 { 358 int err, leb_needed_cnt, written, *p, time = 0; 359 360 dbg_gc("%d znodes to write", cnt); 361 362 c->gap_lebs = kmalloc_array(c->lst.idx_lebs + 1, sizeof(int), 363 GFP_NOFS); 364 if (!c->gap_lebs) 365 return -ENOMEM; 366 367 pr_warn("Debug_msg: c->gap_lebs mallocs %d*sizeof(int)++++++++++\n", c->lst.idx_lebs+1); 368 p = c->gap_lebs; 369 do { 370 ubifs_assert(c, p < c->gap_lebs + c->lst.idx_lebs); 371 written = layout_leb_in_gaps(c, p); 372 if (written < 0) { 373 err = written; 374 if (err != -ENOSPC) { 375 kfree(c->gap_lebs); 376 c->gap_lebs = NULL; 377 return err; 378 } 379 if (!dbg_is_chk_index(c)) { 380 /* 381 * Do not print scary warnings if the debugging 382 * option which forces in-the-gaps is enabled. 383 */ 384 ubifs_warn(c, "out of space"); 385 ubifs_dump_budg(c, &c->bi); 386 ubifs_dump_lprops(c); 387 } 388 /* Try to commit anyway */ 389 break; 390 } 391 time++; 392 p++; 393 cnt -= written; 394 leb_needed_cnt = get_leb_cnt(c, cnt); 395 dbg_gc("%d znodes remaining, need %d LEBs, have %d", cnt, 396 leb_needed_cnt, c->ileb_cnt); 397 pr_warn("Debug_msg: do-while: Writen c->gap_lebs[%d] ----------\n", time-1); 398 pr_warn("Debug_msg: do-while: c->lst.idx_lebs=%d,leb_needed_cnt=%d,c->ileb_cnt=%d ----------\n", c->lst.idx_lebs, leb_needed_cnt, c->ileb_cnt); 399 } while (leb_needed_cnt > c->ileb_cnt); 400 *p = -1; 401 pr_warn("Debug_msg: Out of do-while: Writen c->gap_lebs[%d] ----------\n", time); 402 pr_warn("Debug_msg: Out of do-while: Executing '*p = -1' ++++++++++\n"); 403 return 0; 404 } ------------------------------------------------ Debug message: [ 18.750317] [nandsim] warning: CONFIG_MTD_PARTITIONED_MASTER must be enabled to expose debugfs stuff [ 20.879263] Debug_msg: c->gap_lebs mallocs 5*sizeof(int)++++++++++ [ 20.882404] Debug_msg: do-while: Writen c->gap_lebs[0] ---------- [ 20.883512] Debug_msg: do-while: c->lst.idx_lebs=4,leb_needed_cnt=13,c->ileb_cnt=2 ---------- [ 20.885278] Debug_msg: do-while: Writen c->gap_lebs[1] ---------- [ 20.886382] Debug_msg: do-while: c->lst.idx_lebs=5,leb_needed_cnt=11,c->ileb_cnt=2 ---------- [ 20.888118] Debug_msg: do-while: Writen c->gap_lebs[2] ---------- [ 20.889225] Debug_msg: do-while: c->lst.idx_lebs=6,leb_needed_cnt=9,c->ileb_cnt=2 ---------- [ 20.890963] Debug_msg: do-while: Writen c->gap_lebs[3] ---------- [ 20.892080] Debug_msg: do-while: c->lst.idx_lebs=7,leb_needed_cnt=8,c->ileb_cnt=2 ---------- [ 20.893817] Debug_msg: do-while: Writen c->gap_lebs[4] ---------- [ 20.894929] Debug_msg: do-while: c->lst.idx_lebs=8,leb_needed_cnt=6,c->ileb_cnt=2 ---------- [ 20.896661] Debug_msg: do-while: Writen c->gap_lebs[5] ---------- [ 20.897773] Debug_msg: do-while: c->lst.idx_lebs=9,leb_needed_cnt=4,c->ileb_cnt=2 ---------- [ 20.899505] Debug_msg: do-while: Writen c->gap_lebs[6] ---------- [ 20.900620] Debug_msg: do-while: c->lst.idx_lebs=10,leb_needed_cnt=3,c->ileb_cnt=2 ---------- [ 20.902363] Debug_msg: do-while: Writen c->gap_lebs[7] ---------- [ 20.903463] Debug_msg: do-while: c->lst.idx_lebs=11,leb_needed_cnt=2,c->ileb_cnt=2 ---------- [ 20.904989] Debug_msg: Out of do-while: Writen c->gap_lebs[8] ---------- [ 20.906194] Debug_msg: Out of do-while: Executing '*p = -1' ++++++++++ [ 20.907442] BUG: unable to handle page fault for address: ffff88820000000c In fact, OOBs happened many times, and those that allocate more memory than slab-32 triggers oops. ------------------------------------------------ I also add external print message in ubifs_find_dirty_idx_leb(called by layout_leb_in_gaps), c->lst.idx_lebs has been increased by layout_leb_in_gaps->ubifs_find_dirty_idx_leb->get_idx_gc_leb->ubifs_change_lp. Conclusion: c->lst.idx_lebs has been increased by layout_leb_in_gaps->ubifs_find_dirty_idx_leb->get_idx_gc_leb->ubifs_change_lp. But allocated memory size for c->gap_lebs is still equal to c->lst.idx_lebs(original value before do-while) * sizeof(int).
(In reply to Zhihao Cheng from comment #0) > Created attachment 283827 [details] > vmcore dmesg > > [ 212.339569] BUG: unable to handle page fault for address: ffffffff00000140 > [ 212.341301] #PF: supervisor read access in kernel mode > [ 212.342207] #PF: error_code(0x0000) - not-present page > [ 212.343113] PGD 280a067 P4D 280a067 PUD 0 > [ 212.343844] Oops: 0000 [#1] SMP > [ 212.344418] CPU: 0 PID: 60 Comm: kworker/u16:1 Kdump: loaded Not tainted > 5.2.0 #13 > [ 212.345742] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 > [ 212.347732] Workqueue: writeback wb_workfn (flush-ubifs_0_0) > [ 212.348762] RIP: 0010:rb_next_postorder+0x2e/0xb0 > [ 212.349594] Code: 80 db 03 01 48 85 ff 0f 84 97 00 00 00 48 8b 17 48 83 > 05 bc 80 db 03 01 48 83 e2 fc 0f 84 82 00 00 00 48 83 05 b2 80 db 03 01 <48> > 3b 7a 10 48 89 d0 74 02 f3 c3 48 8b 52 08 48 83 05 a3 80 db 03 > [ 212.352845] RSP: 0018:ffffc90000887758 EFLAGS: 00010202 > [ 212.353795] RAX: ffff888129ae4700 RBX: ffff888138b08400 RCX: > 0000000080800001 > [ 212.355035] RDX: ffffffff00000130 RSI: 0000000080800024 RDI: > ffff888138b08400 > [ 212.356276] RBP: ffff888138b08400 R08: ffffea0004a6b920 R09: > 0000000000000000 > [ 212.357527] R10: ffffc90000887740 R11: 0000000000000001 R12: > ffff888128d48000 > [ 212.358769] R13: 0000000000000800 R14: 000000000000011e R15: > 00000000000007c8 > [ 212.360015] FS: 0000000000000000(0000) GS:ffff88813ba00000(0000) > knlGS:0000000000000000 > [ 212.361442] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 212.362451] CR2: ffffffff00000140 CR3: 000000013789d000 CR4: > 00000000000006f0 > [ 212.363699] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 212.364951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 212.366191] Call Trace: > [ 212.366666] destroy_old_idx+0x5d/0xa0 [ubifs] > [ 212.367470] ubifs_tnc_start_commit+0x4fe/0x1380 [ubifs] > [ 212.368429] do_commit+0x3eb/0x830 [ubifs] > [ 212.369172] ubifs_run_commit+0xdc/0x1c0 [ubifs] > [ 212.370001] make_reservation+0x62b/0x930 [ubifs] > [ 212.370845] ubifs_jnl_write_data+0x266/0x490 [ubifs] > [ 212.371741] ? __test_set_page_writeback+0xb1/0x4f0 > [ 212.372625] do_writepage+0x1ab/0x2d0 [ubifs] > [ 212.373425] ubifs_writepage+0x1c8/0x330 [ubifs] > [ 212.374243] __writepage+0x1b/0xa0 > [ 212.374849] write_cache_pages+0x2cc/0x6a0 > [ 212.375574] ? __wb_calc_thresh+0x160/0x160 > [ 212.376315] generic_writepages+0x74/0xc0 > [ 212.377046] ? ubifs_release_budget+0x1c6/0x5e0 [ubifs] > [ 212.377980] ? ubifs_release_dirty_inode_budget+0x53/0x80 [ubifs] > [ 212.379051] ? do_writepages+0x8d/0xb0 > [ 212.379719] do_writepages+0x8d/0xb0 > [ 212.380367] __writeback_single_inode+0x6d/0x650 > [ 212.381187] writeback_sb_inodes+0x305/0x730 > [ 212.381945] wb_writeback+0x16c/0x550 > [ 212.382598] ? wb_workfn+0x15e/0x730 > [ 212.383235] wb_workfn+0x15e/0x730 > [ 212.383846] ? put_prev_entity+0x45/0x1c0 > [ 212.384566] process_one_work+0x2a1/0x660 > [ 212.385278] worker_thread+0x222/0x6d0 > [ 212.385944] ? rescuer_thread+0x540/0x540 > [ 212.386658] kthread+0x166/0x1c0 > [ 212.387237] ? kthread_cancel_work_sync+0x20/0x20 > [ 212.388069] ret_from_fork+0x1f/0x30 > [ 212.388714] Modules linked in: ubifs ubi nandsim [last unloaded: nandsim] > [ 212.389904] CR2: ffffffff00000140 Compile mtd-utils/tests/fs-tests/simple/test_2.c and run test_2 on ubi device based on nandsim/mtdram flash in qemu(x86_64). I tried fastmap enabled/disabled on several flash sizes, probability of problem recurrence is very high.
The assertion has never been triggered. > ubifs_assert(c, p < c->gap_lebs + c->lst.idx_lebs)
Created attachment 284453 [details] assertion triggers