Created attachment 281725 [details] The (compressed) crafted image which causes crash - Overview After mounting crafted image and running the attached program, I got this segmentation fault while running attached program. I also tried to reproduce on vm, but it only failed on lkl. - Produces ./lkl/tools/lkl/ext4-combined -t ext4 -i tmp.img -p poc_01.c.raw -v (poc_01.c shows it's internal programs) - Messages ./lkl/tools/lkl/ext4-combined -t ext4 -i tmp.img -p tmp.c.raw -v ./lkl/tools/lkl/ext4-combined -t ext4 -i tmp.img -p poc_01.c.raw -v [ 0.000000] Linux version 5.0.0-rc6+ (jungyeon@copper) (gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)) #1 Mon Mar 11 14:49:22 EDT 2019 [ 0.000000] memblock address range: 0x7face0000000 - 0x7face7fff000 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 32319 [ 0.000000] Kernel command line: mem=128M virtio_mmio.device=316@0x1000000:1 [ 0.000000] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes) [ 0.000000] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes) [ 0.000000] Memory available: 129044k/131068k RAM [ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [ 0.000000] NR_IRQS: 4096 [ 0.000000] lkl: irqs initialized [ 0.000000] clocksource: lkl: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.000004] lkl: time and timers initialized (irq2) [ 0.000011] pid_max: default: 4096 minimum: 301 [ 0.000074] Mount-cache hash table entries: 512 (order: 0, 4096 bytes) [ 0.000084] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes) [ 0.002643] printk: console [lkl_console0] enabled [ 0.002673] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 0.004396] clocksource: Switched to clocksource lkl [ 0.004672] virtio-mmio: Registering device virtio-mmio.0 at 0x1000000-0x100013b, IRQ 1. [ 0.005205] workingset: timestamp_bits=62 max_order=15 bucket_order=0 [ 0.015834] virtio-mmio virtio-mmio.0: Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work. [ 0.016070] virtio_blk virtio0: [vda] 32768 512-byte logical blocks (16.8 MB/16.0 MiB) [ 0.016903] random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0 [ 0.017356] Warning: unable to open an initial console. [ 0.017395] This architecture does not have kernel memory protection. [ 0.017402] Run /init as init process [ 0.019260] EXT4-fs (vda): barriers disabled [ 0.019867] [EXT4 FS bs=1024, gc=2, bpg=8192, ipg=2048, mo=e000c42c, mo2=0002] [ 0.019890] System zones: 1-2, 66-581, 8193-8194 [ 0.020020] EXT4-fs (vda): mounting with "discard" option, but the device does not support discard [ 0.020030] EXT4-fs (vda): mounted filesystem with journalled data mode. Opts: errors=remount-ro v13 = syscall(SYS_open, (long)v2, 65536, 0); syscall(SYS_getdents64, (long)v13, (long)v1, 2344); syscall(SYS_fsync, (long)v13); syscall(SYS_fsync, (long)v13); syscall(SYS_readlink, (long)v10, (long)v1, 8192); v15 = syscall(SYS_open, (long)v14, 66, 438); syscall(SYS_write, (long)v15, (long)v1, 2229); syscall(SYS_write, (long)v15, (long)v1, 3563); syscall(SYS_ftruncate, (long)v15, 7336); syscall(SYS_getdents64, (long)v13, (long)v1, 4633); syscall(SYS_mkdir, (long)v16, 511); syscall(SYS_fsync, (long)v13); syscall(SYS_fsync, (long)v15); syscall(SYS_unlink, (long)v8); syscall(SYS_write, (long)v15, (long)v1, 7178); syscall(SYS_readlink, (long)v14, (long)v1, 8192); syscall(SYS_utimes, (long)v11, (long)v1); syscall(SYS_ftruncate, (long)v15, 4018); syscall(SYS_utimes, (long)v10, (long)v1); syscall(SYS_ftruncate, (long)v15, 6005); syscall(SYS_fsync, (long)v15); syscall(SYS_rmdir, (long)v12); syscall(SYS_pwrite64, (long)v15, (long)v1, 7752, 4527); syscall(SYS_getdents64, (long)v13, (long)v1, 3796); syscall(SYS_mkdir, (long)v17, 511); syscall(SYS_removexattr, (long)v3, (long)v18); syscall(SYS_ftruncate, (long)v15, 53); syscall(SYS_listxattr, (long)v5, (long)v1, 4138); syscall(SYS_pwrite64, (long)v15, (long)v1, 7728, 1584); syscall(SYS_fsync, (long)v15); syscall(SYS_fsync, (long)v15); syscall(SYS_write, (long)v15, (long)v1, 1974); syscall(SYS_unlink, (long)v14); syscall(SYS_write, (long)v15, (long)v1, 1752); syscall(SYS_getdents64, (long)v13, (long)v1, 1582); syscall(SYS_pwrite64, (long)v15, (long)v1, 5142, 5178); syscall(SYS_removexattr, (long)v16, (long)v19); v20 = syscall(SYS_open, (long)v3, 65536, 0); syscall(SYS_fsync, (long)v15); syscall(SYS_symlink, (long)v5, (long)v21); syscall(SYS_link, (long)v10, (long)v22); v23 = syscall(SYS_open, (long)v7, 2, 0); syscall(SYS_ftruncate, (long)v15, 2545); syscall(SYS_write, (long)v23, (long)v1, 2067); syscall(SYS_fdatasync, (long)v23); syscall(SYS_link, (long)v10, (long)v24); syscall(SYS_symlink, (long)v9, (long)v25); syscall(SYS_fsync, (long)v15); syscall(SYS_mkdir, (long)v26, 511); [ 0.084492] random: fast init done syscall(SYS_fdatasync, (long)v23); syscall(SYS_write, (long)v23, (long)v1, 969); syscall(SYS_readlink, (long)v2, (long)v1, 8192); syscall(SYS_chmod, (long)v25, 3072); syscall(SYS_fdatasync, (long)v23); syscall(SYS_pwrite64, (long)v23, (long)v1, 1520, 1423); syscall(SYS_fallocate, (long)v15, 65, 5353, 6797); syscall(SYS_fsync, (long)v23); syscall(SYS_listxattr, (long)v22, (long)v1, 1808); syscall(SYS_pwrite64, (long)v23, (long)v1, 4742, 7814); syscall(SYS_newlstat, (long)v21, (long)v1); syscall(SYS_fsync, (long)v20); Segmentation fault (core dumped)
Created attachment 281727 [details] poc_01.c.raw
Created attachment 281729 [details] poc_01.c
https://gts3.org/~jungyeon/ext4-combined at the link above, I uploaded the executable file required for this test
How is a userspace segfault a kernel bug? Running poc-01.c directly on the mounted image doesn't produce a problem for me. Also I'm not keen on downloading a random 16MB binary "required" to run this test.
I assume LKL is "Linux Kernel Library", so this is trying to run the ext4 file system executing the system calls found in poc-01.c in userspace? Are there instructions so we can build the ext4_combined from source? That's going to be needed if we are going to be able to run the binary under a debugger, and versus a patched kernel to verify the fix. Also, can you give us a stack dump so we might have some kind of hint what's going on?
Sorry for my lack of explanation. Yes, LKL is Linux Kernel Library. poc-01.c is a program that calls lists of system calls in userspace and the craft image is a potentially faulty image to test error cases. We are going to release our source code so that you can build the ext4-combined shortly. We needs some clean-up inside the codes before making it public. I'm attaching stack dump at the last. The problem here is that bh is NULL at the first place of this function, so that it leads to an error on J_ASSERT_JH(jh, jh->b_jcount >= 0). To get the stack dump, I temporarily inserted BUG_ON on condition of jh being NULL. Additionally I used Linux version 5.0.0+ for this trace (and in the linked ext4-combined binary) 2534 static void __journal_remove_journal_head(struct buffer_head *bh) 2535 { 2536 struct journal_head *jh = bh2jh(bh); 2537 2538 BUG_ON(jh == NULL); 2539 J_ASSERT_JH(jh, jh->b_jcount >= 0); 2540 J_ASSERT_JH(jh, jh->b_transaction == NULL); 2541 J_ASSERT_JH(jh, jh->b_next_transaction == NULL); 2542 J_ASSERT_JH(jh, jh->b_cp_transaction == NULL); 2543 J_ASSERT_JH(jh, jh->b_jlist == BJ_None); 2544 J_ASSERT_BH(bh, buffer_jbd(bh)); 2545 J_ASSERT_BH(bh, jh2bh(jh) == bh); 2546 BUFFER_TRACE(bh, "remove journal_head"); 2547 if (jh->b_frozen_data) { 2548 printk(KERN_WARNING "%s: freeing b_frozen_data\n", __func__); 2549 jbd2_free(jh->b_frozen_data, bh->b_size); 2550 } - Stack dump [ 0.089081] BUG: failure at fs/jbd2/journal.c:2538/__journal_remove_journal_head()! [ 0.089096] Kernel panic - not syncing: BUG! [ 0.089101] Call Trace: [ 0.089110] (____ptrval____): [<55555559bc94>] .LC81+0x5f/0xfb [ 0.089118] (____ptrval____): [<5555555c6025>] major_names+0x75/0x80 [ 0.089125] (____ptrval____): [<5555555978f4>] .LC11+0x14/0x20 [ 0.089133] (____ptrval____): [<5555556b1e40>] submit_bh+0x40/0x50 [ 0.089141] (____ptrval____): [<55555580286d>] jbd2_journal_put_journal_head+0x6cd/0x6d0 [ 0.089147] (____ptrval____): [<5555557ec6e8>] __jbd2_journal_refile_buffer+0x2d8/0x3c0 [ 0.089153] (____ptrval____): [<5555557f641a>] __jbd2_journal_remove_checkpoint+0x17a/0x2f0 [ 0.089164] (____ptrval____): [<5555557eff12>] jbd2_journal_commit_transaction+0x2fc2/0x3fc0 [ 0.089173] (____ptrval____): [<555555597353>] .LC18+0x3/0x10 [ 0.089181] (____ptrval____): [<5555555b8fb9>] try_to_wake_up+0x169/0x190 [ 0.089190] (____ptrval____): [<5555558031be>] kjournald2+0x34e/0x400 [ 0.089199] (____ptrval____): [<5555555bfd30>] autoremove_wake_function+0x0/0x40 [ 0.089206] (____ptrval____): [<5555555978f4>] .LC11+0x14/0x20 [ 0.089214] (____ptrval____): [<5555555b3acb>] kthread+0x15b/0x170 [ 0.089221] (____ptrval____): [<555555802e70>] kjournald2+0x0/0x400 [ 0.089228] (____ptrval____): [<5555555b3970>] kthread+0x0/0x170 [ 0.089237] (____ptrval____): [<5555555970ab>] uidhash_table+0x3b/0x40 Thanks.
Created attachment 281825 [details] another test set I'm attaching another error case what shows the same failure. This includes much less system call (15 calls) so I hope this can help to figure out this bug. - Reproduce ./lkl/tools/lkl/ext4-combined -t ext4 -i tmp.img -p min_11.c.raw -v (min_11.c shows it's internal programs) - Call stack [ 0.040743] BUG: failure at fs/jbd2/journal.c:2538/__journal_remove_journal_head()! [ 0.040754] Kernel panic - not syncing: BUG! [ 0.040758] Call Trace: [ 0.040767] (____ptrval____): [<55555559bc94>] .LC81+0x5f/0xfb [ 0.040775] (____ptrval____): [<5555555c6025>] major_names+0x75/0x80 [ 0.040782] (____ptrval____): [<5555555978f4>] .LC11+0x14/0x20 [ 0.040791] (____ptrval____): [<555555604368>] kmem_cache_free+0x148/0x190 [ 0.040796] (____ptrval____): [<5555555978f4>] .LC11+0x14/0x20 [ 0.040804] (____ptrval____): [<55555580286d>] jbd2_journal_put_journal_head+0x6cd/0x6d0 [ 0.040811] (____ptrval____): [<5555557f641a>] __jbd2_journal_remove_checkpoint+0x17a/0x2f0 [ 0.040822] (____ptrval____): [<5555557f5608>] jbd2_log_do_checkpoint+0x298/0xd10 [ 0.040835] (____ptrval____): [<555555850674>] atomic64_cmpxchg+0x54/0x80 [ 0.040843] (____ptrval____): [<5555557feda3>] jbd2_journal_destroy+0x363/0x840 [ 0.040856] (____ptrval____): [<5555555bfd30>] autoremove_wake_function+0x0/0x40 [ 0.040865] (____ptrval____): [<5555555ada2c>] input_timer_state+0x1c/0x20 [ 0.040873] (____ptrval____): [<5555557cb8ac>] ext4_put_super+0xac/0x7f0 [ 0.040881] (____ptrval____): [<555555616f5b>] generic_shutdown_super+0x13b/0x370 [ 0.040889] (____ptrval____): [<55555561acc5>] kill_block_super+0x55/0x100 [ 0.040897] (____ptrval____): [<555555616abc>] deactivate_locked_super+0x11c/0x170 [ 0.040903] (____ptrval____): [<555555616cb6>] deactivate_super+0x1a6/0x1b0 [ 0.040911] (____ptrval____): [<5555556538fb>] dput+0xcb/0x7c0 [ 0.040919] (____ptrval____): [<55555567d1a9>] cleanup_mnt+0xb9/0x170 [ 0.040929] (____ptrval____): [<55555567d0ed>] __cleanup_mnt+0x3d/0x40 [ 0.040935] (____ptrval____): [<5555555b24ca>] task_work_run+0xba/0xf0 [ 0.040944] (____ptrval____): [<55555559800f>] .LC2+0x3f/0x40 [ 0.040951] (____ptrval____): [<5555555978f4>] .LC11+0x14/0x20 [ 0.040958] (____ptrval____): [<5555555986d6>] .LC19+0x6/0x15 [ 0.040966] [ 0.040972] ---[ end Kernel panic - not syncing: BUG! ]--- ext4-combined: lib/posix-host.c:302: panic: Assertion `0' failed.
Jungyeon, One of the things you can do which would be helpful when creating a minimal reproducer, is to fix some of the gratuitous corruptions in the file system image, so we can be 100% sure which file sysutem corruption is combining with your test syscall load to trigger the failure. For example, both of these super block corruptions which cause e2fsck to stop dead in its track because it views the superblock as being too compromised for automated machine assumptions to be safe are probably things we can clear and still have ext4-combined dump core: % e2fsck -fy /tmp/tmp.img e2fsck 1.45.0 (6-Mar-2019) Found invalid V2 journal superblock fields (from V1 journal). Clearing fields beyond the V1 journal superblock... Corruption found in superblock. (desc_size = 33667). E2fsck fixed up the first problem automatically, and the second I could fix up using debugfs: debugfs -w -R "ssv s_desc_size 64" /tmp/tmp.img I'm suspect that root cause is that the block allocation bitmap has a block which is also used by the journal as being free. And if that block gets reallocated so that a directory block (which, being metadata is accessed via a buffer head) overlaps with the journal block, I can imagine all sorts of hilarity enusing. I will attach a proposed patch which should detect this case, and block the reuse of a block belonging to the journal. Can you try applying this patch to your LKL ext4-combined program, and see if it traps the file system corruption early enough that core dump doesn't get triggered?
Created attachment 281995 [details] possible patch to detect the problem early
Thanks a lot for the patch. As you stated, it also works for the #202877 reported bug.