Bug 202879 - Segmentation fault while running crafted program
Summary: Segmentation fault while running crafted program
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-11 18:57 UTC by Jungyeon
Modified: 2019-04-05 20:12 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.0.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The (compressed) crafted image which causes crash (20.42 KB, application/zip)
2019-03-11 18:57 UTC, Jungyeon
Details
poc_01.c.raw (4.52 KB, image/x-panasonic-rw)
2019-03-11 18:58 UTC, Jungyeon
Details
poc_01.c (6.46 KB, text/x-csrc)
2019-03-11 18:59 UTC, Jungyeon
Details
another test set (5.41 KB, application/gzip)
2019-03-14 18:15 UTC, Jungyeon
Details
possible patch to detect the problem early (2.28 KB, patch)
2019-03-25 03:41 UTC, Theodore Tso
Details | Diff

Description Jungyeon 2019-03-11 18:57:42 UTC
Created attachment 281725 [details]
The (compressed) crafted image which causes crash

- Overview
After mounting crafted image and running the attached program, I got this segmentation fault while running attached program.
I also tried to reproduce on vm, but it only failed on lkl.

- Produces
./lkl/tools/lkl/ext4-combined -t ext4 -i tmp.img -p poc_01.c.raw -v
(poc_01.c shows it's internal programs)

- Messages
./lkl/tools/lkl/ext4-combined -t ext4 -i tmp.img -p tmp.c.raw -v
./lkl/tools/lkl/ext4-combined -t ext4 -i tmp.img -p poc_01.c.raw -v
[    0.000000] Linux version 5.0.0-rc6+ (jungyeon@copper) (gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)) #1 Mon Mar 11 14:49:22 EDT 2019
[    0.000000] memblock address range: 0x7face0000000 - 0x7face7fff000
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 32319
[    0.000000] Kernel command line: mem=128M virtio_mmio.device=316@0x1000000:1
[    0.000000] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.000000] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
[    0.000000] Memory available: 129044k/131068k RAM
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 4096
[    0.000000] lkl: irqs initialized
[    0.000000] clocksource: lkl: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000004] lkl: time and timers initialized (irq2)
[    0.000011] pid_max: default: 4096 minimum: 301
[    0.000074] Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
[    0.000084] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes)
[    0.002643] printk: console [lkl_console0] enabled
[    0.002673] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.004396] clocksource: Switched to clocksource lkl
[    0.004672] virtio-mmio: Registering device virtio-mmio.0 at 0x1000000-0x100013b, IRQ 1.
[    0.005205] workingset: timestamp_bits=62 max_order=15 bucket_order=0
[    0.015834] virtio-mmio virtio-mmio.0: Failed to enable 64-bit or 32-bit DMA.  Trying to continue, but this might not work.
[    0.016070] virtio_blk virtio0: [vda] 32768 512-byte logical blocks (16.8 MB/16.0 MiB)
[    0.016903] random: get_random_bytes called from init_oops_id+0x35/0x40 with crng_init=0
[    0.017356] Warning: unable to open an initial console.
[    0.017395] This architecture does not have kernel memory protection.
[    0.017402] Run /init as init process
[    0.019260] EXT4-fs (vda): barriers disabled
[    0.019867] [EXT4 FS bs=1024, gc=2, bpg=8192, ipg=2048, mo=e000c42c, mo2=0002]
[    0.019890] System zones: 1-2, 66-581, 8193-8194
[    0.020020] EXT4-fs (vda): mounting with "discard" option, but the device does not support discard
[    0.020030] EXT4-fs (vda): mounted filesystem with journalled data mode. Opts: errors=remount-ro
	v13 = syscall(SYS_open, (long)v2, 65536, 0);
	syscall(SYS_getdents64, (long)v13, (long)v1, 2344);
	syscall(SYS_fsync, (long)v13);
	syscall(SYS_fsync, (long)v13);
	syscall(SYS_readlink, (long)v10, (long)v1, 8192);
	v15 = syscall(SYS_open, (long)v14, 66, 438);
	syscall(SYS_write, (long)v15, (long)v1, 2229);
	syscall(SYS_write, (long)v15, (long)v1, 3563);
	syscall(SYS_ftruncate, (long)v15, 7336);
	syscall(SYS_getdents64, (long)v13, (long)v1, 4633);
	syscall(SYS_mkdir, (long)v16, 511);
	syscall(SYS_fsync, (long)v13);
	syscall(SYS_fsync, (long)v15);
	syscall(SYS_unlink, (long)v8);
	syscall(SYS_write, (long)v15, (long)v1, 7178);
	syscall(SYS_readlink, (long)v14, (long)v1, 8192);
	syscall(SYS_utimes, (long)v11, (long)v1);
	syscall(SYS_ftruncate, (long)v15, 4018);
	syscall(SYS_utimes, (long)v10, (long)v1);
	syscall(SYS_ftruncate, (long)v15, 6005);
	syscall(SYS_fsync, (long)v15);
	syscall(SYS_rmdir, (long)v12);
	syscall(SYS_pwrite64, (long)v15, (long)v1, 7752, 4527);
	syscall(SYS_getdents64, (long)v13, (long)v1, 3796);
	syscall(SYS_mkdir, (long)v17, 511);
	syscall(SYS_removexattr, (long)v3, (long)v18);
	syscall(SYS_ftruncate, (long)v15, 53);
	syscall(SYS_listxattr, (long)v5, (long)v1, 4138);
	syscall(SYS_pwrite64, (long)v15, (long)v1, 7728, 1584);
	syscall(SYS_fsync, (long)v15);
	syscall(SYS_fsync, (long)v15);
	syscall(SYS_write, (long)v15, (long)v1, 1974);
	syscall(SYS_unlink, (long)v14);
	syscall(SYS_write, (long)v15, (long)v1, 1752);
	syscall(SYS_getdents64, (long)v13, (long)v1, 1582);
	syscall(SYS_pwrite64, (long)v15, (long)v1, 5142, 5178);
	syscall(SYS_removexattr, (long)v16, (long)v19);
	v20 = syscall(SYS_open, (long)v3, 65536, 0);
	syscall(SYS_fsync, (long)v15);
	syscall(SYS_symlink, (long)v5, (long)v21);
	syscall(SYS_link, (long)v10, (long)v22);
	v23 = syscall(SYS_open, (long)v7, 2, 0);
	syscall(SYS_ftruncate, (long)v15, 2545);
	syscall(SYS_write, (long)v23, (long)v1, 2067);
	syscall(SYS_fdatasync, (long)v23);
	syscall(SYS_link, (long)v10, (long)v24);
	syscall(SYS_symlink, (long)v9, (long)v25);
	syscall(SYS_fsync, (long)v15);
	syscall(SYS_mkdir, (long)v26, 511);
[    0.084492] random: fast init done
	syscall(SYS_fdatasync, (long)v23);
	syscall(SYS_write, (long)v23, (long)v1, 969);
	syscall(SYS_readlink, (long)v2, (long)v1, 8192);
	syscall(SYS_chmod, (long)v25, 3072);
	syscall(SYS_fdatasync, (long)v23);
	syscall(SYS_pwrite64, (long)v23, (long)v1, 1520, 1423);
	syscall(SYS_fallocate, (long)v15, 65, 5353, 6797);
	syscall(SYS_fsync, (long)v23);
	syscall(SYS_listxattr, (long)v22, (long)v1, 1808);
	syscall(SYS_pwrite64, (long)v23, (long)v1, 4742, 7814);
	syscall(SYS_newlstat, (long)v21, (long)v1);
	syscall(SYS_fsync, (long)v20);
Segmentation fault (core dumped)
Comment 1 Jungyeon 2019-03-11 18:58:55 UTC
Created attachment 281727 [details]
poc_01.c.raw
Comment 2 Jungyeon 2019-03-11 18:59:11 UTC
Created attachment 281729 [details]
poc_01.c
Comment 3 Jungyeon 2019-03-13 16:24:58 UTC
https://gts3.org/~jungyeon/ext4-combined

at the link above, I uploaded the executable file required for this test
Comment 4 Eric Sandeen 2019-03-13 17:13:51 UTC
How is a userspace segfault a kernel bug?

Running poc-01.c directly on the mounted image doesn't produce a problem for me.

Also I'm not keen on downloading a random 16MB binary "required" to run this test.
Comment 5 Theodore Tso 2019-03-13 17:41:33 UTC
I assume LKL is "Linux Kernel Library", so this is trying to run the ext4 file system executing the system calls found in poc-01.c in userspace?

Are there instructions so we can build the ext4_combined from source?  That's going to be needed if we are going to be able to run the binary under a debugger, and versus a patched kernel to verify the fix.

Also, can you give us a stack dump so we might have some kind of hint what's going on?
Comment 6 Jungyeon 2019-03-14 00:43:13 UTC
Sorry for my lack of explanation. 
Yes, LKL is Linux Kernel Library. poc-01.c is a program that calls lists of system calls in userspace and the craft image is a potentially faulty image to test error cases.

We are going to release our source code so that you can build the ext4-combined shortly. We needs some clean-up inside the codes before making it public.

I'm attaching stack dump at the last.
The problem here is that bh is NULL at the first place of this function, so that it leads to an error on J_ASSERT_JH(jh, jh->b_jcount >= 0).
To get the stack dump, I temporarily inserted BUG_ON on condition of jh being NULL.

Additionally I used Linux version 5.0.0+ for this trace (and in the linked ext4-combined binary)

2534 static void __journal_remove_journal_head(struct buffer_head *bh)
2535 {
2536     struct journal_head *jh = bh2jh(bh);
2537 
2538     BUG_ON(jh == NULL);
2539     J_ASSERT_JH(jh, jh->b_jcount >= 0);
2540     J_ASSERT_JH(jh, jh->b_transaction == NULL);
2541     J_ASSERT_JH(jh, jh->b_next_transaction == NULL);
2542     J_ASSERT_JH(jh, jh->b_cp_transaction == NULL);
2543     J_ASSERT_JH(jh, jh->b_jlist == BJ_None);
2544     J_ASSERT_BH(bh, buffer_jbd(bh));
2545     J_ASSERT_BH(bh, jh2bh(jh) == bh);
2546     BUFFER_TRACE(bh, "remove journal_head");
2547     if (jh->b_frozen_data) {
2548         printk(KERN_WARNING "%s: freeing b_frozen_data\n", __func__);
2549         jbd2_free(jh->b_frozen_data, bh->b_size);
2550     }


- Stack dump
[    0.089081] BUG: failure at fs/jbd2/journal.c:2538/__journal_remove_journal_head()!
[    0.089096] Kernel panic - not syncing: BUG!
[    0.089101] Call Trace:
[    0.089110] (____ptrval____):  [<55555559bc94>] .LC81+0x5f/0xfb
[    0.089118] (____ptrval____):  [<5555555c6025>] major_names+0x75/0x80
[    0.089125] (____ptrval____):  [<5555555978f4>] .LC11+0x14/0x20
[    0.089133] (____ptrval____):  [<5555556b1e40>] submit_bh+0x40/0x50
[    0.089141] (____ptrval____):  [<55555580286d>] jbd2_journal_put_journal_head+0x6cd/0x6d0
[    0.089147] (____ptrval____):  [<5555557ec6e8>] __jbd2_journal_refile_buffer+0x2d8/0x3c0
[    0.089153] (____ptrval____):  [<5555557f641a>] __jbd2_journal_remove_checkpoint+0x17a/0x2f0
[    0.089164] (____ptrval____):  [<5555557eff12>] jbd2_journal_commit_transaction+0x2fc2/0x3fc0
[    0.089173] (____ptrval____):  [<555555597353>] .LC18+0x3/0x10
[    0.089181] (____ptrval____):  [<5555555b8fb9>] try_to_wake_up+0x169/0x190
[    0.089190] (____ptrval____):  [<5555558031be>] kjournald2+0x34e/0x400
[    0.089199] (____ptrval____):  [<5555555bfd30>] autoremove_wake_function+0x0/0x40
[    0.089206] (____ptrval____):  [<5555555978f4>] .LC11+0x14/0x20
[    0.089214] (____ptrval____):  [<5555555b3acb>] kthread+0x15b/0x170
[    0.089221] (____ptrval____):  [<555555802e70>] kjournald2+0x0/0x400
[    0.089228] (____ptrval____):  [<5555555b3970>] kthread+0x0/0x170
[    0.089237] (____ptrval____):  [<5555555970ab>] uidhash_table+0x3b/0x40

Thanks.
Comment 7 Jungyeon 2019-03-14 18:15:08 UTC
Created attachment 281825 [details]
another test set

I'm attaching another error case what shows the same failure.
This includes much less system call (15 calls) so I hope this can help to figure out this bug.

- Reproduce
./lkl/tools/lkl/ext4-combined -t ext4 -i tmp.img -p min_11.c.raw -v
(min_11.c shows it's internal programs)

- Call stack
[    0.040743] BUG: failure at fs/jbd2/journal.c:2538/__journal_remove_journal_head()!
[    0.040754] Kernel panic - not syncing: BUG!
[    0.040758] Call Trace:
[    0.040767] (____ptrval____):  [<55555559bc94>] .LC81+0x5f/0xfb
[    0.040775] (____ptrval____):  [<5555555c6025>] major_names+0x75/0x80
[    0.040782] (____ptrval____):  [<5555555978f4>] .LC11+0x14/0x20
[    0.040791] (____ptrval____):  [<555555604368>] kmem_cache_free+0x148/0x190
[    0.040796] (____ptrval____):  [<5555555978f4>] .LC11+0x14/0x20
[    0.040804] (____ptrval____):  [<55555580286d>] jbd2_journal_put_journal_head+0x6cd/0x6d0
[    0.040811] (____ptrval____):  [<5555557f641a>] __jbd2_journal_remove_checkpoint+0x17a/0x2f0
[    0.040822] (____ptrval____):  [<5555557f5608>] jbd2_log_do_checkpoint+0x298/0xd10
[    0.040835] (____ptrval____):  [<555555850674>] atomic64_cmpxchg+0x54/0x80
[    0.040843] (____ptrval____):  [<5555557feda3>] jbd2_journal_destroy+0x363/0x840
[    0.040856] (____ptrval____):  [<5555555bfd30>] autoremove_wake_function+0x0/0x40
[    0.040865] (____ptrval____):  [<5555555ada2c>] input_timer_state+0x1c/0x20
[    0.040873] (____ptrval____):  [<5555557cb8ac>] ext4_put_super+0xac/0x7f0
[    0.040881] (____ptrval____):  [<555555616f5b>] generic_shutdown_super+0x13b/0x370
[    0.040889] (____ptrval____):  [<55555561acc5>] kill_block_super+0x55/0x100
[    0.040897] (____ptrval____):  [<555555616abc>] deactivate_locked_super+0x11c/0x170
[    0.040903] (____ptrval____):  [<555555616cb6>] deactivate_super+0x1a6/0x1b0
[    0.040911] (____ptrval____):  [<5555556538fb>] dput+0xcb/0x7c0
[    0.040919] (____ptrval____):  [<55555567d1a9>] cleanup_mnt+0xb9/0x170
[    0.040929] (____ptrval____):  [<55555567d0ed>] __cleanup_mnt+0x3d/0x40
[    0.040935] (____ptrval____):  [<5555555b24ca>] task_work_run+0xba/0xf0
[    0.040944] (____ptrval____):  [<55555559800f>] .LC2+0x3f/0x40
[    0.040951] (____ptrval____):  [<5555555978f4>] .LC11+0x14/0x20
[    0.040958] (____ptrval____):  [<5555555986d6>] .LC19+0x6/0x15
[    0.040966] 
[    0.040972] ---[ end Kernel panic - not syncing: BUG! ]---
ext4-combined: lib/posix-host.c:302: panic: Assertion `0' failed.
Comment 8 Theodore Tso 2019-03-25 03:40:25 UTC
Jungyeon,

One of the things you can do which would be helpful when creating a minimal reproducer, is to fix some of the gratuitous corruptions in the file system image, so we can be 100% sure which file sysutem corruption is combining with your test syscall load to trigger the failure.

For example, both of these super block corruptions which cause e2fsck to stop dead in its track because it views the superblock as being too compromised for automated machine assumptions to be safe are probably things we can clear and still have ext4-combined dump core:

% e2fsck -fy /tmp/tmp.img 
e2fsck 1.45.0 (6-Mar-2019)
Found invalid V2 journal superblock fields (from V1 journal).
Clearing fields beyond the V1 journal superblock...

Corruption found in superblock.  (desc_size = 33667).

E2fsck fixed up the first problem automatically, and the second I could fix up using debugfs: debugfs -w -R "ssv s_desc_size 64" /tmp/tmp.img

I'm suspect that root cause is that the block allocation bitmap has a block which is also used by the journal as being free.   And if that block gets reallocated so that a directory block (which, being metadata is accessed via a buffer head) overlaps with the journal block, I can imagine all sorts of hilarity enusing.   

I will attach a proposed patch which should detect this case, and block the reuse of a block belonging to the journal.   Can you try applying this patch to your LKL ext4-combined program, and see if it traps the file system corruption early enough that core dump doesn't get triggered?
Comment 9 Theodore Tso 2019-03-25 03:41:27 UTC
Created attachment 281995 [details]
possible patch to detect the problem early
Comment 10 Jungyeon 2019-04-05 20:12:00 UTC
Thanks a lot for the patch. As you stated, it also works for the #202877 reported bug.

Note You need to log in before you can comment on or make changes to this bug.