Bug 216283

Summary: FUZZ: BUG() triggered in fs/ext4/extent.c:ext4_ext_insert_extent() when mount and operate on crafted image
Product: File System Reporter: Wenqing Liu (wenqingliu0120)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED CODE_FIX    
Severity: normal CC: joni_lee, luis.henriques, tytso, wenqingliu0120
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.15-5.19-rc8 Subsystem:
Regression: No Bisected commit-id:
Attachments: poc and .config

Description Wenqing Liu 2022-07-26 19:35:26 UTC
Created attachment 301487 [details]
poc and .config

- Overview 
FUZZ: BUG() triggered in fs/ext4/extent.c:ext4_ext_insert_extent()when mount and operate on crafted image

- Reproduce 
Tested on kernel 5.15.57, 5.19-rc8

# mkdir test_crash
# cd test_crash 
# unzip tmp15.zip
# mkdir mnt
# ./single_test.sh ext4 15

-Kernel dump
[ 1524.446966] loop5: detected capacity change from 0 to 32768
[ 1524.536425] EXT4-fs (loop5): recovery complete
[ 1524.542174] EXT4-fs (loop5): mounted filesystem with ordered data mode. Quota mode: none.
[ 1524.542850] ext4 filesystem being mounted at /home/wq/test_crashes/mnt supports timestamps until 2038 (0x7fffffff)
[ 1524.849072] ------------[ cut here ]------------
[ 1524.849076] kernel BUG at fs/ext4/extents.c:1006!
[ 1524.849141] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 1524.849167] CPU: 0 PID: 1186 Comm: tmp15 Not tainted 5.19.0-rc8 #1
[ 1524.849193] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 1524.849228] RIP: 0010:ext4_ext_insert_extent+0x3b8e/0x43d0
[ 1524.849259] Code: df ba f7 03 00 00 48 c7 c6 e0 9b f8 8d 41 be 8b ff ff ff e8 e4 20 12 00 e9 da f5 ff ff 4c 89 ff e8 a7 91 cf ff e9 a0 f4 ff ff <0f> 0b e8 1b 91 cf ff e9 49 f4 ff ff 44 0f b7 fa 50 49 c7 c1 e0 8f
[ 1524.849330] RSP: 0018:ffff88812689f5f8 EFLAGS: 00010286
[ 1524.849353] RAX: 00000000ffffffff RBX: ffff888105227aa8 RCX: 0000000000000000
[ 1524.849381] RDX: ffffffffffffffff RSI: 0000000000017ef8 RDI: ffff888173197004
[ 1524.849410] RBP: ffff888105227986 R08: 0000000000000001 R09: ffffed1020af02a9
[ 1524.849438] R10: ffff888105781547 R11: ffffed1020af02a8 R12: 0000000000001013
[ 1524.849466] R13: ffff888103723c58 R14: ffff888173197000 R15: ffff888173197018
[ 1524.849494] FS:  00007f653cb8b540(0000) GS:ffff888293600000(0000) knlGS:0000000000000000
[ 1524.849526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1524.849549] CR2: 000055c5c050c008 CR3: 000000011caae004 CR4: 0000000000370ef0
[ 1524.849579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1524.849610] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1524.849638] Call Trace:
[ 1524.849650]  <TASK>
[ 1524.849661]  ? ext4_discard_preallocations+0xd70/0xd70
[ 1524.849686]  ? ext4_ext_shift_extents+0xc50/0xc50
[ 1524.849707]  ? ext4_ext_search_right+0x822/0xc20
[ 1524.849728]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 1524.849750]  ext4_ext_map_blocks+0xc86/0x3710
[ 1524.849771]  ? ext4_ext_release+0x10/0x10
[ 1524.849789]  ? do_writepages+0x170/0x590
[ 1524.849819]  ? __filemap_fdatawrite_range+0xa7/0xe0
[ 1524.849859]  ? ext4_sync_file+0x18a/0x9b0
[ 1524.849895]  ? do_fsync+0x38/0x70
[ 1524.849928]  ? __x64_sys_fdatasync+0x32/0x50
[ 1524.849965]  ? mpage_process_page_bufs+0xe8/0x5b0
[ 1524.850005]  ? __pagevec_release+0x7f/0x110
[ 1524.850042]  ? down_write_killable+0x130/0x130
[ 1524.850080]  ? ext4_es_lookup_extent+0x3ae/0x960
[ 1524.850104]  ext4_map_blocks+0x600/0x1460
[ 1524.850123]  ? ext4_issue_zeroout+0x190/0x190
[ 1524.850142]  ? __kasan_slab_alloc+0x90/0xc0
[ 1524.850163]  ext4_writepages+0xffa/0x25e0
[ 1524.850182]  ? __ext4_mark_inode_dirty+0x5f0/0x5f0
[ 1524.850204]  ? __stack_depot_save+0x34/0x540
[ 1524.850223]  ? _raw_spin_lock+0x87/0xda
[ 1524.850245]  ? _raw_spin_lock_irqsave+0xf0/0xf0
[ 1524.850283]  ? kmem_cache_free+0xd3/0x3b0
[ 1524.850320]  ? do_syscall_64+0x38/0x90
[ 1524.851019]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1524.851716]  do_writepages+0x170/0x590
[ 1524.852415]  ? page_writeback_cpu_online+0x20/0x20
[ 1524.853162]  ? avc_has_extended_perms+0xe70/0xe70
[ 1524.853848]  ? may_linkat+0x310/0x310
[ 1524.854524]  ? _raw_spin_lock+0x87/0xda
[ 1524.855182]  ? _raw_spin_lock_irqsave+0xf0/0xf0
[ 1524.855824]  ? wbc_attach_and_unlock_inode+0x21/0x590
[ 1524.856449]  filemap_fdatawrite_wbc+0x11d/0x190
[ 1524.857095]  __filemap_fdatawrite_range+0xa7/0xe0
[ 1524.857686]  ? delete_from_page_cache_batch+0x950/0x950
[ 1524.858274]  ? do_faccessat+0x1d2/0x630
[ 1524.858855]  ? kmem_cache_free+0xd3/0x3b0
[ 1524.859434]  file_write_and_wait_range+0x92/0x100
[ 1524.860013]  ext4_sync_file+0x18a/0x9b0
[ 1524.860587]  do_fsync+0x38/0x70
[ 1524.861193]  __x64_sys_fdatasync+0x32/0x50
[ 1524.861744]  do_syscall_64+0x38/0x90
[ 1524.862284]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 1524.862832] RIP: 0033:0x7f653cab073d
[ 1524.863388] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 23 37 0d 00 f7 d8 64 89 01 48
[ 1524.864581] RSP: 002b:00007ffe69164218 EFLAGS: 00000217 ORIG_RAX: 000000000000004b
[ 1524.865245] RAX: ffffffffffffffda RBX: 000055c5c050b720 RCX: 00007f653cab073d
[ 1524.865868] RDX: 00007f653cab073d RSI: ffffffffffffff80 RDI: 0000000000000004
[ 1524.866499] RBP: 00007ffe69168b80 R08: 00007ffe69168c78 R09: 00007ffe69168c78
[ 1524.867134] R10: 00007ffe69168c78 R11: 0000000000000217 R12: 000055c5c050b0a0
[ 1524.867770] R13: 00007ffe69168c70 R14: 0000000000000000 R15: 0000000000000000
[ 1524.868404]  </TASK>
[ 1524.869048] Modules linked in: input_leds joydev serio_raw qemu_fw_cfg xfs autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear qxl drm_ttm_helper ttm drm_kms_helper hid_generic usbhid syscopyarea sysfillrect sysimgblt fb_sys_fops hid drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel psmouse crypto_simd cryptd
[ 1524.871971] ---[ end trace 0000000000000000 ]---
[ 1524.872717] RIP: 0010:ext4_ext_insert_extent+0x3b8e/0x43d0
[ 1524.873715] Code: df ba f7 03 00 00 48 c7 c6 e0 9b f8 8d 41 be 8b ff ff ff e8 e4 20 12 00 e9 da f5 ff ff 4c 89 ff e8 a7 91 cf ff e9 a0 f4 ff ff <0f> 0b e8 1b 91 cf ff e9 49 f4 ff ff 44 0f b7 fa 50 49 c7 c1 e0 8f
[ 1524.875375] RSP: 0018:ffff88812689f5f8 EFLAGS: 00010286
[ 1524.876222] RAX: 00000000ffffffff RBX: ffff888105227aa8 RCX: 0000000000000000
[ 1524.877110] RDX: ffffffffffffffff RSI: 0000000000017ef8 RDI: ffff888173197004
[ 1524.877964] RBP: ffff888105227986 R08: 0000000000000001 R09: ffffed1020af02a9
[ 1524.878829] R10: ffff888105781547 R11: ffffed1020af02a8 R12: 0000000000001013
[ 1524.879684] R13: ffff888103723c58 R14: ffff888173197000 R15: ffff888173197018
[ 1524.880561] FS:  00007f653cb8b540(0000) GS:ffff888293600000(0000) knlGS:0000000000000000
[ 1524.881491] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1524.882372] CR2: 000055c5c050c008 CR3: 000000011caae004 CR4: 0000000000370ef0
[ 1524.883270] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1524.884182] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1524.885084] ------------[ cut here ]------------
[ 1524.885972] WARNING: CPU: 0 PID: 1186 at kernel/exit.c:741 do_exit+0x1798/0x2740
[ 1524.886885] Modules linked in: input_leds joydev serio_raw qemu_fw_cfg xfs autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear qxl drm_ttm_helper ttm drm_kms_helper hid_generic usbhid syscopyarea sysfillrect sysimgblt fb_sys_fops hid drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel psmouse crypto_simd cryptd
[ 1524.890831] CPU: 0 PID: 1186 Comm: tmp15 Tainted: G      D           5.19.0-rc8 #1
[ 1524.891858] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 1524.892918] RIP: 0010:do_exit+0x1798/0x2740
[ 1524.893959] Code: c0 74 08 3c 03 0f 8e cc 0c 00 00 8b 83 48 13 00 00 65 01 05 7a 59 aa 74 e9 92 fc ff ff 48 89 df e8 fd 77 28 00 e9 ec ee ff ff <0f> 0b e9 fa e8 ff ff 4c 89 e6 bf 05 06 00 00 e8 d4 72 02 00 e9 b2
[ 1524.896142] RSP: 0018:ffff88812689fe48 EFLAGS: 00010286
[ 1524.897245] RAX: 1ffff11024d12815 RBX: ffff888126893400 RCX: 0000000000000000
[ 1524.898357] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881268940a8
[ 1524.899473] RBP: ffff888126893400 R08: 0000000000000041 R09: ffffed1024d13000
[ 1524.900604] R10: ffff88829362848b R11: ffffed10526c5091 R12: 000000000000000b
[ 1524.901737] R13: ffffffff8de22ac0 R14: ffff888126893400 R15: 0000000000000000
[ 1524.902863] FS:  00007f653cb8b540(0000) GS:ffff888293600000(0000) knlGS:0000000000000000
[ 1524.904014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1524.905156] CR2: 000055c5c050c008 CR3: 000000011caae004 CR4: 0000000000370ef0
[ 1524.906287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1524.907395] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1524.908495] Call Trace:
[ 1524.909570]  <TASK>
[ 1524.910628]  ? file_write_and_wait_range+0x92/0x100
[ 1524.911697]  ? mm_update_next_owner+0x6e0/0x6e0
[ 1524.912777]  ? ext4_sync_file+0x18a/0x9b0
[ 1524.913849]  make_task_dead+0xb0/0xc0
[ 1524.914904]  rewind_stack_and_make_dead+0x17/0x17
[ 1524.915952] RIP: 0033:0x7f653cab073d
[ 1524.916963] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 23 37 0d 00 f7 d8 64 89 01 48
[ 1524.918998] RSP: 002b:00007ffe69164218 EFLAGS: 00000217 ORIG_RAX: 000000000000004b
[ 1524.920032] RAX: ffffffffffffffda RBX: 000055c5c050b720 RCX: 00007f653cab073d
[ 1524.921062] RDX: 00007f653cab073d RSI: ffffffffffffff80 RDI: 0000000000000004
[ 1524.922070] RBP: 00007ffe69168b80 R08: 00007ffe69168c78 R09: 00007ffe69168c78
[ 1524.923061] R10: 00007ffe69168c78 R11: 0000000000000217 R12: 000055c5c050b0a0
[ 1524.924051] R13: 00007ffe69168c70 R14: 0000000000000000 R15: 0000000000000000
[ 1524.925023]  </TASK>
[ 1524.925975] ---[ end trace 0000000000000000 ]---
Comment 1 Darrick J. Wong 2022-07-26 20:10:26 UTC
If you are going to run some scripted tool to randomly
corrupt the filesystem to find failures, then you have an
ethical and moral responsibility to do some of the work to
narrow down and identify the cause of the failure, not just
throw them at someone to do all the work.

--D
Comment 2 Lukas Czerner 2022-07-27 11:53:15 UTC
On Tue, Jul 26, 2022 at 01:10:24PM -0700, Darrick J. Wong wrote:
> If you are going to run some scripted tool to randomly
> corrupt the filesystem to find failures, then you have an
> ethical and moral responsibility to do some of the work to
> narrow down and identify the cause of the failure, not just
> throw them at someone to do all the work.
> 
> --D

While I understand the frustration with the fuzzer bug reports like this
I very much disagree with your statement about ethical and moral
responsibility.

The bug is in the code, it would have been there even if Wenqing Liu
didn't run the tool. We know there are bugs in the code we just don't
know where all of them are. Now, thanks to this report, we know a little
bit more about at least one of them. That's at least a little useful.
But you seem to argue that the reporter should put more work in, or not
bother at all.

That's wrong. Really, Wenqing Liu has no more ethical and moral
responsibility than you finding and fixing the problem regardless of the
bug report.

I think the frustration comes from the fact that it's potentially a lot
of work to untangle and fix the real problem and now when it is out
there we feel obligated to fix it. And while bug reports and tools
generating these can always be better and reporters can always be a bit
more active in narrowing the problem down, you're of course free to
ignore this until you, or anyone else, has a bit of spare time and
energy to investigate.

-Lukas
Comment 3 Dave Chinner 2022-07-27 23:22:31 UTC
On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote:
> On Tue, Jul 26, 2022 at 01:10:24PM -0700, Darrick J. Wong wrote:
> > If you are going to run some scripted tool to randomly
> > corrupt the filesystem to find failures, then you have an
> > ethical and moral responsibility to do some of the work to
> > narrow down and identify the cause of the failure, not just
> > throw them at someone to do all the work.
> > 
> > --D
> 
> While I understand the frustration with the fuzzer bug reports like this
> I very much disagree with your statement about ethical and moral
> responsibility.
> 
> The bug is in the code, it would have been there even if Wenqing Liu
> didn't run the tool.

Yes, but it's not just a bug. It's a format parser exploit.

> We know there are bugs in the code we just don't
> know where all of them are. Now, thanks to this report, we know a little
> bit more about at least one of them. That's at least a little useful.
> But you seem to argue that the reporter should put more work in, or not
> bother at all.
> 
> That's wrong. Really, Wenqing Liu has no more ethical and moral
> responsibility than you finding and fixing the problem regardless of the
> bug report.

By this reasoning, the researchers that discovered RetBleed
should have just published their findings without notify any of the
affected parties.

i.e. your argument implies they have no responsibility and hence are
entitled to say "We aren't responsible for helping anyone understand
the problem or mitigating the impact of the flaw - we've got our
publicity and secured tenure with discovery and publication!"

That's not _responsible disclosure_.

Yup, this is important enough that we actually have a name for it:
responsible disclosure.

And where do those responsibilities come from? You  guessed it -
they are based on the ethics and morals that guide us towards doing
what is best for the wider community.

> I think the frustration comes from the fact that it's potentially a lot
> of work to untangle and fix the real problem and now when it is out
> there we feel obligated to fix it. And while bug reports and tools
> generating these can always be better and reporters can always be a bit
> more active in narrowing the problem down, you're of course free to
> ignore this until you, or anyone else, has a bit of spare time and
> energy to investigate.

It has nothing to do with the amount of work, nor does it change the
fact that us developers will need to do most of the work. The
problem here is the lack of responsible disclosure that we see
repeatedly with filesystem flaws found by fuzzing the on-disk
format.

Public reports like this require immediate work to determine the
scope, impact and risk of the problem to decide what needs to be
done next.  All public disclosure does is start a race and force
developers to have to address it immediately.

Responsible disclosure gives developers a short window in which they
can perform that analysis without fear that somebody might already
be actively exploiting the problem discovered by the fuzzer. We can
address the problem without extreme urgency, knowing a day or two
while we wait for private discussion and bug fixing to take place
isn't going to make things worse.

That's the issue with drive-by fuzzer bug reporting - the people
that do this have little clue about the potential impact of the
flaws they are discovering. Those people need to be taught that
their responsibility is not to through issues over the wall at other
people, but to work closely with the people that can fix the issues
to have a fix for the problem ready at the same time the issue is
disclosed.

IOWs, they have an ethical and moral responsibility to the wider
community to disclose these issues to relevant developers in a
responsible manner and work with them to fix the problems before the
issues are made public.

Once you look at filesystem fuzzing bugs from a security and exploit
perspective, _everything changes_.

Cheers,

Dave.
Comment 4 Theodore Tso 2022-07-28 02:47:06 UTC
On Thu, Jul 28, 2022 at 09:22:24AM +1000, Dave Chinner wrote:
> On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote:
> > While I understand the frustration with the fuzzer bug reports like this
> > I very much disagree with your statement about ethical and moral
> > responsibility.
> > 
> > The bug is in the code, it would have been there even if Wenqing Liu
> > didn't run the tool.
> 
> i.e. your argument implies they have no responsibility and hence are
> entitled to say "We aren't responsible for helping anyone understand
> the problem or mitigating the impact of the flaw - we've got our
> publicity and secured tenure with discovery and publication!"
> 
> That's not _responsible disclosure_.

So I'm going to disagree here.  I understand that this is the XFS
position, and so a few years back, the Georgia Tech folks who were
responsible for Janus and Hydra decided not to engage with the XFS
community and stopped reporting XFS bugs.  They continued to engage
with the ext4 community, and I found their reports to be helpful.  We
found and fixed quite a few bugs as a result of their work, and I
sponsored them to get some research funding from Google so they could
do more file system fuzzing work, because I thought their work was a
useful contribution.

I don't particularly worry about "responsible disclosure" because I
don't consider fuzzed file system crashes to be a particularly serious
security concern.  There are some crazy container folks who think
containers are just as secure(tm) as VM's, and who advocate allowing
untrusted containers to mount arbitrary file system images and expect
that this not cause the "host" OS to crash or get compromised.  Those
people are insane(tm), and I don't particularly worry about their use
cases.

If you have a Linux laptop with an automounter enabled it's possible
that when you plug in a USB stick containing a corrupted file system,
it could cause the system to crash.  But that requires physical access
to the machine, and if you have physical access, there is no shortage
of problems you could cause in any case.


> Public reports like this require immediate work to determine the
> scope, impact and risk of the problem to decide what needs to be
> done next.  All public disclosure does is start a race and force
> developers to have to address it immediately.

Nope.  I'll address these when I have time, and I don't consider them
to be particularly urgent, for the reasons described above.

I actually consider this fuzzer bug report to be particularly
well-formed.  Unlike Syzkaller, the file system image was in a
separate file, and wasn't embedded in the reproducer.c file in a way
that made it super-inconvenient to extract.  Furthermore, like the
Georgia Tech fuzzing reports, I appreciate that it was filed in
Bugzilla, since it won't easily get lost, with all of the information
that we need.

In any case, I've taken a closer look at this report, and it's
actually quite the interesting problem.  The issue is that we have an
non-leaf node in the extent tree where eh_entries header field is
zero.  This should never happen:

debugfs: extents <16>
Level Entries       Logical      Physical Length Flags
 0/ 2   1/  1     0 - 98030  9284          98031
 1/ 2   1/  0     0 - 98030  9282          98031 <======
          ^^^
 2/ 2   1/ 84     0 -     0  9730 -  9730      1 
 2/ 2   2/ 84     5 -     7  9739 -  9741      3 
 2/ 2   3/ 84    16 -    17  9750 -  9751      2 
 2/ 2   4/ 84    26 -    26  9768 -  9768      1 
 2/ 2   5/ 84    36 -    36  9787 -  9787      1 

This causes len to go negative in ext4_extent_insert_extent:

[   26.419401] ino 16 len -1 logical 98040 eh_entries 0 eh_max 84 depth 1

... which is what triggers the BUG_ON(len < 0).

What makes this particularly interesting is that neither the kernel
*nor* e2fsck is flagging this extent tree as corrupt.  So this is an
opportunity to improve both the kernel as well as fsck.ext4.

Again, it's not an *urgent* issue, but it is something that is worth
trying to improve in ext4 from the perspective of improving the
quality of our implementation.  And since it's not an urgent issue,
concerns of "responsble disclosure" don't arise, at least not in my
opinion.

					- Ted
Comment 5 Lukas Czerner 2022-07-28 07:25:21 UTC
On Thu, Jul 28, 2022 at 09:22:24AM +1000, Dave Chinner wrote:
> On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote:
> > On Tue, Jul 26, 2022 at 01:10:24PM -0700, Darrick J. Wong wrote:
> > > If you are going to run some scripted tool to randomly
> > > corrupt the filesystem to find failures, then you have an
> > > ethical and moral responsibility to do some of the work to
> > > narrow down and identify the cause of the failure, not just
> > > throw them at someone to do all the work.
> > > 
> > > --D
> > 
> > While I understand the frustration with the fuzzer bug reports like this
> > I very much disagree with your statement about ethical and moral
> > responsibility.
> > 
> > The bug is in the code, it would have been there even if Wenqing Liu
> > didn't run the tool.
> 
> Yes, but it's not just a bug. It's a format parser exploit.

And what do you think this is exploiting? A bug in a "format parser"
perhaps?

Are you trying both downplay it to not-a-bug and elevate it to 'security
vulnerability' at the same time ? ;)

> 
> > We know there are bugs in the code we just don't
> > know where all of them are. Now, thanks to this report, we know a little
> > bit more about at least one of them. That's at least a little useful.
> > But you seem to argue that the reporter should put more work in, or not
> > bother at all.
> > 
> > That's wrong. Really, Wenqing Liu has no more ethical and moral
> > responsibility than you finding and fixing the problem regardless of the
> > bug report.
> 
> By this reasoning, the researchers that discovered RetBleed
> should have just published their findings without notify any of the
> affected parties.
> 
> i.e. your argument implies they have no responsibility and hence are
> entitled to say "We aren't responsible for helping anyone understand
> the problem or mitigating the impact of the flaw - we've got our
> publicity and secured tenure with discovery and publication!"
> 
> That's not _responsible disclosure_.

Look, your entire argument hinges on the assumption that this is a
security vulnerability that could be exploited and the report makes the
situation worse. And that's very much debatable. I don't think it is and
Ted described it very well in his comment.

Asking for more information, or even asking reported to try to narrow
down the problem is of course fine. But making sweeping claims about
moral and ethical responsibilities is always a little suspicious and
completely bogus in this case IMO.

-Lukas
Comment 6 Dave Chinner 2022-08-01 22:45:57 UTC
On Thu, Jul 28, 2022 at 09:25:10AM +0200, Lukas Czerner wrote:
> On Thu, Jul 28, 2022 at 09:22:24AM +1000, Dave Chinner wrote:
> > On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote:
> > > On Tue, Jul 26, 2022 at 01:10:24PM -0700, Darrick J. Wong wrote:
> > > > If you are going to run some scripted tool to randomly
> > > > corrupt the filesystem to find failures, then you have an
> > > > ethical and moral responsibility to do some of the work to
> > > > narrow down and identify the cause of the failure, not just
> > > > throw them at someone to do all the work.
> > > > 
> > > > --D
> > > 
> > > While I understand the frustration with the fuzzer bug reports like this
> > > I very much disagree with your statement about ethical and moral
> > > responsibility.
> > > 
> > > The bug is in the code, it would have been there even if Wenqing Liu
> > > didn't run the tool.
> > 
> > Yes, but it's not just a bug. It's a format parser exploit.
> 
> And what do you think this is exploiting? A bug in a "format parser"
> perhaps?
> 
> Are you trying both downplay it to not-a-bug and elevate it to 'security
> vulnerability' at the same time ? ;)

How did you come to that conclusion?

"not just a bug" != "not a bug".

i.e. I said the complete opposite of what your comment implies I
said...

> > > We know there are bugs in the code we just don't
> > > know where all of them are. Now, thanks to this report, we know a little
> > > bit more about at least one of them. That's at least a little useful.
> > > But you seem to argue that the reporter should put more work in, or not
> > > bother at all.
> > > 
> > > That's wrong. Really, Wenqing Liu has no more ethical and moral
> > > responsibility than you finding and fixing the problem regardless of the
> > > bug report.
> > 
> > By this reasoning, the researchers that discovered RetBleed
> > should have just published their findings without notify any of the
> > affected parties.
> > 
> > i.e. your argument implies they have no responsibility and hence are
> > entitled to say "We aren't responsible for helping anyone understand
> > the problem or mitigating the impact of the flaw - we've got our
> > publicity and secured tenure with discovery and publication!"
> > 
> > That's not _responsible disclosure_.
> 
> Look, your entire argument hinges on the assumption that this is a
> security vulnerability that could be exploited and the report makes the
> situation worse. And that's very much debatable. I don't think it is and
> Ted described it very well in his comment.

On systems that automount filesytsems when you plug in a USB drive
(which most distros do out of the box) then a crash bug during mount
is, at minimum, an annoying DOS vector. And if it can result in a
buffer overflow, then....

> Asking for more information, or even asking reported to try to narrow
> down the problem is of course fine.

Sure, nobody is questioning how we triage these issues - the
question is over how they are reported and the forum under which the
initial triage takes place

> But making sweeping claims about
> moral and ethical responsibilities is always a little suspicious and
> completely bogus in this case IMO.

Hand waving away the fact that fuzzer crash bugs won't be a security
issue without having done any investigation is pretty much the whole
problem here. This is not responsible behaviour.

Cheers,

Dave.
Comment 7 Theodore Tso 2022-08-02 01:06:54 UTC
On Tue, Aug 02, 2022 at 08:45:51AM +1000, Dave Chinner wrote:
> 
> On systems that automount filesytsems when you plug in a USB drive
> (which most distros do out of the box) then a crash bug during mount
> is, at minimum, an annoying DOS vector. And if it can result in a
> buffer overflow, then....

You need physical access to plug in a USB drive, and if you can do
that, the number of potential attack vectors are numerous.  eSATA,
Firewire, etc., gives the potential hardware device direct access to
the PCI bus and the ability to issue arbitrary DMA requests.  Badly
implemented Thunderbolt devices can have the same vulnerability, and
badly implemented USB controllers have their own entertaining issues.
And if attackers have a bit more unguarded physical access time, there
are no shortage of "evil maid" attacks that can be carried out.

As far as I'm concerned a secure system has automounters disabled, and
comptent distributions should disable the automounter when the laptop
is locked.  Enterprise class server class machines running enterprise
distros have no business having the automounter enabled at all, and
careful datacenter managers should fill in the USB ports with epoxy.
For more common sense tips, see:
https://www.youtube.com/watch?v=kd33UVZhnAA

Look, bad buys have the time and energy to run file system fuzzers
(many of which are open source and can be easily found on github).
I'm sure our good friends at the NSA, MSS, and KGB know all of this
already; and the NSO group is apparently happy to make them available
to anyone willing to pay, no matter what their human rights record
might be.

Security by obscurity never works, and as far as I am concerned, I am
grateful when academics run fuzzers and report bugs to us.  Especially
since attacks which require physical access or root privs are going to
have low CVE Security Scores *anyway*.

Cheers,

						- Ted
Comment 8 Dave Chinner 2022-08-02 03:26:05 UTC
On Wed, Jul 27, 2022 at 10:46:53PM -0400, Theodore Ts'o wrote:
> On Thu, Jul 28, 2022 at 09:22:24AM +1000, Dave Chinner wrote:
> > On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote:
> > > While I understand the frustration with the fuzzer bug reports like this
> > > I very much disagree with your statement about ethical and moral
> > > responsibility.
> > > 
> > > The bug is in the code, it would have been there even if Wenqing Liu
> > > didn't run the tool.
> > 
> > i.e. your argument implies they have no responsibility and hence are
> > entitled to say "We aren't responsible for helping anyone understand
> > the problem or mitigating the impact of the flaw - we've got our
> > publicity and secured tenure with discovery and publication!"
> > 
> > That's not _responsible disclosure_.
> 
> So I'm going to disagree here.  I understand that this is the XFS
> position,

Nope, nothing to do with XFS here - addressing how filesystem
fuzzing is approached and reported this is much wider engineering
and security process problem.

> and so a few years back, the Georgia Tech folks who were
> responsible for Janus and Hydra decided not to engage with the XFS
> community and stopped reporting XFS bugs.

That is at odds with the fact they engaged us repeatedly over a
period of 6 months to report and fix all the bugs the Janus
framework found.

Indeed, the Acknowledgements from the Janus paper read:

"We thank the anonymous reviewers, and our shepherd,
Thorsten Holz, for their helpful feedback. We also thank all
the file system developers, including Theodore Ts’o, Darrick
J. Wong, Dave Chinner, Eric Sandeen, Chao Yu, Wenruo
Qu and Ernesto A. Fernández for handling our bug reports."

Yup, there we all are - ext4, XFS and btrfs all represented.

And, well, we didn't form the opinion that fuzzer bugs should be
disclosed responsibly until early 2021. The interactions with GATech
researchers running the Janus project was back in 2018 and we
addressed all their bug reports quickly and with a minimum of fuss.

It's somewhat disingenious to claim that a policy taht wasn't
formulated until 2021 had a fundamental influence on decisions
made in late 2018....

> They continued to engage
> with the ext4 community, and I found their reports to be helpful.  We
> found and fixed quite a few bugs as a result of their work,

Yup, same with XFS - we fixed them all pretty quickly, and even so
still had half a dozen CVEs raised against those XFS bugs
post-humously by the linux security community. And I note that ext4
also had about a dozen CVEs raised against the bugs that Janus
found...

I'll also quote from the Hydra paper on their classification of the
bugs they were trying to uncover:

"Memory errors (ME). Memory errors are common in file
systems. Due to their high security impact, [...]"

The evidence at hand tells us that filesystem fuzzer bugs have
security implications. Hence we need to treat them accordingly.

> and I
> sponsored them to get some research funding from Google so they could
> do more file system fuzzing work, because I thought their work was a
> useful contribution.

I guess the funding you are talking about is for the Hydra paper
that GATech published later in 2019? The only upstream developer
mentioned in the acknowledgements is you, and I also note that
funding from Google is disclosed, too. True, they didn't engage with
upstream XFS at all during that work, or since, but I think there's
a completely different reason to what you are implying...

i.e., I don't think the "not engaging with upstream XFS" has
anything to do with reporting and fixing the bugs of the Janus era.
To quote the Hydra paper, from the "experimental setup" section:

"We also tested XFS, GFS2, HFS+, ReiserFS, and VFAT, but found only
memory-safety bugs."

Blink and you miss it, yet it's possibly the most important finding
in the paper: Hydra didn't find any crash inconsistencies, no
logic bugs, nor any POSIX spec violations in XFS.

IOWs, Hydra didn't find any of the problems the fuzzer was supposed
to find in the filesystems it was run on.  There was simply nothing
to report to upstream XFS, and nothing to write about in the paper.

It's hardly a compelling research paper that reports "new algorithm
found no new bugs at all". Yet that's what the result was with Hydra
on XFS.

Let's consider that finding in the wider context of academia
looking into new filesystem fuzzing techniques. If you have a
filesystem that is immune to fuzzing, then it doesn't really help
you prove that you've advanced the start of the fuzzing art, does
it?

Hence a filesystem that is becoming largely immmune to randomised
fuzzing techniques then becomes the least appealing research target
for filesystem fuzzing. If a new fuzzer can't find bugs in a complex
filesystem that we all know is full of bugs, it doesn't make for
very compelling research, does it?

Indeed, the Hydra paper spends a lot of time at the start explaining
how fstests doesn't exercise filesysetms using semantic fuzzer
techniques that can be used to discover format corruption bugs.
However, it ignores the fact that fstests contains extensive
directed structure corruption based fuzzing tests for XFS. This is
one of the reasons why Hydra didn't find any new format fuzzing bugs
- it's semantic algorithms and crafted images didn't exercise the
XFS in a way that wasn't already covered by fstests.

IOWs, if a new fuzzer isn't any better than what we already have in
fstests, then the new fuzzer research is going to come up a great
big donut on XFS, as we see with Hydra.

Hence, if we are seeing researchers barely mention XFS because their
new technique is not finding bugs in XFS, and we see them instead
focus on ext4, btrfs, and other filesystems that do actually crash
or have inconsistencies, what does that say about how XFS developers
have been going about fuzz testing and run-time on-disk format
validation? What does that say about ext4, f2fs, btrfs, etc? What
does that say about the researcher's selective presentation of the
results?

IOWs, the lack of upstream XFS community engagement from fuzzer
researchers has nothing to do with interactions with the XFS
community - it has everything to do with the fact we've raised the
bar higher than new fuzzer projects can reach in a short period of
time. If the research doesn't bear fruit on XFS, then the
researchers have no reason to engage the upstream community during
the course of their research.....

The bottom line is that we want filesystems to be immune to fuzzer
fault injection.  Hence if XFS is doing better at rejecting fuzzed
input than other linux filesystems, then perhaps the XFS developers
are doing something right and perhaps there's something to the
approach they take and processes they have brought to filesystem
fuzzing.

The state of the art is not always defined by academic research
papers....

> I don't particularly worry about "responsible disclosure" because I
> don't consider fuzzed file system crashes to be a particularly serious
> security concern.  There are some crazy container folks who think
> containers are just as secure(tm) as VM's, and who advocate allowing
> untrusted containers to mount arbitrary file system images and expect
> that this not cause the "host" OS to crash or get compromised.  Those
> people are insane(tm), and I don't particularly worry about their use
> cases.

They may be "crazy container" use cases, but anything we can do to
make that safer is a good thing.


But if the filesystem crashes or has a bug that can be exploited
during the mount process....

> If you have a Linux laptop with an automounter enabled it's possible
> that when you plug in a USB stick containing a corrupted file system,
> it could cause the system to crash.  But that requires physical access
> to the machine, and if you have physical access, there is no shortage
> of problems you could cause in any case.

Yes, the real issue is that distros automount filesystems with
"noexec,nosuid,nodev". They use these mount options so that the OS
protects against trojanned permissions and binaries on the untrusted
filesystem, thereby preventing most of the vectors an untrusted
filesystem can use to subvert the security of the system without the
user first making an explicit choice to allow the system to run
untrusted code.

But exploiting an automoutner does not require physical access at
all. Anyone who says this is ignoring the elephant in the room:
supply chain attacks.

All it requires is a supply chain to be subverted somehere, and now
the USB drive that contains the drivers for your special hardware
from a manufacturer you trust (and with manufacturer
trust/anti-tamper seals intact) now powns your machine when you plug
it in.

Did the user do anything wrong? No, not at all. But they could
have a big problem if filesystem developers don't care about
threat models like subverted supply chains and leave the door wide
open even when the user does all the right things...

> > Public reports like this require immediate work to determine the
> > scope, impact and risk of the problem to decide what needs to be
> > done next.  All public disclosure does is start a race and force
> > developers to have to address it immediately.
> 
> Nope.  I'll address these when I have time, and I don't consider them
> to be particularly urgent, for the reasons described above.

Your choice, but....

> I actually consider this fuzzer bug report to be particularly
> well-formed.

.... that's not the issue here, and ....

> In any case, I've taken a closer look at this report, and it's

.... regardless of whether you consider it urgent or not, you have
now gone out of your way to determine the risk the reported problem
now poses.....

> Again, it's not an *urgent* issue,

.... and so finally we have an answer to the risk and scope
question. This should have been known before the bug was made
public.

Giving developers a short window to determine the scope of the
problem before it is made public avoids all the potential problems
of the corruption bug having system security implications. It
generally doesn't take long to determine this (especially when the
reporter has a reproducer), but it needs to be done *before* the
flaw is made public...

Anything that can attract a CVE (and filesystem fuzzer bugs do,
indeed, attract CVEs) needs to be treated as a potential security
issue, not as a normal bug.

Cheers,

Dave.
Comment 9 Lukas Czerner 2022-08-02 09:28:49 UTC
On Tue, Aug 02, 2022 at 08:45:51AM +1000, Dave Chinner wrote:

--- snip ---

> > 
> > Look, your entire argument hinges on the assumption that this is a
> > security vulnerability that could be exploited and the report makes the
> > situation worse. And that's very much debatable. I don't think it is and
> > Ted described it very well in his comment.
> 
> On systems that automount filesytsems when you plug in a USB drive
> (which most distros do out of the box) then a crash bug during mount
> is, at minimum, an annoying DOS vector. And if it can result in a
> buffer overflow, then....
> 
> > Asking for more information, or even asking reported to try to narrow
> > down the problem is of course fine.
> 
> Sure, nobody is questioning how we triage these issues - the
> question is over how they are reported and the forum under which the
> initial triage takes place
> 
> > But making sweeping claims about
> > moral and ethical responsibilities is always a little suspicious and
> > completely bogus in this case IMO.
> 
> Hand waving away the fact that fuzzer crash bugs won't be a security
> issue without having done any investigation is pretty much the whole
> problem here. This is not responsible behaviour.

Since it's obvious that the security status of this is disputed, then
please feel free to create guidelines stating that fuzzer bugs for xfs
are considered a security issues and reporters should follow guidelines
of responsible disclosure and bugs are not to be reported publicly.

Problem solved and no moralizing needed.

-Lukas

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Comment 10 Zhang Boyang 2022-08-17 12:42:53 UTC
Hi,

On 2022/8/2 11:25, Dave Chinner wrote:
>> I don't particularly worry about "responsible disclosure" because I
>> don't consider fuzzed file system crashes to be a particularly serious
>> security concern.  There are some crazy container folks who think
>> containers are just as secure(tm) as VM's, and who advocate allowing
>> untrusted containers to mount arbitrary file system images and expect
>> that this not cause the "host" OS to crash or get compromised.  Those
>> people are insane(tm), and I don't particularly worry about their use
>> cases.
> 
> They may be "crazy container" use cases, but anything we can do to
> make that safer is a good thing.
> 
> 
> But if the filesystem crashes or has a bug that can be exploited
> during the mount process....
> 

I think filesystem-safety is very import to consumer devices like 
computers or smartphones, at least for those filesystems designed for 
(or widely used for) data exchange, like fat and exfat. Please see my 
comments below.

On the other hand, filesystem designed for internal use like ext4 or xfs 
can ignore deliberate manipulation but users still expect they can deal 
with random errors, e.g. you don't want whole file server down because 
of single faulty disk. And this has nothing to do with containers.


>> If you have a Linux laptop with an automounter enabled it's possible
>> that when you plug in a USB stick containing a corrupted file system,
>> it could cause the system to crash.  But that requires physical access
>> to the machine, and if you have physical access, there is no shortage
>> of problems you could cause in any case.
> 
> Yes, the real issue is that distros automount filesystems with
> "noexec,nosuid,nodev". They use these mount options so that the OS
> protects against trojanned permissions and binaries on the untrusted
> filesystem, thereby preventing most of the vectors an untrusted
> filesystem can use to subvert the security of the system without the
> user first making an explicit choice to allow the system to run
> untrusted code.
> 
> But exploiting an automoutner does not require physical access at
> all. Anyone who says this is ignoring the elephant in the room:
> supply chain attacks.guarantee
> 
> All it requires is a supply chain to be subverted somehere, and now
> the USB drive that contains the drivers for your special hardware
> from a manufacturer you trust (and with manufacturer
> trust/anti-tamper seals intact) now powns your machine when you plug
> it in.
> 
> Did the user do anything wrong? No, not at all. But they could
> have a big problem if filesystem developers don't care about
> threat models like subverted supply chains and leave the door wide
> open even when the user does all the right things...
> 

Yes, an attack need physical access doesn't means the attacker need 
physical access.

USB sticks (or more generally, external storage devices), is still a 
very important way to exchange data between computers (and/or smart 
devices), although it's not as common as before. No safe guarantee here 
means there is no way to even read untrusted filesystems without using 
virtual machines / DMZ machines. Thus, using untrusted filesystems 
natively will become "give root privilege to those who wrote to that 
filesystem". That makes me recall the nightmare of autorun.inf worms on 
Windows platforms. I think no user/vendor really want this. At least I'm 
sure it would be scandal for Tesla if their cars can be hacked by 
inserting a USB stick.

Best Regards,
Zhang Boyang
Comment 11 Luis Henriques 2022-10-04 09:15:09 UTC
I think that, with commit 29a5b8a137ac ("ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0") merged, this bug can be closed.
Comment 12 Theodore Tso 2022-10-04 19:42:53 UTC
Yep, thanks for the patch!