Created attachment 301487 [details] poc and .config - Overview FUZZ: BUG() triggered in fs/ext4/extent.c:ext4_ext_insert_extent()when mount and operate on crafted image - Reproduce Tested on kernel 5.15.57, 5.19-rc8 # mkdir test_crash # cd test_crash # unzip tmp15.zip # mkdir mnt # ./single_test.sh ext4 15 -Kernel dump [ 1524.446966] loop5: detected capacity change from 0 to 32768 [ 1524.536425] EXT4-fs (loop5): recovery complete [ 1524.542174] EXT4-fs (loop5): mounted filesystem with ordered data mode. Quota mode: none. [ 1524.542850] ext4 filesystem being mounted at /home/wq/test_crashes/mnt supports timestamps until 2038 (0x7fffffff) [ 1524.849072] ------------[ cut here ]------------ [ 1524.849076] kernel BUG at fs/ext4/extents.c:1006! [ 1524.849141] invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 1524.849167] CPU: 0 PID: 1186 Comm: tmp15 Not tainted 5.19.0-rc8 #1 [ 1524.849193] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1524.849228] RIP: 0010:ext4_ext_insert_extent+0x3b8e/0x43d0 [ 1524.849259] Code: df ba f7 03 00 00 48 c7 c6 e0 9b f8 8d 41 be 8b ff ff ff e8 e4 20 12 00 e9 da f5 ff ff 4c 89 ff e8 a7 91 cf ff e9 a0 f4 ff ff <0f> 0b e8 1b 91 cf ff e9 49 f4 ff ff 44 0f b7 fa 50 49 c7 c1 e0 8f [ 1524.849330] RSP: 0018:ffff88812689f5f8 EFLAGS: 00010286 [ 1524.849353] RAX: 00000000ffffffff RBX: ffff888105227aa8 RCX: 0000000000000000 [ 1524.849381] RDX: ffffffffffffffff RSI: 0000000000017ef8 RDI: ffff888173197004 [ 1524.849410] RBP: ffff888105227986 R08: 0000000000000001 R09: ffffed1020af02a9 [ 1524.849438] R10: ffff888105781547 R11: ffffed1020af02a8 R12: 0000000000001013 [ 1524.849466] R13: ffff888103723c58 R14: ffff888173197000 R15: ffff888173197018 [ 1524.849494] FS: 00007f653cb8b540(0000) GS:ffff888293600000(0000) knlGS:0000000000000000 [ 1524.849526] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1524.849549] CR2: 000055c5c050c008 CR3: 000000011caae004 CR4: 0000000000370ef0 [ 1524.849579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1524.849610] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1524.849638] Call Trace: [ 1524.849650] <TASK> [ 1524.849661] ? ext4_discard_preallocations+0xd70/0xd70 [ 1524.849686] ? ext4_ext_shift_extents+0xc50/0xc50 [ 1524.849707] ? ext4_ext_search_right+0x822/0xc20 [ 1524.849728] ? _raw_spin_unlock_irqrestore+0x23/0x40 [ 1524.849750] ext4_ext_map_blocks+0xc86/0x3710 [ 1524.849771] ? ext4_ext_release+0x10/0x10 [ 1524.849789] ? do_writepages+0x170/0x590 [ 1524.849819] ? __filemap_fdatawrite_range+0xa7/0xe0 [ 1524.849859] ? ext4_sync_file+0x18a/0x9b0 [ 1524.849895] ? do_fsync+0x38/0x70 [ 1524.849928] ? __x64_sys_fdatasync+0x32/0x50 [ 1524.849965] ? mpage_process_page_bufs+0xe8/0x5b0 [ 1524.850005] ? __pagevec_release+0x7f/0x110 [ 1524.850042] ? down_write_killable+0x130/0x130 [ 1524.850080] ? ext4_es_lookup_extent+0x3ae/0x960 [ 1524.850104] ext4_map_blocks+0x600/0x1460 [ 1524.850123] ? ext4_issue_zeroout+0x190/0x190 [ 1524.850142] ? __kasan_slab_alloc+0x90/0xc0 [ 1524.850163] ext4_writepages+0xffa/0x25e0 [ 1524.850182] ? __ext4_mark_inode_dirty+0x5f0/0x5f0 [ 1524.850204] ? __stack_depot_save+0x34/0x540 [ 1524.850223] ? _raw_spin_lock+0x87/0xda [ 1524.850245] ? _raw_spin_lock_irqsave+0xf0/0xf0 [ 1524.850283] ? kmem_cache_free+0xd3/0x3b0 [ 1524.850320] ? do_syscall_64+0x38/0x90 [ 1524.851019] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1524.851716] do_writepages+0x170/0x590 [ 1524.852415] ? page_writeback_cpu_online+0x20/0x20 [ 1524.853162] ? avc_has_extended_perms+0xe70/0xe70 [ 1524.853848] ? may_linkat+0x310/0x310 [ 1524.854524] ? _raw_spin_lock+0x87/0xda [ 1524.855182] ? _raw_spin_lock_irqsave+0xf0/0xf0 [ 1524.855824] ? wbc_attach_and_unlock_inode+0x21/0x590 [ 1524.856449] filemap_fdatawrite_wbc+0x11d/0x190 [ 1524.857095] __filemap_fdatawrite_range+0xa7/0xe0 [ 1524.857686] ? delete_from_page_cache_batch+0x950/0x950 [ 1524.858274] ? do_faccessat+0x1d2/0x630 [ 1524.858855] ? kmem_cache_free+0xd3/0x3b0 [ 1524.859434] file_write_and_wait_range+0x92/0x100 [ 1524.860013] ext4_sync_file+0x18a/0x9b0 [ 1524.860587] do_fsync+0x38/0x70 [ 1524.861193] __x64_sys_fdatasync+0x32/0x50 [ 1524.861744] do_syscall_64+0x38/0x90 [ 1524.862284] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1524.862832] RIP: 0033:0x7f653cab073d [ 1524.863388] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 23 37 0d 00 f7 d8 64 89 01 48 [ 1524.864581] RSP: 002b:00007ffe69164218 EFLAGS: 00000217 ORIG_RAX: 000000000000004b [ 1524.865245] RAX: ffffffffffffffda RBX: 000055c5c050b720 RCX: 00007f653cab073d [ 1524.865868] RDX: 00007f653cab073d RSI: ffffffffffffff80 RDI: 0000000000000004 [ 1524.866499] RBP: 00007ffe69168b80 R08: 00007ffe69168c78 R09: 00007ffe69168c78 [ 1524.867134] R10: 00007ffe69168c78 R11: 0000000000000217 R12: 000055c5c050b0a0 [ 1524.867770] R13: 00007ffe69168c70 R14: 0000000000000000 R15: 0000000000000000 [ 1524.868404] </TASK> [ 1524.869048] Modules linked in: input_leds joydev serio_raw qemu_fw_cfg xfs autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear qxl drm_ttm_helper ttm drm_kms_helper hid_generic usbhid syscopyarea sysfillrect sysimgblt fb_sys_fops hid drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel psmouse crypto_simd cryptd [ 1524.871971] ---[ end trace 0000000000000000 ]--- [ 1524.872717] RIP: 0010:ext4_ext_insert_extent+0x3b8e/0x43d0 [ 1524.873715] Code: df ba f7 03 00 00 48 c7 c6 e0 9b f8 8d 41 be 8b ff ff ff e8 e4 20 12 00 e9 da f5 ff ff 4c 89 ff e8 a7 91 cf ff e9 a0 f4 ff ff <0f> 0b e8 1b 91 cf ff e9 49 f4 ff ff 44 0f b7 fa 50 49 c7 c1 e0 8f [ 1524.875375] RSP: 0018:ffff88812689f5f8 EFLAGS: 00010286 [ 1524.876222] RAX: 00000000ffffffff RBX: ffff888105227aa8 RCX: 0000000000000000 [ 1524.877110] RDX: ffffffffffffffff RSI: 0000000000017ef8 RDI: ffff888173197004 [ 1524.877964] RBP: ffff888105227986 R08: 0000000000000001 R09: ffffed1020af02a9 [ 1524.878829] R10: ffff888105781547 R11: ffffed1020af02a8 R12: 0000000000001013 [ 1524.879684] R13: ffff888103723c58 R14: ffff888173197000 R15: ffff888173197018 [ 1524.880561] FS: 00007f653cb8b540(0000) GS:ffff888293600000(0000) knlGS:0000000000000000 [ 1524.881491] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1524.882372] CR2: 000055c5c050c008 CR3: 000000011caae004 CR4: 0000000000370ef0 [ 1524.883270] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1524.884182] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1524.885084] ------------[ cut here ]------------ [ 1524.885972] WARNING: CPU: 0 PID: 1186 at kernel/exit.c:741 do_exit+0x1798/0x2740 [ 1524.886885] Modules linked in: input_leds joydev serio_raw qemu_fw_cfg xfs autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear qxl drm_ttm_helper ttm drm_kms_helper hid_generic usbhid syscopyarea sysfillrect sysimgblt fb_sys_fops hid drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel psmouse crypto_simd cryptd [ 1524.890831] CPU: 0 PID: 1186 Comm: tmp15 Tainted: G D 5.19.0-rc8 #1 [ 1524.891858] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1524.892918] RIP: 0010:do_exit+0x1798/0x2740 [ 1524.893959] Code: c0 74 08 3c 03 0f 8e cc 0c 00 00 8b 83 48 13 00 00 65 01 05 7a 59 aa 74 e9 92 fc ff ff 48 89 df e8 fd 77 28 00 e9 ec ee ff ff <0f> 0b e9 fa e8 ff ff 4c 89 e6 bf 05 06 00 00 e8 d4 72 02 00 e9 b2 [ 1524.896142] RSP: 0018:ffff88812689fe48 EFLAGS: 00010286 [ 1524.897245] RAX: 1ffff11024d12815 RBX: ffff888126893400 RCX: 0000000000000000 [ 1524.898357] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881268940a8 [ 1524.899473] RBP: ffff888126893400 R08: 0000000000000041 R09: ffffed1024d13000 [ 1524.900604] R10: ffff88829362848b R11: ffffed10526c5091 R12: 000000000000000b [ 1524.901737] R13: ffffffff8de22ac0 R14: ffff888126893400 R15: 0000000000000000 [ 1524.902863] FS: 00007f653cb8b540(0000) GS:ffff888293600000(0000) knlGS:0000000000000000 [ 1524.904014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1524.905156] CR2: 000055c5c050c008 CR3: 000000011caae004 CR4: 0000000000370ef0 [ 1524.906287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1524.907395] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1524.908495] Call Trace: [ 1524.909570] <TASK> [ 1524.910628] ? file_write_and_wait_range+0x92/0x100 [ 1524.911697] ? mm_update_next_owner+0x6e0/0x6e0 [ 1524.912777] ? ext4_sync_file+0x18a/0x9b0 [ 1524.913849] make_task_dead+0xb0/0xc0 [ 1524.914904] rewind_stack_and_make_dead+0x17/0x17 [ 1524.915952] RIP: 0033:0x7f653cab073d [ 1524.916963] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 23 37 0d 00 f7 d8 64 89 01 48 [ 1524.918998] RSP: 002b:00007ffe69164218 EFLAGS: 00000217 ORIG_RAX: 000000000000004b [ 1524.920032] RAX: ffffffffffffffda RBX: 000055c5c050b720 RCX: 00007f653cab073d [ 1524.921062] RDX: 00007f653cab073d RSI: ffffffffffffff80 RDI: 0000000000000004 [ 1524.922070] RBP: 00007ffe69168b80 R08: 00007ffe69168c78 R09: 00007ffe69168c78 [ 1524.923061] R10: 00007ffe69168c78 R11: 0000000000000217 R12: 000055c5c050b0a0 [ 1524.924051] R13: 00007ffe69168c70 R14: 0000000000000000 R15: 0000000000000000 [ 1524.925023] </TASK> [ 1524.925975] ---[ end trace 0000000000000000 ]---
If you are going to run some scripted tool to randomly corrupt the filesystem to find failures, then you have an ethical and moral responsibility to do some of the work to narrow down and identify the cause of the failure, not just throw them at someone to do all the work. --D
On Tue, Jul 26, 2022 at 01:10:24PM -0700, Darrick J. Wong wrote: > If you are going to run some scripted tool to randomly > corrupt the filesystem to find failures, then you have an > ethical and moral responsibility to do some of the work to > narrow down and identify the cause of the failure, not just > throw them at someone to do all the work. > > --D While I understand the frustration with the fuzzer bug reports like this I very much disagree with your statement about ethical and moral responsibility. The bug is in the code, it would have been there even if Wenqing Liu didn't run the tool. We know there are bugs in the code we just don't know where all of them are. Now, thanks to this report, we know a little bit more about at least one of them. That's at least a little useful. But you seem to argue that the reporter should put more work in, or not bother at all. That's wrong. Really, Wenqing Liu has no more ethical and moral responsibility than you finding and fixing the problem regardless of the bug report. I think the frustration comes from the fact that it's potentially a lot of work to untangle and fix the real problem and now when it is out there we feel obligated to fix it. And while bug reports and tools generating these can always be better and reporters can always be a bit more active in narrowing the problem down, you're of course free to ignore this until you, or anyone else, has a bit of spare time and energy to investigate. -Lukas
On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote: > On Tue, Jul 26, 2022 at 01:10:24PM -0700, Darrick J. Wong wrote: > > If you are going to run some scripted tool to randomly > > corrupt the filesystem to find failures, then you have an > > ethical and moral responsibility to do some of the work to > > narrow down and identify the cause of the failure, not just > > throw them at someone to do all the work. > > > > --D > > While I understand the frustration with the fuzzer bug reports like this > I very much disagree with your statement about ethical and moral > responsibility. > > The bug is in the code, it would have been there even if Wenqing Liu > didn't run the tool. Yes, but it's not just a bug. It's a format parser exploit. > We know there are bugs in the code we just don't > know where all of them are. Now, thanks to this report, we know a little > bit more about at least one of them. That's at least a little useful. > But you seem to argue that the reporter should put more work in, or not > bother at all. > > That's wrong. Really, Wenqing Liu has no more ethical and moral > responsibility than you finding and fixing the problem regardless of the > bug report. By this reasoning, the researchers that discovered RetBleed should have just published their findings without notify any of the affected parties. i.e. your argument implies they have no responsibility and hence are entitled to say "We aren't responsible for helping anyone understand the problem or mitigating the impact of the flaw - we've got our publicity and secured tenure with discovery and publication!" That's not _responsible disclosure_. Yup, this is important enough that we actually have a name for it: responsible disclosure. And where do those responsibilities come from? You guessed it - they are based on the ethics and morals that guide us towards doing what is best for the wider community. > I think the frustration comes from the fact that it's potentially a lot > of work to untangle and fix the real problem and now when it is out > there we feel obligated to fix it. And while bug reports and tools > generating these can always be better and reporters can always be a bit > more active in narrowing the problem down, you're of course free to > ignore this until you, or anyone else, has a bit of spare time and > energy to investigate. It has nothing to do with the amount of work, nor does it change the fact that us developers will need to do most of the work. The problem here is the lack of responsible disclosure that we see repeatedly with filesystem flaws found by fuzzing the on-disk format. Public reports like this require immediate work to determine the scope, impact and risk of the problem to decide what needs to be done next. All public disclosure does is start a race and force developers to have to address it immediately. Responsible disclosure gives developers a short window in which they can perform that analysis without fear that somebody might already be actively exploiting the problem discovered by the fuzzer. We can address the problem without extreme urgency, knowing a day or two while we wait for private discussion and bug fixing to take place isn't going to make things worse. That's the issue with drive-by fuzzer bug reporting - the people that do this have little clue about the potential impact of the flaws they are discovering. Those people need to be taught that their responsibility is not to through issues over the wall at other people, but to work closely with the people that can fix the issues to have a fix for the problem ready at the same time the issue is disclosed. IOWs, they have an ethical and moral responsibility to the wider community to disclose these issues to relevant developers in a responsible manner and work with them to fix the problems before the issues are made public. Once you look at filesystem fuzzing bugs from a security and exploit perspective, _everything changes_. Cheers, Dave.
On Thu, Jul 28, 2022 at 09:22:24AM +1000, Dave Chinner wrote: > On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote: > > While I understand the frustration with the fuzzer bug reports like this > > I very much disagree with your statement about ethical and moral > > responsibility. > > > > The bug is in the code, it would have been there even if Wenqing Liu > > didn't run the tool. > > i.e. your argument implies they have no responsibility and hence are > entitled to say "We aren't responsible for helping anyone understand > the problem or mitigating the impact of the flaw - we've got our > publicity and secured tenure with discovery and publication!" > > That's not _responsible disclosure_. So I'm going to disagree here. I understand that this is the XFS position, and so a few years back, the Georgia Tech folks who were responsible for Janus and Hydra decided not to engage with the XFS community and stopped reporting XFS bugs. They continued to engage with the ext4 community, and I found their reports to be helpful. We found and fixed quite a few bugs as a result of their work, and I sponsored them to get some research funding from Google so they could do more file system fuzzing work, because I thought their work was a useful contribution. I don't particularly worry about "responsible disclosure" because I don't consider fuzzed file system crashes to be a particularly serious security concern. There are some crazy container folks who think containers are just as secure(tm) as VM's, and who advocate allowing untrusted containers to mount arbitrary file system images and expect that this not cause the "host" OS to crash or get compromised. Those people are insane(tm), and I don't particularly worry about their use cases. If you have a Linux laptop with an automounter enabled it's possible that when you plug in a USB stick containing a corrupted file system, it could cause the system to crash. But that requires physical access to the machine, and if you have physical access, there is no shortage of problems you could cause in any case. > Public reports like this require immediate work to determine the > scope, impact and risk of the problem to decide what needs to be > done next. All public disclosure does is start a race and force > developers to have to address it immediately. Nope. I'll address these when I have time, and I don't consider them to be particularly urgent, for the reasons described above. I actually consider this fuzzer bug report to be particularly well-formed. Unlike Syzkaller, the file system image was in a separate file, and wasn't embedded in the reproducer.c file in a way that made it super-inconvenient to extract. Furthermore, like the Georgia Tech fuzzing reports, I appreciate that it was filed in Bugzilla, since it won't easily get lost, with all of the information that we need. In any case, I've taken a closer look at this report, and it's actually quite the interesting problem. The issue is that we have an non-leaf node in the extent tree where eh_entries header field is zero. This should never happen: debugfs: extents <16> Level Entries Logical Physical Length Flags 0/ 2 1/ 1 0 - 98030 9284 98031 1/ 2 1/ 0 0 - 98030 9282 98031 <====== ^^^ 2/ 2 1/ 84 0 - 0 9730 - 9730 1 2/ 2 2/ 84 5 - 7 9739 - 9741 3 2/ 2 3/ 84 16 - 17 9750 - 9751 2 2/ 2 4/ 84 26 - 26 9768 - 9768 1 2/ 2 5/ 84 36 - 36 9787 - 9787 1 This causes len to go negative in ext4_extent_insert_extent: [ 26.419401] ino 16 len -1 logical 98040 eh_entries 0 eh_max 84 depth 1 ... which is what triggers the BUG_ON(len < 0). What makes this particularly interesting is that neither the kernel *nor* e2fsck is flagging this extent tree as corrupt. So this is an opportunity to improve both the kernel as well as fsck.ext4. Again, it's not an *urgent* issue, but it is something that is worth trying to improve in ext4 from the perspective of improving the quality of our implementation. And since it's not an urgent issue, concerns of "responsble disclosure" don't arise, at least not in my opinion. - Ted
On Thu, Jul 28, 2022 at 09:22:24AM +1000, Dave Chinner wrote: > On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote: > > On Tue, Jul 26, 2022 at 01:10:24PM -0700, Darrick J. Wong wrote: > > > If you are going to run some scripted tool to randomly > > > corrupt the filesystem to find failures, then you have an > > > ethical and moral responsibility to do some of the work to > > > narrow down and identify the cause of the failure, not just > > > throw them at someone to do all the work. > > > > > > --D > > > > While I understand the frustration with the fuzzer bug reports like this > > I very much disagree with your statement about ethical and moral > > responsibility. > > > > The bug is in the code, it would have been there even if Wenqing Liu > > didn't run the tool. > > Yes, but it's not just a bug. It's a format parser exploit. And what do you think this is exploiting? A bug in a "format parser" perhaps? Are you trying both downplay it to not-a-bug and elevate it to 'security vulnerability' at the same time ? ;) > > > We know there are bugs in the code we just don't > > know where all of them are. Now, thanks to this report, we know a little > > bit more about at least one of them. That's at least a little useful. > > But you seem to argue that the reporter should put more work in, or not > > bother at all. > > > > That's wrong. Really, Wenqing Liu has no more ethical and moral > > responsibility than you finding and fixing the problem regardless of the > > bug report. > > By this reasoning, the researchers that discovered RetBleed > should have just published their findings without notify any of the > affected parties. > > i.e. your argument implies they have no responsibility and hence are > entitled to say "We aren't responsible for helping anyone understand > the problem or mitigating the impact of the flaw - we've got our > publicity and secured tenure with discovery and publication!" > > That's not _responsible disclosure_. Look, your entire argument hinges on the assumption that this is a security vulnerability that could be exploited and the report makes the situation worse. And that's very much debatable. I don't think it is and Ted described it very well in his comment. Asking for more information, or even asking reported to try to narrow down the problem is of course fine. But making sweeping claims about moral and ethical responsibilities is always a little suspicious and completely bogus in this case IMO. -Lukas
On Thu, Jul 28, 2022 at 09:25:10AM +0200, Lukas Czerner wrote: > On Thu, Jul 28, 2022 at 09:22:24AM +1000, Dave Chinner wrote: > > On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote: > > > On Tue, Jul 26, 2022 at 01:10:24PM -0700, Darrick J. Wong wrote: > > > > If you are going to run some scripted tool to randomly > > > > corrupt the filesystem to find failures, then you have an > > > > ethical and moral responsibility to do some of the work to > > > > narrow down and identify the cause of the failure, not just > > > > throw them at someone to do all the work. > > > > > > > > --D > > > > > > While I understand the frustration with the fuzzer bug reports like this > > > I very much disagree with your statement about ethical and moral > > > responsibility. > > > > > > The bug is in the code, it would have been there even if Wenqing Liu > > > didn't run the tool. > > > > Yes, but it's not just a bug. It's a format parser exploit. > > And what do you think this is exploiting? A bug in a "format parser" > perhaps? > > Are you trying both downplay it to not-a-bug and elevate it to 'security > vulnerability' at the same time ? ;) How did you come to that conclusion? "not just a bug" != "not a bug". i.e. I said the complete opposite of what your comment implies I said... > > > We know there are bugs in the code we just don't > > > know where all of them are. Now, thanks to this report, we know a little > > > bit more about at least one of them. That's at least a little useful. > > > But you seem to argue that the reporter should put more work in, or not > > > bother at all. > > > > > > That's wrong. Really, Wenqing Liu has no more ethical and moral > > > responsibility than you finding and fixing the problem regardless of the > > > bug report. > > > > By this reasoning, the researchers that discovered RetBleed > > should have just published their findings without notify any of the > > affected parties. > > > > i.e. your argument implies they have no responsibility and hence are > > entitled to say "We aren't responsible for helping anyone understand > > the problem or mitigating the impact of the flaw - we've got our > > publicity and secured tenure with discovery and publication!" > > > > That's not _responsible disclosure_. > > Look, your entire argument hinges on the assumption that this is a > security vulnerability that could be exploited and the report makes the > situation worse. And that's very much debatable. I don't think it is and > Ted described it very well in his comment. On systems that automount filesytsems when you plug in a USB drive (which most distros do out of the box) then a crash bug during mount is, at minimum, an annoying DOS vector. And if it can result in a buffer overflow, then.... > Asking for more information, or even asking reported to try to narrow > down the problem is of course fine. Sure, nobody is questioning how we triage these issues - the question is over how they are reported and the forum under which the initial triage takes place > But making sweeping claims about > moral and ethical responsibilities is always a little suspicious and > completely bogus in this case IMO. Hand waving away the fact that fuzzer crash bugs won't be a security issue without having done any investigation is pretty much the whole problem here. This is not responsible behaviour. Cheers, Dave.
On Tue, Aug 02, 2022 at 08:45:51AM +1000, Dave Chinner wrote: > > On systems that automount filesytsems when you plug in a USB drive > (which most distros do out of the box) then a crash bug during mount > is, at minimum, an annoying DOS vector. And if it can result in a > buffer overflow, then.... You need physical access to plug in a USB drive, and if you can do that, the number of potential attack vectors are numerous. eSATA, Firewire, etc., gives the potential hardware device direct access to the PCI bus and the ability to issue arbitrary DMA requests. Badly implemented Thunderbolt devices can have the same vulnerability, and badly implemented USB controllers have their own entertaining issues. And if attackers have a bit more unguarded physical access time, there are no shortage of "evil maid" attacks that can be carried out. As far as I'm concerned a secure system has automounters disabled, and comptent distributions should disable the automounter when the laptop is locked. Enterprise class server class machines running enterprise distros have no business having the automounter enabled at all, and careful datacenter managers should fill in the USB ports with epoxy. For more common sense tips, see: https://www.youtube.com/watch?v=kd33UVZhnAA Look, bad buys have the time and energy to run file system fuzzers (many of which are open source and can be easily found on github). I'm sure our good friends at the NSA, MSS, and KGB know all of this already; and the NSO group is apparently happy to make them available to anyone willing to pay, no matter what their human rights record might be. Security by obscurity never works, and as far as I am concerned, I am grateful when academics run fuzzers and report bugs to us. Especially since attacks which require physical access or root privs are going to have low CVE Security Scores *anyway*. Cheers, - Ted
On Wed, Jul 27, 2022 at 10:46:53PM -0400, Theodore Ts'o wrote: > On Thu, Jul 28, 2022 at 09:22:24AM +1000, Dave Chinner wrote: > > On Wed, Jul 27, 2022 at 01:53:07PM +0200, Lukas Czerner wrote: > > > While I understand the frustration with the fuzzer bug reports like this > > > I very much disagree with your statement about ethical and moral > > > responsibility. > > > > > > The bug is in the code, it would have been there even if Wenqing Liu > > > didn't run the tool. > > > > i.e. your argument implies they have no responsibility and hence are > > entitled to say "We aren't responsible for helping anyone understand > > the problem or mitigating the impact of the flaw - we've got our > > publicity and secured tenure with discovery and publication!" > > > > That's not _responsible disclosure_. > > So I'm going to disagree here. I understand that this is the XFS > position, Nope, nothing to do with XFS here - addressing how filesystem fuzzing is approached and reported this is much wider engineering and security process problem. > and so a few years back, the Georgia Tech folks who were > responsible for Janus and Hydra decided not to engage with the XFS > community and stopped reporting XFS bugs. That is at odds with the fact they engaged us repeatedly over a period of 6 months to report and fix all the bugs the Janus framework found. Indeed, the Acknowledgements from the Janus paper read: "We thank the anonymous reviewers, and our shepherd, Thorsten Holz, for their helpful feedback. We also thank all the file system developers, including Theodore Ts’o, Darrick J. Wong, Dave Chinner, Eric Sandeen, Chao Yu, Wenruo Qu and Ernesto A. Fernández for handling our bug reports." Yup, there we all are - ext4, XFS and btrfs all represented. And, well, we didn't form the opinion that fuzzer bugs should be disclosed responsibly until early 2021. The interactions with GATech researchers running the Janus project was back in 2018 and we addressed all their bug reports quickly and with a minimum of fuss. It's somewhat disingenious to claim that a policy taht wasn't formulated until 2021 had a fundamental influence on decisions made in late 2018.... > They continued to engage > with the ext4 community, and I found their reports to be helpful. We > found and fixed quite a few bugs as a result of their work, Yup, same with XFS - we fixed them all pretty quickly, and even so still had half a dozen CVEs raised against those XFS bugs post-humously by the linux security community. And I note that ext4 also had about a dozen CVEs raised against the bugs that Janus found... I'll also quote from the Hydra paper on their classification of the bugs they were trying to uncover: "Memory errors (ME). Memory errors are common in file systems. Due to their high security impact, [...]" The evidence at hand tells us that filesystem fuzzer bugs have security implications. Hence we need to treat them accordingly. > and I > sponsored them to get some research funding from Google so they could > do more file system fuzzing work, because I thought their work was a > useful contribution. I guess the funding you are talking about is for the Hydra paper that GATech published later in 2019? The only upstream developer mentioned in the acknowledgements is you, and I also note that funding from Google is disclosed, too. True, they didn't engage with upstream XFS at all during that work, or since, but I think there's a completely different reason to what you are implying... i.e., I don't think the "not engaging with upstream XFS" has anything to do with reporting and fixing the bugs of the Janus era. To quote the Hydra paper, from the "experimental setup" section: "We also tested XFS, GFS2, HFS+, ReiserFS, and VFAT, but found only memory-safety bugs." Blink and you miss it, yet it's possibly the most important finding in the paper: Hydra didn't find any crash inconsistencies, no logic bugs, nor any POSIX spec violations in XFS. IOWs, Hydra didn't find any of the problems the fuzzer was supposed to find in the filesystems it was run on. There was simply nothing to report to upstream XFS, and nothing to write about in the paper. It's hardly a compelling research paper that reports "new algorithm found no new bugs at all". Yet that's what the result was with Hydra on XFS. Let's consider that finding in the wider context of academia looking into new filesystem fuzzing techniques. If you have a filesystem that is immune to fuzzing, then it doesn't really help you prove that you've advanced the start of the fuzzing art, does it? Hence a filesystem that is becoming largely immmune to randomised fuzzing techniques then becomes the least appealing research target for filesystem fuzzing. If a new fuzzer can't find bugs in a complex filesystem that we all know is full of bugs, it doesn't make for very compelling research, does it? Indeed, the Hydra paper spends a lot of time at the start explaining how fstests doesn't exercise filesysetms using semantic fuzzer techniques that can be used to discover format corruption bugs. However, it ignores the fact that fstests contains extensive directed structure corruption based fuzzing tests for XFS. This is one of the reasons why Hydra didn't find any new format fuzzing bugs - it's semantic algorithms and crafted images didn't exercise the XFS in a way that wasn't already covered by fstests. IOWs, if a new fuzzer isn't any better than what we already have in fstests, then the new fuzzer research is going to come up a great big donut on XFS, as we see with Hydra. Hence, if we are seeing researchers barely mention XFS because their new technique is not finding bugs in XFS, and we see them instead focus on ext4, btrfs, and other filesystems that do actually crash or have inconsistencies, what does that say about how XFS developers have been going about fuzz testing and run-time on-disk format validation? What does that say about ext4, f2fs, btrfs, etc? What does that say about the researcher's selective presentation of the results? IOWs, the lack of upstream XFS community engagement from fuzzer researchers has nothing to do with interactions with the XFS community - it has everything to do with the fact we've raised the bar higher than new fuzzer projects can reach in a short period of time. If the research doesn't bear fruit on XFS, then the researchers have no reason to engage the upstream community during the course of their research..... The bottom line is that we want filesystems to be immune to fuzzer fault injection. Hence if XFS is doing better at rejecting fuzzed input than other linux filesystems, then perhaps the XFS developers are doing something right and perhaps there's something to the approach they take and processes they have brought to filesystem fuzzing. The state of the art is not always defined by academic research papers.... > I don't particularly worry about "responsible disclosure" because I > don't consider fuzzed file system crashes to be a particularly serious > security concern. There are some crazy container folks who think > containers are just as secure(tm) as VM's, and who advocate allowing > untrusted containers to mount arbitrary file system images and expect > that this not cause the "host" OS to crash or get compromised. Those > people are insane(tm), and I don't particularly worry about their use > cases. They may be "crazy container" use cases, but anything we can do to make that safer is a good thing. But if the filesystem crashes or has a bug that can be exploited during the mount process.... > If you have a Linux laptop with an automounter enabled it's possible > that when you plug in a USB stick containing a corrupted file system, > it could cause the system to crash. But that requires physical access > to the machine, and if you have physical access, there is no shortage > of problems you could cause in any case. Yes, the real issue is that distros automount filesystems with "noexec,nosuid,nodev". They use these mount options so that the OS protects against trojanned permissions and binaries on the untrusted filesystem, thereby preventing most of the vectors an untrusted filesystem can use to subvert the security of the system without the user first making an explicit choice to allow the system to run untrusted code. But exploiting an automoutner does not require physical access at all. Anyone who says this is ignoring the elephant in the room: supply chain attacks. All it requires is a supply chain to be subverted somehere, and now the USB drive that contains the drivers for your special hardware from a manufacturer you trust (and with manufacturer trust/anti-tamper seals intact) now powns your machine when you plug it in. Did the user do anything wrong? No, not at all. But they could have a big problem if filesystem developers don't care about threat models like subverted supply chains and leave the door wide open even when the user does all the right things... > > Public reports like this require immediate work to determine the > > scope, impact and risk of the problem to decide what needs to be > > done next. All public disclosure does is start a race and force > > developers to have to address it immediately. > > Nope. I'll address these when I have time, and I don't consider them > to be particularly urgent, for the reasons described above. Your choice, but.... > I actually consider this fuzzer bug report to be particularly > well-formed. .... that's not the issue here, and .... > In any case, I've taken a closer look at this report, and it's .... regardless of whether you consider it urgent or not, you have now gone out of your way to determine the risk the reported problem now poses..... > Again, it's not an *urgent* issue, .... and so finally we have an answer to the risk and scope question. This should have been known before the bug was made public. Giving developers a short window to determine the scope of the problem before it is made public avoids all the potential problems of the corruption bug having system security implications. It generally doesn't take long to determine this (especially when the reporter has a reproducer), but it needs to be done *before* the flaw is made public... Anything that can attract a CVE (and filesystem fuzzer bugs do, indeed, attract CVEs) needs to be treated as a potential security issue, not as a normal bug. Cheers, Dave.
On Tue, Aug 02, 2022 at 08:45:51AM +1000, Dave Chinner wrote: --- snip --- > > > > Look, your entire argument hinges on the assumption that this is a > > security vulnerability that could be exploited and the report makes the > > situation worse. And that's very much debatable. I don't think it is and > > Ted described it very well in his comment. > > On systems that automount filesytsems when you plug in a USB drive > (which most distros do out of the box) then a crash bug during mount > is, at minimum, an annoying DOS vector. And if it can result in a > buffer overflow, then.... > > > Asking for more information, or even asking reported to try to narrow > > down the problem is of course fine. > > Sure, nobody is questioning how we triage these issues - the > question is over how they are reported and the forum under which the > initial triage takes place > > > But making sweeping claims about > > moral and ethical responsibilities is always a little suspicious and > > completely bogus in this case IMO. > > Hand waving away the fact that fuzzer crash bugs won't be a security > issue without having done any investigation is pretty much the whole > problem here. This is not responsible behaviour. Since it's obvious that the security status of this is disputed, then please feel free to create guidelines stating that fuzzer bugs for xfs are considered a security issues and reporters should follow guidelines of responsible disclosure and bugs are not to be reported publicly. Problem solved and no moralizing needed. -Lukas > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com >
Hi, On 2022/8/2 11:25, Dave Chinner wrote: >> I don't particularly worry about "responsible disclosure" because I >> don't consider fuzzed file system crashes to be a particularly serious >> security concern. There are some crazy container folks who think >> containers are just as secure(tm) as VM's, and who advocate allowing >> untrusted containers to mount arbitrary file system images and expect >> that this not cause the "host" OS to crash or get compromised. Those >> people are insane(tm), and I don't particularly worry about their use >> cases. > > They may be "crazy container" use cases, but anything we can do to > make that safer is a good thing. > > > But if the filesystem crashes or has a bug that can be exploited > during the mount process.... > I think filesystem-safety is very import to consumer devices like computers or smartphones, at least for those filesystems designed for (or widely used for) data exchange, like fat and exfat. Please see my comments below. On the other hand, filesystem designed for internal use like ext4 or xfs can ignore deliberate manipulation but users still expect they can deal with random errors, e.g. you don't want whole file server down because of single faulty disk. And this has nothing to do with containers. >> If you have a Linux laptop with an automounter enabled it's possible >> that when you plug in a USB stick containing a corrupted file system, >> it could cause the system to crash. But that requires physical access >> to the machine, and if you have physical access, there is no shortage >> of problems you could cause in any case. > > Yes, the real issue is that distros automount filesystems with > "noexec,nosuid,nodev". They use these mount options so that the OS > protects against trojanned permissions and binaries on the untrusted > filesystem, thereby preventing most of the vectors an untrusted > filesystem can use to subvert the security of the system without the > user first making an explicit choice to allow the system to run > untrusted code. > > But exploiting an automoutner does not require physical access at > all. Anyone who says this is ignoring the elephant in the room: > supply chain attacks.guarantee > > All it requires is a supply chain to be subverted somehere, and now > the USB drive that contains the drivers for your special hardware > from a manufacturer you trust (and with manufacturer > trust/anti-tamper seals intact) now powns your machine when you plug > it in. > > Did the user do anything wrong? No, not at all. But they could > have a big problem if filesystem developers don't care about > threat models like subverted supply chains and leave the door wide > open even when the user does all the right things... > Yes, an attack need physical access doesn't means the attacker need physical access. USB sticks (or more generally, external storage devices), is still a very important way to exchange data between computers (and/or smart devices), although it's not as common as before. No safe guarantee here means there is no way to even read untrusted filesystems without using virtual machines / DMZ machines. Thus, using untrusted filesystems natively will become "give root privilege to those who wrote to that filesystem". That makes me recall the nightmare of autorun.inf worms on Windows platforms. I think no user/vendor really want this. At least I'm sure it would be scandal for Tesla if their cars can be hacked by inserting a USB stick. Best Regards, Zhang Boyang
I think that, with commit 29a5b8a137ac ("ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0") merged, this bug can be closed.
Yep, thanks for the patch!