Bug 205713

Summary: Linux Kernel 4.19.83 - debugfs_remove use-after-free
Product: File System Reporter: Tristan Madani (tristmd)
Component: OtherAssignee: fs_other
Status: RESOLVED CODE_FIX    
Severity: normal CC: mcgrof, sandeen, zhangbl1105
Priority: P5    
Hardware: All   
OS: Linux   
Kernel Version: 4.19.83 Subsystem:
Regression: No Bisected commit-id:

Description Tristan Madani 2019-11-29 17:37:46 UTC
LINUX KERNEL 4.19.83 - debugfs_remove use-after-free

0x01 - Introduction
===

# Product: Linux Kernel 
# Version: 4.19.83 (and probably other versions)
# Bug: UAF (Read)
# Tested on: GNU/Linux Debian 9 x86_64


0x02 - Details
===

There is a UAF read in the debugfs_remove function which is used to removes a file or directory in 
debugfs that was previously created with a call to another debugfs function (like debugfs_create_file())

Code analysis (fs/debugfs/inode.c):

void debugfs_remove(struct dentry *dentry)
{
	struct dentry *parent;
	int ret;

	if (IS_ERR_OR_NULL(dentry))
		return;

	parent = dentry->d_parent; 		// UAF occurs here
	inode_lock(d_inode(parent));
	ret = __debugfs_remove(dentry, parent);
	inode_unlock(d_inode(parent));
	if (!ret)
		simple_release_fs(&debugfs_mount, &debugfs_mount_count);
}
../..


0x03 - Crash report
===

hpet1: lost 1 rtc interrupts
==================================================================
BUG: KASAN: use-after-free in debugfs_remove+0xfd/0x120 fs/debugfs/inode.c:687
Read of size 8 at addr ffff8880275e70c0 by task kworker/1:1/24

CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 4.19.83 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: events __blk_release_queue
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x138/0x22e lib/dump_stack.c:113
 print_address_description+0x67/0x26a mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report mm/kasan/report.c:412 [inline]
 kasan_report.cold.7+0x241/0x2fe mm/kasan/report.c:396
 debugfs_remove+0xfd/0x120 fs/debugfs/inode.c:687
 blk_trace_free+0x31/0x130 kernel/trace/blktrace.c:312
 blk_trace_cleanup kernel/trace/blktrace.c:339 [inline]
 __blk_trace_remove+0x70/0xa0 kernel/trace/blktrace.c:352
 blk_trace_shutdown+0x5e/0x80 kernel/trace/blktrace.c:752
 __blk_release_queue+0x21a/0x4b0 block/blk-sysfs.c:855
 process_one_work+0xbcd/0x1b50 kernel/workqueue.c:2153
 worker_thread+0x176/0x1200 kernel/workqueue.c:2296
 kthread+0x347/0x410 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

Allocated by task 4702:			// <--- where it has been allocated
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc mm/kasan/kasan.c:553 [inline]
 kasan_kmalloc+0xbf/0xe0 mm/kasan/kasan.c:531
 slab_post_alloc_hook mm/slab.h:445 [inline]
 slab_alloc_node mm/slub.c:2706 [inline]
 slab_alloc mm/slub.c:2714 [inline]
 kmem_cache_alloc+0xef/0x290 mm/slub.c:2719
 __d_alloc+0xbe/0xc60 fs/dcache.c:1610
 d_alloc+0x8f/0x300 fs/dcache.c:1694
 d_alloc_parallel+0x149/0x1c70 fs/dcache.c:2441
 __lookup_slow+0x1e6/0x510 fs/namei.c:1654
 lookup_one_len+0x1bd/0x200 fs/namei.c:2543
 start_creating fs/debugfs/inode.c:313 [inline]
 start_creating+0xbf/0x1f0 fs/debugfs/inode.c:289
 __debugfs_create_file+0x5e/0x3f0 fs/debugfs/inode.c:352
 do_blk_trace_setup+0x427/0xc80 kernel/trace/blktrace.c:528
 __blk_trace_setup+0xbc/0x170 kernel/trace/blktrace.c:577
 blk_trace_ioctl+0x161/0x2d0 kernel/trace/blktrace.c:716
 blkdev_ioctl+0x3fa/0x1fa0 block/ioctl.c:587
 block_ioctl+0xe9/0x130 fs/block_dev.c:1891
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:501 [inline]
 do_vfs_ioctl+0x1c6/0x16f0 fs/ioctl.c:688
 ksys_ioctl+0x9b/0xc0 fs/ioctl.c:705
 __do_sys_ioctl fs/ioctl.c:712 [inline]
 __se_sys_ioctl fs/ioctl.c:710 [inline]
 __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:710
 do_syscall_64+0x167/0x6c0 arch/x86/entry/common.c:293
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2115:					// <--- where it has been freed
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
 slab_free_hook mm/slub.c:1371 [inline]
 slab_free_freelist_hook mm/slub.c:1398 [inline]
 slab_free mm/slub.c:2953 [inline]
 kmem_cache_free+0xc8/0x2e0 mm/slub.c:2969
 __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
 rcu_do_batch kernel/rcu/tree.c:2584 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2897 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:2864 [inline]
 rcu_process_callbacks+0x133a/0x3350 kernel/rcu/tree.c:2881
 __do_softirq+0x2ae/0xa09 kernel/softirq.c:292

The buggy address belongs to the object at ffff8880275e7080
 which belongs to the cache dentry of size 288			// <--- the object's cache
The buggy address is located 64 bytes inside of
 288-byte region [ffff8880275e7080, ffff8880275e71a0)
The buggy address belongs to the page:
page:ffffea00009d7980 count:1 mapcount:0 mapping:ffff888035a5cc00 index:0xffff8880275e74a0 compound_mapcount: 0
flags: 0x100000000008100(slab|head)
raw: 0100000000008100 ffffea00009d5f08 ffffea00009d2088 ffff888035a5cc00
raw: ffff8880275e74a0 000000000017000a 00000001ffffffff 0000000000000000
hpet1: lost 1 rtc interrupts
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880275e6f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880275e7000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8880275e7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff8880275e7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880275e7180: fb fb fb fb fc fc fc fc fc fc fc fc 00 00 00 00
==================================================================
Comment 1 zhangbl1105 2020-03-11 08:21:32 UTC
Does someone know how to reproduce this issue? thanks.
Comment 2 Luis Chamberlain 2020-04-02 16:25:13 UTC
Thanks for the bug report! However the CVE assigned to this over incorrect by stating the severity of this bug report is in the core kernel degugfs_remove() with an intrinsic use-after-free on the core debugfs code in the kernel. We would like to dispute this CVE. Patches being reviewed upstream and the respective reproducer:

https://lkml.kernel.org/r/20200402000002.7442-1-mcgrof@kernel.org               
https://github.com/mcgrof/break-blktrace
Comment 3 Eric Sandeen 2020-04-15 17:40:29 UTC
I'd also be interested to know if Tristan's original bug was hit while using the loop driver or some oether block device.

Frankly, bugs like this with so little information are very difficult to work with - it's tough to reverse-engineer what may have happened when that information is not provided.
Comment 4 Luis Chamberlain 2020-04-21 02:02:45 UTC
To remove clutter from the commit log that will be used, I will provide below a bit of explanation of the crash, and provide a reference to here for those curious of the elaborate details of the analysis.

This will be using a new crash, but he same applies to v4.19.83 where the bug was reported on.

    Below is the splat explained with a bit more details, explaining
    what is happening in userspace, kernel, and a print of the CPU on,
    which the code runs on:
    
    load loopback module
    [   13.603371] == blk_mq_debugfs_register(12) start
    [   13.604040] == blk_mq_debugfs_register(12) q->debugfs_dir created
    [   13.604934] == blk_mq_debugfs_register(12) end
    [   13.627382] == blk_mq_debugfs_register(12) start
    [   13.628041] == blk_mq_debugfs_register(12) q->debugfs_dir created
    [   13.629240] == blk_mq_debugfs_register(12) end
    [   13.651667] == blk_mq_debugfs_register(12) start
    [   13.652836] == blk_mq_debugfs_register(12) q->debugfs_dir created
    [   13.655107] == blk_mq_debugfs_register(12) end
    [   13.684917] == blk_mq_debugfs_register(12) start
    [   13.687876] == blk_mq_debugfs_register(12) q->debugfs_dir created
    [   13.691588] == blk_mq_debugfs_register(13) end
    [   13.707320] == blk_mq_debugfs_register(13) start
    [   13.707863] == blk_mq_debugfs_register(13) q->debugfs_dir created
    [   13.708856] == blk_mq_debugfs_register(13) end
    [   13.735623] == blk_mq_debugfs_register(13) start
    [   13.736656] == blk_mq_debugfs_register(13) q->debugfs_dir created
    [   13.738411] == blk_mq_debugfs_register(13) end
    [   13.763326] == blk_mq_debugfs_register(13) start
    [   13.763972] == blk_mq_debugfs_register(13) q->debugfs_dir created
    [   13.765167] == blk_mq_debugfs_register(13) end
    [   13.779510] == blk_mq_debugfs_register(13) start
    [   13.780522] == blk_mq_debugfs_register(13) q->debugfs_dir created
    [   13.782338] == blk_mq_debugfs_register(13) end
    [   13.783521] loop: module loaded

    LOOP_CTL_DEL(loop0) #1
    [   13.803550] = __blk_release_queue(4) start
    [   13.807772] == blk_trace_shutdown(4) start
    [   13.810749] == blk_trace_shutdown(4) end
    [   13.813437] = __blk_release_queue(4) calling blk_mq_debugfs_unregister()
    [   13.817593] ==== blk_mq_debugfs_unregister(4) begin
    [   13.817621] ==== blk_mq_debugfs_unregister(4) debugfs_remove_recursive(q->debugfs_dir)
    [   13.821203] ==== blk_mq_debugfs_unregister(4) end q->debugfs_dir is NULL
    [   13.826166] = __blk_release_queue(4) blk_mq_debugfs_unregister() end
    [   13.832992] = __blk_release_queue(4) end
    
    LOOP_CTL_ADD(loop0) #1
    [   13.843742] == blk_mq_debugfs_register(7) start
    [   13.845569] == blk_mq_debugfs_register(7) q->debugfs_dir created
    [   13.848628] == blk_mq_debugfs_register(7) end
    
    BLKTRACE_SETUP(loop0) #1
    [   13.850924] == blk_trace_ioctl(7, BLKTRACESETUP) start
    [   13.852852] === do_blk_trace_setup(7) start
    [   13.854580] === do_blk_trace_setup(7) creating directory
    [   13.856620] === do_blk_trace_setup(7) using what debugfs_lookup() gave
    [   13.860635] === do_blk_trace_setup(7) end with ret: 0
    [   13.862615] == blk_trace_ioctl(7, BLKTRACESETUP) end
    
    LOOP_CTL_DEL(loop0) #2
    [   13.883304] = __blk_release_queue(7) start
    [   13.885324] == blk_trace_shutdown(7) start
    [   13.887197] == blk_trace_shutdown(7) calling __blk_trace_remove()
    [   13.889807] == __blk_trace_remove(7) start
    [   13.891669] === blk_trace_cleanup(7) start
    [   13.911656] ====== blk_trace_free(7) start

    This is a scheduled __blk_release_queue(). Note that at this point
    we know that the old device is gone as the gendisk removal is
    synchronous.
    
    LOOP_CTL_ADD(loop0) #2
    [   13.912709] == blk_mq_debugfs_register(2) start
    
    This is a *new* device, as we know the last one was removed synchronously.
    
    ---> From LOOP_CTL_DEL(loop0) #2
    [   13.915887] ====== blk_trace_free(7) end
    
    ---> From LOOP_CTL_ADD(loop0) #2
    [   13.918359] debugfs: Directory 'loop0' with parent 'block' already present!
    
    Above we have a case where the upon LOOP_CTL_ADD(loop) #2 the
    debugfs_create_dir() start_creating() call will have found an existing dentry,
    and return ERR_PTR(-EEXIST), and so the q->debugfs_dir will be initialized to
    NULL. This clash happens because an asynchronous __blk_release_queue() has
    been scheduled from the above LOOP_CTL_DEL(loop0) #2 call, and it has not yet
    removed the old loop0 debugfs directory.
    
    [   13.926433] == blk_mq_debugfs_register(2) q->debugfs_dir created
    
    This is a huge assumption, since we never check for errors on
    blk_mq_debugfs_register(), and so the child files we need to
    create *can* simply be created using NULL as as the parent, in
    which case debugfs would use debugfs_mount->mnt_root as the parent.
    
    [   13.930373] == blk_mq_debugfs_register(2) end
    
    BLKTRACE_SETUP(loop0) #2
    [   13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
    [   13.936758] === do_blk_trace_setup(2) start
    [   13.938944] === do_blk_trace_setup(2) creating directory
    [   13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave

    Here debugfs_lookup() will have found the old debugfs directory, even
    though q->debugfs_dir was empty when initializing this device. So the
    files created can actually be created where they should have.
    
    ---> From LOOP_CTL_DEL(loop0) #2
    [   13.971046] === blk_trace_cleanup(7) end
    [   13.973175] == __blk_trace_remove(7) end
    [   13.975352] == blk_trace_shutdown(7) end
    [   13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
    [   13.980645] ==== blk_mq_debugfs_unregister(7) begin
    [   13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
    
    This is a continuation from the old scheduled  __blk_release_queue(). It
    is using another request_queue structure than what the currently activated
    blktrace setup is using.
    
    Here now the dentry where the blktrace setup files were created on top
    of is gone.  In fact, since debugfs_remove_recursive() is used, *all* of
    those blktrace debugfs files should have been removed as well, including
    the relay. This is despite the fact that the removal and blktrace setup
    are working on two different request_queue structures.
    
    [   13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
    [   13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
    [   13.993155] = __blk_release_queue(7) end
    
    The scheduled __blk_release_queue completes finally.
    
    ---> From BLKTRACE_SETUP(loop0) #2
    [   13.995928] === do_blk_trace_setup(2) end with ret: 0
    [   13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end
    
    At this point the old blktrace *setup* is complete though, and let us
    recall that the old dentry is still used. But recall that we just had
    our old device removed, and since debugfs_remove_recursive() was used,
    so are all of its files. This request_queue however is still active.

    LOOP_CTL_DEL(loop0) #3
    [   14.035119] = __blk_release_queue(2) start
    [   14.036925] == blk_trace_shutdown(2) start
    [   14.038518] == blk_trace_shutdown(2) calling __blk_trace_remove()
    [   14.040829] == __blk_trace_remove(2) start
    [   14.042413] === blk_trace_cleanup(2) start
    
    This removal will cleanup the blktrace setup. Note the trap though,
    the blktrace setup will already have been cleaned up since the
    scheduled removal from instance #2 removed the files on the same
    debugfs path.
    
    The __blk_release_queue() is scheduled and is asynchronous so can
    continue later, and that is where the crash happens, on CPU 2.
    
    LOOP_CTL_ADD(loop0) #3
    [   14.072522] == blk_mq_debugfs_register(6) start
    
    ---> From LOOP_CTL_DEL(loop0) #3
    [   14.075151] ====== blk_trace_free(2) start
    
    ---> From LOOP_CTL_ADD(loop0) #3
    [   14.075882] == blk_mq_debugfs_register(6) q->debugfs_dir created
    
    ---> From LOOP_CTL_DEL(loop0) #3
    The panic happens on CPU 2 when we continue with the old scheduled
    removal of the request_queue, which will run into a blktrace setup
    with NULL pointers.
    
    [   14.078624] BUG: kernel NULL pointer dereference, address: 00000000000000a0
    [   14.084332] == blk_mq_debugfs_register(6) end
    [   14.086971] #PF: supervisor write access in kernel mode
    [   14.086974] #PF: error_code(0x0002) - not-present page
    [   14.086977] PGD 0 P4D 0
    [   14.086984] Oops: 0002 [#1] SMP NOPTI
    [   14.086990] CPU: 2 PID: 287 Comm: kworker/2:2 Tainted: G            E 5.6.0-next-20200403+ #54
    [   14.086991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
    [   14.087002] Workqueue: events __blk_release_queue
    [   14.087011] RIP: 0010:down_write+0x15/0x40
    [   14.090300] == blk_trace_ioctl(6, BLKTRACESETUP) start
    [   14.093277] Code: eb ca e8 3e 34 8d ff cc cc cc cc cc cc cc cc cc cc
    cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00
    00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d
    [   14.093280] RSP: 0018:ffffc28a00533da8 EFLAGS: 00010246
    [   14.093284] RAX: 0000000000000000 RBX: ffff9f7a24d07980 RCX: ffffff8100000000
    [   14.093286] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
    [   14.093287] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000019
    [   14.093289] R10: 0000000000000774 R11: 0000000000000000 R12: 0000000000000000
    [   14.093291] R13: ffff9f7a2fab0400 R14: ffff9f7a21dd1140 R15: 00000000000000a0
    [   14.093294] FS:  0000000000000000(0000) GS:ffff9f7a2fa80000(0000) knlGS:0000000000000000
    [   14.093296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [   14.093298] CR2: 00000000000000a0 CR3: 00000004293d2003 CR4: 0000000000360ee0
    [   14.093307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [   14.093308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [   14.093310] Call Trace:
    [   14.093324]  simple_recursive_removal+0x4e/0x2e0
    [   14.093330]  ? debugfs_remove+0x60/0x60
    [   14.093334]  debugfs_remove+0x40/0x60
    [   14.093339]  blk_trace_free+0x20/0x70
    [   14.093346]  __blk_trace_remove+0x54/0x90
    [   14.096704] === do_blk_trace_setup(6) start
    [   14.098534]  blk_trace_shutdown+0x74/0x80
    [   14.100958] === do_blk_trace_setup(6) creating directory
    [   14.104575]  __blk_release_queue+0xbe/0x160
    [   14.104580]  process_one_work+0x1b4/0x380
    [   14.104585]  worker_thread+0x50/0x3c0
    [   14.104589]  kthread+0xf9/0x130
    [   14.104593]  ? process_one_work+0x380/0x380
    [   14.104596]  ? kthread_park+0x90/0x90
    [   14.104599]  ret_from_fork+0x1f/0x40
    [   14.104603] Modules linked in: loop(E) xfs(E) libcrc32c(E)
    crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) joydev(E)
    serio_raw(E) aesni_intel(E) glue_helper(E) virtio_balloon(E) evdev(E)
    crypto_simd(E) pcspkr(E) cryptd(E) i6300esb(E) button(E) ip_tables(E)
    x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E)
    jbd2(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E)
    ata_generic(E) uhci_hcd(E) ata_piix(E) ehci_hcd(E) nvme(E) libata(E)
    crc32c_intel(E) usbcore(E) psmouse(E) nvme_core(E) virtio_pci(E)
    scsi_mod(E) virtio_ring(E) t10_pi(E) virtio(E) i2c_piix4(E) floppy(E)
    [   14.107400] === do_blk_trace_setup(6) using what debugfs_lookup() gave
    [   14.108939] CR2: 00000000000000a0
    [   14.110589] === do_blk_trace_setup(6) end with ret: 0
    [   14.111592] ---[ end trace 7a783b33b9614db9 ]---
    
    The root cause to this issue is that debugfs_lookup() can find a
    previous incarnation's dir of the same name which is about to get
    removed from a not yet schedule work. When that happens, the the files
    are taken underneath the nose of the blktrace, and when it comes time to
    cleanup, these dentries are long gone because of a scheduled removal.
Comment 5 Luis Chamberlain 2020-06-22 12:15:53 UTC
The fixes for this is now queued up on the block for-next branch, on its way for v5.9. There were quite a bit of scattered fixes required for this, if you are looking to backport this to your kernel be sure to include starting from "blktrace: break out of blktrace setup on concurrent calls" up to "blktrace: ensure our debugfs dir exists". The actual fix for this particular crash however is handled by the patch titled, "blktrace: fix debugfs use after free"