LINUX KERNEL 4.19.83 - debugfs_remove use-after-free 0x01 - Introduction === # Product: Linux Kernel # Version: 4.19.83 (and probably other versions) # Bug: UAF (Read) # Tested on: GNU/Linux Debian 9 x86_64 0x02 - Details === There is a UAF read in the debugfs_remove function which is used to removes a file or directory in debugfs that was previously created with a call to another debugfs function (like debugfs_create_file()) Code analysis (fs/debugfs/inode.c): void debugfs_remove(struct dentry *dentry) { struct dentry *parent; int ret; if (IS_ERR_OR_NULL(dentry)) return; parent = dentry->d_parent; // UAF occurs here inode_lock(d_inode(parent)); ret = __debugfs_remove(dentry, parent); inode_unlock(d_inode(parent)); if (!ret) simple_release_fs(&debugfs_mount, &debugfs_mount_count); } ../.. 0x03 - Crash report === hpet1: lost 1 rtc interrupts ================================================================== BUG: KASAN: use-after-free in debugfs_remove+0xfd/0x120 fs/debugfs/inode.c:687 Read of size 8 at addr ffff8880275e70c0 by task kworker/1:1/24 CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 4.19.83 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events __blk_release_queue Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x138/0x22e lib/dump_stack.c:113 print_address_description+0x67/0x26a mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report mm/kasan/report.c:412 [inline] kasan_report.cold.7+0x241/0x2fe mm/kasan/report.c:396 debugfs_remove+0xfd/0x120 fs/debugfs/inode.c:687 blk_trace_free+0x31/0x130 kernel/trace/blktrace.c:312 blk_trace_cleanup kernel/trace/blktrace.c:339 [inline] __blk_trace_remove+0x70/0xa0 kernel/trace/blktrace.c:352 blk_trace_shutdown+0x5e/0x80 kernel/trace/blktrace.c:752 __blk_release_queue+0x21a/0x4b0 block/blk-sysfs.c:855 process_one_work+0xbcd/0x1b50 kernel/workqueue.c:2153 worker_thread+0x176/0x1200 kernel/workqueue.c:2296 kthread+0x347/0x410 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415 Allocated by task 4702: // <--- where it has been allocated set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc mm/kasan/kasan.c:553 [inline] kasan_kmalloc+0xbf/0xe0 mm/kasan/kasan.c:531 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2706 [inline] slab_alloc mm/slub.c:2714 [inline] kmem_cache_alloc+0xef/0x290 mm/slub.c:2719 __d_alloc+0xbe/0xc60 fs/dcache.c:1610 d_alloc+0x8f/0x300 fs/dcache.c:1694 d_alloc_parallel+0x149/0x1c70 fs/dcache.c:2441 __lookup_slow+0x1e6/0x510 fs/namei.c:1654 lookup_one_len+0x1bd/0x200 fs/namei.c:2543 start_creating fs/debugfs/inode.c:313 [inline] start_creating+0xbf/0x1f0 fs/debugfs/inode.c:289 __debugfs_create_file+0x5e/0x3f0 fs/debugfs/inode.c:352 do_blk_trace_setup+0x427/0xc80 kernel/trace/blktrace.c:528 __blk_trace_setup+0xbc/0x170 kernel/trace/blktrace.c:577 blk_trace_ioctl+0x161/0x2d0 kernel/trace/blktrace.c:716 blkdev_ioctl+0x3fa/0x1fa0 block/ioctl.c:587 block_ioctl+0xe9/0x130 fs/block_dev.c:1891 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:501 [inline] do_vfs_ioctl+0x1c6/0x16f0 fs/ioctl.c:688 ksys_ioctl+0x9b/0xc0 fs/ioctl.c:705 __do_sys_ioctl fs/ioctl.c:712 [inline] __se_sys_ioctl fs/ioctl.c:710 [inline] __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:710 do_syscall_64+0x167/0x6c0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 2115: // <--- where it has been freed set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521 slab_free_hook mm/slub.c:1371 [inline] slab_free_freelist_hook mm/slub.c:1398 [inline] slab_free mm/slub.c:2953 [inline] kmem_cache_free+0xc8/0x2e0 mm/slub.c:2969 __rcu_reclaim kernel/rcu/rcu.h:236 [inline] rcu_do_batch kernel/rcu/tree.c:2584 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2897 [inline] __rcu_process_callbacks kernel/rcu/tree.c:2864 [inline] rcu_process_callbacks+0x133a/0x3350 kernel/rcu/tree.c:2881 __do_softirq+0x2ae/0xa09 kernel/softirq.c:292 The buggy address belongs to the object at ffff8880275e7080 which belongs to the cache dentry of size 288 // <--- the object's cache The buggy address is located 64 bytes inside of 288-byte region [ffff8880275e7080, ffff8880275e71a0) The buggy address belongs to the page: page:ffffea00009d7980 count:1 mapcount:0 mapping:ffff888035a5cc00 index:0xffff8880275e74a0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 ffffea00009d5f08 ffffea00009d2088 ffff888035a5cc00 raw: ffff8880275e74a0 000000000017000a 00000001ffffffff 0000000000000000 hpet1: lost 1 rtc interrupts page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880275e6f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880275e7000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff8880275e7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880275e7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880275e7180: fb fb fb fb fc fc fc fc fc fc fc fc 00 00 00 00 ==================================================================
Does someone know how to reproduce this issue? thanks.
Thanks for the bug report! However the CVE assigned to this over incorrect by stating the severity of this bug report is in the core kernel degugfs_remove() with an intrinsic use-after-free on the core debugfs code in the kernel. We would like to dispute this CVE. Patches being reviewed upstream and the respective reproducer: https://lkml.kernel.org/r/20200402000002.7442-1-mcgrof@kernel.org https://github.com/mcgrof/break-blktrace
I'd also be interested to know if Tristan's original bug was hit while using the loop driver or some oether block device. Frankly, bugs like this with so little information are very difficult to work with - it's tough to reverse-engineer what may have happened when that information is not provided.
To remove clutter from the commit log that will be used, I will provide below a bit of explanation of the crash, and provide a reference to here for those curious of the elaborate details of the analysis. This will be using a new crash, but he same applies to v4.19.83 where the bug was reported on. Below is the splat explained with a bit more details, explaining what is happening in userspace, kernel, and a print of the CPU on, which the code runs on: load loopback module [ 13.603371] == blk_mq_debugfs_register(12) start [ 13.604040] == blk_mq_debugfs_register(12) q->debugfs_dir created [ 13.604934] == blk_mq_debugfs_register(12) end [ 13.627382] == blk_mq_debugfs_register(12) start [ 13.628041] == blk_mq_debugfs_register(12) q->debugfs_dir created [ 13.629240] == blk_mq_debugfs_register(12) end [ 13.651667] == blk_mq_debugfs_register(12) start [ 13.652836] == blk_mq_debugfs_register(12) q->debugfs_dir created [ 13.655107] == blk_mq_debugfs_register(12) end [ 13.684917] == blk_mq_debugfs_register(12) start [ 13.687876] == blk_mq_debugfs_register(12) q->debugfs_dir created [ 13.691588] == blk_mq_debugfs_register(13) end [ 13.707320] == blk_mq_debugfs_register(13) start [ 13.707863] == blk_mq_debugfs_register(13) q->debugfs_dir created [ 13.708856] == blk_mq_debugfs_register(13) end [ 13.735623] == blk_mq_debugfs_register(13) start [ 13.736656] == blk_mq_debugfs_register(13) q->debugfs_dir created [ 13.738411] == blk_mq_debugfs_register(13) end [ 13.763326] == blk_mq_debugfs_register(13) start [ 13.763972] == blk_mq_debugfs_register(13) q->debugfs_dir created [ 13.765167] == blk_mq_debugfs_register(13) end [ 13.779510] == blk_mq_debugfs_register(13) start [ 13.780522] == blk_mq_debugfs_register(13) q->debugfs_dir created [ 13.782338] == blk_mq_debugfs_register(13) end [ 13.783521] loop: module loaded LOOP_CTL_DEL(loop0) #1 [ 13.803550] = __blk_release_queue(4) start [ 13.807772] == blk_trace_shutdown(4) start [ 13.810749] == blk_trace_shutdown(4) end [ 13.813437] = __blk_release_queue(4) calling blk_mq_debugfs_unregister() [ 13.817593] ==== blk_mq_debugfs_unregister(4) begin [ 13.817621] ==== blk_mq_debugfs_unregister(4) debugfs_remove_recursive(q->debugfs_dir) [ 13.821203] ==== blk_mq_debugfs_unregister(4) end q->debugfs_dir is NULL [ 13.826166] = __blk_release_queue(4) blk_mq_debugfs_unregister() end [ 13.832992] = __blk_release_queue(4) end LOOP_CTL_ADD(loop0) #1 [ 13.843742] == blk_mq_debugfs_register(7) start [ 13.845569] == blk_mq_debugfs_register(7) q->debugfs_dir created [ 13.848628] == blk_mq_debugfs_register(7) end BLKTRACE_SETUP(loop0) #1 [ 13.850924] == blk_trace_ioctl(7, BLKTRACESETUP) start [ 13.852852] === do_blk_trace_setup(7) start [ 13.854580] === do_blk_trace_setup(7) creating directory [ 13.856620] === do_blk_trace_setup(7) using what debugfs_lookup() gave [ 13.860635] === do_blk_trace_setup(7) end with ret: 0 [ 13.862615] == blk_trace_ioctl(7, BLKTRACESETUP) end LOOP_CTL_DEL(loop0) #2 [ 13.883304] = __blk_release_queue(7) start [ 13.885324] == blk_trace_shutdown(7) start [ 13.887197] == blk_trace_shutdown(7) calling __blk_trace_remove() [ 13.889807] == __blk_trace_remove(7) start [ 13.891669] === blk_trace_cleanup(7) start [ 13.911656] ====== blk_trace_free(7) start This is a scheduled __blk_release_queue(). Note that at this point we know that the old device is gone as the gendisk removal is synchronous. LOOP_CTL_ADD(loop0) #2 [ 13.912709] == blk_mq_debugfs_register(2) start This is a *new* device, as we know the last one was removed synchronously. ---> From LOOP_CTL_DEL(loop0) #2 [ 13.915887] ====== blk_trace_free(7) end ---> From LOOP_CTL_ADD(loop0) #2 [ 13.918359] debugfs: Directory 'loop0' with parent 'block' already present! Above we have a case where the upon LOOP_CTL_ADD(loop) #2 the debugfs_create_dir() start_creating() call will have found an existing dentry, and return ERR_PTR(-EEXIST), and so the q->debugfs_dir will be initialized to NULL. This clash happens because an asynchronous __blk_release_queue() has been scheduled from the above LOOP_CTL_DEL(loop0) #2 call, and it has not yet removed the old loop0 debugfs directory. [ 13.926433] == blk_mq_debugfs_register(2) q->debugfs_dir created This is a huge assumption, since we never check for errors on blk_mq_debugfs_register(), and so the child files we need to create *can* simply be created using NULL as as the parent, in which case debugfs would use debugfs_mount->mnt_root as the parent. [ 13.930373] == blk_mq_debugfs_register(2) end BLKTRACE_SETUP(loop0) #2 [ 13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start [ 13.936758] === do_blk_trace_setup(2) start [ 13.938944] === do_blk_trace_setup(2) creating directory [ 13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave Here debugfs_lookup() will have found the old debugfs directory, even though q->debugfs_dir was empty when initializing this device. So the files created can actually be created where they should have. ---> From LOOP_CTL_DEL(loop0) #2 [ 13.971046] === blk_trace_cleanup(7) end [ 13.973175] == __blk_trace_remove(7) end [ 13.975352] == blk_trace_shutdown(7) end [ 13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister() [ 13.980645] ==== blk_mq_debugfs_unregister(7) begin [ 13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir) This is a continuation from the old scheduled __blk_release_queue(). It is using another request_queue structure than what the currently activated blktrace setup is using. Here now the dentry where the blktrace setup files were created on top of is gone. In fact, since debugfs_remove_recursive() is used, *all* of those blktrace debugfs files should have been removed as well, including the relay. This is despite the fact that the removal and blktrace setup are working on two different request_queue structures. [ 13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL [ 13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end [ 13.993155] = __blk_release_queue(7) end The scheduled __blk_release_queue completes finally. ---> From BLKTRACE_SETUP(loop0) #2 [ 13.995928] === do_blk_trace_setup(2) end with ret: 0 [ 13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end At this point the old blktrace *setup* is complete though, and let us recall that the old dentry is still used. But recall that we just had our old device removed, and since debugfs_remove_recursive() was used, so are all of its files. This request_queue however is still active. LOOP_CTL_DEL(loop0) #3 [ 14.035119] = __blk_release_queue(2) start [ 14.036925] == blk_trace_shutdown(2) start [ 14.038518] == blk_trace_shutdown(2) calling __blk_trace_remove() [ 14.040829] == __blk_trace_remove(2) start [ 14.042413] === blk_trace_cleanup(2) start This removal will cleanup the blktrace setup. Note the trap though, the blktrace setup will already have been cleaned up since the scheduled removal from instance #2 removed the files on the same debugfs path. The __blk_release_queue() is scheduled and is asynchronous so can continue later, and that is where the crash happens, on CPU 2. LOOP_CTL_ADD(loop0) #3 [ 14.072522] == blk_mq_debugfs_register(6) start ---> From LOOP_CTL_DEL(loop0) #3 [ 14.075151] ====== blk_trace_free(2) start ---> From LOOP_CTL_ADD(loop0) #3 [ 14.075882] == blk_mq_debugfs_register(6) q->debugfs_dir created ---> From LOOP_CTL_DEL(loop0) #3 The panic happens on CPU 2 when we continue with the old scheduled removal of the request_queue, which will run into a blktrace setup with NULL pointers. [ 14.078624] BUG: kernel NULL pointer dereference, address: 00000000000000a0 [ 14.084332] == blk_mq_debugfs_register(6) end [ 14.086971] #PF: supervisor write access in kernel mode [ 14.086974] #PF: error_code(0x0002) - not-present page [ 14.086977] PGD 0 P4D 0 [ 14.086984] Oops: 0002 [#1] SMP NOPTI [ 14.086990] CPU: 2 PID: 287 Comm: kworker/2:2 Tainted: G E 5.6.0-next-20200403+ #54 [ 14.086991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 14.087002] Workqueue: events __blk_release_queue [ 14.087011] RIP: 0010:down_write+0x15/0x40 [ 14.090300] == blk_trace_ioctl(6, BLKTRACESETUP) start [ 14.093277] Code: eb ca e8 3e 34 8d ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89 45 08 5d [ 14.093280] RSP: 0018:ffffc28a00533da8 EFLAGS: 00010246 [ 14.093284] RAX: 0000000000000000 RBX: ffff9f7a24d07980 RCX: ffffff8100000000 [ 14.093286] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0 [ 14.093287] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000019 [ 14.093289] R10: 0000000000000774 R11: 0000000000000000 R12: 0000000000000000 [ 14.093291] R13: ffff9f7a2fab0400 R14: ffff9f7a21dd1140 R15: 00000000000000a0 [ 14.093294] FS: 0000000000000000(0000) GS:ffff9f7a2fa80000(0000) knlGS:0000000000000000 [ 14.093296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.093298] CR2: 00000000000000a0 CR3: 00000004293d2003 CR4: 0000000000360ee0 [ 14.093307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.093308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 14.093310] Call Trace: [ 14.093324] simple_recursive_removal+0x4e/0x2e0 [ 14.093330] ? debugfs_remove+0x60/0x60 [ 14.093334] debugfs_remove+0x40/0x60 [ 14.093339] blk_trace_free+0x20/0x70 [ 14.093346] __blk_trace_remove+0x54/0x90 [ 14.096704] === do_blk_trace_setup(6) start [ 14.098534] blk_trace_shutdown+0x74/0x80 [ 14.100958] === do_blk_trace_setup(6) creating directory [ 14.104575] __blk_release_queue+0xbe/0x160 [ 14.104580] process_one_work+0x1b4/0x380 [ 14.104585] worker_thread+0x50/0x3c0 [ 14.104589] kthread+0xf9/0x130 [ 14.104593] ? process_one_work+0x380/0x380 [ 14.104596] ? kthread_park+0x90/0x90 [ 14.104599] ret_from_fork+0x1f/0x40 [ 14.104603] Modules linked in: loop(E) xfs(E) libcrc32c(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) joydev(E) serio_raw(E) aesni_intel(E) glue_helper(E) virtio_balloon(E) evdev(E) crypto_simd(E) pcspkr(E) cryptd(E) i6300esb(E) button(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E) ata_generic(E) uhci_hcd(E) ata_piix(E) ehci_hcd(E) nvme(E) libata(E) crc32c_intel(E) usbcore(E) psmouse(E) nvme_core(E) virtio_pci(E) scsi_mod(E) virtio_ring(E) t10_pi(E) virtio(E) i2c_piix4(E) floppy(E) [ 14.107400] === do_blk_trace_setup(6) using what debugfs_lookup() gave [ 14.108939] CR2: 00000000000000a0 [ 14.110589] === do_blk_trace_setup(6) end with ret: 0 [ 14.111592] ---[ end trace 7a783b33b9614db9 ]--- The root cause to this issue is that debugfs_lookup() can find a previous incarnation's dir of the same name which is about to get removed from a not yet schedule work. When that happens, the the files are taken underneath the nose of the blktrace, and when it comes time to cleanup, these dentries are long gone because of a scheduled removal.
The fixes for this is now queued up on the block for-next branch, on its way for v5.9. There were quite a bit of scattered fixes required for this, if you are looking to backport this to your kernel be sure to include starting from "blktrace: break out of blktrace setup on concurrent calls" up to "blktrace: ensure our debugfs dir exists". The actual fix for this particular crash however is handled by the patch titled, "blktrace: fix debugfs use after free"