Bug 205711

Summary: Linux Kernel 5.4.x - __blk_add_trace use-after-free
Product: IO/Storage Reporter: Tristan Madani (tristmd)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: RESOLVED CODE_FIX    
Severity: normal CC: jack, jeanleo8701, tom.leiming
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4.0-rc2 Subsystem:
Regression: No Bisected commit-id:
Attachments: blktrace: Protect q->blktrace with RCU

Description Tristan Madani 2019-11-29 17:31:52 UTC
LINUX KERNEL 5.4.0-rc2 - __blk_add_trace use-after-free

0x01 - Introduction
===

# Product: Linux Kernel 
# Version: 5.4.0-rc2 (and probably other versions)
# Bug: UAF (Read)
# Tested on: GNU/Linux Debian 9 x86_64


0x02 - Details
===

There is a UAF read in the __blk_add_trace function which is used to fill out a
blk_io_trace structure and places it in a per-cpu subbuffer.

Code analysis (kernel/trace/blktrace.c):

static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
		     int op, int op_flags, u32 what, int error, int pdu_len,
		     void *pdu_data, union kernfs_node_id *cgid)
{
	struct task_struct *tsk = current;
	struct ring_buffer_event *event = NULL;
	struct ring_buffer *buffer = NULL;
	struct blk_io_trace *t;
	unsigned long flags = 0;
	unsigned long *sequence;
	pid_t pid;
	int cpu, pc = 0;
	bool blk_tracer = blk_tracer_enabled;
	ssize_t cgid_len = cgid ? sizeof(*cgid) : 0;

	if (unlikely(bt->trace_state != Blktrace_running && !blk_tracer)) // UAF read here
		return;

	what |= ddir_act[op_is_write(op) ? WRITE : READ];
	what |= MASK_TC_BIT(op_flags, SYNC);

../..


0x03 - Crash report
===

BUG: KASAN: use-after-free in __blk_add_trace.constprop.27+0x801/0xbb0 kernel/trace/blktrace.c:228
Read of size 4 at addr ffff88803019be00 by task jbd2/sda-8/100

CPU: 1 PID: 100 Comm: jbd2/sda-8 Not tainted 5.4.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x75/0xae lib/dump_stack.c:113
 print_address_description.constprop.6+0x16/0x220 mm/kasan/report.c:374
 __kasan_report.cold.9+0x1a/0x40 mm/kasan/report.c:506
 kasan_report+0xe/0x20 mm/kasan/common.c:634
 __blk_add_trace.constprop.27+0x801/0xbb0 kernel/trace/blktrace.c:228
 blk_add_trace_bio.isra.19+0xb1/0x100 kernel/trace/blktrace.c:865
 trace_block_bio_queue include/trace/events/block.h:357 [inline]
 generic_make_request_checks+0xd05/0x1300 block/blk-core.c:966
 generic_make_request+0x7e/0x910 block/blk-core.c:1018
 submit_bio+0xa7/0x3c0 block/blk-core.c:1190
 submit_bh_wbc.isra.56+0x504/0x6e0 fs/buffer.c:3095
 jbd2_journal_commit_transaction+0x2639/0x6365 fs/jbd2/commit.c:727
 kjournald2+0x1fd/0x7f0 fs/jbd2/journal.c:210
 kthread+0x308/0x3c0 kernel/kthread.c:255
 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352

Allocated by task 10077:					// <--- where it has been allocated
 save_stack+0x19/0x80 mm/kasan/common.c:69
 set_track mm/kasan/common.c:77 [inline]
 __kasan_kmalloc mm/kasan/common.c:510 [inline]
 __kasan_kmalloc.constprop.6+0xc1/0xd0 mm/kasan/common.c:483
 kmalloc include/linux/slab.h:556 [inline]
 kzalloc include/linux/slab.h:690 [inline]
 do_blk_trace_setup+0x157/0xaa0 kernel/trace/blktrace.c:497
 __blk_trace_setup+0xa8/0x140 kernel/trace/blktrace.c:571
 blk_trace_setup+0x41/0x60 kernel/trace/blktrace.c:589
 sg_ioctl+0xd50/0x2540 drivers/scsi/sg.c:1112
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x1a0/0x1000 fs/ioctl.c:696
 ksys_ioctl+0x77/0xa0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:718
 do_syscall_64+0x9a/0x330 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 340:					// <--- where it has been freed
 save_stack+0x19/0x80 mm/kasan/common.c:69
 set_track mm/kasan/common.c:77 [inline]
 kasan_set_free_info mm/kasan/common.c:332 [inline]
 __kasan_slab_free+0x125/0x170 mm/kasan/common.c:471
 slab_free_hook mm/slub.c:1424 [inline]
 slab_free_freelist_hook mm/slub.c:1475 [inline]
 slab_free mm/slub.c:3018 [inline]
 kfree+0x90/0x240 mm/slub.c:3967
 blk_trace_cleanup kernel/trace/blktrace.c:339 [inline]
 __blk_trace_remove+0x5f/0x80 kernel/trace/blktrace.c:352
 blk_trace_remove+0x21/0x30 kernel/trace/blktrace.c:362
 sg_ioctl+0x66f/0x2540 drivers/scsi/sg.c:1121
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x1a0/0x1000 fs/ioctl.c:696
 ksys_ioctl+0x77/0xa0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:718
 do_syscall_64+0x9a/0x330 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff88803019be00
 which belongs to the cache kmalloc-128 of size 128			// <--- the object's cache
The buggy address is located 0 bytes inside of
 128-byte region [ffff88803019be00, ffff88803019be80)
The buggy address belongs to the page:
page:ffffea0000c066c0 refcount:1 mapcount:0 mapping:ffff888035c01640 index:0xffff88803019bf00
flags: 0x100000000000200(slab)
raw: 0100000000000200 ffffea0000d51788 ffff888035c00d10 ffff888035c01640
raw: ffff88803019bf00 0000000000100006 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88803019bd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
 ffff88803019bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88803019be00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                   ^
 ffff88803019be80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88803019bf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Comment 1 Jan Kara 2020-01-27 14:23:52 UTC
It seems to me that to properly resolve this race, we need to protect q->blk_trace with RCU and do the teardown and freeing of the structure only after the RCU period expires... Any better idea?
Comment 2 Jan Kara 2020-02-06 14:24:39 UTC
Created attachment 287171 [details]
blktrace: Protect q->blktrace with RCU

I've submitted this patch upstream.
Comment 3 Jan Kara 2020-02-06 14:29:58 UTC
Tristan, I'm not sure how reproducible this is but if you can reasonably reproduce, I'd be happy if you checked whether this patch fixes the problem for you.
Comment 4 Jean Leo 2020-02-07 06:36:59 UTC
(In reply to Jan Kara from comment #3)
> Tristan, I'm not sure how reproducible this is but if you can reasonably
> reproduce, I'd be happy if you checked whether this patch fixes the problem
> for you.

Hi, Jan Kara. I am also trying to fix this problem and and I am now trying to reproduce it. I think the two functions: sysfs_blk_trace_attr_store() and sysfs_blk_trace_attr_show() are okay and do not need to be modified because mutex_lock(&q->blk_trace_mutex) provides sufficient protection for this. The other modifications are basically similar to your patch.
Comment 5 Ming Lei 2020-02-07 08:50:27 UTC
(In reply to Jan Kara from comment #1)
> It seems to me that to properly resolve this race, we need to protect
> q->blk_trace with RCU and do the teardown and freeing of the structure only
> after the RCU period expires... Any better idea?

IMO, RCU might be the best approach.

It should be fixed by allocating 'struct blk_trace' for the request queue statically, or using percpu_ref too. Looks RCU is still better and simpler.
Comment 6 Jan Kara 2020-03-26 11:45:02 UTC
The patches are committed - c780e86dd48 ("blktrace: Protect q->blk_trace with RCU"). Closing the bug.