Created attachment 300278 [details] Coverage when processing the 4 frames Hi, While fuzzing bluetooth-next with KASAN I found a reproducable null-pointer-deref in klist_next. I am not sure, whether the bug lies in how the Bluetooth subsystem handles the device or in the device implementation in /drivers/base/core.c It is also triggered without KASAN and in current linux v5.16 mainline kernel, but I obtained the following logs and coverage from bluetooth-next@72279d17df54d5e4e7910b39c61a3f3464e36633. To trigger the bug, the following 4 frames must be sent by the controller (the frames sent by the fuzzer are longer, I removed the unparsed extra bytes): Frame #1: HCI Sent Connect Request (BD_ADDR 00:00:00:00:00:04) 0000 04 04 04 04 00 00 00 00 00 00 10 21 00 Frame #2: HCI_Connection_Complete (BD_ADDR 00:00:00:00:00:04, Handle 0x0010) 0000 04 03 80 00 10 00 04 00 00 00 00 00 00 00 Frame #3: HCI_Connection_Complete (duplicate of previous frame) 0000 04 03 80 00 10 00 04 00 00 00 00 00 00 00 Frame #4: HCI_Disconnection_Complete (Handle 0x0010) 0000 04 05 00 00 10 00 00 ff e8 04 94 05 ff I obtained the kcov-coverage of the bluetooth subsystem from receiving these frames from my fuzzer resolved with addr2line and attached it. This is the kernel backtrace (with KASAN): KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f] CPU: 0 PID: 44 Comm: kworker/u3:2 Not tainted 5.16.0-rc8+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: hci0 hci_rx_work RIP: 0010:klist_next+0x44/0x450 Code: 53 48 89 fb 48 83 ec 08 80 3c 02 00 0f 85 7a 03 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 2b 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4e 03 00 00 4c 8d 6b 08 4c 8b 7d 58 48 b8 00 00 RSP: 0018:ffffc900004ef9d8 EFLAGS: 00010212 RAX: dffffc0000000000 RBX: ffffc900004efa40 RCX: ffffffff834b4762 RDX: 000000000000000b RSI: ffffc900004efa40 RDI: 0000000000000058 RBP: 0000000000000000 R08: ffffffff844ee5a0 R09: 0000000000000001 R10: ffffffff834b4703 R11: 0000000000000000 R12: ffffffff834b43a0 R13: 1ffff9200009df44 R14: dffffc0000000000 R15: ffff88800ab18000 FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000004a26000 CR4: 00000000000006f0 Call Trace: <TASK> ? synchronize_rcu_expedited+0xb40/0xb40 ? bt_link_release+0x20/0x20 device_find_child+0xa8/0x160 ? device_for_each_child+0x140/0x140 ? mark_held_locks+0x9e/0xe0 hci_conn_del_sysfs+0xc3/0x100 hci_conn_cleanup+0x328/0x6a0 ? skb_dequeue+0x110/0x1a0 hci_conn_del+0x27e/0x6a0 ? sco_conn_del+0x2c0/0x2c0 hci_disconn_complete_evt+0x72e/0xdc0 hci_event_packet+0x7b2/0xdd0 ? hci_inquiry_complete_evt+0x560/0x560 ? hci_le_ext_adv_term_evt+0x850/0x850 ? lockdep_hardirqs_on_prepare+0x17b/0x400 hci_rx_work+0x4d3/0xb90 process_one_work+0x904/0x1590 ? lock_release+0x6f0/0x6f0 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 ? _raw_spin_lock_irq+0x41/0x50 worker_thread+0x57c/0x1260 ? process_one_work+0x1590/0x1590 kthread+0x3b2/0x490 ? _raw_spin_unlock_irq+0x1f/0x40 ? set_kthread_struct+0x100/0x100 ret_from_fork+0x22/0x30 </TASK> Modules linked in: ---[ end trace b5a710d8df676911 ]--- RIP: 0010:klist_next+0x44/0x450 Code: 53 48 89 fb 48 83 ec 08 80 3c 02 00 0f 85 7a 03 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 2b 48 8d 7d 58 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 4e 03 00 00 4c 8d 6b 08 4c 8b 7d 58 48 b8 00 00 RSP: 0018:ffffc900004ef9d8 EFLAGS: 00010212 RAX: dffffc0000000000 RBX: ffffc900004efa40 RCX: ffffffff834b4762 RDX: 000000000000000b RSI: ffffc900004efa40 RDI: 0000000000000058 RBP: 0000000000000000 R08: ffffffff844ee5a0 R09: 0000000000000001 R10: ffffffff834b4703 R11: 0000000000000000 R12: ffffffff834b43a0 R13: 1ffff9200009df44 R14: dffffc0000000000 R15: ffff88800ab18000 FS: 0000000000000000(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000004a26000 CR4: 00000000000006f0 I tried to pin it down, the problem seems to be that conn->dev->p of the connection that should be cleaned up in hci_conn_del_sysfs is (already) NULL. In the subsequent calls, conn->dev->p->klist_children is used to initialize an iterator which is finally dereferenced in klist_next: void hci_conn_del_sysfs(struct hci_conn *conn) { struct hci_dev *hdev = conn->hdev; if (!device_is_registered(&conn->dev)) return; while (1) { struct device *dev; dev = device_find_child(&conn->dev, NULL, __match_tty); struct device *device_find_child(struct device *parent, void *data, int (*match)(struct device *dev, void *data)) { struct klist_iter i; struct device *child; if (!parent) return NULL; klist_iter_init(&parent->p->klist_children, &i); while ((child = next_device(&i))) static struct device *next_device(struct klist_iter *i) { struct klist_node *n = klist_next(i); I guess that happens during the processing of the third frame, which is a duplicate connection_complete event. During the processing of that, a warning is written to the kernel log, because sysfs tries to create a duplicate filename: debugfs: Directory '16' with parent 'hci0' already present! sysfs: cannot create duplicate filename '/devices/virtual/bluetooth/hci0/hci0:16' CPU: 0 PID: 44 Comm: kworker/u3:2 Not tainted 5.16.0-rc8+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: hci0 hci_rx_work Call Trace: <TASK> dump_stack_lvl+0x45/0x59 sysfs_warn_dup.cold+0x17/0x24 sysfs_create_dir_ns+0x1eb/0x260 ? sysfs_create_mount_point+0x80/0x80 ? rwlock_bug.part.0+0x90/0x90 ? do_raw_spin_unlock+0x4f/0x210 kobject_add_internal+0x223/0x920 ? lock_release+0x3b2/0x6f0 kobject_add+0x120/0x190 ? mark_held_locks+0x9e/0xe0 ? kset_create_and_add+0x170/0x170 ? kasan_quarantine_put+0x87/0x1d0 ? lockdep_hardirqs_on+0x79/0x100 ? kasan_quarantine_put+0x87/0x1d0 ? kobject_set_name_vargs+0xad/0x110 ? kobject_get+0x50/0xe0 device_add+0x2cd/0x1b20 ? dev_set_name+0xa6/0xd0 ? device_initialize+0x540/0x540 ? __fw_devlink_link_to_suppliers+0x480/0x480 ? hci_debugfs_create_conn+0x17b/0x1f0 ? hci_debugfs_create_le+0x850/0x850 hci_conn_add_sysfs+0x8d/0xf0 hci_conn_complete_evt+0x4db/0x10b0 ? lock_chain_count+0x20/0x20 ? hci_encrypt_change_evt+0xf20/0xf20 ? lock_is_held_type+0xd7/0x130 hci_event_packet+0x7b2/0xdd0 ? hci_encrypt_change_evt+0xf20/0xf20 ? hci_le_ext_adv_term_evt+0x850/0x850 ? lockdep_hardirqs_on_prepare+0x17b/0x400 hci_rx_work+0x4d3/0xb90 process_one_work+0x904/0x1590 ? lock_release+0x6f0/0x6f0 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 ? _raw_spin_lock_irq+0x41/0x50 worker_thread+0x57c/0x1260 ? process_one_work+0x1590/0x1590 kthread+0x3b2/0x490 ? _raw_spin_unlock_irq+0x1f/0x40 ? set_kthread_struct+0x100/0x100 ret_from_fork+0x22/0x30 </TASK> kobject_add_internal failed for hci0:16 with -EEXIST, don't try to register things with the same name in the same directory. Bluetooth: hci0: failed to register connection device I think conn->dev->p is freed in device_add in the trace that creates the previous warning, due to the third received frame, which is the second of the two duplicate HCI_Connection_Complete events: int device_add(struct device *dev) { struct device *parent; struct kobject *kobj; struct class_interface *class_intf; int error = -EINVAL; struct kobject *glue_dir = NULL; dev = get_device(dev); if (!dev) goto done; if (!dev->p) { error = device_private_init(dev); if (error) goto done; } /* * for statically allocated devices, which should all be converted * some day, we need to initialize the name. We prevent reading back * the name, and force the use of dev_name() */ if (dev->init_name) { dev_set_name(dev, "%s", dev->init_name); dev->init_name = NULL; } /* subsystems can specify simple device enumeration */ if (!dev_name(dev) && dev->bus && dev->bus->dev_name) dev_set_name(dev, "%s%u", dev->bus->dev_name, dev->id); if (!dev_name(dev)) { error = -EINVAL; goto name_error; } [..] name_error: kfree(dev->p); dev->p = NULL; goto done; }
Created attachment 300279 [details] PCAP of the bluetooth frames
Created attachment 300280 [details] Full systemlog
I think I found an underlying issue in driver-core, and submitted a patch: https://lore.kernel.org/lkml/20220120150246.6216-1-soenke.huster@eknoes.de/T/
Submitted a patch to linux-bluetooth: https://lore.kernel.org/linux-bluetooth/20220121173622.192744-1-soenke.huster@eknoes.de/T/#u
Fixed in d5ebaa7c5f6f688959e8d40840b2249ede63b8ed @ bluetooth-next (https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/commit/?id=d5ebaa7c5f6f688959e8d40840b2249ede63b8ed)