Bug 198685 - bluez 5.48 memory use after free crash
Summary: bluez 5.48 memory use after free crash
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Bluetooth (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-bluetooth@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-05 21:02 UTC by Nathaniel McCallum
Modified: 2018-02-09 10:59 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.14.16-300.fc27.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Log from bluetoothd (48.43 KB, text/plain)
2018-02-07 14:24 UTC, Nathaniel McCallum
Details
Reset callbacks when unregistering (1.34 KB, patch)
2018-02-09 10:59 UTC, Luiz Von Dentz
Details | Diff

Description Nathaniel McCallum 2018-02-05 21:02:18 UTC
==7643== Invalid read of size 8
==7643==    at 0x181119: register_notify_io_cb (gatt-client.c:1437)
==7643==    by 0x1AD2EF: complete_notify_request (gatt-client.c:1215)
==7643==    by 0x1AD2EF: enable_ccc_callback (gatt-client.c:1297)
==7643==    by 0x1AA597: handle_rsp (att.c:713)
==7643==    by 0x1AA597: can_read_data (att.c:885)
==7643==    by 0x1B3362: watch_callback (io-glib.c:170)
==7643==    by 0x4E87A66: g_main_dispatch (gmain.c:3177)
==7643==    by 0x4E87A66: g_main_context_dispatch (gmain.c:3830)
==7643==    by 0x4E87E0F: g_main_context_iterate.isra.21 (gmain.c:3903)
==7643==    by 0x4E88121: g_main_loop_run (gmain.c:4099)
==7643==    by 0x1260FA: main (main.c:770)
==7643==  Address 0x7b956e0 is 0 bytes inside a block of size 32 free'd
==7643==    at 0x4C30D18: free (vg_replace_malloc.c:530)
==7643==    by 0x180602: pipe_io_destroy (gatt-client.c:1082)
==7643==    by 0x180667: characteristic_destroy_pipe (gatt-client.c:1104)
==7643==    by 0x180717: characteristic_pipe_hup (gatt-client.c:1119)
==7643==    by 0x1B3362: watch_callback (io-glib.c:170)
==7643==    by 0x4E87A66: g_main_dispatch (gmain.c:3177)
==7643==    by 0x4E87A66: g_main_context_dispatch (gmain.c:3830)
==7643==    by 0x4E87E0F: g_main_context_iterate.isra.21 (gmain.c:3903)
==7643==    by 0x4E88121: g_main_loop_run (gmain.c:4099)
==7643==    by 0x1260FA: main (main.c:770)
==7643==  Block was alloc'd at
==7643==    at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
==7643==    by 0x1A576D: btd_malloc (util.c:46)
==7643==    by 0x180923: notify_client_create (gatt-client.c:1330)
==7643==    by 0x180C68: characteristic_acquire_notify (gatt-client.c:1486)
==7643==    by 0x1A1F8A: process_message.isra.7 (object.c:259)
==7643==    by 0x51773DF: _dbus_object_tree_dispatch_and_unlock (dbus-object-tree.c:1020)
==7643==    by 0x5168209: dbus_connection_dispatch (dbus-connection.c:4744)
==7643==    by 0x19E80F: message_dispatch (mainloop.c:72)
==7643==    by 0x4E84436: g_idle_dispatch (gmain.c:5535)
==7643==    by 0x4E87A66: g_main_dispatch (gmain.c:3177)
==7643==    by 0x4E87A66: g_main_context_dispatch (gmain.c:3830)
==7643==    by 0x4E87E0F: g_main_context_iterate.isra.21 (gmain.c:3903)
==7643==    by 0x4E88121: g_main_loop_run (gmain.c:4099)
==7643== 

It looks like characteristic_pipe_hup() frees the chrc but leaves around a reference to it. I run into this every time I call AcquireNotify in the GATT interface.
Comment 1 Luiz Von Dentz 2018-02-06 06:48:27 UTC
I guess you are calling it from a process that is terminated which cleans up the fd before the CCC has been written which is not the way it intended to use, it is still a bug though and it shouldn't cause a crash.
Comment 2 Luiz Von Dentz 2018-02-06 08:51:36 UTC
Im checking this again and apparently this should not have happen as the pipe itself is only created in register_notify_io_cb by characteristic_ready so I wonder if the register_notify_io_cb is being called more than once.
Comment 3 Nathaniel McCallum 2018-02-06 14:27:39 UTC
(In reply to Luiz Von Dentz from comment #1)
> I guess you are calling it from a process that is terminated which cleans up
> the fd before the CCC has been written which is not the way it intended to
> use, it is still a bug though and it shouldn't cause a crash.

No. My process is alive the whole time and doesn't close the file descriptor. So long as I run bluez under valgrind, the NULL check on gatt-client.c:1437 passes and bluez continues to run and my program continues to receive data. So I don't understand why characteristic_pipe_hup() is being called at all.
Comment 4 Luiz Von Dentz 2018-02-06 14:46:59 UTC
Tried with bluetoothctl:

[Vudentz's T460s #1:/service000a/char000b]# notify on
[CHG] Attribute /org/bluez/app/service0x7d6ba0/chrc0x7dae30 Notify pipe acquired
[CHG] Attribute /org/bluez/hci1/dev_00_1B_DC_07_31_88/service000a/char000b Notifying: yes
Notify started

bluetoothd[27959]: src/device.c:gatt_debug() Write Req - handle: 0x0009
bluetoothd[27959]: src/gatt-database.c:gatt_ccc_write_cb() CCC write called for handle: 0x0009
bluetoothd[27959]: src/device.c:gatt_debug() Registered handler for "Service Changed": 1
bluetoothd[27959]: src/gatt-client.c:notify_client_ref() owner :1.586
bluetoothd[27959]: src/device.c:gatt_debug() Write Req - handle: 0x000d
bluetoothd[27959]: src/gatt-database.c:gatt_ccc_write_cb() CCC write called for handle: 0x000d
bluetoothd[27959]: src/gatt-database.c:ccc_write_cb() External CCC write received with value: 0x0001
bluetoothd[27959]: src/gatt-database.c:acquire_notify_reply() AcquireNotify success: fd 44 MTU 517
Comment 5 Nathaniel McCallum 2018-02-06 16:59:00 UTC
Let me be more clear about my usage pattern. We operate in a standard request/response schema. When a request is ready to send to the bluetooth device, the first thing we do is start a timeout callback:

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L480

Then we call AcquireNotify:

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L484

If this succeeds, we make an EPOLLIN event:

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L334

Next, we read the GATT attribute which defines the protocol MTU (possibly different than the BTLE MTU):

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L344

Then we write the request packet:

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L287

Which is possibly over multiple frames:

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L224

At this point, the only outstanding events are the timeout and the epoll on the fd. When frames are received on input, we try to reassemble the response packets.

If a keep-alive packet is received, we restart the timeout:

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L151

If a response is received, we are complete so we close the fd:

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L167

If the timeout event fires, we also close the fd:

https://github.com/freeu2f/u2fhid2bt/blob/master/gatt.c#L92

We don't call any dbus methods outside of this workflow. Since the first thing we do is acquire the fd and the last thing we do is close the fd, I think our application is working within the parameters of the bluez API.
Comment 6 Luiz Von Dentz 2018-02-06 20:07:56 UTC
Can you add the output of bluetoothd with debug enabled?

Btw, you don't really need to dup the fd at the client end, at least not with libdbus, or is this something to do with systemd D-Bus library? Another thing that stand out is the fact that there is no check for gatt->snd.not, perhaps the idea is to reacquire on every request? That would be overkill and perhaps you would be better off using StartNotify instead then. Both bluetoothctl and meshctl acquire the notification once and just keep it as along as the connection exists, though if the problem is really the fact that you close and acquire in quick succession we should fix that since nothing should crash bluetootd.
Comment 7 Nathaniel McCallum 2018-02-07 14:09:47 UTC
(In reply to Luiz Von Dentz from comment #6)
> Can you add the output of bluetoothd with debug enabled?

Posting it shortly after this.

> Btw, you don't really need to dup the fd at the client end, at least not
> with libdbus, or is this something to do with systemd D-Bus library?

https://github.com/systemd/systemd/issues/8003

> Another
> thing that stand out is the fact that there is no check for gatt->snd.not,
> perhaps the idea is to reacquire on every request? That would be overkill
> and perhaps you would be better off using StartNotify instead then.

Yes, we reacquire on every request. I did this because I wanted to avoid holding state in case the device goes out of range. I'm not sure what the right way to do this is.

> Both
> bluetoothctl and meshctl acquire the notification once and just keep it as
> along as the connection exists, though if the problem is really the fact
> that you close and acquire in quick succession we should fix that since
> nothing should crash bluetootd.

+1
Comment 8 Nathaniel McCallum 2018-02-07 14:24:47 UTC
Created attachment 274049 [details]
Log from bluetoothd
Comment 9 Nathaniel McCallum 2018-02-07 14:32:08 UTC
The full log is attached. The relevant bit is this:

src/device.c:gatt_debug() Primary services found: 3
src/device.c:gatt_debug() start: 0x0001, end: 0x0003, uuid: 00001801-0000-1000-8000-00805f9b34fb
src/device.c:gatt_debug() start: 0x0014, end: 0x001a, uuid: 00001800-0000-1000-8000-00805f9b34fb
src/device.c:gatt_debug() start: 0x0028, end: 0xffff, uuid: 0000fffd-0000-1000-8000-00805f9b34fb
src/device.c:gatt_debug() Registered handler for "Service Changed": 0
src/device.c:gatt_client_ready_cb() status: success, error: 0
src/gatt-client.c:btd_gatt_client_ready() GATT client ready
src/gatt-client.c:create_services() Exporting objects for GATT services: AC:37:43:BD:2B:0A
src/gatt-client.c:service_create() Exported GATT service: /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0001
src/gatt-client.c:characteristic_create() Exported GATT characteristic: /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0001/char0002
src/gatt-client.c:service_create() Exported GATT service: /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0028
src/gatt-client.c:characteristic_create() Exported GATT characteristic: /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0028/char0029
src/gatt-client.c:characteristic_create() Exported GATT characteristic: /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0028/char002b
src/gatt-client.c:descriptor_create() Exported GATT characteristic descriptor: /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0028/char002b/desc002d
src/gatt-client.c:characteristic_create() Exported GATT characteristic: /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0028/char002e
src/gatt-client.c:characteristic_create() Exported GATT characteristic: /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0028/char0030
src/device.c:device_svc_resolved() /org/bluez/hci0/dev_AC_37_43_BD_2B_0A err 0
src/gatt-client.c:notify_client_ref() owner :1.302
src/gatt-client.c:characteristic_create_pipe() AcquireNotify: sender :1.302 io 0x7b299c0
src/gatt-client.c:characteristic_pipe_hup() /org/bluez/hci0/dev_AC_37_43_BD_2B_0A/service0028/char002b: io 0x7b299c0
src/gatt-client.c:notify_client_unref() owner :1.302
src/gatt-client.c:notify_client_free() owner :1.302
src/gatt-client.c:notify_client_ref() owner :1.302
==18279== Invalid read of size 8
==18279==    at 0x181119: register_notify_io_cb (gatt-client.c:1437)
==18279==    by 0x1AD2EF: complete_notify_request (gatt-client.c:1215)
==18279==    by 0x1AD2EF: enable_ccc_callback (gatt-client.c:1297)
==18279==    by 0x1AA597: handle_rsp (att.c:713)
==18279==    by 0x1AA597: can_read_data (att.c:885)
==18279==    by 0x1B3362: watch_callback (io-glib.c:170)
==18279==    by 0x4E87A66: g_main_dispatch (gmain.c:3177)
==18279==    by 0x4E87A66: g_main_context_dispatch (gmain.c:3830)
==18279==    by 0x4E87E0F: g_main_context_iterate.isra.21 (gmain.c:3903)
==18279==    by 0x4E88121: g_main_loop_run (gmain.c:4099)
==18279==    by 0x1260FA: main (main.c:770)
==18279==  Address 0x7b26a10 is 0 bytes inside a block of size 32 free'd
==18279==    at 0x4C30D18: free (vg_replace_malloc.c:530)
==18279==    by 0x180602: pipe_io_destroy (gatt-client.c:1082)
==18279==    by 0x180667: characteristic_destroy_pipe (gatt-client.c:1104)
==18279==    by 0x180717: characteristic_pipe_hup (gatt-client.c:1119)
==18279==    by 0x1B3362: watch_callback (io-glib.c:170)
==18279==    by 0x4E87A66: g_main_dispatch (gmain.c:3177)
==18279==    by 0x4E87A66: g_main_context_dispatch (gmain.c:3830)
==18279==    by 0x4E87E0F: g_main_context_iterate.isra.21 (gmain.c:3903)
==18279==    by 0x4E88121: g_main_loop_run (gmain.c:4099)
==18279==    by 0x1260FA: main (main.c:770)
==18279==  Block was alloc'd at
==18279==    at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
==18279==    by 0x1A576D: btd_malloc (util.c:46)
==18279==    by 0x180923: notify_client_create (gatt-client.c:1330)
==18279==    by 0x180C68: characteristic_acquire_notify (gatt-client.c:1486)
==18279==    by 0x1A1F8A: process_message.isra.7 (object.c:259)
==18279==    by 0x51773DF: _dbus_object_tree_dispatch_and_unlock (dbus-object-tree.c:1020)
==18279==    by 0x5168209: dbus_connection_dispatch (dbus-connection.c:4744)
==18279==    by 0x19E80F: message_dispatch (mainloop.c:72)
==18279==    by 0x4E84436: g_idle_dispatch (gmain.c:5535)
==18279==    by 0x4E87A66: g_main_dispatch (gmain.c:3177)
==18279==    by 0x4E87A66: g_main_context_dispatch (gmain.c:3830)
==18279==    by 0x4E87E0F: g_main_context_iterate.isra.21 (gmain.c:3903)
==18279==    by 0x4E88121: g_main_loop_run (gmain.c:4099)
==18279==
Comment 10 Luiz Von Dentz 2018-02-09 10:59:28 UTC
Created attachment 274079 [details]
Reset callbacks when unregistering

This should prevent the callback to be called, Im still not sure what is happening but it looks like the bt_gatt_client_unregister_notify is called while CCC has not been written.

Note You need to log in before you can comment on or make changes to this bug.