Bug 204275

Summary: bluetoothd consumes 100% cpu on keyboard disconnect
Product: Drivers Reporter: Steven Newbury (steve)
Component: BluetoothAssignee: linux-bluetooth (linux-bluetooth)
Status: NEW ---    
Severity: normal CC: luiz.dentz
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.x + Subsystem:
Regression: No Bisected commit-id:
Attachments: Patch to fix high CPU usage with HID keyboard and other devices
Remove sec_watch, instead attempt immediate connection

Description Steven Newbury 2019-07-22 13:36:29 UTC
Many bluetooth HID keyboard profile devices disconnect on idle to save power, I have two such devices:

Logitech Y-X5A77 keyboard

Sony PS3 bluetooth remote control

When the device disconnects on idle the g_io_channels get disconnected

bluetoothd[3167]: profiles/input/device.c:ctrl_watch_cb() Device 00:07:61:F6:C9:7D disconnected
bluetoothd[3167]: profiles/input/device.c:intr_watch_cb() Device 00:07:61:F6:C9:7D disconnected

But, something isn't closed properly (possibly g_io_add_watch) since bluetoothd then spins at 100% CPU in g_main_loop_run() until the device reconnects, at which point every works normally again until the device next idles.

This bug has been present for at least 2 years, I'm certain it's much longer:

https://bugs.launchpad.net/ubuntu/+source/bluez/+bug/1717796
Comment 1 Steven Newbury 2019-07-22 14:22:28 UTC
I've tried adding g_source_remove()s for ctrl_watch and intr_watch before shutting down the io channels but it doesn't make any difference.
Comment 2 Steven Newbury 2019-07-22 19:36:06 UTC
strace results in continuous repeating:

poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=42, events=POLLIN}, {fd=43, events=POLLIN}, {fd=44, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}, {fd=49, events=POLLIN}, {fd=50, events=POLLIN}, {fd=51, events=POLLIN}, {fd=52, events=POLLIN}, {fd=53, events=POLLIN}, {fd=55, events=POLLOUT}], 28, -1) = 1 ([{fd=55, revents=POLLNVAL}])
Comment 3 Steven Newbury 2019-07-22 20:54:45 UTC
Ahhh...

I was already aware that on first connect the HID input device wasn't created, but I hadn't realised it's part of the same bug.

What's happening is that on initial connect:

 dev->sec_watch = g_io_add_watch(idev->intr_io, G_IO_OUT,
                                                        encrypt_notify, idev);

creates the notify callback.  But it never triggers.  On second connection the callback gets triggered, but the connect code gets run again so another refcount is added.  This means the intr channel never gets closed which means it's stuck on the last event.
Comment 4 Steven Newbury 2019-07-22 21:00:23 UTC
I flipped the G_IO_OUT to G_IO_IN and it started working on first connect.  I don't know why it was G_IO_OUT?
Comment 5 Steven Newbury 2019-07-22 23:56:16 UTC
Created attachment 283925 [details]
Patch to fix high CPU usage with HID keyboard and other devices
Comment 6 Luiz Von Dentz 2019-07-24 11:14:41 UTC
Ive sent a fix upstream:

https://lore.kernel.org/linux-bluetooth/20190724110151.4258-1-luiz.dentz@gmail.com/T/#u

Let me know if that works.
Comment 7 Steven Newbury 2019-07-24 15:31:20 UTC
(In reply to Luiz Von Dentz from comment #6)
> Ive sent a fix upstream:
> 
> https://lore.kernel.org/linux-bluetooth/20190724110151.4258-1-luiz.
> dentz@gmail.com/T/#u
> 
> Let me know if that works.

That will prevent the channel getting left dangling, but it doesn't address the issue of the callback not happening on initial connect, which is AFAICT what results in that part of the bug for me.

Is it really supposed to be waiting for G_IO_OUT?  It seems it only gets triggered for me when a second connection is attempted on the dangling channel, presumably during handshake..?
Comment 8 Steven Newbury 2019-07-25 09:04:46 UTC
Luiz, did you see (In reply to Luiz Von Dentz from comment #6)
> Ive sent a fix upstream:
> 
> https://lore.kernel.org/linux-bluetooth/20190724110151.4258-1-luiz.
> dentz@gmail.com/T/#u
> 
> Let me know if that works.

Luiz, did you see the patch I attached?  Perhaps it's better to remove the old watcher first than not add another if it exists, but it shouldn't happen anyway.  The explicit remove should happen to catch the case where the connection is broken before being fully initialised but the callback should get triggered once data starts to flow over the encrypted channel.

In my tests, allowing the channel to fully close by removing the sec_watch prevents the keyboard from ever connecting, this is because the callback never gets triggered.  I believe the bug is caused by the refcount sec_watch creates not being automatically removed because the callback is left pending, it doesn't get removed on channel shutdown, but has the side-effect of making it work with the channel being already open when the device reconnects because it triggers the callback.

What made it work every time is changing sec_watch to wait for G_IO_IN instead of G_IO_OUT, but presumably it is G_IO_OUT for a reason?
Comment 9 Luiz Von Dentz 2019-07-25 13:13:34 UTC
Im not sure we even need the watch in the first place, the kernel should block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it might be possible to get rid of sec_watch altogether.
Comment 10 Steven Newbury 2019-07-29 13:52:47 UTC
(In reply to Luiz Von Dentz from comment #9)
> Im not sure we even need the watch in the first place, the kernel should
> block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> might be possible to get rid of sec_watch altogether.

Going through the history it originally worked that way, I'm not sure why it was changed..?
Comment 11 Luiz Von Dentz 2019-07-29 13:56:22 UTC
(In reply to Steven Newbury from comment #10)
> (In reply to Luiz Von Dentz from comment #9)
> > Im not sure we even need the watch in the first place, the kernel should
> > block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> > might be possible to get rid of sec_watch altogether.
> 
> Going through the history it originally worked that way, I'm not sure why it
> was changed..?

Well lets try to remove it then and see if that fix the problem.
Comment 12 Steven Newbury 2019-08-01 18:03:14 UTC
(In reply to Luiz Von Dentz from comment #11)
> (In reply to Steven Newbury from comment #10)
> > (In reply to Luiz Von Dentz from comment #9)
> > > Im not sure we even need the watch in the first place, the kernel should
> > > block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> > > might be possible to get rid of sec_watch altogether.
> > 
> > Going through the history it originally worked that way, I'm not sure why
> it
> > was changed..?
> 
> Well lets try to remove it then and see if that fix the problem.

That works.  I'll attach a patch on here.
Comment 13 Steven Newbury 2019-08-01 18:12:12 UTC
Created attachment 284073 [details]
Remove sec_watch, instead attempt immediate connection
Comment 14 Luiz Von Dentz 2019-08-10 07:40:08 UTC
(In reply to Steven Newbury from comment #13)
> Created attachment 284073 [details]
> Remove sec_watch, instead attempt immediate connection

Could you please send as a patch to linux-bluetooth?