Bug 204275 - bluetoothd consumes 100% cpu on keyboard disconnect
Summary: bluetoothd consumes 100% cpu on keyboard disconnect
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Bluetooth (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-bluetooth@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-22 13:36 UTC by Steven Newbury
Modified: 2019-08-10 07:40 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.x +
Tree: Mainline
Regression: No


Attachments
Patch to fix high CPU usage with HID keyboard and other devices (2.08 KB, patch)
2019-07-22 23:56 UTC, Steven Newbury
Details | Diff
Remove sec_watch, instead attempt immediate connection (1.42 KB, patch)
2019-08-01 18:12 UTC, Steven Newbury
Details | Diff

Description Steven Newbury 2019-07-22 13:36:29 UTC
Many bluetooth HID keyboard profile devices disconnect on idle to save power, I have two such devices:

Logitech Y-X5A77 keyboard

Sony PS3 bluetooth remote control

When the device disconnects on idle the g_io_channels get disconnected

bluetoothd[3167]: profiles/input/device.c:ctrl_watch_cb() Device 00:07:61:F6:C9:7D disconnected
bluetoothd[3167]: profiles/input/device.c:intr_watch_cb() Device 00:07:61:F6:C9:7D disconnected

But, something isn't closed properly (possibly g_io_add_watch) since bluetoothd then spins at 100% CPU in g_main_loop_run() until the device reconnects, at which point every works normally again until the device next idles.

This bug has been present for at least 2 years, I'm certain it's much longer:

https://bugs.launchpad.net/ubuntu/+source/bluez/+bug/1717796
Comment 1 Steven Newbury 2019-07-22 14:22:28 UTC
I've tried adding g_source_remove()s for ctrl_watch and intr_watch before shutting down the io channels but it doesn't make any difference.
Comment 2 Steven Newbury 2019-07-22 19:36:06 UTC
strace results in continuous repeating:

poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=18, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=42, events=POLLIN}, {fd=43, events=POLLIN}, {fd=44, events=POLLIN}, {fd=45, events=POLLIN}, {fd=46, events=POLLIN}, {fd=47, events=POLLIN}, {fd=48, events=POLLIN}, {fd=49, events=POLLIN}, {fd=50, events=POLLIN}, {fd=51, events=POLLIN}, {fd=52, events=POLLIN}, {fd=53, events=POLLIN}, {fd=55, events=POLLOUT}], 28, -1) = 1 ([{fd=55, revents=POLLNVAL}])
Comment 3 Steven Newbury 2019-07-22 20:54:45 UTC
Ahhh...

I was already aware that on first connect the HID input device wasn't created, but I hadn't realised it's part of the same bug.

What's happening is that on initial connect:

 dev->sec_watch = g_io_add_watch(idev->intr_io, G_IO_OUT,
                                                        encrypt_notify, idev);

creates the notify callback.  But it never triggers.  On second connection the callback gets triggered, but the connect code gets run again so another refcount is added.  This means the intr channel never gets closed which means it's stuck on the last event.
Comment 4 Steven Newbury 2019-07-22 21:00:23 UTC
I flipped the G_IO_OUT to G_IO_IN and it started working on first connect.  I don't know why it was G_IO_OUT?
Comment 5 Steven Newbury 2019-07-22 23:56:16 UTC
Created attachment 283925 [details]
Patch to fix high CPU usage with HID keyboard and other devices
Comment 6 Luiz Von Dentz 2019-07-24 11:14:41 UTC
Ive sent a fix upstream:

https://lore.kernel.org/linux-bluetooth/20190724110151.4258-1-luiz.dentz@gmail.com/T/#u

Let me know if that works.
Comment 7 Steven Newbury 2019-07-24 15:31:20 UTC
(In reply to Luiz Von Dentz from comment #6)
> Ive sent a fix upstream:
> 
> https://lore.kernel.org/linux-bluetooth/20190724110151.4258-1-luiz.
> dentz@gmail.com/T/#u
> 
> Let me know if that works.

That will prevent the channel getting left dangling, but it doesn't address the issue of the callback not happening on initial connect, which is AFAICT what results in that part of the bug for me.

Is it really supposed to be waiting for G_IO_OUT?  It seems it only gets triggered for me when a second connection is attempted on the dangling channel, presumably during handshake..?
Comment 8 Steven Newbury 2019-07-25 09:04:46 UTC
Luiz, did you see (In reply to Luiz Von Dentz from comment #6)
> Ive sent a fix upstream:
> 
> https://lore.kernel.org/linux-bluetooth/20190724110151.4258-1-luiz.
> dentz@gmail.com/T/#u
> 
> Let me know if that works.

Luiz, did you see the patch I attached?  Perhaps it's better to remove the old watcher first than not add another if it exists, but it shouldn't happen anyway.  The explicit remove should happen to catch the case where the connection is broken before being fully initialised but the callback should get triggered once data starts to flow over the encrypted channel.

In my tests, allowing the channel to fully close by removing the sec_watch prevents the keyboard from ever connecting, this is because the callback never gets triggered.  I believe the bug is caused by the refcount sec_watch creates not being automatically removed because the callback is left pending, it doesn't get removed on channel shutdown, but has the side-effect of making it work with the channel being already open when the device reconnects because it triggers the callback.

What made it work every time is changing sec_watch to wait for G_IO_IN instead of G_IO_OUT, but presumably it is G_IO_OUT for a reason?
Comment 9 Luiz Von Dentz 2019-07-25 13:13:34 UTC
Im not sure we even need the watch in the first place, the kernel should block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it might be possible to get rid of sec_watch altogether.
Comment 10 Steven Newbury 2019-07-29 13:52:47 UTC
(In reply to Luiz Von Dentz from comment #9)
> Im not sure we even need the watch in the first place, the kernel should
> block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> might be possible to get rid of sec_watch altogether.

Going through the history it originally worked that way, I'm not sure why it was changed..?
Comment 11 Luiz Von Dentz 2019-07-29 13:56:22 UTC
(In reply to Steven Newbury from comment #10)
> (In reply to Luiz Von Dentz from comment #9)
> > Im not sure we even need the watch in the first place, the kernel should
> > block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> > might be possible to get rid of sec_watch altogether.
> 
> Going through the history it originally worked that way, I'm not sure why it
> was changed..?

Well lets try to remove it then and see if that fix the problem.
Comment 12 Steven Newbury 2019-08-01 18:03:14 UTC
(In reply to Luiz Von Dentz from comment #11)
> (In reply to Steven Newbury from comment #10)
> > (In reply to Luiz Von Dentz from comment #9)
> > > Im not sure we even need the watch in the first place, the kernel should
> > > block any in/out until the connection is encrypted (BT_SK_SUSPEND) so it
> > > might be possible to get rid of sec_watch altogether.
> > 
> > Going through the history it originally worked that way, I'm not sure why
> it
> > was changed..?
> 
> Well lets try to remove it then and see if that fix the problem.

That works.  I'll attach a patch on here.
Comment 13 Steven Newbury 2019-08-01 18:12:12 UTC
Created attachment 284073 [details]
Remove sec_watch, instead attempt immediate connection
Comment 14 Luiz Von Dentz 2019-08-10 07:40:08 UTC
(In reply to Steven Newbury from comment #13)
> Created attachment 284073 [details]
> Remove sec_watch, instead attempt immediate connection

Could you please send as a patch to linux-bluetooth?

Note You need to log in before you can comment on or make changes to this bug.