Created attachment 90741 [details]
udevadm monitor log
On coming out of suspend my usb bluetooth adaptor is being reset by the system.
In linux 3.7 the usb devices are being removed from the sysfs tree first, and then the various 'child' devices (like my bluetooth mouse & keyboard related devices) afterwards. This is causing the udev events for the input devices to have 'orphaned' sysfs paths in the udev events.
This in term means the Xorg evdev driver does not recognise the events, and so doesn't see the removal of in the put devices.
This has been picked by some downstream distributions, e.g. see this thread by Google Chrome developers:
Back on linux 3.2 this was not the case. The usb adaptor was reset, but device removal was orderly: first the input devices (will full paths in the udev events), then the usb devices walking up the tree.
The attachment is a fragment captured by udevadm monitor when running 3.7. See how the usb devices are moved first, and then the input/bluetooth related stuff with path-heads removed (paths are now /hci0/... instead of /devices/...)
On Tue, Jan 08, 2013 at 01:24:19PM +0000, email@example.com wrote:
> Summary: usb device removed from sysfs before input children
Please send this to the firstname.lastname@example.org and linux-bluetooth
The problem is caused by two separate issues:
1) There is a race between
hidp_session() in bluetooth/hidp/core.c
hci_unregister_dev() in bluetooth/hci_core.c
Really the latter needs to wait for the former to terminate before pulling the plug on the hci device (it is hidp_session that eventually removes the 'input' device).
2) the power_supply system seems to be triggering a lookup of battery 'capacity' in the input device. This calls into the driver, which deep down invokes more code in the 'hidp' module: hidp_get_raw_report()
This function tries to communicate with the bluetooth device (keyboard in my case), but because the device is being 'torn down' such communication fails. Accordingly hidp_get_raw_report has a 5 second timeout ... so this function stalls for 5 seconds until that timeout occurs.
I would guess that for some reason, removing the 'hid' or associated 'input' device is leading to the power_supply code wanting to lookup the battery capacity which introduces a 5 second wait. This would delay hidp_session by 5 seconds ... in the mean time allowing hci_unregister_dev() to win the race and hence the hci device is removed before the input & hid devices.
device_del is done on the battery device, called from hidinput_cleanup_battery in hid-input.c.
This will generate a uevent which has the power_supply attributes added, which in turn invokes power_supply_show_property for 'capacity' which invokes the hidp_get_raw_report (which stalls for 5 seconds).
Back in the days of linux 3.2, hid-input.c did not have any support for batteries, so this chain of calls was never made, hence the race was won by hidp_session() because it didn't have the 5 second delay injected.
Now with battery support added, the problem is manifest.
I think hidp_get_raw_report should abort if the devices is being 'torn down' (and so not go into the 5 second wait). But also the race between the two functions needs sorting, since sooner or later someone else will get unlucky.
Created attachment 92151 [details]
Make hidp_get_raw_report abort if the session is terminating
This patch avoids the 5 second penalty in hidp_get_raw_report by making it terminate if the session is terminating anyway.
Created attachment 92161 [details]
udevadm monitor report with hidp_get_raw_report patch applied
With patch 92151 applied, the 5 second penalty is avoided. As this log shows, things are much better with the input/event devices being removed in their proper order.
There are still some bits out of order though - this is because of the race condition in the code between the hci code and the hidp_session code.
However, with this patch the events are in order enough for userspace (i.e. Xorg evdev driver) to be happy - the keyboard/mouse is disconnected from the Xserver, making room for a new connection later.
i have a similar crash when connecting to my BT mouse on 3.5.0-23 (and I tried also on 3.6), but I don't need to go to suspend or back.
[ 26.920280] Call Trace:
[ 26.920311] [<ffffffff8103fa39>] default_spin_lock_flags+0x9/0x10
[ 26.920388] [<ffffffff81683bee>] _raw_spin_lock_irqsave+0x2e/0x40
[ 26.920465] [<ffffffff81076871>] finish_wait+0x41/0x80
[ 26.920531] [<ffffffffa0644d05>] hidp_session+0x5c5/0x8f0 [hidp]
[ 26.920606] [<ffffffff81087cc0>] ? try_to_wake_up+0x2a0/0x2a0
[ 26.920678] [<ffffffff81087cc0>] ? try_to_wake_up+0x2a0/0x2a0
[ 26.920750] [<ffffffff810768b0>] ? finish_wait+0x80/0x80
[ 26.920817] [<ffffffffa0644740>] ? hidp_input_report.isra.4+0x290/0x290 [hidp]
[ 26.920906] [<ffffffff81075f13>] kthread+0x93/0xa0
[ 26.920967] [<ffffffff8168ce64>] kernel_thread_helper+0x4/0x10
[ 26.921039] [<ffffffff81075e80>] ? kthread_freezable_should_stop+0x70/0x70
[ 26.921125] [<ffffffff8168ce60>] ? gs_change+0x13/0x13
Created attachment 92431 [details]
Created attachment 92581 [details]
Fix race between hidp_session and hci code
This patch, in conjunction with 92151, should remove the problem and ensure clean sysfs device removal in the right order.
Thanks for the patch, Karl. Do you plan on submitting it for inclusion in the mainline kernel?
There are now much better and more comprehensive patches that cover this and other related issues in hidp that are in bluetooth-next and imminently being pulled across to linux-next