Bug 52471
Summary: | usb device removed from sysfs before input children devices | ||
---|---|---|---|
Product: | Drivers | Reporter: | karlrelton000+ubuntu |
Component: | USB | Assignee: | Greg Kroah-Hartman (greg) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | alan, chrmhoffmann, clancy.kieran+kernel, joseph.salisbury, linux-bluetooth |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.7 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
udevadm monitor log
Make hidp_get_raw_report abort if the session is terminating udevadm monitor report with hidp_get_raw_report patch applied full dmesg Fix race between hidp_session and hci code |
Description
karlrelton000+ubuntu
2013-01-08 13:24:19 UTC
On Tue, Jan 08, 2013 at 01:24:19PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > Summary: usb device removed from sysfs before input children > devices Please send this to the linux-usb@vger.kernel.org and linux-bluetooth mailing lists. The problem is caused by two separate issues: 1) There is a race between hidp_session() in bluetooth/hidp/core.c and hci_unregister_dev() in bluetooth/hci_core.c Really the latter needs to wait for the former to terminate before pulling the plug on the hci device (it is hidp_session that eventually removes the 'input' device). 2) the power_supply system seems to be triggering a lookup of battery 'capacity' in the input device. This calls into the driver, which deep down invokes more code in the 'hidp' module: hidp_get_raw_report() This function tries to communicate with the bluetooth device (keyboard in my case), but because the device is being 'torn down' such communication fails. Accordingly hidp_get_raw_report has a 5 second timeout ... so this function stalls for 5 seconds until that timeout occurs. --- I would guess that for some reason, removing the 'hid' or associated 'input' device is leading to the power_supply code wanting to lookup the battery capacity which introduces a 5 second wait. This would delay hidp_session by 5 seconds ... in the mean time allowing hci_unregister_dev() to win the race and hence the hci device is removed before the input & hid devices. device_del is done on the battery device, called from hidinput_cleanup_battery in hid-input.c. This will generate a uevent which has the power_supply attributes added, which in turn invokes power_supply_show_property for 'capacity' which invokes the hidp_get_raw_report (which stalls for 5 seconds). Back in the days of linux 3.2, hid-input.c did not have any support for batteries, so this chain of calls was never made, hence the race was won by hidp_session() because it didn't have the 5 second delay injected. Now with battery support added, the problem is manifest. -- I think hidp_get_raw_report should abort if the devices is being 'torn down' (and so not go into the 5 second wait). But also the race between the two functions needs sorting, since sooner or later someone else will get unlucky. Created attachment 92151 [details]
Make hidp_get_raw_report abort if the session is terminating
This patch avoids the 5 second penalty in hidp_get_raw_report by making it terminate if the session is terminating anyway.
Created attachment 92161 [details]
udevadm monitor report with hidp_get_raw_report patch applied
With patch 92151 applied, the 5 second penalty is avoided. As this log shows, things are much better with the input/event devices being removed in their proper order.
There are still some bits out of order though - this is because of the race condition in the code between the hci code and the hidp_session code.
However, with this patch the events are in order enough for userspace (i.e. Xorg evdev driver) to be happy - the keyboard/mouse is disconnected from the Xserver, making room for a new connection later.
Hi, i have a similar crash when connecting to my BT mouse on 3.5.0-23 (and I tried also on 3.6), but I don't need to go to suspend or back. [ 26.920280] Call Trace: [ 26.920311] [<ffffffff8103fa39>] default_spin_lock_flags+0x9/0x10 [ 26.920388] [<ffffffff81683bee>] _raw_spin_lock_irqsave+0x2e/0x40 [ 26.920465] [<ffffffff81076871>] finish_wait+0x41/0x80 [ 26.920531] [<ffffffffa0644d05>] hidp_session+0x5c5/0x8f0 [hidp] [ 26.920606] [<ffffffff81087cc0>] ? try_to_wake_up+0x2a0/0x2a0 [ 26.920678] [<ffffffff81087cc0>] ? try_to_wake_up+0x2a0/0x2a0 [ 26.920750] [<ffffffff810768b0>] ? finish_wait+0x80/0x80 [ 26.920817] [<ffffffffa0644740>] ? hidp_input_report.isra.4+0x290/0x290 [hidp] [ 26.920906] [<ffffffff81075f13>] kthread+0x93/0xa0 [ 26.920967] [<ffffffff8168ce64>] kernel_thread_helper+0x4/0x10 [ 26.921039] [<ffffffff81075e80>] ? kthread_freezable_should_stop+0x70/0x70 [ 26.921125] [<ffffffff8168ce60>] ? gs_change+0x13/0x13 Chris Created attachment 92431 [details]
full dmesg
Created attachment 92581 [details]
Fix race between hidp_session and hci code
This patch, in conjunction with 92151, should remove the problem and ensure clean sysfs device removal in the right order.
Thanks for the patch, Karl. Do you plan on submitting it for inclusion in the mainline kernel? There are now much better and more comprehensive patches that cover this and other related issues in hidp that are in bluetooth-next and imminently being pulled across to linux-next |