Bug 52471 - usb device removed from sysfs before input children devices
Summary: usb device removed from sysfs before input children devices
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-08 13:24 UTC by karllinuxtest.relton
Modified: 2013-11-19 14:08 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.7
Tree: Mainline
Regression: Yes


Attachments
udevadm monitor log (1.34 KB, text/plain)
2013-01-08 13:24 UTC, karllinuxtest.relton
Details
Make hidp_get_raw_report abort if the session is terminating (1.43 KB, patch)
2013-01-29 19:29 UTC, karllinuxtest.relton
Details | Diff
udevadm monitor report with hidp_get_raw_report patch applied (2.67 KB, text/plain)
2013-01-29 19:33 UTC, karllinuxtest.relton
Details
full dmesg (79.43 KB, application/octet-stream)
2013-02-02 18:38 UTC, chrmhoffmann
Details
Fix race between hidp_session and hci code (2.05 KB, patch)
2013-02-05 21:14 UTC, karllinuxtest.relton
Details | Diff

Description karllinuxtest.relton 2013-01-08 13:24:19 UTC
Created attachment 90741 [details]
udevadm monitor log

On coming out of suspend my usb bluetooth adaptor is being reset by the system.

In linux 3.7 the usb devices are being removed from the sysfs tree first, and then the various 'child' devices (like my bluetooth mouse & keyboard related devices) afterwards. This is causing the udev events for the input devices to have 'orphaned' sysfs paths in the udev events.

This in term means the Xorg evdev driver does not recognise the events, and so doesn't see the removal of in the put devices.

This has been picked by some downstream distributions, e.g. see this thread by Google Chrome developers:

http://code.google.com/p/chromium-os/issues/detail?id=33813

Back on linux 3.2 this was not the case. The usb adaptor was reset, but device removal was orderly: first the input devices (will full paths in the udev events), then the usb devices walking up the tree.


The attachment is a fragment captured by udevadm monitor when running 3.7. See how the usb devices are moved first, and then the input/bluetooth related stuff with path-heads removed (paths are now /hci0/... instead of /devices/...)
Comment 1 Greg Kroah-Hartman 2013-01-08 14:22:57 UTC
On Tue, Jan 08, 2013 at 01:24:19PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
>            Summary: usb device removed from sysfs before input children
>                     devices

Please send this to the linux-usb@vger.kernel.org and linux-bluetooth
mailing lists.
Comment 2 karllinuxtest.relton 2013-01-24 13:58:13 UTC
The problem is caused by two separate issues:

1) There is a race between 
hidp_session() in bluetooth/hidp/core.c
and
hci_unregister_dev() in bluetooth/hci_core.c

Really the latter needs to wait for the former to terminate before pulling the plug on the hci device (it is hidp_session that eventually removes the 'input' device).


2) the power_supply system seems to be triggering a lookup of battery 'capacity' in the input device. This calls into the driver, which deep down invokes more code in the 'hidp' module: hidp_get_raw_report()

This function tries to communicate with the bluetooth device (keyboard in my case), but because the device is being 'torn down' such communication fails. Accordingly hidp_get_raw_report has a 5 second timeout ... so this function stalls for 5 seconds until that timeout occurs.

---

I would guess that for some reason, removing the 'hid' or associated 'input' device is leading to the power_supply code wanting to lookup the battery capacity which introduces a 5 second wait. This would delay hidp_session by 5 seconds ... in the mean time allowing hci_unregister_dev() to win the race and hence the hci device is removed before the input & hid devices.
Comment 3 karllinuxtest.relton 2013-01-24 16:09:45 UTC
device_del is done on the battery device, called from hidinput_cleanup_battery in hid-input.c.

This will generate a uevent which has the power_supply attributes added, which in turn invokes power_supply_show_property for 'capacity' which invokes the hidp_get_raw_report (which stalls for 5 seconds).


Back in the days of linux 3.2, hid-input.c did not have any support for batteries, so this chain of calls was never made, hence the race was won by hidp_session() because it didn't have the 5 second delay injected.

Now with battery support added, the problem is manifest.

--

I think hidp_get_raw_report should abort if the devices is being 'torn down' (and so not go into the 5 second wait). But also the race between the two functions needs sorting, since sooner or later someone else will get unlucky.
Comment 4 karllinuxtest.relton 2013-01-29 19:29:38 UTC
Created attachment 92151 [details]
Make hidp_get_raw_report abort if the session is  terminating

This patch avoids the 5 second penalty in hidp_get_raw_report by making it terminate if the session is terminating anyway.
Comment 5 karllinuxtest.relton 2013-01-29 19:33:07 UTC
Created attachment 92161 [details]
udevadm monitor report with hidp_get_raw_report patch applied

With patch 92151 applied, the 5 second penalty is avoided. As this log shows, things are much better with the input/event devices being removed in their proper order.

There are still some bits out of order though - this is because of the race condition in the code between the hci code and the hidp_session code.

However, with this patch the events are in order enough for userspace (i.e. Xorg evdev driver) to be happy - the keyboard/mouse is disconnected from the Xserver, making room for a new connection later.
Comment 6 chrmhoffmann 2013-02-02 18:38:14 UTC
Hi, 

i have a similar crash when connecting to my BT mouse on 3.5.0-23 (and I tried also on 3.6), but I don't need to go to suspend or back. 

[   26.920280] Call Trace:
[   26.920311]  [<ffffffff8103fa39>] default_spin_lock_flags+0x9/0x10
[   26.920388]  [<ffffffff81683bee>] _raw_spin_lock_irqsave+0x2e/0x40
[   26.920465]  [<ffffffff81076871>] finish_wait+0x41/0x80
[   26.920531]  [<ffffffffa0644d05>] hidp_session+0x5c5/0x8f0 [hidp]
[   26.920606]  [<ffffffff81087cc0>] ? try_to_wake_up+0x2a0/0x2a0
[   26.920678]  [<ffffffff81087cc0>] ? try_to_wake_up+0x2a0/0x2a0
[   26.920750]  [<ffffffff810768b0>] ? finish_wait+0x80/0x80
[   26.920817]  [<ffffffffa0644740>] ? hidp_input_report.isra.4+0x290/0x290 [hidp]
[   26.920906]  [<ffffffff81075f13>] kthread+0x93/0xa0
[   26.920967]  [<ffffffff8168ce64>] kernel_thread_helper+0x4/0x10
[   26.921039]  [<ffffffff81075e80>] ? kthread_freezable_should_stop+0x70/0x70
[   26.921125]  [<ffffffff8168ce60>] ? gs_change+0x13/0x13

Chris
Comment 7 chrmhoffmann 2013-02-02 18:38:49 UTC
Created attachment 92431 [details]
full dmesg
Comment 8 karllinuxtest.relton 2013-02-05 21:14:01 UTC
Created attachment 92581 [details]
Fix race between hidp_session and hci code

This patch, in conjunction with 92151, should remove the problem and ensure clean sysfs device removal in the right order.
Comment 9 Joseph Salisbury 2013-04-18 15:37:39 UTC
Thanks for the patch, Karl.  Do you plan on submitting it for inclusion in the mainline kernel?
Comment 10 karllinuxtest.relton 2013-04-18 17:54:41 UTC
There are now much better and more comprehensive patches that cover this and other related issues in hidp that are in bluetooth-next and imminently being pulled across to linux-next

Note You need to log in before you can comment on or make changes to this bug.