Bug 6706
Summary: | modprobe -r ohci1394 hangs or panics | ||
---|---|---|---|
Product: | Drivers | Reporter: | Stefan Richter (stefanr) |
Component: | IEEE1394 | Assignee: | Stefan Richter (stefanr) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | ||
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.15, 2.6.16 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
effectively reverts 2.6.16 patch "Hold the device's parent's lock during probe and remove"
an unsane fix another ugly workaround previous patch revised |
Description
Stefan Richter
2006-06-18 03:40:46 UTC
Didn't happen anymore when I tested again with Linux 2.6.19-rc2. I am too lazy to find out which patch fixed that. On a quick glance, this fix in 2.6.18 looks like it: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=445151932e869fd76b23bccff75ae2a600ccf3c9 Alas it's not fixed. I never get a panic but the deadlock between the modprobe process and the knodemgrd kthread does still happen if the timing is right. Tested with 2.6.19-rc. # modprobe ohci1394 && modprobe -r ohci1394 works. # modprobe ohci1394 && sleep 1 && modprobe -r ohci1394 gets stuck in uninterruptible sleep on kthread_stop(). This is trying to stop the knodemgrd which uninterruptibly sleeps on bus_rescan_devices_helper() meanwhile. Call trace of the modprobe -r context: kthread_stop in kernel/kthread.c nodemgr_remove_host in drivers/ieee1394/nodemgr.c __unregister_host in drivers/ieee1394/highlevel.c highlevel_remove_host in drivers/ieee1394/highlevel.c hpsb_remove_host in drivers/ieee1394/hosts.c ohci1394_pci_remove in drivers/ieee1394/ohci1394.c pci_device_remove in pci/pci-driver.c __device_release_driver in drivers/base/dd.c driver_detach in drivers/base/dd.c Call trace of the knodemgrd context: bus_rescan_devices_helper in drivers/base/bus.c bus_rescan_devices in drivers/base/bus.c nodemgr_node_probe in drivers/ieee1394/nodemgr.c nodemgr_host_thread in drivers/ieee1394/nodemgr.c It seems the following is the culprit: Since Linux 2.6.16, bus_rescan_devices_helper takes down(&dev->parent->sem) if a parent device exists. This is true for all devices that are managed by nodemgr. (FireWire ud's have ud's or ne's as parent, and FireWire ne's have hosts as parent.) And yes, the call in driver_detach to __device_release_driver is enclosed in down(&dev->sem). Created attachment 9564 [details]
effectively reverts 2.6.16 patch "Hold the device's parent's lock during probe and remove"
As expected, reverting the dev->parent->sem changes made in Linux 2.6.16 avoids
the deadlock. Of course we cannot simply revert it without wreaking havoc in
the USB subsystem.
Created attachment 9566 [details] an unsane fix This prevents the deadlock. The lines in // are not required. Note, although the deadlock is now gone for good according to repeated tests, there is now a new oops in eth1394, logged as bug 7550. This could have been the lock-up which I observed with kernels before 2.6.16. Discussion on the mailinglists: http://lkml.org/lkml/2006/11/18/140 Created attachment 9577 [details]
another ugly workaround
This patch creates a dummy driver and binds it to all fw-host devices. That
way, bus_rescan_devices_helper will skip fw-host devices and won't block on
their parent's device semaphore.
Created attachment 9585 [details]
previous patch revised
binds a dummy driver named "ieee1394" to fw-host devices, thus prevents
fw-hosts from being scanned by the driver core, thus prevents the deadlock
fix was merged, will appear in Linux 2.6.20-rc1 |