Bug 214843 - firewire: Controller hotplug blocks on userspace applications
Summary: firewire: Controller hotplug blocks on userspace applications
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_ieee1394
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-27 11:47 UTC by Hector Martin
Modified: 2021-10-27 11:47 UTC (History)
0 users

See Also:
Kernel Version: 5.14.2-rt-rt21
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Hector Martin 2021-10-27 11:47:17 UTC
When a FireWire controller is yanked from the PCIe bus while busy (e.g. a Thunderbolt adapter), the core blocks on any userspace apps that have the device open:

[ 1598.543455] INFO: task irq/35-pciehp:937 blocked for more than 122 seconds.
[ 1598.543460]       Not tainted 5.14.2-rt-rt21 #1
[ 1598.543462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1598.543463] task:irq/35-pciehp   state:D stack:    0 pid:  937 ppid:     2 flags:0x00004000
[ 1598.543468] Call Trace:
[ 1598.543470]  __schedule+0x306/0xb10
[ 1598.543475]  ? lock_release+0x193/0x290
[ 1598.543481]  schedule+0x5f/0xd0
[ 1598.543483]  schedule_timeout+0xe8/0x120
[ 1598.543487]  ? rcu_read_lock_sched_held+0xd/0x70
[ 1598.543489]  ? wait_for_completion+0x7c/0xe0
[ 1598.543492]  ? lock_release+0x193/0x290
[ 1598.543495]  ? trace_hardirqs_on+0x1b/0xf0
[ 1598.543501]  wait_for_completion+0x84/0xe0
[ 1598.543505]  fw_core_remove_card+0x178/0x1c0 [firewire_core]
<snip garbage>
[ 1598.543600]  pci_remove+0x55/0x220 [firewire_ohci]
[ 1598.543605]  pci_device_remove+0x36/0xa0
[ 1598.543610]  __device_release_driver+0x17b/0x230
[ 1598.543615]  device_release_driver+0x21/0x30
[ 1598.543618]  pci_stop_bus_device+0x6a/0x90
[ 1598.543622]  pci_stop_bus_device+0x27/0x90
[ 1598.543626]  pci_stop_bus_device+0x27/0x90
[ 1598.543629]  pci_stop_and_remove_bus_device+0x9/0x20
[ 1598.543633]  pciehp_unconfigure_device+0x6f/0xf0
[ 1598.543638]  pciehp_disable_slot+0x64/0x100
[ 1598.543642]  pciehp_handle_presence_or_link_change+0xe2/0x320
[ 1598.543647]  pciehp_ist+0x174/0x180
[ 1598.543651]  ? disable_irq_nosync+0x10/0x10
[ 1598.543656]  irq_thread_fn+0x1b/0x60
[ 1598.543660]  irq_thread+0xd0/0x170
[ 1598.543664]  ? irq_finalize_oneshot.part.0+0xd0/0xd0
[ 1598.543668]  ? irq_thread_check_affinity+0x80/0x80
[ 1598.543672]  kthread+0x161/0x180
[ 1598.543676]  ? set_kthread_struct+0x40/0x40
[ 1598.543679]  ret_from_fork+0x22/0x30
[ 1598.543812] INFO: lockdep is turned off.

This can be reproduced by just running `cat > /dev/fw0` and unplugging the host controller. The pciehp thread will hang with that call stack until the `cat` is killed. Once that happens, the disconnection completes and the controller can be reconnected.

It would be much preferable if the file descriptors to the open device were orphaned (everything returning -ENODEV) and then the core can complete the shutdown process without waiting for userspace.

(Note: this is an RT kernel but this also reproduces on vanilla)

Note You need to log in before you can comment on or make changes to this bug.