Bug 60672 - kernel panic when remove and rescan pci device via sysfs/pci interface in the same time.
Summary: kernel panic when remove and rescan pci device via sysfs/pci interface in the...
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-01 05:45 UTC by Gu Zheng
Modified: 2014-06-06 20:44 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments
panic dmesg (85.59 KB, text/plain)
2013-08-01 05:45 UTC, Gu Zheng
Details
pci tree infos (2.27 KB, text/plain)
2013-08-01 05:46 UTC, Gu Zheng
Details

Description Gu Zheng 2013-08-01 05:45:46 UTC
Created attachment 107070 [details]
panic dmesg

There is a potential deadlock situation when we manipulate the pci-sysfs user
interfaces from different bus hierarchy simultaneously, described as following:

path1: sysfs remove device:             | path2: sysfs rescan device:
sysfs_schedule_callback_work()          | sysfs_write_file()
  remove_callback()                     |   flush_write_buffer()
*1* mutex_lock(&pci_remove_rescan_mutex)|*2*  sysfs_get_active(attr_sd)
      ...                               |     dev_attr_store()
        device_remove_file()            |       dev_rescan_store()
          ...                           |*4*
mutex_lock(&pci_remove_rescan_mutex)
*3*       sysfs_deactivate(sd)          |     ...
            wait_for_completion()       |*5*  sysfs_put_active(attr_sd)
*6* mutex_unlock(&pci_remove_rescan_mutex)

If path1 first holds the pci_remove_rescan_mutex at *1*, then another path
called path2 actived and runs to *2* before path1 runs to *3*, we now runs
to a deadlock situation:
Path1 holds the mutex waiting path2 to decrease sysfs_dirent's s_active
counter at *5*, but path2 is blocked at *4* when trying to get the
pci_remove_rescan_mutex. The mutex won't be put by path1 until it reach
*6*, but it's now blocked at *3*.

Running the following two scripts simultaneously can produce this issue:
1. 
for i in `seq 20`; do echo $i; echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove; echo -n 1 > /sys/bus/pci/devices/0000\:00\:09.0/rescan ; done

2.
for i in `seq 20000`; do echo $i; sleep 0.1;  echo -n 1 >  /sys/bus/pci/devices/0000\:1c\:00.1/remove ; echo -n 1 >  /sys/bus/pci/devices/0000\:1c\:00.0/remove ;echo -n 1 >  /sys/bus/pci/devices/0000\:1a\:01.0/rescan; done

panic dmesg info, pci tree infos are sent as attachments.
Comment 1 Gu Zheng 2013-08-01 05:46:45 UTC
Created attachment 107071 [details]
pci tree infos
Comment 2 Bjorn Helgaas 2014-06-04 00:46:22 UTC
Gu, has this been fixed?  I suspect that Rafael's work, e.g.,

  f41b32613138 ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
  d42f5da23400 ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock

might have fixed this.
Comment 3 Gu Zheng 2014-06-04 00:56:24 UTC
(In reply to Bjorn Helgaas from comment #2)
Hi Bjorn,
I'll confirm this in the coming week, the box is on other duty now.

Thanks,
Gu
> Gu, has this been fixed?  I suspect that Rafael's work, e.g.,
> 
>   f41b32613138 ACPI / hotplug / PCI: Move PCI rescan-remove locking to
> hotplug_event()
>   d42f5da23400 ACPI / hotplug / PCI: Scan root bus under the PCI
> rescan-remove lock
Yeah, the code seems can fix this issue.
> 
> might have fixed this.

Note You need to log in before you can comment on or make changes to this bug.