Bug 60695 - pci_remove_rescan_mutex lockdep warning on rescan
Summary: pci_remove_rescan_mutex lockdep warning on rescan
Status: NEEDINFO
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL: http://lkml.kernel.org/r/2381207.2TMy...
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-05 15:51 UTC by Bjorn Helgaas
Modified: 2014-06-07 16:13 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.11.0-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lockdep dependency chain (9.42 KB, text/plain)
2013-08-05 15:51 UTC, Bjorn Helgaas
Details
dmesg boot + remove + rescan (102.35 KB, text/plain)
2013-08-05 20:39 UTC, Peter Wu
Details
lspci -vv (37.91 KB, text/plain)
2013-08-05 20:44 UTC, Peter Wu
Details
dmesg (v3.15-rc6) (6.80 KB, text/plain)
2014-06-07 08:40 UTC, Peter Wu
Details

Description Bjorn Helgaas 2013-08-05 15:51:25 UTC
Created attachment 107100 [details]
lockdep dependency chain

Probably related to bug 60672.
Reported by Peter Wu <lekensteyn@gmail.com>:

When trying to rescan for PCI devices (after removing a child), I get a
lockdep warning in my logs. The commands were:

    # tee /sys/bus/pci/devices/0000\:03\:00.0/remove <<<1
    1
    # tee /sys/bus/pci/devices/0000\:02\:00.0/rescan <<<1
    1

I did not experience actual issues, just letting you know about this. I can
reproduce this on every reboot:
1. Boot
2. (pci-stub owns the device)
4. remove from parent bus
5. rescan
6. Lockdep warning found.
(7. pci-stub claims device again)

Interestingly, I can only reproduce this after freshly rebooting.
Repeating steps 4 and 5 do not trigger a new lockdep warning.

Regards,
Peter

======================================================
[ INFO: possible circular locking dependency detected ]
3.11.0-rc2-cold-00096-gae2ad35-dirty #1 Tainted: G           O
-------------------------------------------------------
tee/29902 is trying to acquire lock:
 (pci_remove_rescan_mutex){+.+.+.}, at: [<ffffffff8134be96>] dev_rescan_store+0x56/0x80

but task is already holding lock:
 (s_active#296){++++.+}, at: [<ffffffff811fddad>] sysfs_write_file+0xcd/0x170

which lock already depends on the new lock.
Comment 1 Peter Wu 2013-08-05 20:39:59 UTC
Created attachment 107106 [details]
dmesg boot + remove + rescan

As requested in the mail, a full dmesg.

At 32.959667, I ran:

    sudo tee /sys/bus/pci/devices/0000\:03\:00.0/remove <<<1

At 41.208461, I ran:

    sudo tee /sys/bus/pci/devices/0000\:02\:00.0/rescan <<<1

Note, the PCI ID of this 03:00.0 device is different than the one from my previous mail in which the device was owned by the pci-stub driver. Due to a hardware (?) bug, I have to kick the EEPROM to change the wrong PCI ID 10ec:8129 to 10ec:8169. For this lockdep issue, that should not matter though. I am mentioning it for completeness and to take away any possible confusion.

The kernel is 3.11.0-rc2-cold-00096-gae2ad35-dirty which is mainline plus some r8169 patches that I was working on (for dumping/changing EEPROM).
Comment 2 Peter Wu 2013-08-05 20:44:06 UTC
Created attachment 107107 [details]
lspci -vv

`lspci -vv` attached, `lspci -tvnn` below:

-[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM Controller [8086:0100]
           +-02.0  Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0102]
           +-16.0  Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 [8086:1c3a]
           +-1a.0  Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 [8086:1c2d]
           +-1b.0  Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller [8086:1c20]
           +-1c.0-[01]--
           +-1c.3-[02-03]----00.0-[03]----00.0  Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller [10ec:8169]
           +-1c.4-[04]----00.0  Etron Technology, Inc. EJ168 USB 3.0 Host Controller [1b6f:7023]
           +-1c.5-[05]----00.0  Etron Technology, Inc. EJ168 USB 3.0 Host Controller [1b6f:7023]
           +-1c.6-[06]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller [10ec:8168]
           +-1c.7-[07]----00.0  Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller [1b4b:9172]
           +-1d.0  Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 [8086:1c26]
           +-1f.0  Intel Corporation Z68 Express Chipset Family LPC Controller [8086:1c44]
           +-1f.2  Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller [8086:1c02]
           \-1f.3  Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller [8086:1c22]
Comment 3 Bjorn Helgaas 2014-06-06 20:48:19 UTC
Probably fixed by these commits:

  f41b32613138 ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
  d42f5da23400 ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock

Peter, can you test v3.15 (which will probably be released on 6/8 or so) and figure out whether this is fixed or still broken?
Comment 4 Peter Wu 2014-06-07 08:40:22 UTC
Created attachment 138451 [details]
dmesg (v3.15-rc6)

Bjorn, tested as follows on v3.15-rc6 (plus "r8169: add ethtool eeprom change/dump feature" patch):

sudo tee /sys/bus/pci/devices/0000\:04\:00.0/remove <<<1
sudo tee /sys/bus/pci/devices/0000\:03\:00.0/rescan <<<1

The original problem is gone, now a new lockdep issue emerges. See attached dmesg.

Note You need to log in before you can comment on or make changes to this bug.