Yinghai reported the following lockdep warning: mlx4_core 0000:02:00.0: NOP command IRQ test passed ============================================= [ INFO: possible recursive locking detected ] 3.10.0-rc1-yh-00114-gf59c98e-dirty #1588 Not tainted --------------------------------------------- kworker/0:1/2285 is trying to acquire lock: ((&wfc.work)){+.+.+.}, at: [<ffffffff810ab745>] flush_work+0x5/0x280 but task is already holding lock: ((&wfc.work)){+.+.+.}, at: [<ffffffff810aabe2>] process_one_work+0x202/0x490 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock((&wfc.work)); lock((&wfc.work)); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/0:1/2285: #0: (events){.+.+.+}, at: [<ffffffff810aabe2>] process_one_work+0x202/0x490 #1: ((&wfc.work)){+.+.+.}, at: [<ffffffff810aabe2>] process_one_work+0x202/0x490 #2: (&__lockdep_no_validate__){......}, at: [<ffffffff81765eea>] device_attach+0x2a/0xc0 stack backtrace: CPU: 0 PID: 2285 Comm: kworker/0:1 Not tainted 3.10.0-rc1-yh-00114-gf59c98e-dirty #1588 Hardware name: Oracle Corporation unknown / , BIOS 11016600 05/17/2011 Workqueue: events work_for_cpu_fn ffffffff83350bc0 ffff881025c11778 ffffffff82093a74 ffff881025c11838 ffffffff810ed194 ffff881025c117b8 ffff881025c38000 0000b787702301dc ffff881000000000 0000000000000002 ffffffff8322cba0 ffff881025c11878 Call Trace: [<ffffffff82093a74>] dump_stack+0x19/0x1b [<ffffffff810ed194>] validate_chain.isra.19+0x8f4/0x1210 [<ffffffff810f0c40>] __lock_acquire+0xac0/0xce0 [<ffffffff810f150a>] lock_acquire+0xda/0x130 [<ffffffff810ab78c>] flush_work+0x4c/0x280 [<ffffffff810aba42>] work_on_cpu+0x82/0x90 [<ffffffff8151ebcf>] pci_device_probe+0xaf/0x110 [<ffffffff8176608d>] driver_probe_device+0xdd/0x220 [<ffffffff817662b3>] __device_attach+0x33/0x50 [<ffffffff817640b6>] bus_for_each_drv+0x56/0xa0 [<ffffffff81765f48>] device_attach+0x88/0xc0 [<ffffffff81515b49>] pci_bus_add_device+0x39/0x60 [<ffffffff81540605>] pci_bus_add_vf+0x25/0x40 [<ffffffff81540834>] pci_bus_add_device_vfs+0xa4/0xe0 [<ffffffff81c1faa6>] __mlx4_init_one+0xa96/0xc90 [<ffffffff81c1fd0d>] mlx4_init_one+0x4d/0x60 [<ffffffff8151e2db>] local_pci_probe+0x4b/0x80 [<ffffffff810a7958>] work_for_cpu_fn+0x18/0x30 [<ffffffff810aac6b>] process_one_work+0x28b/0x490
This problem occurs when drivers call pci_enable_sriov() from their .probe() method, via this path: pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) # .probe for PF driver .probe pci_enable_sriov ... pci_bus_add_device # add new VF ... pci_call_probe work_on_cpu(cpu, local_pci_probe, ...) # .probe() for VF Drivers that support SR-IOV should implement the .sriov_configure() method and enable SR-IOV there, because then users can use the sysfs interface to configure each instance of a PF differently. But some drivers enable SR-IOV in .probe() so they can continue supporting module parameters for the number of VFs. Drivers that use SR-IOV but don't yet implement .sriov_configure() include: be, cxgb4, efx, enic, mlx4, and vxge.