Bug 12046

Summary: circular locking dependency detected [ by iwlgan, on thinkpad x200]
Product: Drivers Reporter: arrow zhang (arrow.ebd)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: arrow.ebd, johannes, reinette.chatre, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11808    
Attachments: patch removing notify function

Description arrow zhang 2008-11-16 06:16:52 UTC
Latest working kernel version: v2.6.27.5
Distribution: archlinux
Hardware Environment: thinkpad x200
Problem Description: when I do "ifconfig wlan0 down" just after system startup.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.28-rc5 #17
-------------------------------------------------------
ifconfig/3218 is trying to acquire lock:
 (iwlagn){--..}, at: [<c023599a>] flush_workqueue+0x0/0x74

but task is already holding lock:
 (rtnl_mutex){--..}, at: [<c0456b5c>] rtnl_lock+0xf/0x11

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (rtnl_mutex){--..}:
       [<c0246391>] __lock_acquire+0xf20/0x11e7
       [<c02466b3>] lock_acquire+0x5b/0x81
       [<c0510c23>] mutex_lock_nested+0xc5/0x24a
       [<c0456b5c>] rtnl_lock+0xf/0x11
       [<c04f5638>] ieee80211_notify_mac+0x10/0x49
       [<f885a72b>] iwl_bg_alive_start+0x38c/0x3d9 [iwlagn]
       [<c0235247>] run_workqueue+0xd6/0x1a4
       [<c02353cb>] worker_thread+0xb6/0xc2
       [<c02381d9>] kthread+0x3b/0x61
       [<c020491b>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #1 (&(&priv->alive_start)->work){--..}:
       [<c0246391>] __lock_acquire+0xf20/0x11e7
       [<c02466b3>] lock_acquire+0x5b/0x81
       [<c0235241>] run_workqueue+0xd0/0x1a4
       [<c02353cb>] worker_thread+0xb6/0xc2
       [<c02381d9>] kthread+0x3b/0x61
       [<c020491b>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (iwlagn){--..}:
       [<c0246128>] __lock_acquire+0xcb7/0x11e7
       [<c02466b3>] lock_acquire+0x5b/0x81
       [<c02359d1>] flush_workqueue+0x37/0x74
       [<f8854cd0>] iwl4965_mac_stop+0xeb/0x1a8 [iwlagn]
       [<c04f8907>] ieee80211_stop+0x3a5/0x3ee
       [<c044f536>] dev_close+0x67/0x86
       [<c044f260>] dev_change_flags+0xa2/0x157
       [<c0497805>] devinet_ioctl+0x21a/0x515
       [<c0498698>] inet_ioctl+0x8e/0xa7
       [<c044397e>] sock_ioctl+0x1bd/0x1e1
       [<c02955af>] vfs_ioctl+0x22/0x69
       [<c0295995>] do_vfs_ioctl+0x39f/0x3d0
       [<c02959f2>] sys_ioctl+0x2c/0x45
       [<c0203a4b>] sysenter_do_call+0x12/0x3f
       [<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by ifconfig/3218:
 #0:  (rtnl_mutex){--..}, at: [<c0456b5c>] rtnl_lock+0xf/0x11

stack backtrace:
Pid: 3218, comm: ifconfig Not tainted 2.6.28-rc5 #17
Call Trace:
 [<c050f88f>] ? printk+0xf/0x11
 [<c024516a>] print_circular_bug_tail+0x94/0x9f
 [<c0246128>] __lock_acquire+0xcb7/0x11e7
 [<c05121c4>] ? _spin_unlock_irqrestore+0x40/0x50
 [<c02466b3>] lock_acquire+0x5b/0x81
 [<c023599a>] ? flush_workqueue+0x0/0x74
 [<c02359d1>] flush_workqueue+0x37/0x74
 [<c023599a>] ? flush_workqueue+0x0/0x74
 [<f8854cd0>] iwl4965_mac_stop+0xeb/0x1a8 [iwlagn]
 [<c04f8907>] ieee80211_stop+0x3a5/0x3ee
 [<c044f536>] dev_close+0x67/0x86
 [<c044f260>] dev_change_flags+0xa2/0x157
 [<c0497805>] devinet_ioctl+0x21a/0x515
 [<c0498698>] inet_ioctl+0x8e/0xa7
 [<c044397e>] sock_ioctl+0x1bd/0x1e1
 [<c04437c1>] ? sock_ioctl+0x0/0x1e1
 [<c02955af>] vfs_ioctl+0x22/0x69
 [<c0295995>] do_vfs_ioctl+0x39f/0x3d0
 [<c0342815>] ? _raw_spin_unlock+0x74/0x78
 [<c05121f1>] ? _spin_unlock+0x1d/0x20
 [<c022e20e>] ? cap_set_effective+0x43/0x4d
 [<c028a5ec>] ? sys_faccessat+0x169/0x173
 [<c0203a87>] ? sysenter_exit+0xf/0x1a
 [<c02959f2>] sys_ioctl+0x2c/0x45
 [<c0203a4b>] sysenter_do_call+0x12/0x3f
iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, 1.3.27kds
iwlagn: Copyright(c) 2003-2008 Intel Corporation
iwlagn 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:03:00.0: setting latency timer to 64
iwlagn: Detected Intel Wireless WiFi Link 5100AGN REV=0x54
iwlagn: Tunable channels: 13 802.11bg, 24 802.11a channels
phy2: Selected rate control algorithm 'iwl-agn-rs'
iwlagn 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlagn 0000:03:00.0: irq 43 for MSI/MSI-X
iwlagn 0000:03:00.0: firmware: requesting iwlwifi-5000-1.ucode
Registered led device: iwl-phy2:radio
Registered led device: iwl-phy2:assoc
Registered led device: iwl-phy2:RX
Registered led device: iwl-phy2:TX
ADDRCONF(NETDEV_UP): wlan0: link is not ready
Comment 1 Oleg Nesterov 2008-11-16 17:53:02 UTC
Commit db7fb86b0ca565cf3537401612581a8158025cc2 ?
Comment 2 Johannes Berg 2008-11-16 18:14:52 UTC
Well, that commit just tried to fix things... We've decided that it's invalid to call anything that needs the rtnl from the mac80211 provided workqueue, so iwlwifi (both iwlagn and iwl3945) will need to use schedule_work() here instead of queue_work().

To be honest, this notify thing is causing us trouble to no end for so little gain. I think we should just remove it again.

There are two use cases for that notify
 (1) when the firmware crashed
 (2) at resume time

(2) needs to be fixed differently to completely reinit the hardware, this notify function only works for station and ibss modes, not for an AP or mesh network or ...

(1) is subject to the same conditions as (2), and as such should probably be implemented by power-cycling the hardware completely and unregistering all mac80211 state etc. after all, hopefully this is rare

Why the driver is calling the notify function when it's starting up is beyond me, that seems entirely pointless.
Comment 3 Johannes Berg 2008-11-16 18:40:47 UTC
Created attachment 18882 [details]
patch removing notify function
Comment 4 arrow zhang 2008-11-20 04:46:19 UTC
Fine, the "circular warning message" is disappear, thanks.
Should me "close" the bug list, OR do it when mainline have fixed it ?
Comment 5 Rafael J. Wysocki 2008-11-22 13:44:33 UTC
Fixed by commit 8e3bad65a59915f2ddc40f62a180ad81695d8440 .