Bug 13245 - possible circular locking dependency detected
Summary: possible circular locking dependency detected
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: networking_wireless@kernel-bugs.osdl.org
URL:
Keywords:
: 13296 (view as bug list)
Depends on:
Blocks: 7216 13070
  Show dependency tree
 
Reported: 2009-05-04 16:56 UTC by Miles Lane
Modified: 2009-05-30 21:36 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.30-rc4-git1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
INFO message text (8.56 KB, text/plain)
2009-05-04 16:56 UTC, Miles Lane
Details
PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUs (5.65 KB, patch)
2009-05-23 23:21 UTC, Rafael J. Wysocki
Details | Diff

Description Miles Lane 2009-05-04 16:56:58 UTC
Created attachment 21214 [details]
INFO message text

As far as I can tell, the system is running normally after resume.  
Also, I only see this INFO message on the first suspend/resume cycle.  Subsequent suspends do not trigger the message.
Comment 1 Johannes Berg 2009-05-04 17:22:32 UTC
Hi all,

This is effectively a copy of
http://bugzilla.kernel.org/show_bug.cgi?id=13245 but with a few more
people. I hope this works and bugme picks it up.

The report is below.

The problem here is effectively that something runs off the events
workqueue, and ends up, with some indirection that I think doesn't
matter much, using cpu_add_remove_lock, either while being on the events
workqueue or in this case depending on a lock that was used from the
events workqueue.

It seems that the problem is in disable_nonboot_cpus, which via the
notifier indirection calls cleanup_workqueue_thread() which flushes the
workqueue so needs the fake "events" lock.

On the other hand, via some indirection, things that run off the events
end up using (depending on something that uses) cpu_add_remove_lock.

Thoughts?

[ 1231.558889] [ INFO: possible circular locking dependency detected ]
[ 1231.558895] 2.6.30-rc4-git1 #5
[ 1231.558899] -------------------------------------------------------
[ 1231.558903] pm-suspend/4741 is trying to acquire lock:
[ 1231.558908]  (events){+.+.+.}, at: [<ffffffff80249feb>] cleanup_workqueue_thread+0x1e/0xf8
[ 1231.558927] 
[ 1231.558929] but task is already holding lock:
[ 1231.558932]  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8023a8a5>] disable_nonboot_cpus+0x33/0x11d
[ 1231.558947] 
[ 1231.558948] which lock already depends on the new lock.
[ 1231.558951] 
[ 1231.558954] 
[ 1231.558955] the existing dependency chain (in reverse order) is:
[ 1231.558959] 
[ 1231.558961] -> #5 (cpu_add_remove_lock){+.+.+.}:
[ 1231.558969]        [<ffffffff8025dc42>] __lock_acquire+0xa88/0xc22
[ 1231.558980]        [<ffffffff8025ded2>] lock_acquire+0xf6/0x11a
[ 1231.558988]        [<ffffffff8058a217>] __mutex_lock_common+0x5a/0x429
[ 1231.558999]        [<ffffffff8058a680>] mutex_lock_nested+0x30/0x35
[ 1231.559008]        [<ffffffff8023a80e>] cpu_maps_update_begin+0x12/0x14
[ 1231.559017]        [<ffffffff8024a34b>] __create_workqueue_key+0x161/0x226
[ 1231.559026]        [<ffffffff80275cc8>] stop_machine_create+0x3a/0x9b
[ 1231.559035]        [<ffffffff80275d42>] stop_machine+0x19/0x4b
[ 1231.559044]        [<ffffffffa0115238>] 0xffffffffa0115238
[ 1231.559061]        [<ffffffff8020906b>] do_one_initcall+0x65/0x153
[ 1231.559072]        [<ffffffff80266b50>] sys_init_module+0xab/0x1d3
[ 1231.559080]        [<ffffffff8020b26b>] system_call_fastpath+0x16/0x1b
[ 1231.559089]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 1231.559097] 
[ 1231.559099] -> #4 (setup_lock){+.+.+.}:
[ 1231.559105]        [<ffffffff8025dc42>] __lock_acquire+0xa88/0xc22
[ 1231.559114]        [<ffffffff8025ded2>] lock_acquire+0xf6/0x11a
[ 1231.559122]        [<ffffffff8058a217>] __mutex_lock_common+0x5a/0x429
[ 1231.559132]        [<ffffffff8058a680>] mutex_lock_nested+0x30/0x35
[ 1231.559141]        [<ffffffff80275ca0>] stop_machine_create+0x12/0x9b
[ 1231.559150]        [<ffffffff8023a888>] disable_nonboot_cpus+0x16/0x11d
[ 1231.559159]        [<ffffffff80267993>] suspend_devices_and_enter+0xd6/0x1b1
[ 1231.559168]        [<ffffffff80267bfa>] enter_state+0x163/0x1c9
[ 1231.559176]        [<ffffffff80267d17>] state_store+0xb7/0xd8
[ 1231.559184]        [<ffffffff803bd78b>] kobj_attr_store+0x17/0x19
[ 1231.559194]        [<ffffffff803137b3>] sysfs_write_file+0xe4/0x119
[ 1231.559203]        [<ffffffff802c237a>] vfs_write+0xab/0x105
[ 1231.559212]        [<ffffffff802c2498>] sys_write+0x47/0x70
[ 1231.559219]        [<ffffffff8020b26b>] system_call_fastpath+0x16/0x1b
[ 1231.559228]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 1231.559236] 
[ 1231.559238] -> #3 (dpm_list_mtx){+.+.+.}:
[ 1231.559245]        [<ffffffff8025dc42>] __lock_acquire+0xa88/0xc22
[ 1231.559253]        [<ffffffff8025ded2>] lock_acquire+0xf6/0x11a
[ 1231.559262]        [<ffffffff8058a217>] __mutex_lock_common+0x5a/0x429
[ 1231.559272]        [<ffffffff8058a680>] mutex_lock_nested+0x30/0x35
[ 1231.559281]        [<ffffffff8046002d>] device_pm_add+0x1e/0xb5
[ 1231.559292]        [<ffffffff80459fae>] device_add+0x37c/0x557
[ 1231.559301]        [<ffffffffa008f453>] wiphy_register+0x148/0x1ec [cfg80211]
[ 1231.559320]        [<ffffffffa00c62f8>] ieee80211_register_hw+0xe9/0x3aa [mac80211]
[ 1231.559348]        [<ffffffffa01776e2>] iwl3945_pci_probe+0xb5c/0xcce [iwl3945]
[ 1231.559368]        [<ffffffff803d1a5f>] local_pci_probe+0x12/0x16
[ 1231.559378]        [<ffffffff803d274e>] pci_device_probe+0x5f/0x89
[ 1231.559388]        [<ffffffff8045bf49>] driver_probe_device+0x9e/0x143
[ 1231.559398]        [<ffffffff8045c046>] __driver_attach+0x58/0x7b
[ 1231.559406]        [<ffffffff8045b83c>] bus_for_each_dev+0x54/0x8a
[ 1231.559415]        [<ffffffff8045bdbb>] driver_attach+0x1c/0x1e
[ 1231.559423]        [<ffffffff8045b178>] bus_add_driver+0xb2/0x1e6
[ 1231.559432]        [<ffffffff8045c300>] driver_register+0xb6/0x121
[ 1231.559439]        [<ffffffff803d2b76>] __pci_register_driver+0x61/0xcf
[ 1231.559448]        [<ffffffffa018505c>] 0xffffffffa018505c
[ 1231.559456]        [<ffffffff8020906b>] do_one_initcall+0x65/0x153
[ 1231.559464]        [<ffffffff80266b50>] sys_init_module+0xab/0x1d3
[ 1231.559473]        [<ffffffff8020b26b>] system_call_fastpath+0x16/0x1b
[ 1231.559481]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 1231.559489] 
[ 1231.559491] -> #2 (cfg80211_mutex){+.+.+.}:
[ 1231.559498]        [<ffffffff8025dc42>] __lock_acquire+0xa88/0xc22
[ 1231.559507]        [<ffffffff8025ded2>] lock_acquire+0xf6/0x11a
[ 1231.559515]        [<ffffffff8058a217>] __mutex_lock_common+0x5a/0x429
[ 1231.559524]        [<ffffffff8058a680>] mutex_lock_nested+0x30/0x35
[ 1231.559533]        [<ffffffffa0090a0b>] reg_todo+0x5d/0x4d8 [cfg80211]
[ 1231.559553]        [<ffffffff80249bf9>] worker_thread+0x24b/0x367
[ 1231.559561]        [<ffffffff8024d94d>] kthread+0x56/0x83
[ 1231.559570]        [<ffffffff8020c37a>] child_rip+0xa/0x20
[ 1231.559577]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 1231.559585] 
[ 1231.559587] -> #1 (reg_work){+.+.+.}:
[ 1231.559593]        [<ffffffff8025dc42>] __lock_acquire+0xa88/0xc22
[ 1231.559603]        [<ffffffff8025ded2>] lock_acquire+0xf6/0x11a
[ 1231.559611]        [<ffffffff80249bf0>] worker_thread+0x242/0x367
[ 1231.559619]        [<ffffffff8024d94d>] kthread+0x56/0x83
[ 1231.559627]        [<ffffffff8020c37a>] child_rip+0xa/0x20
[ 1231.559634]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 1231.559658] 
[ 1231.559659] -> #0 (events){+.+.+.}:
[ 1231.559666]        [<ffffffff8025db36>] __lock_acquire+0x97c/0xc22
[ 1231.559675]        [<ffffffff8025ded2>] lock_acquire+0xf6/0x11a
[ 1231.559683]        [<ffffffff8024a012>] cleanup_workqueue_thread+0x45/0xf8
[ 1231.559691]        [<ffffffff8057c347>] workqueue_cpu_callback+0xc2/0x105
[ 1231.559703]        [<ffffffff8058e6a1>] notifier_call_chain+0x5e/0x92
[ 1231.559712]        [<ffffffff80251c78>] raw_notifier_call_chain+0xf/0x11
[ 1231.559722]        [<ffffffff8057a9db>] _cpu_down+0x278/0x295
[ 1231.559730]        [<ffffffff8023a8ea>] disable_nonboot_cpus+0x78/0x11d
[ 1231.559740]        [<ffffffff80267993>] suspend_devices_and_enter+0xd6/0x1b1
[ 1231.559749]        [<ffffffff80267bfa>] enter_state+0x163/0x1c9
[ 1231.559757]        [<ffffffff80267d17>] state_store+0xb7/0xd8
[ 1231.559765]        [<ffffffff803bd78b>] kobj_attr_store+0x17/0x19
[ 1231.559773]        [<ffffffff803137b3>] sysfs_write_file+0xe4/0x119
[ 1231.559781]        [<ffffffff802c237a>] vfs_write+0xab/0x105
[ 1231.559788]        [<ffffffff802c2498>] sys_write+0x47/0x70
[ 1231.559796]        [<ffffffff8020b26b>] system_call_fastpath+0x16/0x1b
[ 1231.559804]        [<ffffffffffffffff>] 0xffffffffffffffff
[ 1231.559813] 
[ 1231.559814] other info that might help us debug this:
[ 1231.559817] 
[ 1231.559822] 4 locks held by pm-suspend/4741:
[ 1231.559825]  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff80313707>] sysfs_write_file+0x38/0x119
[ 1231.559838]  #1:  (pm_mutex){+.+.+.}, at: [<ffffffff80267c57>] enter_state+0x1c0/0x1c9
[ 1231.559851]  #2:  (dpm_list_mtx){+.+.+.}, at: [<ffffffff8045f533>] device_pm_lock+0x12/0x14
[ 1231.559864]  #3:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8023a8a5>] disable_nonboot_cpus+0x33/0x11d
[ 1231.559878] 
[ 1231.559880] stack backtrace:
[ 1231.559886] Pid: 4741, comm: pm-suspend Not tainted 2.6.30-rc4-git1 #5
[ 1231.559892] Call Trace:
[ 1231.559895]  [<ffffffff8025ce44>] print_circular_bug_tail+0x71/0x7c
[ 1231.559895]  [<ffffffff8025db36>] __lock_acquire+0x97c/0xc22
[ 1231.559895]  [<ffffffff8025ded2>] lock_acquire+0xf6/0x11a
[ 1231.559895]  [<ffffffff80249feb>] ? cleanup_workqueue_thread+0x1e/0xf8
[ 1231.559895]  [<ffffffff8024a012>] cleanup_workqueue_thread+0x45/0xf8
[ 1231.559895]  [<ffffffff80249feb>] ? cleanup_workqueue_thread+0x1e/0xf8
[ 1231.559895]  [<ffffffff8057c347>] workqueue_cpu_callback+0xc2/0x105
[ 1231.559895]  [<ffffffff8058e6a1>] notifier_call_chain+0x5e/0x92
[ 1231.559895]  [<ffffffff80251c78>] raw_notifier_call_chain+0xf/0x11
[ 1231.559895]  [<ffffffff8057a9db>] _cpu_down+0x278/0x295
[ 1231.559895]  [<ffffffff8023a8ea>] disable_nonboot_cpus+0x78/0x11d
[ 1231.559895]  [<ffffffff80267993>] suspend_devices_and_enter+0xd6/0x1b1
[ 1231.559895]  [<ffffffff80267bfa>] enter_state+0x163/0x1c9
[ 1231.559895]  [<ffffffff80267d17>] state_store+0xb7/0xd8
[ 1231.559895]  [<ffffffff803bd78b>] kobj_attr_store+0x17/0x19
[ 1231.559895]  [<ffffffff803137b3>] sysfs_write_file+0xe4/0x119
[ 1231.559895]  [<ffffffff802c237a>] vfs_write+0xab/0x105
[ 1231.559895]  [<ffffffff802c2498>] sys_write+0x47/0x70
[ 1231.559895]  [<ffffffff8020b26b>] system_call_fastpath+0x16/0x1b
Comment 2 Miles Lane 2009-05-17 06:48:44 UTC
Yes, I am still seeing this with 2.6.30-rc6-git1.
Comment 3 Rafael J. Wysocki 2009-05-17 10:17:12 UTC
On Sunday 17 May 2009, Ming Lei wrote:
> I also can reproduce it on 2.6.30-rc5-next-20090515.
> 
> 2009/5/17 Rafael J. Wysocki <rjw@sisk.pl>:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.29.  Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=13245
> > Subject         : possible circular locking dependency detected
> > Submitter       : Miles Lane <miles.lane@gmail.com>
> > Date            : 2009-05-04 16:56 (13 days old)
Comment 4 Rafael J. Wysocki 2009-05-17 10:31:05 UTC
*** Bug 13296 has been marked as a duplicate of this bug. ***
Comment 5 Tomas Janousek 2009-05-17 12:39:28 UTC
Is it possible that this causes my system to hang after "Suspending consoles" ?
Comment 6 Johannes Berg 2009-05-17 12:59:28 UTC
(In reply to comment #5)
> Is it possible that this causes my system to hang after "Suspending consoles"
> ?

I'm not quite sure but I don't think so -- this might be a false positive.
Comment 7 Johannes Berg 2009-05-22 10:49:12 UTC
I have provided a shortened deadlock scenario here:
http://thread.gmane.org/gmane.linux.kernel/835569/focus=838839

Oleg and me independently arrived at the conclusion that the commits

	PM: Change hibernation code ordering
	4aecd6718939eb3c4145b248369b65f7483a8a02

	PM: Change suspend code ordering
	900af0d973856d6feb6fc088c2d0d3fde57707d3

are the ones causing this problem (for hibernate/suspend respectively).
Comment 8 Rafael J. Wysocki 2009-05-23 23:21:30 UTC
Created attachment 21511 [details]
PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUs

The attached patch should fix the problem.  Miles please test it.
Comment 9 Rafael J. Wysocki 2009-05-24 21:45:26 UTC
Handled-By : Rafael J. Wysocki <rjw@sisk.pl>
Patch : http://patchwork.kernel.org/patch/25557/
Comment 10 Rafael J. Wysocki 2009-05-30 21:36:44 UTC
Fixed by commit 32bdfac5462d777f35b00838893c4f87baf23efe .

Note You need to log in before you can comment on or make changes to this bug.