Kernel 2.6.33-rc3 64bits Acer Aspire 1511LMi Bluewhite 64 13 (64bits port of Slackware 13) While suspending to ram, I got the following warning: ... PM: Entering mem sleep Suspending console(s) (use no_console_suspend to debug) ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.33-rc3 #2 ------------------------------------------------------- hibernate.sh/3424 is trying to acquire lock: ((&(&rfkill->poll_work)->work)){+.+...}, at: [<ffffffff81049944>] __cancel_work_timer+0x74 /0x210 but task is already holding lock: (s_active){++++.+}, at: [<ffffffff8110ae82>] sysfs_get_active_two+0x22/0x60 which lock already depends on the new lock. I didn't experienced this warning with previous versions. the existing dependency chain (in reverse order) is: -> #2 (s_active){++++.+}: [<ffffffff81060f9c>] __lock_acquire+0x165c/0x1d10 [<ffffffff810616ab>] lock_acquire+0x5b/0x80 [<ffffffff8110ab41>] sysfs_addrm_finish+0xf1/0x180 [<ffffffff81108bf6>] sysfs_hash_and_remove+0x56/0x80 [<ffffffff8110be4f>] sysfs_remove_group+0x4f/0xe0 [<ffffffff812d53f4>] dpm_sysfs_remove+0x14/0x20 [<ffffffff812ceb25>] device_del+0x45/0x1a0 [<ffffffff812cec91>] device_unregister+0x11/0x20 [<ffffffff812d9539>] _request_firmware+0x2f9/0x4e0 [<ffffffff812d97ae>] request_firmware+0xe/0x10 [<ffffffff8132f618>] b43_do_request_fw+0x98/0x230 [<ffffffff81330ba7>] b43_try_request_fw+0x97/0x360 [<ffffffff81332b7a>] b43_wireless_core_init+0x1ca/0x10e0 [<ffffffff81333c40>] b43_op_start+0x1b0/0x1d0 [<ffffffff81565809>] ieee80211_open+0x3d9/0x6d0 [<ffffffff814558aa>] dev_open+0xca/0x110 [<ffffffff81454aed>] dev_change_flags+0x9d/0x1e0 [<ffffffff814b3879>] devinet_ioctl+0x8e9/0x8f0 [<ffffffff814b4c14>] inet_ioctl+0x94/0xc0 [<ffffffff8144010d>] sock_do_ioctl+0x2d/0x60 [<ffffffff814401b6>] sock_ioctl+0x76/0x280 [<ffffffff810bff15>] vfs_ioctl+0x35/0xd0 [<ffffffff810c00ea>] do_vfs_ioctl+0x8a/0x5c0 [<ffffffff810c066a>] sys_ioctl+0x4a/0x80 [<ffffffff81002468>] system_call_fastpath+0x16/0x1b -> #1 (&wl->mutex){+.+.+.}: [<ffffffff81060f9c>] __lock_acquire+0x165c/0x1d10 [<ffffffff810616ab>] lock_acquire+0x5b/0x80 [<ffffffff8158851b>] mutex_lock_nested+0x5b/0x380 [<ffffffff8134a244>] b43_rfkill_poll+0x44/0x130 [<ffffffff815675c2>] ieee80211_rfkill_poll+0x22/0x40 [<ffffffff8153e204>] cfg80211_rfkill_poll+0x14/0x20 [<ffffffff81574333>] rfkill_poll+0x23/0x40 [<ffffffff81048fd5>] worker_thread+0x1e5/0x340 [<ffffffff8104d21e>] kthread+0x9e/0xb0 [<ffffffff81003074>] kernel_thread_helper+0x4/0x10 -> #0 ((&(&rfkill->poll_work)->work)){+.+...}: [<ffffffff810615ac>] __lock_acquire+0x1c6c/0x1d10 [<ffffffff810616ab>] lock_acquire+0x5b/0x80 [<ffffffff81049976>] __cancel_work_timer+0xa6/0x210 [<ffffffff81049aed>] cancel_delayed_work_sync+0xd/0x10 [<ffffffff8157422f>] rfkill_pause_polling+0x1f/0x30 [<ffffffff8157424d>] rfkill_suspend+0xd/0x20 [<ffffffff812d5b43>] legacy_suspend+0x33/0x80 [<ffffffff812d6070>] dpm_suspend_start+0x460/0x4f0 [<ffffffff8106da5f>] suspend_devices_and_enter+0x5f/0x200 [<ffffffff8106dd29>] enter_state+0x129/0x150 [<ffffffff8106d33a>] state_store+0xba/0x100 [<ffffffff8123f057>] kobj_attr_store+0x17/0x20 [<ffffffff811094c4>] sysfs_write_file+0xd4/0x150 [<ffffffff810b15e6>] vfs_write+0xc6/0x150 [<ffffffff810b1760>] sys_write+0x50/0x90 [<ffffffff81002468>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 4 locks held by hibernate.sh/3424: #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff81109434>] sysfs_write_file+0x44/0x150 #1: (s_active){++++.+}, at: [<ffffffff8110ae9d>] sysfs_get_active_two+0x3d/0x60 #2: (s_active){++++.+}, at: [<ffffffff8110ae82>] sysfs_get_active_two+0x22/0x60 #3: (pm_mutex){+.+.+.}, at: [<ffffffff8106dc49>] enter_state+0x49/0x150 stack backtrace: Pid: 3424, comm: hibernate.sh Not tainted 2.6.33-rc3 #2 Call Trace: [<ffffffff8105f2d0>] print_circular_bug+0xf0/0x100 [<ffffffff81049944>] ? __cancel_work_timer+0x74/0x210 [<ffffffff810615ac>] __lock_acquire+0x1c6c/0x1d10 [<ffffffff8105e7bf>] ? mark_held_locks+0x6f/0x90 [<ffffffff810616ab>] lock_acquire+0x5b/0x80 [<ffffffff81049944>] ? __cancel_work_timer+0x74/0x210 [<ffffffff81049976>] __cancel_work_timer+0xa6/0x210 [<ffffffff81049944>] ? __cancel_work_timer+0x74/0x210 [<ffffffff8105e7bf>] ? mark_held_locks+0x6f/0x90 [<ffffffff815887b5>] ? mutex_lock_nested+0x2f5/0x380 [<ffffffff81574240>] ? rfkill_suspend+0x0/0x20 [<ffffffff81049aed>] cancel_delayed_work_sync+0xd/0x10 [<ffffffff8157422f>] rfkill_pause_polling+0x1f/0x30 [<ffffffff8157424d>] rfkill_suspend+0xd/0x20 [<ffffffff812d5b43>] legacy_suspend+0x33/0x80 [<ffffffff812d6070>] dpm_suspend_start+0x460/0x4f0 [<ffffffff8106da5f>] suspend_devices_and_enter+0x5f/0x200 [<ffffffff8106dd29>] enter_state+0x129/0x150 [<ffffffff8106d33a>] state_store+0xba/0x100 [<ffffffff8123f057>] kobj_attr_store+0x17/0x20 [<ffffffff811094c4>] sysfs_write_file+0xd4/0x150 [<ffffffff810b15e6>] vfs_write+0xc6/0x150 [<ffffffff810b1760>] sys_write+0x50/0x90 [<ffffffff81002468>] system_call_fastpath+0x16/0x1b parport_pc 00:0b: disabled b43-pci-bridge 0000:02:08.0: PCI INT A disabled ... PM: suspend of devices complete after 567.248 msecs PM: late suspend of devices complete after 39.686 msecs ACPI: Preparing to enter system sleep state S3 Back to C! ACPI: Waking up from system sleep state S3 ... The hibernation script called stops the network with: /etc/rc.d/rc.inet1 stop before going to sleep.
Update: still present in -rc4.
On Tuesday 26 January 2010, Christian Casteyde wrote: > Yes, it's still present in 33-rc5.
When I suspend to RAM or disk with my b43-wireless system, I get a different locking problem. I don't know if the two are related. ============================================= [ INFO: possible recursive locking detected ] 2.6.33-rc5-Linus-dirty #173 --------------------------------------------- sh/3488 is trying to acquire lock: (s_active){++++.+}, at: [<ffffffff81167413>] sysfs_addrm_finish+0x43/0x70 but task is already holding lock: (s_active){++++.+}, at: [<ffffffff8116771d>] sysfs_get_active_two+0x3d/0x60 other info that might help us debug this: 4 locks held by sh/3488: #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff81165b7f>] sysfs_write_file+0x3f/0x160 #1: (s_active){++++.+}, at: [<ffffffff8116771d>] sysfs_get_active_two+0x3d/0x60 #2: (s_active){++++.+}, at: [<ffffffff81167702>] sysfs_get_active_two+0x22/0x60 #3: (dbs_mutex){+.+.+.}, at: [<ffffffff81271517>] cpufreq_governor_dbs+0xe7/0x480 stack backtrace: Pid: 3488, comm: sh Not tainted 2.6.33-rc5-Linus-dirty #173 Call Trace: [<ffffffff8107c36b>] __lock_acquire+0xf6b/0x1d30 [<ffffffff81078e9f>] ? lockdep_init_map+0x5f/0x5d0 [<ffffffff8107d1cb>] lock_acquire+0x9b/0x120 [<ffffffff81167413>] ? sysfs_addrm_finish+0x43/0x70 [<ffffffff81166ba3>] sysfs_deactivate+0xc3/0x110 [<ffffffff81167413>] ? sysfs_addrm_finish+0x43/0x70 [<ffffffff81167413>] sysfs_addrm_finish+0x43/0x70 [<ffffffff81165206>] sysfs_hash_and_remove+0x56/0x80 [<ffffffff8116895f>] sysfs_remove_group+0x4f/0xf0 [<ffffffff8127152b>] cpufreq_governor_dbs+0xfb/0x480 [<ffffffff8107a8dd>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffff8107a92d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8126e314>] __cpufreq_governor+0x94/0x160 [<ffffffff8126f84f>] __cpufreq_set_policy+0x11f/0x180 [<ffffffff8126fc66>] store_scaling_governor+0xc6/0x200 [<ffffffff81270530>] ? handle_update+0x0/0x10 [<ffffffff8126f702>] store+0x62/0x90 [<ffffffff81165c21>] sysfs_write_file+0xe1/0x160 [<ffffffff8110b0c8>] vfs_write+0xb8/0x180 [<ffffffff8110b26c>] sys_write+0x4c/0x80 [<ffffffff81002dab>] system_call_fastpath+0x16/0x1b
This is a result of adding some new debug code to sysfs, AFAICS. It has been reported for several times already.
I don't know if Larry's trace is related to the same problem, but indeed mine seems to be due to the new debug code in sysfs. I bisected it between -rc3 and -rc4 and got this first bad commit: commit 846f99749ab68bbc7f75c74fec305de675b1a1bf Author: Eric W. Biederman <ebiederm@xmission.com> Date: Sat Jan 2 13:37:12 2010 -0800 sysfs: Add lockdep annotations for the sysfs active reference Holding locks over device_del -> kobject_del -> sysfs_deactivate can cause deadlocks if those same locks are grabbed in sysfs show or store methods. The I model s_active count + completion as a sleeping read/write lock. I describe to lockdep sysfs_get_active as a read_trylock, sysfs_put_active as a read_unlock, and sysfs_deactivate as a write_lock and write_unlock pair. This seems to capture the essence for purposes of finding deadlocks, and in my testing gives finds real issues and ignores non-issues. This brings us back to holding locks over kobject_del is a problem that ideally we should find a way of addressing, but at least lockdep can tell us about the problems instead of requiring developers to debug rare strange system deadlocks, that happen when sysfs files are removed while being written to. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> I append the bisect log. Maybe this should not be considered as a regression, since showing a potential bug doesn't introduce it: it was latent in previous versions and only visible now. Moreover, this is not blocking since I do not run kernel with debug options normally.
Created attachment 24792 [details] git bisect log
On Monday 01 February 2010, Christian Casteyde wrote: > Yes, still applies to 2.6.32-rc6.
On Monday 08 February 2010, Christian Casteyde wrote: > Still present in 2.6.33-rc7. > > Bonjour, > Le dimanche 7 février 2010, vous avez écrit : > > This message has been generated automatically as a part of a summary report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.32. Please verify if it still should be listed and let the > > tracking team know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14999 > > Subject : possible circular locking dependency detected in > rfkill at > > suspend Submitter : Christian Casteyde <casteyde.christian@free.fr> > > Date : 2010-01-06 21:52 (33 days old)
On Monday 15 February 2010, Eric W. Biederman wrote: > "Rafael J. Wysocki" <rjw@sisk.pl> writes: > > > This message has been generated automatically as a part of a summary report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.32. Please verify if it still should be listed and let the > tracking team > > know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14999 > > Subject : possible circular locking dependency detected in > rfkill at suspend > > Submitter : Christian Casteyde <casteyde.christian@free.fr> > > Date : 2010-01-06 21:52 (40 days old) > > I have a set of patches to clear up the sysfs lockdep false positives. > http://lkml.org/lkml/2010/2/11/329 > > It requires sprinking a bunch of sysfs_attr_init on the few dynamically > allocated sysfs attributes, and I haven't found them all yet. Otherwise > it seems to be working well. > > Greg should pick up those patches shortly and they should be appearing > in linux-next and then ultimately in 2.6.34.
On Saturday 27 February 2010, Christian Casteyde wrote: > Still present in 2.6.33.
Christian, have you tested linux-next? Or could you test 2.6.34-rc1 when it is available?
I've tested 2.6.34-rc1, but I cannot say anything because the wireless adapter doesn't work at all with this kernel. Please see http://bugzilla.kernel.org/show_bug.cgi?id=15505 for more details. I'll recheck this bug once #15505 is fixed, since I obviously need the driver to be started to check suspend/release correctly.
With 2.6.34-rc1 + the fix for #15505, I don't get the warning anymore. I think it was fixed in -rc1 therefore.