Bug 13136

Summary: possible circular locking dependency detected while suspending laptop to S3
Product: Other Reporter: Christian Casteyde (casteyde.christian)
Component: OtherAssignee: other_other
Status: CLOSED CODE_FIX    
Severity: normal CC: rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13070    

Description Christian Casteyde 2009-04-18 11:17:46 UTC
AMD64 X2 4200, 1.2GB, laptop Acer Aspire 1511LMi
Distro: Bluewhite64 12.1 (64bits port of Slackware 12.1)
Last known working version: 2.6.29.1
Kernel compiled with debug options (lock, memory, etc. checking)

I tried suspend / resume from ACPI event "lid" and it works, but just before going to S3 the following warning:

eth1: associated
ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Clocksource tsc unstable (delta = -183207445 ns)
*** INFO FROM REPORTER: here is the last message after normal boot ***
*** Now starting to suspend ***

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.30-rc2 #1
-------------------------------------------------------
X/4338 is trying to acquire lock:
 (events){+.+.+.}, at: [<ffffffff8024ac00>] flush_workqueue+0x0/0x70

but task is already holding lock:
 (hid_open_mut){+.+...}, at: [<ffffffff80531efc>] usbhid_close+0x2c/0x90

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #5 (hid_open_mut){+.+...}:
       [<ffffffff8025e561>] validate_chain+0xbd1/0x1200
       [<ffffffff8025eee9>] __lock_acquire+0x359/0xa50
       [<ffffffff8025f634>] lock_acquire+0x54/0x70
       [<ffffffff806a9368>] mutex_lock_nested+0x58/0x360
       [<ffffffff80531f89>] usbhid_open+0x29/0xb0
       [<ffffffff8052db95>] hidinput_open+0x15/0x20
       [<ffffffff80504709>] input_open_device+0x99/0xc0
       [<ffffffff80507a15>] mousedev_open_device+0x75/0x100
       [<ffffffff80507a7a>] mousedev_open_device+0xda/0x100
       [<ffffffff80507bdc>] mousedev_open+0x13c/0x1c0
       [<ffffffff8050633e>] input_open_file+0xce/0x1a0
       [<ffffffff802a1aa5>] chrdev_open+0xe5/0x1d0
       [<ffffffff8029ca5d>] __dentry_open+0x10d/0x390
       [<ffffffff8029dd6e>] nameidata_to_filp+0x4e/0x60
       [<ffffffff802aacaf>] do_filp_open+0x1df/0x8b0
       [<ffffffff8029c86e>] do_sys_open+0x5e/0xf0
       [<ffffffff8029c92b>] sys_open+0x1b/0x20
       [<ffffffff8020b268>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #4 (&dev->mutex){+.+.+.}:
       [<ffffffff8025e561>] validate_chain+0xbd1/0x1200
       [<ffffffff8025eee9>] __lock_acquire+0x359/0xa50
       [<ffffffff8025f634>] lock_acquire+0x54/0x70
       [<ffffffff806a8f5e>] mutex_lock_interruptible_nested+0x6e/0x420
       [<ffffffff80504949>] input_register_handle+0x39/0xc0
       [<ffffffff803e81a0>] kbd_connect+0x60/0xb0
       [<ffffffff80504265>] input_attach_handler+0x65/0xa0
       [<ffffffff805069bb>] input_register_handler+0xbb/0x110
       [<ffffffff80944539>] kbd_init+0x99/0xe0
       [<ffffffff80944767>] vty_init+0x187/0x1d0
       [<ffffffff80943db2>] tty_init+0xe2/0x110
       [<ffffffff8020903a>] _stext+0x3a/0x160
       [<ffffffff8092b8fd>] kernel_init+0x9d/0x100
       [<ffffffff8020c16a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #3 (input_mutex){+.+.+.}:
       [<ffffffff8025e561>] validate_chain+0xbd1/0x1200
       [<ffffffff8025eee9>] __lock_acquire+0x359/0xa50
       [<ffffffff8025f634>] lock_acquire+0x54/0x70
       [<ffffffff806a8f5e>] mutex_lock_interruptible_nested+0x6e/0x420
       [<ffffffff80506680>] input_register_device+0x170/0x210
       [<ffffffff805073ac>] input_register_polled_device+0x9c/0xb0
       [<ffffffff8049e59e>] b43_rfkill_init+0x17e/0x1b0
       [<ffffffff8048d5c1>] b43_op_start+0xc1/0x1a0
       [<ffffffff80688b5d>] ieee80211_open+0x2dd/0x7b0
       [<ffffffff8059ca32>] dev_open+0x72/0xc0
       [<ffffffff8059a6a1>] dev_change_flags+0x91/0x1b0
       [<ffffffff805ef84b>] devinet_ioctl+0x5ab/0x740
       [<ffffffff805f05dd>] inet_ioctl+0x6d/0x90
       [<ffffffff8058bb4a>] sock_ioctl+0xca/0x270
       [<ffffffff802ac181>] vfs_ioctl+0x31/0x90
       [<ffffffff802ac514>] do_vfs_ioctl+0x334/0x530
       [<ffffffff802ac75a>] sys_ioctl+0x4a/0x80
       [<ffffffff8020b268>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #2 (rtnl_mutex){+.+.+.}:
       [<ffffffff8025e561>] validate_chain+0xbd1/0x1200
       [<ffffffff8025eee9>] __lock_acquire+0x359/0xa50
       [<ffffffff8025f634>] lock_acquire+0x54/0x70
       [<ffffffff806a9368>] mutex_lock_nested+0x58/0x360
       [<ffffffff805a54e2>] rtnl_lock+0x12/0x20
       [<ffffffff805a6ab9>] linkwatch_event+0x9/0x40
       [<ffffffff8024b1ed>] worker_thread+0x1fd/0x2e0
       [<ffffffff8024e4d8>] kthread+0x58/0x90
       [<ffffffff8020c16a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 ((linkwatch_work).work){+.+.+.}:
       [<ffffffff8025e561>] validate_chain+0xbd1/0x1200
       [<ffffffff8025eee9>] __lock_acquire+0x359/0xa50
       [<ffffffff8025f634>] lock_acquire+0x54/0x70
       [<ffffffff8024b1e7>] worker_thread+0x1f7/0x2e0
       [<ffffffff8024e4d8>] kthread+0x58/0x90
       [<ffffffff8020c16a>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (events){+.+.+.}:
       [<ffffffff8025df72>] validate_chain+0x5e2/0x1200
       [<ffffffff8025eee9>] __lock_acquire+0x359/0xa50
       [<ffffffff8025f634>] lock_acquire+0x54/0x70
       [<ffffffff8024ac48>] flush_workqueue+0x48/0x70
       [<ffffffff8024ac80>] flush_scheduled_work+0x10/0x20
       [<ffffffff80531f33>] usbhid_close+0x63/0x90
       [<ffffffff8052b785>] hidinput_close+0x15/0x20
       [<ffffffff80504807>] input_close_device+0x57/0x90
       [<ffffffff805078bc>] mousedev_close_device+0x3c/0xc0
       [<ffffffff8050792f>] mousedev_close_device+0xaf/0xc0
       [<ffffffff80507977>] mousedev_release+0x37/0x60
       [<ffffffff8029f98c>] __fput+0xcc/0x260
       [<ffffffff8029fe36>] fput+0x16/0x20
       [<ffffffff8029c6c8>] filp_close+0x58/0x90
       [<ffffffff8029c7a1>] sys_close+0xa1/0x110
       [<ffffffff8020b268>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

4 locks held by X/4338:
 #0:  (&mousedev->mutex/31){+.+...}, at: [<ffffffff8050789b>] mousedev_close_device+0x1b/0xc0
 #1:  (&mousedev->mutex#2){+.+...}, at: [<ffffffff8050789b>] mousedev_close_device+0x1b/0xc0
 #2:  (&dev->mutex){+.+.+.}, at: [<ffffffff805047dc>] input_close_device+0x2c/0x90
 #3:  (hid_open_mut){+.+...}, at: [<ffffffff80531efc>] usbhid_close+0x2c/0x90

stack backtrace:
Pid: 4338, comm: X Not tainted 2.6.30-rc2 #1
Call Trace:
 [<ffffffff8025d7d4>] print_circular_bug_tail+0x94/0xf0
 [<ffffffff8025df72>] validate_chain+0x5e2/0x1200
 [<ffffffff8025eee9>] __lock_acquire+0x359/0xa50
 [<ffffffff8025f634>] lock_acquire+0x54/0x70
 [<ffffffff8024ac00>] ? flush_workqueue+0x0/0x70
 [<ffffffff8024ac48>] flush_workqueue+0x48/0x70
 [<ffffffff8024ac00>] ? flush_workqueue+0x0/0x70
 [<ffffffff8025c46d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8024ac80>] flush_scheduled_work+0x10/0x20
 [<ffffffff80531f33>] usbhid_close+0x63/0x90
 [<ffffffff8052b785>] hidinput_close+0x15/0x20
 [<ffffffff80504807>] input_close_device+0x57/0x90
 [<ffffffff805078bc>] mousedev_close_device+0x3c/0xc0
 [<ffffffff8050792f>] mousedev_close_device+0xaf/0xc0
 [<ffffffff80507977>] mousedev_release+0x37/0x60
 [<ffffffff8029f98c>] __fput+0xcc/0x260
 [<ffffffff8029fe36>] fput+0x16/0x20
 [<ffffffff8029c6c8>] filp_close+0x58/0x90
 [<ffffffff8029c7a1>] sys_close+0xa1/0x110
 [<ffffffff8020b268>] system_call_fastpath+0x16/0x1b
eth1: deauthenticating by local choice (reason=3)
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.01 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
Suspending console(s) (use no_console_suspend to debug)
parport_pc 00:0b: disabled
b43-pci-bridge 0000:02:08.0: PCI INT A disabled
ACPI handle has no context!
Intel ICH Modem 0000:00:06.1: PCI INT B disabled
Intel ICH 0000:00:06.0: PCI INT A disabled
ehci_hcd 0000:00:02.2: PCI INT C disabled
ehci_hcd 0000:00:02.2: PME# disabled
ohci_hcd 0000:00:02.1: PCI INT B disabled
ohci_hcd 0000:00:02.1: PME# disabled
ohci_hcd 0000:00:02.0: PCI INT A disabled
ohci_hcd 0000:00:02.0: PME# disabled
ACPI: Preparing to enter system sleep state S3
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106

*** NOTE FROM REPORTER : Thawing the laptop now (reopening the lid) ***
Back to C!
ACPI: Waking up from system sleep state S3
ohci_hcd 0000:00:02.0: restoring config space at offset 0x1 (was 0xb00007, writing 0xb00003)
ohci_hcd 0000:00:02.1: restoring config space at offset 0x1 (was 0xb00007, writing 0xb00003)
ehci_hcd 0000:00:02.2: restoring config space at offset 0xd (was 0x80, writing 0x44)
ehci_hcd 0000:00:02.2: restoring config space at offset 0x1 (was 0xb00006, writing 0xb00002)
Intel ICH 0000:00:06.0: restoring config space at offset 0x1 (was 0xb00007, writing 0xb00003)
Intel ICH Modem 0000:00:06.1: restoring config space at offset 0x1 (was 0xb00007, writing 0xb00003)
pci 0000:00:0a.0: restoring config space at offset 0x9 (was 0xfff0, writing 0x63f06000)
pci 0000:00:0a.0: restoring config space at offset 0x7 (was 0x2803030, writing 0x82803030)
pci 0000:00:0a.0: restoring config space at offset 0x6 (was 0x80020200, writing 0x80060200)

etc.

For more info, just to make suspend working on this laptop, the following sequence is issued on the lid event (from a custom hibernate.sh script):
...
       # Switch to virtual terminal 1 to avoid graphics corruption in X
        chvt 1

        # Prevent double power event:
        touch /var/state/hibernating

        # Stops network:
        /etc/rc.d/rc.inet1 eth1_stop

        # Flush all data to disk (just in case)
        LD_PRELOAD="" sync

        # Save the system time
        hwclock --systohc --utc

        # Suspend
        echo -n mem > /sys/power/state

        # 2.6.25 does not send lid event at resume:
        rm -rf /var/state/hibernating

        # Restore the system time
        hwclock --hctosys --utc

        # Reset the graphic card
        vbetool post

        # Reset the CD/DVD drive
        hdparm -w /dev/hdc

        # Reset audio
        alsactl restore

        # Starts network:
        /usr/sbin/wifi &

        # Switch back to X
        chvt 7
...

So suspension is done from the console even if X was started.
Using nv free driver, no tainting.

Regards
Comment 1 Christian Casteyde 2009-04-21 17:59:56 UTC
Your patch seems to work Oliver.
I didn't reproduced the behaviour after several suspend/resume cycle.

The patch was (applied manually, was rejected on vanilla -rc2, but it's OK otherwise):

commit 4bdc818cca662000cf195b1e83e8a0f8a3f0b9c6
Author: Oliver Neukum <oneukum@linux-d698.(none)>
Date:   Tue Apr 21 15:33:41 2009 +0200

    avoid deadlock in usbhid_close by cancelling workqueues

diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
index 4306cb1..900ce18 100644
--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -662,8 +662,8 @@ void usbhid_close(struct hid_device *hid)
        spin_lock_irq(&usbhid->lock);
        if (!--hid->open) {
                spin_unlock_irq(&usbhid->lock);
+               hid_cancel_delayed_stuff(usbhid);
                usb_kill_urb(usbhid->urbin);
-               flush_scheduled_work();
                usbhid->intf->needs_remote_wakeup = 0;
        } else {
                spin_unlock_irq(&usbhid->lock);
Comment 2 Rafael J. Wysocki 2009-04-25 21:15:03 UTC
Handled-By : Oliver Neukum <oliver@neukum.org>
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=13136#c1
Comment 3 Rafael J. Wysocki 2009-05-16 21:52:57 UTC
Fixed by commit 89092ddd7aca598e3b8874f7a49216e7917d8f3f .