Bug 36532

Summary: possible recursive locking detected at suspend in acpi_os_acquire_lock - Acer Aspire 7750G
Product: Power Management Reporter: Christian Casteyde (casteyde.christian)
Component: Hibernation/SuspendAssignee: power-management_other
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: acpi-bugzilla, clemens, lenb, maciej.rutecki, penguin-kernel, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0-rc1 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: dmesg output while suspend

Description Christian Casteyde 2011-06-02 13:46:05 UTC
Acer Aspire 7750G
Core i7 in 64bits mode
Slackware64 13.37

With kmemcheck and lock correctness proving feature on, at least on 3.0-rc1 (not tested with 2.6.39 and previous kernels do not work on my new laptop), I get the following at suspend to RAM:

atl1c 0000:02:00.0: wake-up capability enabled by ACPI
PM: late suspend of devices complete after 94.846 msecs
ACPI: Preparing to enter system sleep state S3
PM: Saving platform NVS memory
Disabling non-boot CPUs ...
Extended CMOS year: 2000

=============================================
[ INFO: possible recursive locking detected ]
3.0.0-rc1 #6
---------------------------------------------
pm-suspend/3053 is trying to acquire lock:
 (&(lock)->rlock){-.-...}, at: [<ffffffff813103e3>] acpi_os_acquire_lock+0x9/0xb

but task is already holding lock:
 (&(lock)->rlock){-.-...}, at: [<ffffffff813103e3>] acpi_os_acquire_lock+0x9/0xb

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(lock)->rlock);
  lock(&(lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by pm-suspend/3053:
 #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff81188fbf>] sysfs_write_file+0x3f/0x160
 #1:  (s_active#58){.+.+.+}, at: [<ffffffff81189048>] sysfs_write_file+0xc8/0x160
 #2:  (pm_mutex){+.+.+.}, at: [<ffffffff810bf9bc>] enter_state+0x3c/0x110
 #3:  (&(lock)->rlock){-.-...}, at: [<ffffffff813103e3>] acpi_os_acquire_lock+0x9/0xb

stack backtrace:
Pid: 3053, comm: pm-suspend Not tainted 3.0.0-rc1 #6
Call Trace:
 [<ffffffff810af041>] __lock_acquire+0x1081/0x1db0
 [<ffffffff812c6b8d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff817fc8e9>] ? error_exit+0x29/0xb0
 [<ffffffff81326726>] ? acpi_hw_enable_runtime_gpe_block+0x43/0x43
 [<ffffffff810b02d7>] lock_acquire+0x57/0x70
 [<ffffffff813103e3>] ? acpi_os_acquire_lock+0x9/0xb
 [<ffffffff817fb69d>] _raw_spin_lock_irqsave+0x4d/0x90
 [<ffffffff813103e3>] ? acpi_os_acquire_lock+0x9/0xb
 [<ffffffff813103e3>] acpi_os_acquire_lock+0x9/0xb
 [<ffffffff8131fd97>] acpi_ev_walk_gpe_list+0x23/0x8d
 [<ffffffff81326dbb>] acpi_hw_clear_acpi_status+0x39/0x56
 [<ffffffff81327161>] acpi_enter_sleep_state+0x8b/0x1c1
 [<ffffffff81039fc9>] ? native_sched_clock+0x29/0x80
 [<ffffffff8103a5cd>] ? save_sched_clock_state+0x1d/0x30
 [<ffffffff8104c25a>] do_suspend_lowlevel+0x9a/0x9c
 [<ffffffff8104c137>] ? acpi_suspend_lowlevel+0x197/0x1c0
 [<ffffffff813114d2>] acpi_suspend_enter+0x2f/0x8e
 [<ffffffff810bf932>] suspend_devices_and_enter+0x172/0x1c0
 [<ffffffff810bfa56>] enter_state+0xd6/0x110
 [<ffffffff810befd6>] state_store+0xc6/0x100
 [<ffffffff812bcd57>] kobj_attr_store+0x17/0x20
 [<ffffffff81189061>] sysfs_write_file+0xe1/0x160
 [<ffffffff811248f6>] vfs_write+0xc6/0x170
 [<ffffffff81124bff>] sys_write+0x4f/0xa0
 [<ffffffff817fcbfb>] system_call_fastpath+0x16/0x1b
ACPI: Low-level resume complete
PM: Restoring platform NVS memory
Extended CMOS year: 2000
ACPI: Waking up from system sleep state S3
WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff88019386f758)
00000000000000002b33cb81ffffffff000000000188ffff0000000000000000
 i i i i i i i i i i i i i i i i i i i i u u u u u u u u u u u u
                                                 ^

Pid: 3053, comm: pm-suspend Not tainted 3.0.0-rc1 #6 Acer Aspire 7750G/JE70_HR
RIP: 0010:[<ffffffff8108f3b3>]  [<ffffffff8108f3b3>] process_one_work+0x83/0x450
RSP: 0018:ffff8801c205bd80  EFLAGS: 00010012
RAX: ffff880100000000 RBX: ffff8801c02739c0 RCX: 0000000000000000
RDX: ffff8801c79d5800 RSI: ffff88019386f710 RDI: ffff8801c02739c0
RBP: ffff8801c205be10 R08: ffff8801c02739c0 R09: 0000000000000002
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88019386f710
R13: ffff8801c79d5800 R14: ffff8801c780ca00 R15: 0000000000000013
FS:  00007fe515a74720(0000) GS:ffff8801c7800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8801c67f70d8 CR3: 000000019e2c8000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
 [<ffffffff810927cc>] worker_thread+0x15c/0x330
 [<ffffffff81097396>] kthread+0xa6/0xb0
 [<ffffffff817fd9d4>] kernel_thread_helper+0x4/0x10
 [<ffffffffffffffff>] 0xffffffffffffffff
pcieport 0000:00:01.0: restoring config space at offset 0x7 (was 0x20003030, writing 0x3030)
pcieport 0000:00:01.0: restoring config space at offset 0x1 (was 0x100007, writing 0x100407)
pci 0000:00:16.0: restoring config space at offset 0x1 (was 0x180006, writing 0x100006)
ehci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2900006, writing 0x2900002)
HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100006, writing 0x100002)
...

Note the second splash is already reported on a seperate bugzilla entry.
Comment 1 Len Brown 2011-06-07 02:07:24 UTC
If this is a deadlock, then your system will never reach suspend.
does your system reach suspend?

it is possible that the tools are getting a false positive on this one.
a cursory look at acpi_gbl_gpe_lock seems to show a proper
release for every acquire.
Comment 2 Christian Casteyde 2011-06-07 18:33:39 UTC
Yes, it managed to reach suspend.
It also managed to wake up after.
Full dmesg output appended for info.
Comment 3 Christian Casteyde 2011-06-07 18:36:19 UTC
Created attachment 61122 [details]
dmesg output while suspend
Comment 4 Len Brown 2011-06-14 01:32:46 UTC
*** Bug 37242 has been marked as a duplicate of this bug. ***
Comment 5 Tetsuo Handa 2011-06-21 07:55:42 UTC
Same except that mine has a "lockdep: fixing up alternatives." line
after the "Disabling non-boot CPUs ..." line.
The machine can reach power off.

[  130.472652] ACPI: Preparing to enter system sleep state S5
[  130.474450] Disabling non-boot CPUs ...
[  130.478419] lockdep: fixing up alternatives.
[  130.513228] Power down.
[  130.514314] 
[  130.514315] =============================================
[  130.515138] [ INFO: possible recursive locking detected ]
[  130.515138] 3.0.0-rc4-ccs #3
[  130.515138] ---------------------------------------------
[  130.515138] halt/3843 is trying to acquire lock:
[  130.515138]  (&(lock)->rlock){......}, at: [<c11e1659>] acpi_os_acquire_lock+0x8/0xa
[  130.515138] 
[  130.515138] but task is already holding lock:
[  130.515138]  (&(lock)->rlock){......}, at: [<c11e1659>] acpi_os_acquire_lock+0x8/0xa
[  130.515138] 
[  130.515138] other info that might help us debug this:
[  130.515138]  Possible unsafe locking scenario:
[  130.515138] 
[  130.515138]        CPU0
[  130.515138]        ----
[  130.515138]   lock(&(lock)->rlock);
[  130.515138]   lock(&(lock)->rlock);
[  130.515138] 
[  130.515138]  *** DEADLOCK ***
[  130.515138] 
[  130.515138]  May be due to missing lock nesting notation
[  130.515138] 
[  130.515138] 2 locks held by halt/3843:
[  130.515138]  #0:  (reboot_mutex){+.+.+.}, at: [<c1050dc9>] sys_reboot+0xb9/0x1a0
[  130.515138]  #1:  (&(lock)->rlock){......}, at: [<c11e1659>] acpi_os_acquire_lock+0x8/0xa
[  130.515138] 
[  130.515138] stack backtrace:
[  130.515138] Pid: 3843, comm: halt Not tainted 3.0.0-rc4-ccs #3
[  130.515138] Call Trace:
[  130.515138]  [<c106df30>] print_deadlock_bug+0xd0/0xe0
[  130.515138]  [<c106dffe>] check_deadlock+0xbe/0x160
[  130.515138]  [<c106ec5b>] validate_chain+0x27b/0x530
[  130.515138]  [<c1070f04>] __lock_acquire+0x2a4/0x8f0
[  130.515138]  [<c1070f3c>] ? __lock_acquire+0x2dc/0x8f0
[  130.515138]  [<c10725ca>] lock_acquire+0x7a/0xa0
[  130.515138]  [<c11e1659>] ? acpi_os_acquire_lock+0x8/0xa
[  130.515138]  [<c138fcf4>] _raw_spin_lock_irqsave+0x44/0x80
[  130.515138]  [<c11e1659>] ? acpi_os_acquire_lock+0x8/0xa
[  130.515138]  [<c11e1659>] acpi_os_acquire_lock+0x8/0xa
[  130.515138]  [<c11ef803>] acpi_ev_walk_gpe_list+0x1b/0x6f
[  130.515138]  [<c11f5ca7>] ? acpi_hw_disable_gpe_block+0x34/0x34
[  130.515138]  [<c11f5efc>] acpi_hw_clear_acpi_status+0x32/0x46
[  130.515138]  [<c11f6317>] acpi_enter_sleep_state+0x83/0x1ac
[  130.515138]  [<c1020995>] ? disable_IO_APIC+0xb5/0xd0
[  130.515138]  [<c103fcc8>] ? printk+0x18/0x20
[  130.515138]  [<c11e297b>] acpi_power_off+0x21/0x26
[  130.515138]  [<c101b048>] native_machine_power_off+0x18/0x30
[  130.515138]  [<c101b069>] machine_power_off+0x9/0x10
[  130.515138]  [<c1050cfe>] kernel_power_off+0x3e/0x50
[  130.515138]  [<c1050e1d>] sys_reboot+0x10d/0x1a0
[  130.515138]  [<c104dbe3>] ? kill_something_info+0x13/0x170
[  130.515138]  [<c10722c7>] ? __lock_release+0x47/0x70
[  130.515138]  [<c104dbe3>] ? kill_something_info+0x13/0x170
[  130.515138]  [<c104dc38>] ? kill_something_info+0x68/0x170
[  130.515138]  [<c104dbe3>] ? kill_something_info+0x13/0x170
[  130.515138]  [<e087bd58>] ? ccs_signal_acl+0x28/0x90 [ccsecurity]
[  130.515138]  [<c104f821>] ? sys_kill+0x81/0x90
[  130.515138]  [<c10ebb13>] ? mntput+0x13/0x20
[  130.515138]  [<c10d1d67>] ? __fput+0x147/0x220
[  130.515138]  [<c10d1e59>] ? fput+0x19/0x20
[  130.515138]  [<c10cfbf9>] ? filp_close+0x49/0x70
[  130.515138]  [<c13907e4>] ? restore_all+0xf/0xf
[  130.515138]  [<c13907b1>] syscall_call+0x7/0xb
Comment 6 Clemens Ladisch 2011-07-05 06:50:30 UTC
Same message here when powering off (which actually still works fine):

=============================================
[ INFO: possible recursive locking detected ]
3.0.0-rc5+ #290
---------------------------------------------
poweroff/1547 is trying to acquire lock:
 (&(lock)->rlock){......}, at: [<ffffffff8122243c>] acpi_os_acquire_lock+0x9/0xb

but task is already holding lock:
 (&(lock)->rlock){......}, at: [<ffffffff8122243c>] acpi_os_acquire_lock+0x9/0xb

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(lock)->rlock);
  lock(&(lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by poweroff/1547:
 #0:  (reboot_mutex){+.+.+.}, at: [<ffffffff81046b47>] sys_reboot+0x91/0x16a
 #1:  (&(lock)->rlock){......}, at: [<ffffffff8122243c>] acpi_os_acquire_lock+0x9/0xb

stack backtrace:
Pid: 1547, comm: poweroff Not tainted 3.0.0-rc5+ #290
Call Trace:
 [<ffffffff81060147>] validate_chain+0x6b2/0xe4c
 [<ffffffff8105d537>] ? save_trace+0x3f/0xb1
 [<ffffffff81061160>] __lock_acquire+0x87f/0x8f0
 [<ffffffff81238f23>] ? acpi_hw_enable_runtime_gpe_block+0x43/0x43
 [<ffffffff810615fa>] lock_acquire+0x57/0x6d
 [<ffffffff8122243c>] ? acpi_os_acquire_lock+0x9/0xb
 [<ffffffff81480ae6>] _raw_spin_lock_irqsave+0x37/0x49
 [<ffffffff8122243c>] ? acpi_os_acquire_lock+0x9/0xb
 [<ffffffff812391f5>] ? acpi_hw_write+0x49/0x50
 [<ffffffff8122243c>] acpi_os_acquire_lock+0x9/0xb
 [<ffffffff812328d9>] acpi_ev_walk_gpe_list+0x23/0x87
 [<ffffffff8123945a>] acpi_hw_clear_acpi_status+0x39/0x55
 [<ffffffff81239a01>] acpi_enter_sleep_state+0x8b/0x1c2
 [<ffffffff8147d9b4>] ? printk+0x3c/0x3e
 [<ffffffff81223dd6>] acpi_power_off+0x29/0x2b
 [<ffffffff81015f2d>] native_machine_power_off+0x22/0x24
 [<ffffffff81015efd>] machine_power_off+0xa/0xc
 [<ffffffff8104692c>] kernel_power_off+0x43/0x45
 [<ffffffff81046bb9>] sys_reboot+0x103/0x16a
 ...
Comment 7 Clemens Ladisch 2011-07-05 06:55:25 UTC
When I change acpi_os_acquire_lock into an inline function, I get the same message, but the acquired/held locks are in acpi_ev_walk_gpe_list() and acpi_hw_clear_acpi_status().
Apparently, lockdep thinks that acpi_gbl_hardware_lock and acpi_gbl_gpe_lock are the same; probably because the ACPI code wraps them in the same type/function.
Comment 8 Rafael J. Wysocki 2011-07-05 19:29:34 UTC
Please try https://patchwork.kernel.org/patch/943152/
Comment 9 Clemens Ladisch 2011-07-06 07:01:04 UTC
Yes; same problem.  That patch works fine.
Comment 10 Len Brown 2011-07-30 06:30:18 UTC
closed by this patch in v3.0:


commit 07e49a7a31153a95caa270d8ad7350a0bcd4d511
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Wed Jul 6 20:44:25 2011 +0200

    ACPI: Fix lockdep false positives in acpi_power_off()
    
    All ACPICA locks are allocated by the same function,
    acpi_os_create_lock(), with the help of a local variable called
    "lock".  Thus, when lockdep is enabled, it uses "lock" as the
    name of all those locks and regards them as instances of the same
    lock, which causes it to report possible locking problems with them
    when there aren't any.
    
    To work around this problem, define acpi_os_create_lock() as a macro
    and make it pass its argument to spin_lock_init(), so that lockdep
    uses it as the name of the new lock.  Define this macron in a
    Linux-specific file, to minimize the resulting modifications of
    the OS-independent ACPICA parts.
    
    This change is based on an earlier patch from Andrea Righi and it
    addresses a regression from 2.6.39 tracked as
    https://bugzilla.kernel.org/show_bug.cgi?id=38152
    
    Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
    Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
    Tested-by: Andrea Righi <andrea@betterlinux.com>
    Reviewed-by: Florian Mickler <florian@mickler.org>
    Signed-off-by: Len Brown <len.brown@intel.com>