Latest working kernel version: 2.6.24-rc8 Earliest failing kernel version: n/a Distribution: ubuntu/unstable Hardware Environment: Lenovo ThinkPad T61 + dock station Software Environment: any Problem Description: Undocking Lenovo ThinkPad T61 causes oops Steps to reproduce: Just push undock button. [79721.755165] BUG: unable to handle kernel paging request at 000000340000000e [79721.755165] IP: [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d [79721.755165] PGD 7203a067 PUD 0 [79721.755165] Oops: 0000 [1] SMP [79721.755165] CPU 1 [79721.755165] Modules linked in: usb_storage i915 drm rfcomm l2cap bluetooth fuse arc4 ecb crypto_blkcipher ehci_hcd uhci_hcd e1000e snd_hda_intel intel_agp thinkpad_acpi [last unloaded: cfg80211] [79721.755165] Pid: 69, comm: kacpi_notify Not tainted 2.6.24-git9 #4 [79721.755165] RIP: 0010:[<ffffffff8034d147>] [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d [79721.755165] RSP: 0018:ffff81007d197d28 EFLAGS: 00010246 [79721.755165] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [79721.755165] RDX: 0000000000017567 RSI: 000000100002c0d0 RDI: 0000003400000006 [79721.755165] RBP: 0000000000000000 R08: ffff810003c00000 R09: 0000000000000000 [79721.755165] R10: 0000000000000002 R11: 0000000000000001 R12: 0000003400000006 [79721.755165] R13: 0000000000000000 R14: ffff81007d197d68 R15: ffff81007d197d70 [79721.755165] FS: 0000000000000000(0000) GS:ffff81007d00e380(0000) knlGS:0000000000000000 [79721.755165] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [79721.755165] CR2: 000000340000000e CR3: 000000007200a000 CR4: 00000000000006a0 [79721.755165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [79721.755165] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [79721.755165] Process kacpi_notify (pid: 69, threadinfo ffff81007d196000, task ffff81007d18ef20) [79721.755165] Stack: ffffffff8034e19f ffff8100519e8800 ffff81007d197de0 0000000000000001 [79721.755165] 0000000000000000 0000000000000001 ffffffff80357d3d ffff81007d02cdc0 [79721.755165] 0000000000000000 0000003400000006 0000000000000000 ffffffff8058cbaa [79721.755165] Call Trace: [79721.755165] [<ffffffff8034e19f>] acpi_get_next_object+0x3b/0x99 [79721.755165] [<ffffffff80357d3d>] acpi_bus_trim+0x57/0x109 [79721.755165] [<ffffffff8035b28b>] hotplug_dock_devices+0x97/0x117 [79721.755166] [<ffffffff8035b3a9>] handle_eject_request+0x4e/0xd3 [79721.755166] [<ffffffff8033e04a>] acpi_os_execute_notify+0x0/0x2c [79721.755166] [<ffffffff80356088>] acpi_bus_get_device+0x1d/0x2e [79721.755166] [<ffffffff803560b0>] acpi_bus_notify+0x17/0x4c [79721.755166] [<ffffffff8033e04a>] acpi_os_execute_notify+0x0/0x2c [79721.755166] [<ffffffff80343a1d>] acpi_ev_notify_dispatch+0x57/0x60 [79721.755166] [<ffffffff8033e06d>] acpi_os_execute_notify+0x23/0x2c [79721.755166] [<ffffffff802460bf>] run_workqueue+0xbf/0x160 [79721.755166] [<ffffffff80246b00>] worker_thread+0x0/0x100 [79721.755166] [<ffffffff80246b00>] worker_thread+0x0/0x100 [79721.755166] [<ffffffff80246b9f>] worker_thread+0x9f/0x100 [79721.755166] [<ffffffff80249e50>] autoremove_wake_function+0x0/0x30 [79721.755166] [<ffffffff80246b00>] worker_thread+0x0/0x100 [79721.755166] [<ffffffff80246b00>] worker_thread+0x0/0x100 [79721.755166] [<ffffffff80249a4b>] kthread+0x4b/0x80 [79721.755166] [<ffffffff8020d128>] child_rip+0xa/0x12 [79721.755166] [<ffffffff80249a00>] kthread+0x0/0x80 [79721.755166] [<ffffffff8020d11e>] child_rip+0x0/0x12 [79721.755166] [79721.755166] [79721.755166] Code: 00 c3 49 ff c1 48 83 c6 04 41 ff c8 45 85 c0 75 a8 31 c0 c6 06 00 c3 48 8d 47 ff 48 83 f8 fd 76 08 48 8b 05 dc 49 39 00 c3 31 c0 <80> 7f 08 0f 48 0f 44 c7 c3 48 89 f8 c3 31 c0 48 85 ff 74 0d f6 [79721.755166] RIP [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d [79721.755166] RSP <ffff81007d197d28> [79721.755166] CR2: 000000340000000e [79721.755166] ---[ end trace 05a1f30122eb8809 ]---
Created attachment 14682 [details] acpidump output
does 2.6.24 work?
I did not test 2.6.24. 2.6.24-rc8 was OK and since there are no big changes between 2.6.24 and 2.6.24-rc8, I can assume that 2.6.24 works as well. But if you want me to check it to be sure, I can do that but it takes some time..
2.6.24 is OK, at least it does not happen every time.
This entry is being used for tracking a regression from 2.6.24. Please don't close it until the problem is fixed in the mainline. Regressions list annotation: Handled-By : Len Brown <lenb@kernel.org>
I did not see this oops in 2.6.25-rc1, but I did not huge testing. I reboot often due to various other problems so I do not highly utilize undocking before suspend as before.
well, in 2.6.25-rc2-git2, it happened again. It seems that it does not happen always, just sometimes.
It looks like this trace precedes the undocking oops. I.e., I can see this trace and after it, undocking produces the oops mentioned above. [86836.756886] ACPI: \_SB_.GDCK - docking [83791.360019] ata4.00: configured for UDMA/33 [86837.177682] acpi IBM0079:01: Suspicious device_add during suspend [86837.177689] Pid: 71, comm: kacpi_notify Not tainted 2.6.25-rc2-git2 #6 [86837.177693] [86837.177694] Call Trace: [86837.177705] [<ffffffff80397070>] pm_sleep_lock+0x10/0x20 [86837.177717] [<ffffffff803905ac>] device_add+0x5c/0x590 [86837.177723] [<ffffffff80359b83>] acpi_ut_release_mutex+0x5f/0x63 [86837.177730] [<ffffffff8035b9c4>] acpi_bus_data_handler+0x0/0x1 [86837.177737] [<ffffffff8035c8d8>] acpi_add_single_object+0xafa/0xcc1 [86837.177745] [<ffffffff8028ef64>] kmem_cache_free+0x14/0xb0 [86837.177754] [<ffffffff80342cb2>] acpi_os_release_object+0x9/0xd [86837.177761] [<ffffffff80358fea>] acpi_ut_update_ref_count+0x50/0x9d [86837.177768] [<ffffffff80359112>] acpi_ut_update_object_reference+0xdb/0x136 [86837.177776] [<ffffffff8035ccb1>] acpi_bus_add+0x1c/0x32 [86837.177783] [<ffffffff8036025c>] hotplug_dock_devices+0xec/0x117 [86837.177788] [<ffffffff8035b9c4>] acpi_bus_data_handler+0x0/0x1 [86837.177794] [<ffffffff80342eb6>] acpi_os_execute_deferred+0x0/0x2c [86837.177801] [<ffffffff803609fc>] dock_notify+0x7e/0xcb [86837.177807] [<ffffffff80348819>] acpi_ev_notify_dispatch+0x57/0x60 [86837.177813] [<ffffffff80342ed9>] acpi_os_execute_deferred+0x23/0x2c [86837.177820] [<ffffffff8024630f>] run_workqueue+0xbf/0x160 [86837.177826] [<ffffffff80246d50>] worker_thread+0x0/0x100 [86837.177831] [<ffffffff80246d50>] worker_thread+0x0/0x100 [86837.177837] [<ffffffff80246def>] worker_thread+0x9f/0x100 [86837.177844] [<ffffffff80249f50>] autoremove_wake_function+0x0/0x30 [86837.177850] [<ffffffff80246d50>] worker_thread+0x0/0x100 [86837.177856] [<ffffffff80246d50>] worker_thread+0x0/0x100 [86837.177860] [<ffffffff80249b5b>] kthread+0x4b/0x80 [86837.177868] [<ffffffff8020d128>] child_rip+0xa/0x12 [86837.177874] [<ffffffff80249b10>] kthread+0x0/0x80 [86837.177879] [<ffffffff8020d11e>] child_rip+0x0/0x12 [86837.177883] [86837.177885] ACPI: Error adding device IBM0079:01<7>PM: Writing back config space on device 0000:03:00.0 at offset 1 (was 100106, writing 100102)
Does it happen during a suspend?
The stack dump would be easier to interpret if the kernel was build with CONFIG_FRAME_POINTER. Apparently acpi_add_single_object() is the culprit, and it was called from within a workqueue. It may have been a coincidence that the workqueue was running during a suspend.
Created attachment 14961 [details] Patch removing the locking of devices from the suspend core Lukas, can you please test with the attached patch applied?
To any ACPI experts reading this bug report: Is it truly necessary to register a new device (acpi_add_single_object) while the system is suspending or resuming? Would it be okay to block the device_add() call until after the resume is finished, instead of failing it?
Created attachment 14964 [details] Patch removing the locking of devices from the suspend core (updated) The previously attached patch was incomplete. Please test this one instead.
(In reply to comment #9) > Does it happen during a suspend? > comment #8 happens during resume. (I suspend at home and resume at work in a dock station) (why the hell I did not receive any mails as this bug was updated?)
Created attachment 14965 [details] Patch removing the locking of devices from the suspend core (updated 2x) Both previously attached patches were incomplete.
(In reply to comment #10) > It may have been a coincidence that the > workqueue was running during a suspend. I think it is because it does not happen every time.
(In reply to comment #14) > (In reply to comment #9) > > Does it happen during a suspend? > > > comment #8 happens during resume. (I suspend at home and resume at work in a > dock station) Ah, thanks, that's important. > (why the hell I did not receive any mails as this bug was updated?) Bugzilla problem, I guess. Regressions list annotation: Patch : http://marc.info/?l=linux-acpi&m=120389632114090&w=2
Fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7a8d37a37380e2b1500592d40b7ec384dbebe7a0