Bug 9874

Summary: Undocking Lenovo ThinkPad T61 causes oops
Product: ACPI Reporter: Lukas Hejtmanek (xhejtman)
Component: Config-HotplugAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, bunk, stern
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-git4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: acpidump output
Patch removing the locking of devices from the suspend core
Patch removing the locking of devices from the suspend core (updated)
Patch removing the locking of devices from the suspend core (updated 2x)

Description Lukas Hejtmanek 2008-02-02 02:07:41 UTC
Latest working kernel version: 2.6.24-rc8
Earliest failing kernel version: n/a
Distribution: ubuntu/unstable
Hardware Environment: Lenovo ThinkPad T61 + dock station
Software Environment: any
Problem Description: Undocking Lenovo ThinkPad T61 causes oops

Steps to reproduce: Just push undock button.

[79721.755165] BUG: unable to handle kernel paging request at 000000340000000e
[79721.755165] IP: [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d
[79721.755165] PGD 7203a067 PUD 0
[79721.755165] Oops: 0000 [1] SMP
[79721.755165] CPU 1
[79721.755165] Modules linked in: usb_storage i915 drm rfcomm l2cap bluetooth fuse arc4 ecb crypto_blkcipher ehci_hcd uhci_hcd e1000e snd_hda_intel
intel_agp thinkpad_acpi [last unloaded: cfg80211]
[79721.755165] Pid: 69, comm: kacpi_notify Not tainted 2.6.24-git9 #4
[79721.755165] RIP: 0010:[<ffffffff8034d147>]  [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d
[79721.755165] RSP: 0018:ffff81007d197d28  EFLAGS: 00010246
[79721.755165] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[79721.755165] RDX: 0000000000017567 RSI: 000000100002c0d0 RDI: 0000003400000006
[79721.755165] RBP: 0000000000000000 R08: ffff810003c00000 R09: 0000000000000000
[79721.755165] R10: 0000000000000002 R11: 0000000000000001 R12: 0000003400000006
[79721.755165] R13: 0000000000000000 R14: ffff81007d197d68 R15: ffff81007d197d70
[79721.755165] FS:  0000000000000000(0000) GS:ffff81007d00e380(0000) knlGS:0000000000000000
[79721.755165] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[79721.755165] CR2: 000000340000000e CR3: 000000007200a000 CR4: 00000000000006a0
[79721.755165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[79721.755165] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[79721.755165] Process kacpi_notify (pid: 69, threadinfo ffff81007d196000, task ffff81007d18ef20)
[79721.755165] Stack:  ffffffff8034e19f ffff8100519e8800 ffff81007d197de0 0000000000000001
[79721.755165]  0000000000000000 0000000000000001 ffffffff80357d3d ffff81007d02cdc0
[79721.755165]  0000000000000000 0000003400000006 0000000000000000 ffffffff8058cbaa
[79721.755165] Call Trace:
[79721.755165]  [<ffffffff8034e19f>] acpi_get_next_object+0x3b/0x99
[79721.755165]  [<ffffffff80357d3d>] acpi_bus_trim+0x57/0x109
[79721.755165]  [<ffffffff8035b28b>] hotplug_dock_devices+0x97/0x117
[79721.755166]  [<ffffffff8035b3a9>] handle_eject_request+0x4e/0xd3
[79721.755166]  [<ffffffff8033e04a>] acpi_os_execute_notify+0x0/0x2c
[79721.755166]  [<ffffffff80356088>] acpi_bus_get_device+0x1d/0x2e
[79721.755166]  [<ffffffff803560b0>] acpi_bus_notify+0x17/0x4c
[79721.755166]  [<ffffffff8033e04a>] acpi_os_execute_notify+0x0/0x2c
[79721.755166]  [<ffffffff80343a1d>] acpi_ev_notify_dispatch+0x57/0x60
[79721.755166]  [<ffffffff8033e06d>] acpi_os_execute_notify+0x23/0x2c
[79721.755166]  [<ffffffff802460bf>] run_workqueue+0xbf/0x160
[79721.755166]  [<ffffffff80246b00>] worker_thread+0x0/0x100
[79721.755166]  [<ffffffff80246b00>] worker_thread+0x0/0x100
[79721.755166]  [<ffffffff80246b9f>] worker_thread+0x9f/0x100
[79721.755166]  [<ffffffff80249e50>] autoremove_wake_function+0x0/0x30
[79721.755166]  [<ffffffff80246b00>] worker_thread+0x0/0x100
[79721.755166]  [<ffffffff80246b00>] worker_thread+0x0/0x100
[79721.755166]  [<ffffffff80249a4b>] kthread+0x4b/0x80
[79721.755166]  [<ffffffff8020d128>] child_rip+0xa/0x12
[79721.755166]  [<ffffffff80249a00>] kthread+0x0/0x80
[79721.755166]  [<ffffffff8020d11e>] child_rip+0x0/0x12
[79721.755166]
[79721.755166]
[79721.755166] Code: 00 c3 49 ff c1 48 83 c6 04 41 ff c8 45 85 c0 75 a8 31 c0 c6 06 00 c3 48 8d 47 ff 48 83 f8 fd 76 08 48 8b 05 dc 49 39 00 c3 31 c0
<80> 7f 08 0f 48 0f 44 c7 c3 48 89 f8 c3 31 c0 48 85 ff 74 0d f6
[79721.755166] RIP  [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d
[79721.755166]  RSP <ffff81007d197d28>
[79721.755166] CR2: 000000340000000e
[79721.755166] ---[ end trace 05a1f30122eb8809 ]---
Comment 1 Lukas Hejtmanek 2008-02-02 02:08:59 UTC
Created attachment 14682 [details]
acpidump output
Comment 2 Len Brown 2008-02-04 10:40:17 UTC
does 2.6.24 work?
Comment 3 Lukas Hejtmanek 2008-02-04 10:45:18 UTC
I did not test 2.6.24. 2.6.24-rc8 was OK and since there are no big changes between 2.6.24 and 2.6.24-rc8, I can assume that 2.6.24 works as well. But if you want me to check it to be sure, I can do that but it takes some time..
Comment 4 Lukas Hejtmanek 2008-02-07 01:41:43 UTC
2.6.24 is OK, at least it does not happen every time.
Comment 5 Rafael J. Wysocki 2008-02-13 11:42:15 UTC
This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.

Regressions list annotation:
Handled-By : Len Brown <lenb@kernel.org>
Comment 6 Lukas Hejtmanek 2008-02-15 04:57:11 UTC
I did not see this oops in 2.6.25-rc1, but I did not huge testing. I reboot often due to various other problems so I do not highly utilize undocking before suspend as before.
Comment 7 Lukas Hejtmanek 2008-02-20 14:48:35 UTC
well, in 2.6.25-rc2-git2, it happened again. It seems that it does not happen always, just sometimes.
Comment 8 Lukas Hejtmanek 2008-02-22 09:37:23 UTC
It looks like this trace precedes the undocking oops. I.e., I can see this trace and after it, undocking produces the oops mentioned above.

[86836.756886] ACPI: \_SB_.GDCK - docking
[83791.360019] ata4.00: configured for UDMA/33
[86837.177682] acpi IBM0079:01: Suspicious device_add during suspend
[86837.177689] Pid: 71, comm: kacpi_notify Not tainted 2.6.25-rc2-git2 #6
[86837.177693]
[86837.177694] Call Trace:
[86837.177705]  [<ffffffff80397070>] pm_sleep_lock+0x10/0x20
[86837.177717]  [<ffffffff803905ac>] device_add+0x5c/0x590
[86837.177723]  [<ffffffff80359b83>] acpi_ut_release_mutex+0x5f/0x63
[86837.177730]  [<ffffffff8035b9c4>] acpi_bus_data_handler+0x0/0x1
[86837.177737]  [<ffffffff8035c8d8>] acpi_add_single_object+0xafa/0xcc1
[86837.177745]  [<ffffffff8028ef64>] kmem_cache_free+0x14/0xb0
[86837.177754]  [<ffffffff80342cb2>] acpi_os_release_object+0x9/0xd
[86837.177761]  [<ffffffff80358fea>] acpi_ut_update_ref_count+0x50/0x9d
[86837.177768]  [<ffffffff80359112>] acpi_ut_update_object_reference+0xdb/0x136
[86837.177776]  [<ffffffff8035ccb1>] acpi_bus_add+0x1c/0x32
[86837.177783]  [<ffffffff8036025c>] hotplug_dock_devices+0xec/0x117
[86837.177788]  [<ffffffff8035b9c4>] acpi_bus_data_handler+0x0/0x1
[86837.177794]  [<ffffffff80342eb6>] acpi_os_execute_deferred+0x0/0x2c
[86837.177801]  [<ffffffff803609fc>] dock_notify+0x7e/0xcb
[86837.177807]  [<ffffffff80348819>] acpi_ev_notify_dispatch+0x57/0x60
[86837.177813]  [<ffffffff80342ed9>] acpi_os_execute_deferred+0x23/0x2c
[86837.177820]  [<ffffffff8024630f>] run_workqueue+0xbf/0x160
[86837.177826]  [<ffffffff80246d50>] worker_thread+0x0/0x100
[86837.177831]  [<ffffffff80246d50>] worker_thread+0x0/0x100
[86837.177837]  [<ffffffff80246def>] worker_thread+0x9f/0x100
[86837.177844]  [<ffffffff80249f50>] autoremove_wake_function+0x0/0x30
[86837.177850]  [<ffffffff80246d50>] worker_thread+0x0/0x100
[86837.177856]  [<ffffffff80246d50>] worker_thread+0x0/0x100
[86837.177860]  [<ffffffff80249b5b>] kthread+0x4b/0x80
[86837.177868]  [<ffffffff8020d128>] child_rip+0xa/0x12
[86837.177874]  [<ffffffff80249b10>] kthread+0x0/0x80
[86837.177879]  [<ffffffff8020d11e>] child_rip+0x0/0x12
[86837.177883]
[86837.177885] ACPI: Error adding device IBM0079:01<7>PM: Writing back config space on device 0000:03:00.0 at offset 1 (was 100106, writing 100102)
Comment 9 Rafael J. Wysocki 2008-02-22 17:00:51 UTC
Does it happen during a suspend?
Comment 10 Alan Stern 2008-02-22 20:32:52 UTC
The stack dump would be easier to interpret if the kernel was build with CONFIG_FRAME_POINTER.  Apparently acpi_add_single_object() is the culprit, and it was called from within a workqueue.  It may have been a coincidence that the workqueue was running during a suspend.
Comment 11 Rafael J. Wysocki 2008-02-23 13:00:08 UTC
Created attachment 14961 [details]
Patch removing the locking of devices from the suspend core

Lukas, can you please test with the attached patch applied?
Comment 12 Alan Stern 2008-02-23 15:27:02 UTC
To any ACPI experts reading this bug report:  Is it truly necessary to register a new device (acpi_add_single_object) while the system is suspending or resuming?  

Would it be okay to block the device_add() call until after the resume is finished, instead of failing it?
Comment 13 Rafael J. Wysocki 2008-02-24 12:31:59 UTC
Created attachment 14964 [details]
Patch removing the locking of devices from the suspend core (updated)

The previously attached patch was incomplete.  Please test this one instead.
Comment 14 Lukas Hejtmanek 2008-02-24 15:06:43 UTC
(In reply to comment #9)
> Does it happen during a suspend?
> 

comment #8 happens during resume. (I suspend at home and resume at work in a dock station)

(why the hell I did not receive any mails as this bug was updated?)
Comment 15 Rafael J. Wysocki 2008-02-24 15:08:29 UTC
Created attachment 14965 [details]
Patch removing the locking of devices from the suspend core (updated 2x)

Both previously attached patches were incomplete.
Comment 16 Lukas Hejtmanek 2008-02-24 15:10:16 UTC
(In reply to comment #10)
> It may have been a coincidence that the
> workqueue was running during a suspend.

I think it is because it does not happen every time.
Comment 17 Rafael J. Wysocki 2008-02-24 17:16:49 UTC
(In reply to comment #14)
> (In reply to comment #9)
> > Does it happen during a suspend?
> > 
> comment #8 happens during resume. (I suspend at home and resume at work in a
> dock station)

Ah, thanks, that's important.

> (why the hell I did not receive any mails as this bug was updated?)

Bugzilla problem, I guess.

Regressions list annotation:
Patch : http://marc.info/?l=linux-acpi&m=120389632114090&w=2