Bug 9874 - Undocking Lenovo ThinkPad T61 causes oops
Undocking Lenovo ThinkPad T61 causes oops
Status: CLOSED CODE_FIX
Product: ACPI
Classification: Unclassified
Component: Config-Hotplug
All Linux
: P1 normal
Assigned To: Rafael J. Wysocki
:
Depends on:
Blocks: 9832
  Show dependency treegraph
 
Reported: 2008-02-02 02:07 UTC by Lukas Hejtmanek
Modified: 2008-03-05 14:55 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.24-git4
Tree: Mainline
Regression: Yes


Attachments
acpidump output (316.12 KB, text/plain)
2008-02-02 02:08 UTC, Lukas Hejtmanek
Details
Patch removing the locking of devices from the suspend core (4.76 KB, patch)
2008-02-23 13:00 UTC, Rafael J. Wysocki
Details | Diff
Patch removing the locking of devices from the suspend core (updated) (7.46 KB, patch)
2008-02-24 12:31 UTC, Rafael J. Wysocki
Details | Diff
Patch removing the locking of devices from the suspend core (updated 2x) (8.27 KB, patch)
2008-02-24 15:08 UTC, Rafael J. Wysocki
Details | Diff

Description Lukas Hejtmanek 2008-02-02 02:07:41 UTC
Latest working kernel version: 2.6.24-rc8
Earliest failing kernel version: n/a
Distribution: ubuntu/unstable
Hardware Environment: Lenovo ThinkPad T61 + dock station
Software Environment: any
Problem Description: Undocking Lenovo ThinkPad T61 causes oops

Steps to reproduce: Just push undock button.

[79721.755165] BUG: unable to handle kernel paging request at 000000340000000e
[79721.755165] IP: [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d
[79721.755165] PGD 7203a067 PUD 0
[79721.755165] Oops: 0000 [1] SMP
[79721.755165] CPU 1
[79721.755165] Modules linked in: usb_storage i915 drm rfcomm l2cap bluetooth fuse arc4 ecb crypto_blkcipher ehci_hcd uhci_hcd e1000e snd_hda_intel
intel_agp thinkpad_acpi [last unloaded: cfg80211]
[79721.755165] Pid: 69, comm: kacpi_notify Not tainted 2.6.24-git9 #4
[79721.755165] RIP: 0010:[<ffffffff8034d147>]  [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d
[79721.755165] RSP: 0018:ffff81007d197d28  EFLAGS: 00010246
[79721.755165] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[79721.755165] RDX: 0000000000017567 RSI: 000000100002c0d0 RDI: 0000003400000006
[79721.755165] RBP: 0000000000000000 R08: ffff810003c00000 R09: 0000000000000000
[79721.755165] R10: 0000000000000002 R11: 0000000000000001 R12: 0000003400000006
[79721.755165] R13: 0000000000000000 R14: ffff81007d197d68 R15: ffff81007d197d70
[79721.755165] FS:  0000000000000000(0000) GS:ffff81007d00e380(0000) knlGS:0000000000000000
[79721.755165] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[79721.755165] CR2: 000000340000000e CR3: 000000007200a000 CR4: 00000000000006a0
[79721.755165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[79721.755165] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[79721.755165] Process kacpi_notify (pid: 69, threadinfo ffff81007d196000, task ffff81007d18ef20)
[79721.755165] Stack:  ffffffff8034e19f ffff8100519e8800 ffff81007d197de0 0000000000000001
[79721.755165]  0000000000000000 0000000000000001 ffffffff80357d3d ffff81007d02cdc0
[79721.755165]  0000000000000000 0000003400000006 0000000000000000 ffffffff8058cbaa
[79721.755165] Call Trace:
[79721.755165]  [<ffffffff8034e19f>] acpi_get_next_object+0x3b/0x99
[79721.755165]  [<ffffffff80357d3d>] acpi_bus_trim+0x57/0x109
[79721.755165]  [<ffffffff8035b28b>] hotplug_dock_devices+0x97/0x117
[79721.755166]  [<ffffffff8035b3a9>] handle_eject_request+0x4e/0xd3
[79721.755166]  [<ffffffff8033e04a>] acpi_os_execute_notify+0x0/0x2c
[79721.755166]  [<ffffffff80356088>] acpi_bus_get_device+0x1d/0x2e
[79721.755166]  [<ffffffff803560b0>] acpi_bus_notify+0x17/0x4c
[79721.755166]  [<ffffffff8033e04a>] acpi_os_execute_notify+0x0/0x2c
[79721.755166]  [<ffffffff80343a1d>] acpi_ev_notify_dispatch+0x57/0x60
[79721.755166]  [<ffffffff8033e06d>] acpi_os_execute_notify+0x23/0x2c
[79721.755166]  [<ffffffff802460bf>] run_workqueue+0xbf/0x160
[79721.755166]  [<ffffffff80246b00>] worker_thread+0x0/0x100
[79721.755166]  [<ffffffff80246b00>] worker_thread+0x0/0x100
[79721.755166]  [<ffffffff80246b9f>] worker_thread+0x9f/0x100
[79721.755166]  [<ffffffff80249e50>] autoremove_wake_function+0x0/0x30
[79721.755166]  [<ffffffff80246b00>] worker_thread+0x0/0x100
[79721.755166]  [<ffffffff80246b00>] worker_thread+0x0/0x100
[79721.755166]  [<ffffffff80249a4b>] kthread+0x4b/0x80
[79721.755166]  [<ffffffff8020d128>] child_rip+0xa/0x12
[79721.755166]  [<ffffffff80249a00>] kthread+0x0/0x80
[79721.755166]  [<ffffffff8020d11e>] child_rip+0x0/0x12
[79721.755166]
[79721.755166]
[79721.755166] Code: 00 c3 49 ff c1 48 83 c6 04 41 ff c8 45 85 c0 75 a8 31 c0 c6 06 00 c3 48 8d 47 ff 48 83 f8 fd 76 08 48 8b 05 dc 49 39 00 c3 31 c0
<80> 7f 08 0f 48 0f 44 c7 c3 48 89 f8 c3 31 c0 48 85 ff 74 0d f6
[79721.755166] RIP  [<ffffffff8034d147>] acpi_ns_map_handle_to_node+0x14/0x1d
[79721.755166]  RSP <ffff81007d197d28>
[79721.755166] CR2: 000000340000000e
[79721.755166] ---[ end trace 05a1f30122eb8809 ]---
Comment 1 Lukas Hejtmanek 2008-02-02 02:08:59 UTC
Created attachment 14682 [details]
acpidump output
Comment 2 Len Brown 2008-02-04 10:40:17 UTC
does 2.6.24 work?
Comment 3 Lukas Hejtmanek 2008-02-04 10:45:18 UTC
I did not test 2.6.24. 2.6.24-rc8 was OK and since there are no big changes between 2.6.24 and 2.6.24-rc8, I can assume that 2.6.24 works as well. But if you want me to check it to be sure, I can do that but it takes some time..
Comment 4 Lukas Hejtmanek 2008-02-07 01:41:43 UTC
2.6.24 is OK, at least it does not happen every time.
Comment 5 Rafael J. Wysocki 2008-02-13 11:42:15 UTC
This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.

Regressions list annotation:
Handled-By : Len Brown <lenb@kernel.org>
Comment 6 Lukas Hejtmanek 2008-02-15 04:57:11 UTC
I did not see this oops in 2.6.25-rc1, but I did not huge testing. I reboot often due to various other problems so I do not highly utilize undocking before suspend as before.
Comment 7 Lukas Hejtmanek 2008-02-20 14:48:35 UTC
well, in 2.6.25-rc2-git2, it happened again. It seems that it does not happen always, just sometimes.
Comment 8 Lukas Hejtmanek 2008-02-22 09:37:23 UTC
It looks like this trace precedes the undocking oops. I.e., I can see this trace and after it, undocking produces the oops mentioned above.

[86836.756886] ACPI: \_SB_.GDCK - docking
[83791.360019] ata4.00: configured for UDMA/33
[86837.177682] acpi IBM0079:01: Suspicious device_add during suspend
[86837.177689] Pid: 71, comm: kacpi_notify Not tainted 2.6.25-rc2-git2 #6
[86837.177693]
[86837.177694] Call Trace:
[86837.177705]  [<ffffffff80397070>] pm_sleep_lock+0x10/0x20
[86837.177717]  [<ffffffff803905ac>] device_add+0x5c/0x590
[86837.177723]  [<ffffffff80359b83>] acpi_ut_release_mutex+0x5f/0x63
[86837.177730]  [<ffffffff8035b9c4>] acpi_bus_data_handler+0x0/0x1
[86837.177737]  [<ffffffff8035c8d8>] acpi_add_single_object+0xafa/0xcc1
[86837.177745]  [<ffffffff8028ef64>] kmem_cache_free+0x14/0xb0
[86837.177754]  [<ffffffff80342cb2>] acpi_os_release_object+0x9/0xd
[86837.177761]  [<ffffffff80358fea>] acpi_ut_update_ref_count+0x50/0x9d
[86837.177768]  [<ffffffff80359112>] acpi_ut_update_object_reference+0xdb/0x136
[86837.177776]  [<ffffffff8035ccb1>] acpi_bus_add+0x1c/0x32
[86837.177783]  [<ffffffff8036025c>] hotplug_dock_devices+0xec/0x117
[86837.177788]  [<ffffffff8035b9c4>] acpi_bus_data_handler+0x0/0x1
[86837.177794]  [<ffffffff80342eb6>] acpi_os_execute_deferred+0x0/0x2c
[86837.177801]  [<ffffffff803609fc>] dock_notify+0x7e/0xcb
[86837.177807]  [<ffffffff80348819>] acpi_ev_notify_dispatch+0x57/0x60
[86837.177813]  [<ffffffff80342ed9>] acpi_os_execute_deferred+0x23/0x2c
[86837.177820]  [<ffffffff8024630f>] run_workqueue+0xbf/0x160
[86837.177826]  [<ffffffff80246d50>] worker_thread+0x0/0x100
[86837.177831]  [<ffffffff80246d50>] worker_thread+0x0/0x100
[86837.177837]  [<ffffffff80246def>] worker_thread+0x9f/0x100
[86837.177844]  [<ffffffff80249f50>] autoremove_wake_function+0x0/0x30
[86837.177850]  [<ffffffff80246d50>] worker_thread+0x0/0x100
[86837.177856]  [<ffffffff80246d50>] worker_thread+0x0/0x100
[86837.177860]  [<ffffffff80249b5b>] kthread+0x4b/0x80
[86837.177868]  [<ffffffff8020d128>] child_rip+0xa/0x12
[86837.177874]  [<ffffffff80249b10>] kthread+0x0/0x80
[86837.177879]  [<ffffffff8020d11e>] child_rip+0x0/0x12
[86837.177883]
[86837.177885] ACPI: Error adding device IBM0079:01<7>PM: Writing back config space on device 0000:03:00.0 at offset 1 (was 100106, writing 100102)
Comment 9 Rafael J. Wysocki 2008-02-22 17:00:51 UTC
Does it happen during a suspend?
Comment 10 Alan Stern 2008-02-22 20:32:52 UTC
The stack dump would be easier to interpret if the kernel was build with CONFIG_FRAME_POINTER.  Apparently acpi_add_single_object() is the culprit, and it was called from within a workqueue.  It may have been a coincidence that the workqueue was running during a suspend.
Comment 11 Rafael J. Wysocki 2008-02-23 13:00:08 UTC
Created attachment 14961 [details]
Patch removing the locking of devices from the suspend core

Lukas, can you please test with the attached patch applied?
Comment 12 Alan Stern 2008-02-23 15:27:02 UTC
To any ACPI experts reading this bug report:  Is it truly necessary to register a new device (acpi_add_single_object) while the system is suspending or resuming?  

Would it be okay to block the device_add() call until after the resume is finished, instead of failing it?
Comment 13 Rafael J. Wysocki 2008-02-24 12:31:59 UTC
Created attachment 14964 [details]
Patch removing the locking of devices from the suspend core (updated)

The previously attached patch was incomplete.  Please test this one instead.
Comment 14 Lukas Hejtmanek 2008-02-24 15:06:43 UTC
(In reply to comment #9)
> Does it happen during a suspend?
> 

comment #8 happens during resume. (I suspend at home and resume at work in a dock station)

(why the hell I did not receive any mails as this bug was updated?)
Comment 15 Rafael J. Wysocki 2008-02-24 15:08:29 UTC
Created attachment 14965 [details]
Patch removing the locking of devices from the suspend core (updated 2x)

Both previously attached patches were incomplete.
Comment 16 Lukas Hejtmanek 2008-02-24 15:10:16 UTC
(In reply to comment #10)
> It may have been a coincidence that the
> workqueue was running during a suspend.

I think it is because it does not happen every time.
Comment 17 Rafael J. Wysocki 2008-02-24 17:16:49 UTC
(In reply to comment #14)
> (In reply to comment #9)
> > Does it happen during a suspend?
> > 
> comment #8 happens during resume. (I suspend at home and resume at work in a
> dock station)

Ah, thanks, that's important.

> (why the hell I did not receive any mails as this bug was updated?)

Bugzilla problem, I guess.

Regressions list annotation:
Patch : http://marc.info/?l=linux-acpi&m=120389632114090&w=2 



Note You need to log in before you can comment on or make changes to this bug.