Created attachment 21908 [details] syslog messages System fails, when removing a battery in laptop. Hardware: Thinkpad R400 (7443-C1G) battery in ultrabay and main battery. Steps to reproduce: Replace DVD-RW drive for a ultrabay battery, then try to remove battery. 2.6.29 works fine.
Will you please attach the output of acpidump? From the description it seems that the 2.6.29 can work well. Will you please use the git-bisect to identify the first bad commit which causes the regression? Thanks.
Created attachment 21927 [details] TP-R400 - acpidump Hi, there is output from acpidump. I'm very busy for seek, which commint cause this regression, I try it at the weekend, maybe.
One more time Hi, I think, I found the reason of problem, if is battery in ultrabay undocked, kernel 2.6.29 says: ACPI: \_SB_.PCI0.LPC_.EC__.BAT1 - undocking but 2.6.30: ACPI: \_SB_.PCI0.LPC_.EC__.BAT1 - docking if run command `echo 1 > /sys/devices/platform/dock.1/undock` before removing battery (dock.1 is a battery bay), then is everything good. Something with dock in kernel is wrong, but I dont known whath.
ACPI: \_SB_.PCI0.LPC_.EC__.BAT1 - docking ACPI: Battery Slot [BAT1] (battery present) ------------[ cut here ]------------ WARNING: at kernel/workqueue.c:371 flush_cpu_workqueue+0xa1/0xb0() what happens after this warning, does the system continue to fuction?
Particaly(In reply to comment #4) > ACPI: \_SB_.PCI0.LPC_.EC__.BAT1 - docking > ACPI: Battery Slot [BAT1] (battery present) > ------------[ cut here ]------------ > WARNING: at kernel/workqueue.c:371 flush_cpu_workqueue+0xa1/0xb0() > > what happens after this warning, does the system continue to fuction? it is unusable, inserting a battery is ignored and something is broken - cannot switch virtual console - it frozen, cpufreqd segfault, only thing what help is SysRq key.
hah, it's true. there is a deadlock in the ACPI hotplug mechanism. thanks for finding this, patch will be attached later. :)
Created attachment 21977 [details] patch: fix a deadlock in hotplug case please apply this patch and see if it helps.
*** Bug 13466 has been marked as a duplicate of this bug. ***
Patch doesn't help, there is a problem with NULL pointer, see log: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff80267539>] queue_work_on+0x29/0x80 PGD 0 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0C09:00/PNP0C0A:01/power_supply/BAT1/energy_full CPU 1 Modules linked in: i915 drm i2c_algo_bit ipv6 sco bridge stp llc bnep l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_hda_codec_conexant snd_pcm_oss snd_mixer_oss pcmcia snd_hda_intel snd_hda_codec snd_hwdep arc4 joydev snd_pcm ecb snd_timer sdhci_pci sdhci ohci1394 thinkpad_acpi nvram wmi iwlagn snd video output mmc_core ieee1394 iwlcore rfkill led_class mac80211 soundcore snd_page_alloc yenta_socket rsrc_nonstatic pcmcia_core serio_raw ricoh_mmc sg cfg80211 psmouse uhci_hcd i2c_i801 i2c_core ehci_hcd iTCO_wdt iTCO_vendor_support usbcore pcspkr heci(C) intel_agp e1000e evdev thermal fan button battery ac aes_x86_64 aes_generic dm_crypt dm_mod fuse vboxdrv cpufreq_powersave cpufreq_conservative cpufreq_ondemand acpi_cpufreq freq_table processor coretemp input_polldev rtc_cmos rtc_core rtc_lib ext3 jbd mbcache sd_mod ahci libata scsi_mod Pid: 16, comm: kacpi_notify Tainted: G C 2.6.30-ARCH #1 7443C1G RIP: 0010:[<ffffffff80267539>] [<ffffffff80267539>] queue_work_on+0x29/0x80 RSP: 0018:ffff88007b103c80 EFLAGS: 00010246 RAX: ffff88006c0062d8 RBX: ffff88006c0062c0 RCX: 0000000000000000 RDX: ffff88006c0062d0 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffffffff80401faf R08: ffff880001037c80 R09: 0000000000000000 R10: 0000000000000001 R11: 00000000ffffffff R12: ffff8800789de6c0 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff88000102a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000020 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kacpi_notify (pid: 16, threadinfo ffff88007b102000, task ffff88007b08e500) Stack: 49555f080f0cd041 00000000f4f6e2b5 495056921ca00041 ffffffff8026773a ffffffff80401faf ffff8800789de6c0 0000000000000001 00000000f4f6e2b5 0000000000000020 ffffffff803fb22c 504c304943505f42 0000000000000246 Call Trace: [<ffffffff8026773a>] ? queue_work+0x3a/0x90 [<ffffffff80401faf>] ? acpi_dock_deferred_cb+0x0/0x1a6 [<ffffffff803fb22c>] ? __acpi_os_execute+0x124/0x174 [<ffffffff80401825>] ? acpi_dock_notifier_call+0xf9/0x13d [<ffffffff8027228f>] ? notifier_call_chain+0x4f/0xa0 [<ffffffff8027279e>] ? __blocking_notifier_call_chain+0x6e/0xc0 [<ffffffff803faed0>] ? acpi_os_execute_deferred+0x0/0x6b [<ffffffff803fd737>] ? acpi_bus_notify+0x35/0x91 [<ffffffff8040de26>] ? acpi_ev_notify_dispatch+0x53/0x8d [<ffffffff803faf1b>] ? acpi_os_execute_deferred+0x4b/0x6b [<ffffffff80266431>] ? worker_thread+0x161/0x300 [<ffffffff8026c6d0>] ? autoremove_wake_function+0x0/0x60 [<ffffffff802662d0>] ? worker_thread+0x0/0x300 [<ffffffff8026c0b4>] ? kthread+0x64/0xc0 [<ffffffff8024ae30>] ? schedule_tail+0x30/0x80 [<ffffffff8020d4fa>] ? child_rip+0xa/0x20 [<ffffffff8026c050>] ? kthread+0x0/0xc0 [<ffffffff8020d4f0>] ? child_rip+0x0/0x20 Code: 00 00 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 f0 0f ba 2a 00 19 c9 85 c9 75 31 48 8d 42 08 48 39 42 08 75 49 <44> 8b 46 20 45 85 c0 75 38 48 63 ff 48 8b 06 48 89 d6 48 03 04 RIP [<ffffffff80267539>] queue_work_on+0x29/0x80 RSP <ffff88007b103c80> CR2: 0000000000000020 ---[ end trace 31e2bbece5b888dc ]--- note: kacpi_notify[16] exited with preempt_count 1
Created attachment 22002 [details] refreshed patch please try this refreshed patch instead
(In reply to comment #11) > Created an attachment (id=22002) [details] > refreshed patch > > please try this refreshed patch instead Is seems to be functional, for me. No fails, when inserting/removing battery or HDD or CD-ROM drive in ultrabay.
good news. Mark this bug as Resolved.
Here's the actual deadlock for future reference: <user removes battery, platform generates system-level notify> acpi_bus_notify blocking_notifier_call_chain acpi_dock_notifier_call acpi_os_hotplug_execute(acpi_dock_deferred_cb, ...) __acpi_os_execute(..., acpi_dock_deferred_cb, ...) schedule_work(<acpi_dock_deferred_cb>) queue_work(keventd_wq, <acpi_dock_deferred_cb>) <acpi_dock_deferred_cb queued for execution in keventd_wq> worker_thread(keventd_wq) acpi_os_execute_hp_deferred acpi_dock_deferred_cb dock_notify hotplug_dock_devices dock_remove_acpi_device acpi_bus_trim acpi_bus_remove device_release_driver acpi_device_remove acpi_battery_remove sysfs_remove_battery power_supply_unregister flush_scheduled_work flush_workqueue(keventd_wq) Now we're waiting for keventd_wq to be flushed, but it won't be considered flushed until acpi_os_execute_hp_deferred() completes.
patch applied to acpi-test tree
shipped in linux-2.6.31-rc1 closed