Bug 13533

Summary: 2.6.30 fails when removing second battery - Thinkpad R400
Product: ACPI Reporter: Vojtech Gondzala (vojtech.gondzala)
Component: Power-BatteryAssignee: Zhang Rui (rui.zhang)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, bjorn.helgaas, lenb, pm, rui.zhang, vojtech.gondzala, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id:
Attachments: syslog messages
TP-R400 - acpidump
patch: fix a deadlock in hotplug case
refreshed patch

Description Vojtech Gondzala 2009-06-14 08:45:26 UTC
Created attachment 21908 [details]
syslog messages

System fails, when removing a battery in laptop.

Hardware:
Thinkpad R400 (7443-C1G) battery in ultrabay and main battery.
 
Steps to reproduce:
Replace DVD-RW drive for a ultrabay battery, then try to remove battery.

2.6.29 works fine.
Comment 1 ykzhao 2009-06-15 00:54:52 UTC
Will you please attach the output of acpidump?

From the description it seems that the 2.6.29 can work well. Will you please use the git-bisect to identify the first bad commit which causes the regression?
Thanks.
Comment 2 Vojtech Gondzala 2009-06-15 16:47:48 UTC
Created attachment 21927 [details]
TP-R400 - acpidump

Hi,
there is output from acpidump.

I'm very busy for seek, which commint cause this regression, I try it at the weekend, maybe.
Comment 3 Vojtech Gondzala 2009-06-15 20:38:15 UTC
One more time Hi,

I think, I found the reason of problem,
if is battery in ultrabay undocked, kernel 2.6.29 says:
ACPI: \_SB_.PCI0.LPC_.EC__.BAT1 - undocking
but 2.6.30:
ACPI: \_SB_.PCI0.LPC_.EC__.BAT1 - docking

if run command `echo 1 > /sys/devices/platform/dock.1/undock` before removing battery (dock.1 is a battery bay), then is everything good. Something with dock in kernel is wrong, but I dont known whath.
Comment 4 Len Brown 2009-06-16 01:46:52 UTC
ACPI: \_SB_.PCI0.LPC_.EC__.BAT1 - docking
ACPI: Battery Slot [BAT1] (battery present)
------------[ cut here ]------------
 WARNING: at kernel/workqueue.c:371 flush_cpu_workqueue+0xa1/0xb0()

what happens after this warning, does the system continue to fuction?
Comment 5 Vojtech Gondzala 2009-06-16 04:42:52 UTC
Particaly(In reply to comment #4)
> ACPI: \_SB_.PCI0.LPC_.EC__.BAT1 - docking
> ACPI: Battery Slot [BAT1] (battery present)
> ------------[ cut here ]------------
>  WARNING: at kernel/workqueue.c:371 flush_cpu_workqueue+0xa1/0xb0()
> 
> what happens after this warning, does the system continue to fuction?

it is unusable, inserting a battery is ignored and something is broken - cannot switch virtual console - it frozen, cpufreqd segfault, only thing what help is SysRq key.
Comment 6 Zhang Rui 2009-06-17 06:07:10 UTC
hah, it's true. there is a deadlock in the ACPI hotplug mechanism.
thanks for finding this, patch will be attached later. :)
Comment 7 Zhang Rui 2009-06-18 05:36:42 UTC
Created attachment 21977 [details]
patch: fix a deadlock in hotplug case

please apply this patch and see if it helps.
Comment 8 Zhang Rui 2009-06-18 06:19:24 UTC
*** Bug 13466 has been marked as a duplicate of this bug. ***
Comment 9 Vojtech Gondzala 2009-06-18 16:41:28 UTC
Patch doesn't help, there is a problem with NULL pointer, see log:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffff80267539>] queue_work_on+0x29/0x80                         
PGD 0                                                                    
Oops: 0000 [#1] PREEMPT SMP                                              
last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0C09:00/PNP0C0A:01/power_supply/BAT1/energy_full                                                                                      
CPU 1                                                                                                    
Modules linked in: i915 drm i2c_algo_bit ipv6 sco bridge stp llc bnep l2cap bluetooth snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_hda_codec_conexant snd_pcm_oss snd_mixer_oss pcmcia snd_hda_intel snd_hda_codec snd_hwdep arc4 joydev snd_pcm ecb snd_timer sdhci_pci sdhci ohci1394 thinkpad_acpi nvram wmi iwlagn snd video output mmc_core ieee1394 iwlcore rfkill led_class mac80211 soundcore snd_page_alloc yenta_socket rsrc_nonstatic pcmcia_core serio_raw ricoh_mmc sg cfg80211 psmouse uhci_hcd i2c_i801 i2c_core ehci_hcd iTCO_wdt iTCO_vendor_support usbcore pcspkr heci(C) intel_agp e1000e evdev thermal fan button battery ac aes_x86_64 aes_generic dm_crypt dm_mod fuse vboxdrv cpufreq_powersave cpufreq_conservative cpufreq_ondemand acpi_cpufreq freq_table processor coretemp input_polldev rtc_cmos rtc_core rtc_lib ext3 jbd mbcache sd_mod ahci libata scsi_mod                                                         
Pid: 16, comm: kacpi_notify Tainted: G         C 2.6.30-ARCH #1 7443C1G                                  
RIP: 0010:[<ffffffff80267539>]  [<ffffffff80267539>] queue_work_on+0x29/0x80                             
RSP: 0018:ffff88007b103c80  EFLAGS: 00010246                                                             
RAX: ffff88006c0062d8 RBX: ffff88006c0062c0 RCX: 0000000000000000                                        
RDX: ffff88006c0062d0 RSI: 0000000000000000 RDI: 0000000000000001                                        
RBP: ffffffff80401faf R08: ffff880001037c80 R09: 0000000000000000                                        
R10: 0000000000000001 R11: 00000000ffffffff R12: ffff8800789de6c0                                        
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000040                                        
FS:  0000000000000000(0000) GS:ffff88000102a000(0000) knlGS:0000000000000000                             
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b                                                        
CR2: 0000000000000020 CR3: 0000000000201000 CR4: 00000000000006e0                                        
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                        
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400                                        
Process kacpi_notify (pid: 16, threadinfo ffff88007b102000, task ffff88007b08e500)                       
Stack:                                                                                                   
 49555f080f0cd041 00000000f4f6e2b5 495056921ca00041 ffffffff8026773a                                     
 ffffffff80401faf ffff8800789de6c0 0000000000000001 00000000f4f6e2b5                                     
 0000000000000020 ffffffff803fb22c 504c304943505f42 0000000000000246                                     
Call Trace:                                                                                              
 [<ffffffff8026773a>] ? queue_work+0x3a/0x90                                                             
 [<ffffffff80401faf>] ? acpi_dock_deferred_cb+0x0/0x1a6                                                  
 [<ffffffff803fb22c>] ? __acpi_os_execute+0x124/0x174
 [<ffffffff80401825>] ? acpi_dock_notifier_call+0xf9/0x13d
 [<ffffffff8027228f>] ? notifier_call_chain+0x4f/0xa0
 [<ffffffff8027279e>] ? __blocking_notifier_call_chain+0x6e/0xc0
 [<ffffffff803faed0>] ? acpi_os_execute_deferred+0x0/0x6b
 [<ffffffff803fd737>] ? acpi_bus_notify+0x35/0x91
 [<ffffffff8040de26>] ? acpi_ev_notify_dispatch+0x53/0x8d
 [<ffffffff803faf1b>] ? acpi_os_execute_deferred+0x4b/0x6b
 [<ffffffff80266431>] ? worker_thread+0x161/0x300
 [<ffffffff8026c6d0>] ? autoremove_wake_function+0x0/0x60
 [<ffffffff802662d0>] ? worker_thread+0x0/0x300
 [<ffffffff8026c0b4>] ? kthread+0x64/0xc0
 [<ffffffff8024ae30>] ? schedule_tail+0x30/0x80
 [<ffffffff8020d4fa>] ? child_rip+0xa/0x20
 [<ffffffff8026c050>] ? kthread+0x0/0xc0
 [<ffffffff8020d4f0>] ? child_rip+0x0/0x20
Code: 00 00 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 f0 0f ba 2a 00 19 c9 85 c9 75 31 48 8d 42 08 48 39 42 08 75 49 <44> 8b 46 20 45 85 c0 75 38 48 63 ff 48 8b 06 48 89 d6 48 03 04
RIP  [<ffffffff80267539>] queue_work_on+0x29/0x80
 RSP <ffff88007b103c80>
CR2: 0000000000000020
---[ end trace 31e2bbece5b888dc ]---
note: kacpi_notify[16] exited with preempt_count 1
Comment 10 Zhang Rui 2009-06-19 02:09:45 UTC
*** Bug 13466 has been marked as a duplicate of this bug. ***
Comment 11 Zhang Rui 2009-06-19 03:01:09 UTC
Created attachment 22002 [details]
refreshed patch

please try this refreshed patch instead
Comment 12 Vojtech Gondzala 2009-06-19 04:56:53 UTC
(In reply to comment #11)
> Created an attachment (id=22002) [details]
> refreshed patch
> 
> please try this refreshed patch instead

Is seems to be functional, for me. No fails, when inserting/removing battery or HDD or CD-ROM drive in ultrabay.
Comment 13 Zhang Rui 2009-06-19 05:09:51 UTC
good news.
Mark this bug as Resolved.
Comment 14 Bjorn Helgaas 2009-06-23 21:57:39 UTC
Here's the actual deadlock for future reference:

  <user removes battery, platform generates system-level notify>
    acpi_bus_notify
      blocking_notifier_call_chain
        acpi_dock_notifier_call
          acpi_os_hotplug_execute(acpi_dock_deferred_cb, ...)
            __acpi_os_execute(..., acpi_dock_deferred_cb, ...)
              schedule_work(<acpi_dock_deferred_cb>)
                queue_work(keventd_wq, <acpi_dock_deferred_cb>)
                  <acpi_dock_deferred_cb queued for execution in keventd_wq>

  worker_thread(keventd_wq)
    acpi_os_execute_hp_deferred
      acpi_dock_deferred_cb
        dock_notify
          hotplug_dock_devices
            dock_remove_acpi_device
              acpi_bus_trim
                acpi_bus_remove
                  device_release_driver
                    acpi_device_remove
                      acpi_battery_remove
                        sysfs_remove_battery
                          power_supply_unregister
                            flush_scheduled_work
                              flush_workqueue(keventd_wq)

Now we're waiting for keventd_wq to be flushed, but it won't be
considered flushed until acpi_os_execute_hp_deferred() completes.
Comment 15 Len Brown 2009-06-24 03:25:10 UTC
patch applied to acpi-test tree
Comment 16 Len Brown 2009-06-25 16:06:53 UTC
shipped in linux-2.6.31-rc1
closed