Bug 16244
Summary: | Battery insert while system is asleep causes sometimes a crash | ||
---|---|---|---|
Product: | ACPI | Reporter: | Maxim Levitsky (maximlevitsky) |
Component: | Power-Battery | Assignee: | Lan Tianyu (tianyu.lan) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | andi-bz, astarikovskiy, hmh, lenb, ming.m.lin, rui.zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.35-rc3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
acpidemp
patch-16244-do-not-invoke-power_supply_changed-twice patch-16244-do-not-invoke-power_supply_changed-twice debug patch debug patch v2 the kernel log debug patch another dmesg of a crash with the debug patch battery_debug.patch |
Description
Maxim Levitsky
2010-06-18 00:30:55 UTC
Created attachment 26833 [details]
acpidemp
Created attachment 26978 [details]
patch-16244-do-not-invoke-power_supply_changed-twice
please try this patch and see if it helps.
please re-open the bug report if the patch doesn't help. I can't test this patch now. really sorry.. but I will test as soon as I can. Big thanks for this. Rui, Please move first half of the patch two lines down, or you might derefence null pointer. Created attachment 27034 [details]
patch-16244-do-not-invoke-power_supply_changed-twice
how about this one?
marking bug as RESOLVED as there is a proposed fix in comment #6 I finally have time to continue my work in linux. I applied that patch (+ WARN_ON to see if this condition triggers). I see if this problem still exits, and hope that this patch fixes it. Patch isn't upstream yet? patch is upstream in 2.6.35-rc6 post git1 commit 153e500f516329f439856f52ccbf61d1fd1a946a Author: Zhang Rui <rui.zhang@intel.com> Date: Wed Jul 7 09:11:57 2010 +0800 ACPI battery: don't invoke power_supply_changed twice when battery is hot-added closed. Unfortunately just got another kernel kaboom: This is latest git. Same thing, system was suspended, then I decided to move it to another room. I inserted the battery, disconnected power, and turned the system on.... <2>[ 1434.270090] kernel BUG at /home/maxim/software/kernel/linux-2.6/kernel/workqueue.c:312! <0>[ 1434.270096] invalid opcode: 0000 [#1] PREEMPT SMP <0>[ 1434.270101] last sysfs file: /sys/power/state <4>[ 1434.270105] CPU 0 <4>[ 1434.270107] Modules linked in: ir_lirc_codec lirc_dev ir_sony_decoder ir_jvc_decoder ir_rc6_decoder ene_ir ir_rc5_decoder ir_nec_decoder ir_core nvidia(P) af_packet nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc iwl3945 iwlcore uvcvideo mac80211 videodev usb_storage v4l2_compat_ioctl32 snd_hda_codec_realtek usb_libusual cpufreq_powersave cpufreq_conservative uhci_hcd snd_hda_intel sdhci_pci cpufreq_userspace ehci_hcd acpi_cpufreq mperf sdhci usbcore snd_hda_codec iTCO_wdt cfg80211 snd_hwdep mmc_core r852 tg3 joydev snd_pcm iTCO_vendor_support sm_common nand coretemp nand_ids nand_ecc psmouse battery ac video libphy mtd snd_page_alloc serio_raw evdev sg [last unloaded: ir_core] <4>[ 1434.270169] <4>[ 1434.270173] Pid: 145, comm: kacpi_notify Tainted: P 2.6.35-rc6+ #98 Nettiling/Aspire 5720 <4>[ 1434.270179] RIP: 0010:[<ffffffff81056364>] [<ffffffff81056364>] queue_work_on+0x44/0x50 <4>[ 1434.270191] RSP: 0018:ffff88007f19bcd0 EFLAGS: 00010213 <4>[ 1434.270195] RAX: ffff88007e374908 RBX: ffff88007f8195a0 RCX: 0000000000000000 <4>[ 1434.270200] RDX: ffff88007e374900 RSI: ffff88007f8195a0 RDI: 0000000000000000 <4>[ 1434.270204] RBP: ffff88007f19bcd0 R08: 0000000000000000 R09: 0000000000000000 <4>[ 1434.270209] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007e374900 <4>[ 1434.270214] R13: ffff88007f0a4000 R14: 0000000000000080 R15: ffff88007f1a0000 <4>[ 1434.270220] FS: 0000000000000000(0000) GS:ffff880002400000(0000) knlGS:0000000000000000 <4>[ 1434.270225] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>[ 1434.270230] CR2: 00007f22b96a56b0 CR3: 0000000001559000 CR4: 00000000000006f0 <4>[ 1434.270234] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[ 1434.270239] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[ 1434.270245] Process kacpi_notify (pid: 145, threadinfo ffff88007f19a000, task ffff88007f1a0000) <0>[ 1434.270250] Stack: <4>[ 1434.270252] ffff88007f19bcf0 ffffffff810563b9 ffff88007e374898 ffff88004f0b1800 <4>[ 1434.270258] <0> ffff88007f19bd00 ffffffff81056403 ffff88007f19bd30 ffffffff812c49ee <4>[ 1434.270265] <0> ffff88004f0b1800 ffff88007f0a4000 0000000000000080 ffff88007e374800 <0>[ 1434.270272] Call Trace: <4>[ 1434.270277] [<ffffffff810563b9>] queue_work+0x29/0x60 <4>[ 1434.270283] [<ffffffff81056403>] schedule_work+0x13/0x20 <4>[ 1434.270290] [<ffffffff812c49ee>] power_supply_changed+0x1e/0x70 <4>[ 1434.270299] [<ffffffffa007c6e8>] acpi_battery_notify+0x7d/0x86 [battery] <4>[ 1434.270307] [<ffffffff81211629>] ? acpi_os_execute_deferred+0x0/0x31 <4>[ 1434.270313] [<ffffffff8121466e>] acpi_device_notify+0x14/0x16 <4>[ 1434.270320] [<ffffffff81222ff8>] acpi_ev_notify_dispatch+0x62/0x7a <4>[ 1434.270326] [<ffffffff8121164d>] acpi_os_execute_deferred+0x24/0x31 <4>[ 1434.270332] [<ffffffff81055df0>] worker_thread+0x220/0x390 <4>[ 1434.270338] [<ffffffff81055d9e>] ? worker_thread+0x1ce/0x390 <4>[ 1434.270345] [<ffffffff8105ac00>] ? autoremove_wake_function+0x0/0x40 <4>[ 1434.270351] [<ffffffff81055bd0>] ? worker_thread+0x0/0x390 <4>[ 1434.270356] [<ffffffff8105a77e>] kthread+0xae/0xc0 <4>[ 1434.270363] [<ffffffff81003bd4>] kernel_thread_helper+0x4/0x10 <4>[ 1434.270369] [<ffffffff8105a6d0>] ? kthread+0x0/0xc0 <4>[ 1434.270374] [<ffffffff81003bd0>] ? kernel_thread_helper+0x0/0x10 <0>[ 1434.270378] Code: 75 29 44 8b 46 20 45 85 c0 75 24 48 8b 06 48 63 ff 48 89 d6 48 03 04 fd 60 f7 5a 81 48 89 c7 e8 03 ff ff ff b8 01 00 00 00 c9 c3 <0f> 0b eb fe 8b 3d 92 96 55 00 eb d4 55 48 89 f2 48 89 e5 48 8b <1>[ 1434.270416] RIP [<ffffffff81056364>] queue_work_on+0x44/0x50 <4>[ 1434.270422] RSP <ffff88007f19bcd0> <4>[ 1434.270427] ---[ end trace 76a5caa882a7a1de ]--- Created attachment 27460 [details]
debug patch
please apply this patch on top of the latest git tree, say 2.6.36-rc1 and attach the dmesg output after resumed.
BTW: the crash should go away after applying this patch, but that's not a fix, I just need to get the dmesg output after resume. :) Just one question, should I do ordinary suspend/resume, or try to reproduce the bug by inserting/removing the battery. BTW I still can't reproduce that reliably. Thanks! Created attachment 27463 [details]
debug patch v2
then please try this patch.
and insert the battery before resume, for several times, and attach the dmesg output until you get "Rui: bug reproduced!" in dmesg.
I tried that. Don't see the 'Rui: bug reproduced!' at all. I do see a message about invoking power_supply_changed twice. I post a proper log soon. Don't have much time now System blew up again, now with your debug patch. Created attachment 29032 [details]
the kernel log
Created attachment 32802 [details]
debug patch
please apply this patch on top of the latest git kernel and see if it helps.
If no, please attach the full log when the problem occurs.
bug closed as there is no response from the bug reporter. please feel free to re-open it if the problem still exists in the latest upstream kernel after applying the debug patch. I am also observing hangs on 2.6.35.11 when battery is either plugged or unplugged (i.e. "plugged" state changes) during S3 sleep. My box (a ThinkPad T43) is known good, has stable firmware, and works perfectly with 2.6.32.y and earlier kernels. I will reopen the bug after I check things a little more. I do not have enough access rights to reopen this bug. Please reopen it. The "debug" patch cures the problem, after a typo is fixed so that it compiles. The printk is never triggered here, but the system does not hang anymore, i.e., unsafe use of delayed work queues in the power supply class is to blame. This really needs to be fixed for the 2.6.35-longterm, regardless of whether it has been fixed directly or indirectly in mainline. This bug can cause data loss, it managed to hang the box here with unflushed buffers (maybe ext4 in 2.6.35 does not flush delayed allocation buffers on PM sync, or the hang happened after stuff started untawing and syslog tried to write data). Cc'ing 2.6.35-longterm maintainer, to make sure it does not fall through the cracks. Created attachment 52892 [details]
another dmesg of a crash with the debug patch
Well that still happens, really
Created attachment 66522 [details]
battery_debug.patch
Please try this patch.
Lan: I did track it down to: "unsafe use of delayed work queues in the power supply class is to blame" (see comment #21). THAT _also_ needs to be fixed, otherwise you will still get a crash if AC state is different on suspend/resume. Tianyu, what's the current status of this bug? Wait for feedback of the debug path in the comment #23. hi Maxim: Since the patch has been accepted by upstream. Now you can test whether this bug still exits in the newest kernel or not. I don't get it anymore, but it always was very rare. Thats why I had no chance to find out what triggers it. But yes, I guess it can be closed, at least until I get it again. (In reply to comment #27) > I don't get it anymore, but it always was very rare. Thats why I had no > chance > to find out what triggers it. > But yes, I guess it can be closed, at least until I get it again. Good to know. Bug closed |