Bug 35642 - On resume, I sometimes get a kernel oops with led_trigger_unregister_simple
Summary: On resume, I sometimes get a kernel oops with led_trigger_unregister_simple
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Battery (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lan Tianyu
URL:
Keywords:
Depends on:
Blocks: 7216 32012
  Show dependency tree
 
Reported: 2011-05-23 00:15 UTC by Sven-Hendrik Haase
Modified: 2016-02-22 04:11 UTC (History)
8 users (show)

See Also:
Kernel Version: 2.6.39
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
relevant kernel.log section (27.54 KB, text/plain)
2011-05-23 00:15 UTC, Sven-Hendrik Haase
Details
lspci -vvv (17.92 KB, text/plain)
2011-05-23 00:22 UTC, Sven-Hendrik Haase
Details
lspci -vvv (15.96 KB, text/plain)
2011-05-24 20:24 UTC, Matz Radloff
Details
/proc/acpi/battery/ while no battery is inserted (429 bytes, text/plain)
2011-05-25 06:41 UTC, Sven-Hendrik Haase
Details
debug patch (2.54 KB, patch)
2011-06-21 01:44 UTC, Lan Tianyu
Details | Diff
debug patch (1.27 KB, patch)
2011-06-21 02:40 UTC, Lan Tianyu
Details | Diff
bko-35642-dmesg-led_trigger_unregister_simple.log (30.32 KB, text/plain)
2011-09-02 08:23 UTC, Leho Kraav
Details
bko-35642-dmesg-led_trigger_unregister_simple-post-patch.log (16.73 KB, text/plain)
2011-09-02 13:34 UTC, Leho Kraav
Details

Description Sven-Hendrik Haase 2011-05-23 00:15:05 UTC
Created attachment 59042 [details]
relevant kernel.log section

On kernel 2.6.39, I sometimes get the attached oops during resume. The log I attached contains the complete sleep-resume cycle and as you can see, I'm suspending into memory. I'm certainly no kernel hacker but from what the trace lets me guess, conky (which also tries to get battery level for me) explodes the kernel on resume since it can't find a battery? My battery is usually not inserted.

It appears to be a regression as I cannot remember seeing this in 2.6.28 or earlier.

Relevant system info and software: 
software: conky, kde4, kde4 battery monitor
uname -a: Linux smith 2.6.39-ARCH #1 SMP PREEMPT Fri May 20 11:33:59 CEST 2011 x86_64 Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz GenuineIntel GNU/Linux
distro: Arch Linux x86_64
Comment 1 Sven-Hendrik Haase 2011-05-23 00:22:20 UTC
Created attachment 59052 [details]
lspci -vvv
Comment 2 Len Brown 2011-05-24 01:57:53 UTC
so if conky is not running, this OOPS doesn't happen?
It looks like it accesses a proc battery interface.
However, I'm surprised that such an interface
exists if the battery is not present.

please show the contents of that interface.

please verify that this was not a problem in 2.6.38
Comment 3 Matz Radloff 2011-05-24 20:24:45 UTC
Created attachment 59252 [details]
lspci -vvv

The same problem seems to happen on my system, too. It's strange that it occurs kind of randomly after a few suspend/sleep cycles.

conky, kde4
Linux 2.6.38-ARCH #1 SMP PREEMPT
x86_64 Intel(R) Core(TM)2 Duo CPU T5800 @ 2.00GHz GenuineIntel GNU/Linux
Arch Linux x86_64
Comment 4 Sven-Hendrik Haase 2011-05-25 06:40:30 UTC
I was unable to recreate the oops after multiple attempts without conky. The problem indeed seems to be caused by an access of conky to the acpi battery interface during suspend/resume. This also makes it hard to recreate even with conky running.

For reference, this is what conky does: http://git.omp.am/?p=conky.git;a=blob;f=src/linux.cc;h=880405fe46c10943c94e6e0985418a8409935f52;hb=HEAD#l2080

I attached a file that shows the contents of my /proc/acpi/battery/ when the battery is not inserted. It also shows a BAT1 (BAT0 being my primary battery) that I could make use of if I used an additional battery bay instead of an optical drive but that was never inserted.

Apparently the problem also exists in 2.6.38 as Matz has pointed out.
Comment 5 Sven-Hendrik Haase 2011-05-25 06:41:01 UTC
Created attachment 59322 [details]
/proc/acpi/battery/ while no battery is inserted
Comment 6 Sven-Hendrik Haase 2011-05-25 06:59:51 UTC
Actually, I lied, this is what conky really does in the tagged version that I use: http://git.omp.am/?p=conky.git;a=blob;f=src/linux.c;h=ce5f73335bbf7e306a31264c984e4ed93b8d4283;hb=0b3fbed04520af4b228aa42723e02b5831f1d0c2#l1961
Comment 7 Lan Tianyu 2011-06-21 01:44:28 UTC
Created attachment 63012 [details]
debug patch

Please try this patch. I guess this problem is introduced by commit 25be5821521640eb00b7eb219ffe59664510d073. 
    The battery_notify refreshes the sysfs without check whether the battery exits or not. It will invoke sysfs_add_battery very time. It is not reasonable. This may cause sysfs_remove_battery to invoke power_supply_unregister when battery doesn't exit. 
     The sysfs_remove_battery will be invoked in the acpi_battery_update
and battery_notify. The acpi_battery_update will be invoked when accessing "/proc/acpi/battery/BAT0/state". So when system resumes from suspend with conky, the sysfs_remove_battery maybe be invoked simultaneously.The power_supply_unregister has chance to be invoked twice. This may lead the problem and produces such message.
Comment 8 Lan Tianyu 2011-06-21 02:40:28 UTC
Created attachment 63022 [details]
debug patch

Please try this one.
Comment 9 Sven-Hendrik Haase 2011-06-21 22:07:37 UTC
With a patched kernel I'm completely unable to recreate the issue so far. As it didn't happen every time before, this might just be bad luck, though. I will announce any problems here but so far it looks fine.
Comment 10 Sven-Hendrik Haase 2011-06-24 23:13:36 UTC
It's a few days in and the problem does indeed appear to be fixed.
Comment 11 Florian Mickler 2011-06-25 08:52:27 UTC
Patch: https://bugzilla.kernel.org/attachment.cgi?id=63022
Comment 12 Florian Mickler 2011-06-25 08:55:47 UTC
Tianyu Lan are you submitting this patch for upstream inclusion?
Comment 13 Lan Tianyu 2011-06-26 11:52:20 UTC
No, I will submit it later with other patchs.
Comment 16 Len Brown 2011-07-31 18:36:59 UTC
these two patches are staged in acpi-test for v3.1
Comment 17 Florian Mickler 2011-08-08 08:13:32 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit 9c921c22a7f33397a6774d7fa076db9b6a0fd669
Author: Lan Tianyu <tianyu.lan@intel.com>
Date:   Thu Jun 30 11:34:12 2011 +0800

    ACPI / Battery: Resolve the race condition in the sysfs_remove_battery()
Comment 18 Florian Mickler 2011-08-08 08:14:53 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit 69d94ec6d83d84044252d9ba03f6a8970816e350
Author: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date:   Sat Aug 6 01:34:08 2011 +0300

    Battery: sysfs_remove_battery(): possible circular locking
Comment 19 Florian Mickler 2011-08-08 08:18:21 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit 6e17fb6aa1a67afa1827ae317c3594040f055730
Author: Lan Tianyu <tianyu.lan@intel.com>
Date:   Thu Jun 30 11:33:58 2011 +0800

    ACPI / Battery: Add the check before refresh sysfs in the battery_notify()
Comment 20 Leho Kraav 2011-09-02 08:23:23 UTC
Created attachment 71302 [details]
bko-35642-dmesg-led_trigger_unregister_simple.log

it looks like i might've bumped into this today with acer travelmate 8172. i suspended with AC power disconnected, then reconnected AC before resuming and received this BUG.

$ uname -a Linux travelmate 3.0.2-pf #4 SMP PREEMPT Fri Aug 26 12:39:19 EEST 2011 i686 Intel(R) Core(TM) i3 CPU U 330 @ 1.20GHz GenuineIntel GNU/Linux

patches from 3.1-rc1 seem to apply cleanly when applied in the correct order. so far i'm at compile-test stage, will put into production shortly.

if there's anything known that prevent these patches from working correctly on 3.0 series, i'd appreciate the info.
Comment 21 Leho Kraav 2011-09-02 09:12:08 UTC
initial test cycle seems to confirm that patch is successful. will keep monitoring the situation.
Comment 22 Leho Kraav 2011-09-02 13:34:05 UTC
Created attachment 71322 [details]
bko-35642-dmesg-led_trigger_unregister_simple-post-patch.log

i can now report that my oops' continue. patches seem to have no apparent effect after all. dmesg attached.
Comment 23 Diego Viola 2016-02-22 04:11:36 UTC
I'm having a similar issue here I think?

Bug 112351

Could someone please help?

Note You need to log in before you can comment on or make changes to this bug.