Bug 9624 - reboot on AC plug event - 2.6.23-rc4 regression - Thinkpad R51
reboot on AC plug event - 2.6.23-rc4 regression - Thinkpad R51
Status: REJECTED UNREPRODUCIBLE
Product: ACPI
Classification: Unclassified
Component: Power-Processor
All Linux
: P1 high
Assigned To: ykzhao
:
Depends on:
Blocks: 7216 9243
  Show dependency treegraph
 
Reported: 2007-12-22 16:09 UTC by Rafael J. Wysocki
Modified: 2008-06-13 19:41 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.24-rc2
Tree: Mainline
Regression: Yes


Attachments
dmesg (25.12 KB, text/plain)
2007-12-26 13:26 UTC, Matt Mackall
Details
acpidump (230.97 KB, text/plain)
2007-12-26 13:26 UTC, Matt Mackall
Details

Description Rafael J. Wysocki 2007-12-22 16:09:20 UTC
Subject         : ACPI or radeon: spontaneous reboot regression
Submitter       : Matt Mackall <mpm@selenic.com>
References      : http://lkml.org/lkml/2007/12/22/139
Comment 1 Fu Michael 2007-12-24 17:25:28 UTC
Matt, would you please provide:

1) acpidump
2) dmesg

thanks.
Comment 2 Matt Mackall 2007-12-26 13:26:36 UTC
Created attachment 14191 [details]
dmesg
Comment 3 Matt Mackall 2007-12-26 13:26:58 UTC
Created attachment 14192 [details]
acpidump
Comment 4 Matt Mackall 2007-12-26 16:29:59 UTC
Looks like this patch is to blame:

ACPI: EC: Fix "no battery" regression

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=c019b1933015ee31366eeaa085bad3ee9516991c

Reverting it against tip makes the reboots go away.

The original bug: http://bugzilla.kernel.org/show_bug.cgi?id=8886
Comment 5 Zhang Rui 2007-12-26 17:23:07 UTC
Please attach your acpidump output.
Comment 6 Matt Mackall 2007-12-26 22:05:35 UTC
Is the attachment in comment #3 not what you wanted?
Comment 7 Alexey Starikovskiy 2007-12-27 01:46:47 UTC
Matt,
Reversing the patch you've mentioned is like disabling interrupts -- sure it will 'cure' the broken interrupt handler by not calling it... By reverting the patch you disable notifications from embedded controller to all ACPI drivers -- thermal, battery and AC adapter at least.
Could you please try to run the latest -rc kernel and disable modules mentioned above one by one? 
Comment 8 Len Brown 2007-12-27 21:11:52 UTC
So 2.6.23-rc3 and before worked fine,
but 2.6.23-rc4 and later (eg 2.6.23, 2.6.24-rc6) all fail this way?
Comment 9 Matt Mackall 2007-12-27 21:19:36 UTC
Yes, bisection shows kernels shortly after 23-rc3 failing. Will try to fiddle with thermal, battery, and AC settings tomorrow.
Comment 10 Matt Mackall 2008-01-03 17:42:37 UTC
Ok, a few tomorrows later.. I can't reproduce it after disabling ACPI_THERMAL, haven't tried the others.
Comment 11 Len Brown 2008-01-07 15:35:31 UTC
Please re-enable ACPI_THERMAL in the build,
and verify that boot-time thermal.off=1
is sufficient to prevent the reboot.

if yes, does thermal.nocrt=1 also stop the reboot
and result in any additional dmesg upon the AC plug event?

please paste the contents of /proc/acpi/thermal_zone/*/* here

to eliminate any user-space interactions, please see
if the reboot occurs when you do AC plug events when
booted only up to single user mode.

Comment 12 Matt Mackall 2008-01-07 15:45:37 UTC
Will do. I've noticed another related problem: recent kernels usually reboot immediately upon resume from s2ram. Disabling thermal doesn't help.
Comment 13 Matt Mackall 2008-01-12 10:28:10 UTC
Ok, ACPI_THERMAL was not the problem. With ACPI_THERMAL disabled in config, pulling the power would cause the system to reboot after about 4 seconds. This was missed by my plug/unplug tests.

The problem appears to be with ACPI_PROCESSOR. Disabling all ACPI options but ACPI_PROCESSOR exhibits the problem, enabling all but ACPI_PROCESSOR doesn't.
Comment 14 Matt Mackall 2008-01-14 13:53:40 UTC
A couple more observations:

In 2.6.23-rc3, Gnome battery charge monitor properly displayed the state of the AC adapter when plugging/unplugging. 2.6.24-rc7 doesn't notice changes most of the time.

Also, at some point along the road, my Fn-F5 to toggle Bluetooth stopped working.
Comment 15 Len Brown 2008-01-14 21:57:32 UTC
it sounds like just about everything is breaking on this box.
Maybe sumthing low level, such as the EC is toasted.
Can you exclude any ibm or thinkpad drivers from the config?

Are ACPI events working?  (kill acpid, cat /proc/acpi/event,
and press the power button a few times should show event strings)

Re: processor driver is related to the failure
same with "idle=poll"?  That will disable the C-state stuff.
The other part is cpufreq use of the processor driver,
which you can disable via CONFIG_CPU_FREQ=n
Comment 16 Matt Mackall 2008-01-16 16:11:31 UTC
We can rule out hardware issues as switching back to 2.6.23-rc3 eliminates the problem. This is a purely software regression.

Removing Thinkpad ACPI bits fixes the Bluetooth button, so I think that's an unrelated issue.

idle=poll and CONFIG_CPU_FREQ=n don't help. Turning off CONFIG_ACPI_PROCESSOR seems to basically be disabling my cpufreq setup anyway as it seems to depend on CONFIG_X86_ACPI_CPUFREQ to actually do anything.

Comment 17 Matt Mackall 2008-01-30 14:58:26 UTC
Any further thoughts, guys? This is a big impediment to me hacking on new kernels.
Comment 18 Rafael J. Wysocki 2008-01-30 15:32:42 UTC
Well, I'd run a bisection at this point ...
Comment 19 Matt Mackall 2008-01-30 15:35:15 UTC
I did that a month ago and arrived at the patch mentioned in #4.
Comment 20 Rafael J. Wysocki 2008-01-30 15:46:54 UTC
Does reverting it also fix the s2ram issue?
Comment 21 Matt Mackall 2008-01-30 16:27:47 UTC
Ignore the s2ram issue. Message #10 was wrong, so the observation about s2ram was still with a broken ACPI system.

With the patch reverted, the machine is unresponsive to ACPI events. So it's not a real fix. The good kernel responds to ACPI events, properly suspends and resumes, and cpufreqd works. The bad kernel reboots on ACPI events. My test was for rebooting on ACPI events so when it hit the patch that re-enabled them, it appeared to be the culprit.

Perhaps I need to bisect from a different direction.
Comment 22 Shaohua 2008-05-13 19:40:10 UTC
Since there is about 4m delay to reboot after you unplug AC, we can get some info what is doing in ACPI side.
can you open acpi debug option (cat 0xffffffff > /sys/modules/acpi/parameters/debug_level. And just after AC is unpluged, type 'dmesg > logfile; sync', and attach the logfile here?
Comment 23 Matt Mackall 2008-05-14 09:25:48 UTC
This laptop's LCD inverter died in February.
Comment 24 ykzhao 2008-05-19 00:34:52 UTC
Hi, Matt
    Can you continue to do the test as required in comment #22 on your laptop? 
    If can, please also try the following test.
    a. boot the system with the thermal.nocrt=1
    b. kill the process  who is using the /proc/acpi/event ( the PID can be obtained by using "lsof /proc/acpi/event")
    c. plug/unplug the AC adapter and see whether the system will be rebooted.

    Please boot the system normally and attach the output of "cat /proc/acpi/thermal/THM0/* "
    thanks.
    
Comment 25 Len Brown 2008-06-13 19:41:00 UTC
> This laptop's LCD inverter died in February.

Please re-open if this laptop becomes available for debugging.

Note You need to log in before you can comment on or make changes to this bug.