Subject : ACPI or radeon: spontaneous reboot regression
Submitter : Matt Mackall <email@example.com>
References : http://lkml.org/lkml/2007/12/22/139
Matt, would you please provide:
Created attachment 14191 [details]
Created attachment 14192 [details]
Looks like this patch is to blame:
ACPI: EC: Fix "no battery" regression
Reverting it against tip makes the reboots go away.
The original bug: http://bugzilla.kernel.org/show_bug.cgi?id=8886
Please attach your acpidump output.
Is the attachment in comment #3 not what you wanted?
Reversing the patch you've mentioned is like disabling interrupts -- sure it will 'cure' the broken interrupt handler by not calling it... By reverting the patch you disable notifications from embedded controller to all ACPI drivers -- thermal, battery and AC adapter at least.
Could you please try to run the latest -rc kernel and disable modules mentioned above one by one?
So 2.6.23-rc3 and before worked fine,
but 2.6.23-rc4 and later (eg 2.6.23, 2.6.24-rc6) all fail this way?
Yes, bisection shows kernels shortly after 23-rc3 failing. Will try to fiddle with thermal, battery, and AC settings tomorrow.
Ok, a few tomorrows later.. I can't reproduce it after disabling ACPI_THERMAL, haven't tried the others.
Please re-enable ACPI_THERMAL in the build,
and verify that boot-time thermal.off=1
is sufficient to prevent the reboot.
if yes, does thermal.nocrt=1 also stop the reboot
and result in any additional dmesg upon the AC plug event?
please paste the contents of /proc/acpi/thermal_zone/*/* here
to eliminate any user-space interactions, please see
if the reboot occurs when you do AC plug events when
booted only up to single user mode.
Will do. I've noticed another related problem: recent kernels usually reboot immediately upon resume from s2ram. Disabling thermal doesn't help.
Ok, ACPI_THERMAL was not the problem. With ACPI_THERMAL disabled in config, pulling the power would cause the system to reboot after about 4 seconds. This was missed by my plug/unplug tests.
The problem appears to be with ACPI_PROCESSOR. Disabling all ACPI options but ACPI_PROCESSOR exhibits the problem, enabling all but ACPI_PROCESSOR doesn't.
A couple more observations:
In 2.6.23-rc3, Gnome battery charge monitor properly displayed the state of the AC adapter when plugging/unplugging. 2.6.24-rc7 doesn't notice changes most of the time.
Also, at some point along the road, my Fn-F5 to toggle Bluetooth stopped working.
it sounds like just about everything is breaking on this box.
Maybe sumthing low level, such as the EC is toasted.
Can you exclude any ibm or thinkpad drivers from the config?
Are ACPI events working? (kill acpid, cat /proc/acpi/event,
and press the power button a few times should show event strings)
Re: processor driver is related to the failure
same with "idle=poll"? That will disable the C-state stuff.
The other part is cpufreq use of the processor driver,
which you can disable via CONFIG_CPU_FREQ=n
We can rule out hardware issues as switching back to 2.6.23-rc3 eliminates the problem. This is a purely software regression.
Removing Thinkpad ACPI bits fixes the Bluetooth button, so I think that's an unrelated issue.
idle=poll and CONFIG_CPU_FREQ=n don't help. Turning off CONFIG_ACPI_PROCESSOR seems to basically be disabling my cpufreq setup anyway as it seems to depend on CONFIG_X86_ACPI_CPUFREQ to actually do anything.
Any further thoughts, guys? This is a big impediment to me hacking on new kernels.
Well, I'd run a bisection at this point ...
I did that a month ago and arrived at the patch mentioned in #4.
Does reverting it also fix the s2ram issue?
Ignore the s2ram issue. Message #10 was wrong, so the observation about s2ram was still with a broken ACPI system.
With the patch reverted, the machine is unresponsive to ACPI events. So it's not a real fix. The good kernel responds to ACPI events, properly suspends and resumes, and cpufreqd works. The bad kernel reboots on ACPI events. My test was for rebooting on ACPI events so when it hit the patch that re-enabled them, it appeared to be the culprit.
Perhaps I need to bisect from a different direction.
Since there is about 4m delay to reboot after you unplug AC, we can get some info what is doing in ACPI side.
can you open acpi debug option (cat 0xffffffff > /sys/modules/acpi/parameters/debug_level. And just after AC is unpluged, type 'dmesg > logfile; sync', and attach the logfile here?
This laptop's LCD inverter died in February.
Can you continue to do the test as required in comment #22 on your laptop?
If can, please also try the following test.
a. boot the system with the thermal.nocrt=1
b. kill the process who is using the /proc/acpi/event ( the PID can be obtained by using "lsof /proc/acpi/event")
c. plug/unplug the AC adapter and see whether the system will be rebooted.
Please boot the system normally and attach the output of "cat /proc/acpi/thermal/THM0/* "
> This laptop's LCD inverter died in February.
Please re-open if this laptop becomes available for debugging.