Bug 13155

Summary: random freeze when EC interrupts are enabled
Product: ACPI Reporter: Vitus Jensen (vjensen)
Component: ECAssignee: Zhang Rui (rui.zhang)
Status: CLOSED INSUFFICIENT_DATA    
Severity: normal CC: lenb, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30-rc3 Subsystem:
Regression: No Bisected commit-id:
Attachments: bootlog from unmodified 2.6.29-gentoo-r1
patch to prohibit EC interrupts
2.6.30-gentoo-r5, powertop output
2.6.30-gentoo-r5, dmesg
2 dumps of /proc/interrupts, 10 seconds interval
linux-acpi, 2.6.33-rc2, dmesg
hack GPEs away
A port of the Embedded Controller driver v2.0 to 2.6.36

Description Vitus Jensen 2009-04-23 08:33:21 UTC
Latest working kernel version: 2.6.23-gentoo-r6
Earliest failing kernel version: 2.6.27
Hardware Environment: Thinkpad R51e, 1843-6NG, Pentium-M 1.73
Software Environment: gentoo stable

Since the removal of ec_intr=0 EC interrupt mode freezes the system at random points, usually with the harddisk led on.  Usually after 1-5 minutes of work.  Can be fixed when not using EC (without acpi=rsdt) or by hacking the EC interrupt away.  There is no output on the console stating GPE storms (even with ACPI_EC_STORM_THRESHOLD = 1), so this automatism can't detect the situation.
Comment 1 Vitus Jensen 2009-04-23 08:40:53 UTC
Created attachment 21091 [details]
bootlog from unmodified 2.6.29-gentoo-r1

bootlog showing EC interrupts getting enabled during boot
Comment 2 Vitus Jensen 2009-04-23 08:55:48 UTC
Created attachment 21092 [details]
patch to prohibit EC interrupts

A minimal patch against 2.6.30-rc3 to remove the freezes.  Instead of disabling EC interrupts it just sets EC_FLAGS_GPE_STORM.
Comment 3 Alexey Starikovskiy 2009-04-27 07:58:20 UTC
Please check if last patch from #12949 changes situation.
Comment 4 Vitus Jensen 2009-05-03 20:16:57 UTC
I tried to apply and edit http://bugzilla.kernel.org/attachment.cgi?id=21105 to 2.6.29-rc3 but failed.  Using 2.6.30-rc3 as base I just had to remove 2 set_flags() lines and got a bootable kernel but it still shows freezes after some time.

The patched kernel shows some 2000 occurence of acpi interrupts in /proc/interrupts but I'm in doubt whether to report more details because the patch wasn't applied cleany.  Please advise.
Comment 5 Len Brown 2009-08-13 03:15:04 UTC
closed due to inactivity for 3 months
please re-open if this is still a problem in 2.6.30.stable
Comment 6 Vitus Jensen 2009-09-01 06:25:48 UTC
Created attachment 22945 [details]
2.6.30-gentoo-r5, powertop output

Running 2.6.30-gentoo-r5 the machine no longer freezes.  At least not during the last 30 minutes of browsing and kernel compile.  But it produces ~80000 ints/s prohibiting any means of powersave.

ACPI events (AC on/off), sleepbutton, suspend/resume works.
Comment 7 Vitus Jensen 2009-09-01 06:26:46 UTC
Created attachment 22946 [details]
2.6.30-gentoo-r5, dmesg
Comment 8 Vitus Jensen 2009-09-02 09:16:22 UTC
Created attachment 22974 [details]
2 dumps of /proc/interrupts, 10 seconds interval

OK, so it's not 80000 ints/s but 80000 wakeups per second.  In 2.6.27 (wireless-testing plus ec_intr patch) it's about 40/s.
Comment 9 Vitus Jensen 2009-09-02 09:19:02 UTC
Tried commit 8aeb0a352af7eb26863e53c203eeb852fd4590c3 from the acpi-test branch at kernel.org but this shows the very same picture: acpi events work, no freezes so far but around 80000 wakeups/s.

Is this still an related issue to the freezes?  Or should I create a new bugzilla entry?
Comment 10 Alexey Starikovskiy 2009-09-02 10:40:11 UTC
Vitus,
From comment #8, you don't have any acpi interrupts during the period, and overall 1200 interrupts is quite low.
Comment 11 Vitus Jensen 2009-10-01 05:12:55 UTC
Updating git repository to 2.6.31acpi-ge56d953 fixed the wakeups, only ~50/s now (20% acpi interrupts).  There are some issues with that kernel but EC isn't one of them.  So close this bug.
Comment 12 Vitus Jensen 2010-01-01 13:11:03 UTC
Because I needed a current kernel I tried current gentoo's 2.6.32-gentoo-r1 and sadly found the same freezes as always.  Retried it with tag "v2.6.32" from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git and 2.6.33-rc2 from linux-acpi-2.6 (commit 1201b2a9bec0413188ada1443ece1a52da6dbff4) and the thinkpad froze after seconds (boot to console, emerge -1 some-ebuild) or hours (rsync to webdav).  As I can't reproduce the freeze at will I assume that it was never fixed.  Newer kernels don't reach C3 or generate a lot of wakeups so for real work I'm staying at my hacked 2.6.27.

I was happy to catch the freeze both on linux-2.6.git and linux-acpi-2.6 with DEBUG enabled in ec.c and while watching the debug output.  There is nothing special to notice :-(  The last line may be "ACPI: EC: transaction end" or some message from cron or my sudo shell.  Then all of a sudden the hard disk LED stays on, the keyboard doesn't react anymore, dead.

My next step would be to hack the EC interrupts away.  But as I can't use those kernels for work anyway is there something easier to try?  Are you at all interested in the non-interrupt way?
Comment 13 Vitus Jensen 2010-01-01 13:13:45 UTC
Created attachment 24396 [details]
linux-acpi, 2.6.33-rc2, dmesg

output from 2.6.33-rc2.  DEBUG enabled in drivers/acpi/ec.c
Comment 14 Zhang Rui 2010-10-22 03:04:54 UTC
does the problem still exist in the latest upstream kernel? say 2.6.35 or 2.6.36-rc
Comment 15 Vitus Jensen 2010-10-22 05:42:12 UTC
(In reply to comment #14)
> does the problem still exist in the latest upstream kernel? say 2.6.35 or
> 2.6.36-rc

I pretty much gave up on this thinkpad and stayed at my patched 2.6.27-wireless-testing kernel.

There is a opportunity to test other kernels tomorrow this time, is there any easy hack to disable EC interrupts in 2.6.35?  Just in case the hangs are still there, to validate EC intr is the culprit?
Comment 16 Vitus Jensen 2010-10-23 21:07:03 UTC
Testing tag 2.6.36 from kernel.org: no freezes in 3 hours work (compile, browse, suspend to RAM).  Seems fixed.

This kernel only seldomly reaches C3, so for real work I will continue on 2.6.27 but the freeze is gone :-)
Comment 17 Zhang Rui 2010-10-25 00:15:18 UTC
good news. As Alexey is not longer working on ACPI EC driver, I'll close it as CODE_FIXED. Please feel free to re-open it if you can reproduce the problem in the latest git kernel.
Comment 18 Vitus Jensen 2010-10-30 13:12:13 UTC
OK, I reopen it.  When using the machine the next time I had the usual freeze while hacking away in emacs.  No power connected, no wlan, resumed and edited for around 1 hour: harddisk LED on, total freeze.

To try something I hacked GPEs away as in the simple patch attached (not much time, no internet).  I still have no idea what triggers the freeze so I compiled kernel for several hours: no freeze.  Ported ec.c from v2.6.27 (Embedded Controller Driver v2.0) to 2.6.36 today and did some surfing, always with the simple patch: no freeze.  I'm now using Embedded Controller Driver v2.0 and will continue so unless you advise differently.
Comment 19 Vitus Jensen 2010-10-30 13:15:05 UTC
Created attachment 35512 [details]
hack GPEs away

This is just to remove GPE from 2.6.36.  It triggers double suspends on the R51e so it's probably not a great idea :D
Comment 20 Vitus Jensen 2011-01-26 22:20:01 UTC
Created attachment 45222 [details]
A port of the Embedded Controller driver v2.0 to 2.6.36

This is the stable solution to the freeze problem as mentioned in comment #18: a port of the Embedded Controller Driver (ec.c) from 2.6.27 with an additional hack to disable the EC interrupt after it's first occurence as ec_intr=0 wasn't available in that version (see #if 0 part).

I'm running 2.6.36 with this patch (and CONFIG_HZ_100=y) since 30th october 2010.  Battery rundowns, switching from AC to battery and vice versa, suspend to ram etc.  Very stable, not a single freeze or anything unusual :-)
Comment 21 Zhang Rui 2011-03-21 07:38:55 UTC
does this problem still exist in the latest upstream kernel?
Comment 22 Vitus Jensen 2011-03-30 04:46:36 UTC
Are you refering to the ACPICA changes merged into v2.6.38?  I will try that version, either tomorrow or saturday.
Comment 23 Vitus Jensen 2011-04-03 13:50:39 UTC
Updated the 2.6.36 configuration to 2.6.38, installed kernel and modules, rebooted.

My machine automatically only boots into a text system and instead of starting X11 I tried to re-install the thinkpad modules.  But "emerge -1 tp_smapi" freezed the machine while still scanning dependencies: harddisk LED on, no reaction to keyboard.  As usual.  So yes, the problem still exists.

Is there again a possibilities for users to disable EC interrupts?  Or some other thing I could try?
Comment 24 Zhang Rui 2012-01-18 01:46:36 UTC
It's great that kernel bugzilla is back.

can you please verify if the problem still exists in the latest upstream kernel?
if the problem still exists, can you please re-describe the symptom in the latest kernel?
Comment 25 Zhang Rui 2012-05-24 07:18:19 UTC
bug closed as there is no response from the bug reporter.
Comment 26 Vitus Jensen 2012-11-11 01:31:42 UTC
The last kernel I used was 3.4 with halt=mwait parameter.  This combination did not require any change on the EC driver.

I sold the laptop now because of wlan problems, high power consumption and not really needing it.