Bug 32862 - acer_wmi partially crashes ACPI/EC (Aspire 8930G)
acer_wmi partially crashes ACPI/EC (Aspire 8930G)
Status: CLOSED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: Platform_x86
All Linux
: P1 high
Assigned To: Carlos Corbacho
:
Depends on:
Blocks: 27352
  Show dependency treegraph
 
Reported: 2011-04-07 17:44 UTC by Hector Martin
Modified: 2011-05-30 08:36 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.38-gentoo-r1
Tree: Mainline
Regression: Yes


Attachments
acpidump (182.64 KB, application/octet-stream)
2011-04-13 07:36 UTC, Hector Martin
Details
acer-wmi-remove-commandline-init.patch (419 bytes, patch)
2011-04-14 04:32 UTC, Lee, Chun-Yi
Details | Diff
acer-wmi-set-u32-when-init.patch (715 bytes, patch)
2011-04-15 06:39 UTC, Lee, Chun-Yi
Details | Diff
0001-acer-wmi-does-not-allow-negative-number-set-to-init.patch (1.46 KB, patch)
2011-04-16 04:05 UTC, Lee, Chun-Yi
Details | Diff
dmidecode log (10.86 KB, text/plain)
2011-05-01 17:42 UTC, Hector Martin
Details
0001-acer-wmi-check-the-existence-of-internal-3G-device.patch (1.21 KB, patch)
2011-05-04 03:54 UTC, Lee, Chun-Yi
Details | Diff

Description Hector Martin 2011-04-07 17:44:53 UTC
laptop model: Acer Aspire 8930G

When the acer-wmi module is loaded, parts of ACPI go haywire. The immediate symptom is that backlight control stops working, logging messages like the following:
kernel: ACPI Error: Method parse/execution failed [\_SB_.PCI0.PEGP.EVGA.LCD_._BCM] (Node ffff88013fc2b820), AE_AML_INFINITE_LOOP (20110112/psparse-536)
kernel: ACPI Error: Evaluating _BCM failed (20110112/video-365)
ACPI: Failed to switch the brightness

I suspect it's waiting for some kind of state from the EC which never arrives.

Not loading acer_wmi solves the issue and brightness control works since that uses the ACPI video driver anyway (however, this might have other negative consequences regarding e.g. rfkill as others have mentioned before).

However, the issue is more severe than this. After shutting down and restarting the computer, or rebooting, the machine hangs during POST with no video and no backlight. This persists even if more reboots are attempted. The only way to clear this state is to disconnect AC power, and remove the battery. This is why I suspect the EC is hanging.

acer-wmi on 2.6.37-gentoo didn't have these issues.
Comment 1 Len Brown 2011-04-12 01:23:44 UTC
can you bisect which commit to the acer_wmi driver causes this regression?
Comment 2 Lee, Chun-Yi 2011-04-13 01:57:54 UTC
Hi Martin, 

Could you please attached the acpidump on this bug? just need:
  acpidump > acpidump.dat

And, 
Could you please try to enable the "EC raw mode" when you load the acer-wmi driver? just need:
 - modprobe acer-wmi ec_raw_mode=1
OR
 - add the following statement to your /etc/modprobe.d
   options acer-wmi ec_raw_mode=1

Please enable the "EC raw mode" then monitor it better or not.
Comment 3 Hector Martin 2011-04-13 07:35:37 UTC
I've tried ec_raw_mode=1 and it doesn't help. In fact, it seems to do nothing, since I get this message:
acer-wmi: No WMID EC raw mode enable method

I'm not sure I'll be able to get a proper chance to bisect the kernel this week; that might have to wait until this weekend or so.

Attaching acpidump.
Comment 4 Hector Martin 2011-04-13 07:36:10 UTC
Created attachment 54232 [details]
acpidump
Comment 5 Lee, Chun-Yi 2011-04-13 08:02:00 UTC
hmm.... that measn(In reply to comment #3)
> I've tried ec_raw_mode=1 and it doesn't help. In fact, it seems to do nothing,
> since I get this message:
> acer-wmi: No WMID EC raw mode enable method
> 

hmm....
that means this issue is not caused by launch manager mode that I added in acer-wmi on 2.6.38 kernel.  

> I'm not sure I'll be able to get a proper chance to bisect the kernel this
> week; that might have to wait until this weekend or so.
> 
> Attaching acpidump.

Thank's for your acpidump file, I will look at it then try to find out root cause.
Comment 6 Lee, Chun-Yi 2011-04-13 08:32:44 UTC
Hi Martin, 

Could you please attached the dmesg or messages when you probe acer-wmi driver?
Did you see the following message in log ?
       Brightness must be controlled by generic video driver
Comment 7 Hector Martin 2011-04-13 08:53:24 UTC
Yes, I get:

acer-wmi: Acer Laptop ACPI-WMI Extras
acer-wmi: Brightness must be controlled by generic video driver

Brightness has been handled by the generic driver for a while, even with acer-wmi loaded (with 2.6.37 too). However, somehow acer-wmi does something that confuses the EC and breaks both BIOS POST on subsequent bootups and the ACPI video brightness handling. So it's probably not related to brightness support in acer-wmi, but rather something else that it does that triggers the issue.
Comment 8 Lee, Chun-Yi 2011-04-13 09:43:19 UTC
There have infinite loop in _BCM when ECMC is 1, I am tracing why the ECMC didn't set to 0:

                        Method (_BCM, 1, NotSerialized)
                        {  
                            P8XH (Zero, 0xF1)
                            Store (Zero, BPRS)
                            Sleep (0x0A)
                            Name (BBCM, Package (0x0A)
                            {  
                                Zero,
                                One,
                                0x02,
                                0x03,
                                0x04,
                                0x05,
                                0x06,
                                0x07,
                                0x08,
                                0x09
                            })
                            P8XH (Zero, 0xF2)
                            Divide (Arg0, 0x0A, Local0, Local2)
                            Store (Subtract (Local2, One), Local3)
                            Store (DerefOf (Index (BBCM, Local3)), Local4)
                            P8XH (Zero, 0xF3)
                            While (ECMC) {}	/* infinite loop */
                            P8XH (Zero, 0xF4)
                            Store (Local4, DAT0)
                            Sleep (0x0A)
                            Store (0x4D, ECMC)
                        }
Comment 9 Lee, Chun-Yi 2011-04-14 04:32:37 UTC
Created attachment 54352 [details]
acer-wmi-remove-commandline-init.patch

Hi Martin, 

Could you please help to try this test patch? I removed acer-wmi try to sync the commandline devices state to EC when initial.

After I traced DSDT and acer-wmi init function, when acer-wmi try to change device state by evaluate WMBA method, it request ECW1 to write data and also set ECMC SystemIO register to non-zero.

The odd thing is ECMC must set back to zero by something, I thought EC code/driver need set ECMC back to 0, but it didn't.

And,
This issue didn't happen before 2.6.38 kernel, I guess that because Matthew fixed the capitalisation of WMID_GUID1 in 2.6.38 kernel, the acer-wmi driver start to do REAL wmi call to DSDT method then fall into the trap.
Comment 10 Lee, Chun-Yi 2011-04-15 06:39:56 UTC
Created attachment 54432 [details]
acer-wmi-set-u32-when-init.patch

A new test patch.
Found set_block also modified device state, markup it for test.
Comment 11 Hector Martin 2011-04-15 15:35:11 UTC
I've tested the second patch. The problem doesn't occur now, brightness control continues to work after loading acer-wmi.
Comment 12 Lee, Chun-Yi 2011-04-16 04:05:55 UTC
Created attachment 54492 [details]
0001-acer-wmi-does-not-allow-negative-number-set-to-init.patch

Martin, 

Thank's for your test, that means we confirm this issue that was caused by Matthew
fixed the capitalisation of WMID_GUID1.

After trace DSDT and acer-wmi initial function, I found acer-wmi feed minus value to EC by wmi method but looks like the AML code in DSDT doesn't check it.
So, I submit this patch to kernel upstream to add the check in acer-wmi driver.

Sorry, I have no Aspire 8930G machine to check.
Please kindly help to try this patch, of course, please reverse my test patch, first.

I thought this patch can fix issue, if NOT, that means EC code/driver doesn't set ECMC flag back to 0 after EC consumed value from wmi. Then we will need EC expert's help for more detail to trace this issue.
Comment 13 Hector Martin 2011-04-20 14:06:18 UTC
Tested the patch, I can confirm it solves the issue. Thanks!
Comment 14 Lee, Chun-Yi 2011-04-21 03:57:33 UTC
(In reply to comment #13)
> Tested the patch, I can confirm it solves the issue. Thanks!

Thank's, will also send this patch to 2.6.38 stable tree after Matthew accepted.

By the way, 
The opensource Kinect driver is really cool. :-)
Comment 15 Rafael J. Wysocki 2011-04-30 20:06:34 UTC
Handled-By : Lee, Chun-Yi <jlee@novell.com>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=54492
Comment 16 Hector Martin 2011-05-01 10:47:02 UTC
Hmm, actually, I just spontaneously experienced the bug again, with the patch. It turns out that not supplying the -1 values on initialization only fixed the most obvious case (that was making the EC lock up *always* as soon as acer-wmi is loaded).

I did some testing and it turns out that making any changes to the threeg file or rfkill3 (which is acer-threeg) will trigger the problem, so in fact the problem was making the threeg call on boot at all, not the -1 value in particular. It's worth noting that this laptop doesn't have a 3G modem at all. So I guess something decided to set the rfkills and that triggered the bug again.
Comment 17 Rafael J. Wysocki 2011-05-01 11:23:56 UTC
Ignore-Patch : https://bugzilla.kernel.org/attachment.cgi?id=54492
Comment 18 Lee, Chun-Yi 2011-05-01 16:55:34 UTC
(In reply to comment #16)
> Hmm, actually, I just spontaneously experienced the bug again, with the patch.
> It turns out that not supplying the -1 values on initialization only fixed the
> most obvious case (that was making the EC lock up *always* as soon as acer-wmi
> is loaded).
> 

I thought "not-allow-negative-number" patch will be better include to next acer-wmi, it can avoid acer-wmi driver send bad number to BIOS/EC.

> I did some testing and it turns out that making any changes to the threeg file
> or rfkill3 (which is acer-threeg) will trigger the problem, so in fact the
> problem was making the threeg call on boot at all, not the -1 value in
> particular. It's worth noting that this laptop doesn't have a 3G modem at all.
> So I guess something decided to set the rfkills and that triggered the bug
> again.

Thank's for your information.

That means we need find out another way to detect there have 3G modem or not, unfortunately I only know one way is check "type aa in dmi" and I already implemented in acer-wmi, looks like Acer 8930G have no "type aa" in dmi.

I thought your machine have wifi and bluetooth, but doesn't have internal 3G module.
Could you please help attach dmi_decode on this bug?
 - dmidecode > dmidecode.log

If you have time, I appreciate you can help:
 - Remove wifi module then dump "dmidecode > dmidecode_without_wifi.log"
 - Remove bluetooth module then dump "dmidecode > dmidecode_without_bt.log"

Then we can compare the above dmidecode log, if we are LUCKY then we can find out another "OEM-specific Type" like "type aa" that is mapping to devices exist state.

Of course we still don't know which bit represent 3G modem, the original acer-wmi design is enable 3G capability by default, I thought we need disable it until we find out a way to detect internal 3G device exist or not.
Comment 19 Hector Martin 2011-05-01 17:42:44 UTC
Created attachment 56092 [details]
dmidecode log

Attached is the dmidecode. I can't remove the Bluetooth module without taking apart the entire laptop, which I'd rather avoid. I did test without the WLAN module, but there was no change in the dmidecode output.
Comment 20 Lee, Chun-Yi 2011-05-02 07:24:42 UTC
hmm.... 

Thank's for Martin's test and log.
That's a bad news for the the dmi information didn't change after removed WLAN module.

I am trying to look at other way to detect the internal 3G module presence, my plan is:
 - check WMID2 result again on my Acer TravelMate 8572 machine.
 - try to poke Acer guys in Taiwan, hope them can share us some information.
Comment 21 Lee, Chun-Yi 2011-05-04 03:54:44 UTC
Created attachment 56422 [details]
0001-acer-wmi-check-the-existence-of-internal-3G-device.patch

OK, 
This patch works to me on my Acer TravelMate 8572 for detect the internal 3G device is exist or not.

Martin, 
Could you please help to test on your Aspire 8930G?

Thank's
Comment 22 Lee, Chun-Yi 2011-05-04 03:55:45 UTC
And, we still need the first patch: 0001-acer-wmi-does-not-allow-negative-number-set-to-init.patch

please don't remove it.
Comment 23 Hector Martin 2011-05-14 22:56:21 UTC
Sorry for taking a while to test, I've been pretty busy for the past few days.

I've tested the latest patch and I can confirm that it correctly disables the 3G functionality on my laptop (I can't test the case where 3G is present since I don't have that). This was applied on top of 0001-acer-wmi-does-not-allow-negative-number-set-to-init.patch since, as Lee says, that's a good idea anyway.
Comment 24 Lee, Chun-Yi 2011-05-19 07:05:55 UTC
Than's(In reply to comment #23)
> Sorry for taking a while to test, I've been pretty busy for the past few days.
> 
> I've tested the latest patch and I can confirm that it correctly disables the
> 3G functionality on my laptop (I can't test the case where 3G is present since
> I don't have that). This was applied on top of
> 0001-acer-wmi-does-not-allow-negative-number-set-to-init.patch since, as Lee
> says, that's a good idea anyway.

Thank's for your testing and confirm the 2 patches works fine on your machine, I will submit those 2 patches to Matthew
Comment 26 Florian Mickler 2011-05-23 17:20:24 UTC
First line should read: 
Patch: https://bugzilla.kernel.org/attachment.cgi?id=54492
Comment 27 Florian Mickler 2011-05-30 07:54:50 UTC
A patch referencing this bug report has been merged in v3.0-rc1:

commit a8d1a266eee5f8b822449fe19d1735189377ef47
Author: Lee, Chun-Yi <joeyli.kernel@gmail.com>
Date:   Sun May 22 07:33:52 2011 +0800

    acer-wmi: check the existence of internal 3G device when set capability
Comment 28 Florian Mickler 2011-05-30 08:36:41 UTC
The other patch is also merged: 

commit c2647b5e99c8ff1b3f535c7c84564cdc53214edf
Author: Lee, Chun-Yi <joeyli.kernel@gmail.com>
Date:   Fri Apr 15 18:42:47 2011 +0800

    acer-wmi: does not allow negative number set to initial device state

Note You need to log in before you can comment on or make changes to this bug.