Bug 78421

Summary: Kernel oops in hp_wmi_get_hw_state on hp EliteBook 6390p
Product: ACPI Reporter: Thomas Richter (thor)
Component: ACPICA-CoreAssignee: Lv Zheng (lv.zheng)
Status: CLOSED UNREPRODUCIBLE    
Severity: blocking CC: Robert.Moore, rui.zhang, tianyu.lan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.12.20 Subsystem:
Regression: No Bisected commit-id:
Attachments: Kernel log from successful boot
[PATCH] ACPICA: Dispatcher: Fix wrong decoding of access_field.
DSDT extracted
The minimized the DSDT
The code I used to build the simulation environment
The hand-cut ASL used to trigger issues
Updated simulator code

Description Thomas Richter 2014-06-19 18:53:29 UTC
Hi folks,

booting an HP EliteBook 6930p with a 3.12.20 kernel seems to result in a kernel oops once in a time. Unfortunately, the kernel oops does not show up in /var/log/messages, I can thus only report the problem from writing down the oops manually.

Apparently, the problem is a double divide-by-zero error generated from within acpi_ex_insert_into_field + 0xc7,called from acpi_ex_write_data_to_field.

From the kernel sources, I suspect that this division by zero is due to the ACPI_ROUND_UP_TO() macro, line 919ff in exfldio.c. It seems that access_bit_width is zero when this happens.

Unfortunately, there is no kernel log at this time. I'm attaching a log from a successful boot (nothing changed, it sometimes works and sometimes crashes!).

Greetings,

Thomas
Comment 1 Thomas Richter 2014-06-19 18:54:21 UTC
Created attachment 140451 [details]
Kernel log from successful boot
Comment 2 Lv Zheng 2014-06-23 05:39:20 UTC
(In reply to Thomas Richter from comment #0)
> From the kernel sources, I suspect that this division by zero is due to the
> ACPI_ROUND_UP_TO() macro, line 919ff in exfldio.c. It seems that
> access_bit_width is zero when this happens.

The "division by zero" is the result, we need to root cause it.
I'm not prepared to offer the code "fixing" non root caused issue because it is likely that the code finally will be proven to be the code "messing up" the existing right ones.

Can you please tell me the the last log entry before the panic one?
So that I can know which boot up step has caused this issue.
Also please upload the acpidump of your machine for me to learn the use case.

Thanks in advance.
Comment 3 Thomas Richter 2014-06-23 06:24:42 UTC
Here is my manual crash dump:

acpi_ex_insert_into_field is called by acpi_x_write_data_to_field, which is called by acpi_ds_init_object_from_op. This is unfortunately as far as my notes go.
Comment 4 Lv Zheng 2014-06-26 07:05:08 UTC
Created attachment 140981 [details]
[PATCH] ACPICA: Dispatcher: Fix wrong decoding of access_field.

There seems to be an issue in the current code.
Could you give it a try?
I didn't have DSDTs encoding N-bytes accessed Fields using AccessField facility.

Also I still need the information:
1. acpidump,
2. the user actions that can trigger this error.
Comment 5 Thomas Richter 2014-06-30 16:09:33 UTC
Comment on attachment 140981 [details]
[PATCH] ACPICA: Dispatcher: Fix wrong decoding of access_field.

Thanks for the patch, I'll give it a try. Unfortunately, I have no idea how to reproduce the bug. It sometimes happens. The only "special" thing I did last time was that I shut down the machine with a muted audio output (thus, nothing special at all).

I'll send you the output of acpidump in a private mail, it's a bit too long.
Comment 6 Lv Zheng 2014-07-01 03:27:03 UTC
Created attachment 141631 [details]
DSDT extracted

Decompiling/compiling the DSDT in the acpidump breaks iasl.
Cc Bob to check this issue first.
Maybe the root cause is just hiding in this issue.
Or this is a known issue.
Comment 7 Thomas Richter 2014-07-21 16:45:21 UTC
More on this. I just had a kernel ooops again, though a different one than last time. To remind you, this is a hp 6930p Elitebook running kernel 3.12.23, with the above patches applied. This time, I took a picture of the ooops, so I can give a better report.

Here we go:

wq_worker_sleeping+0x8/0x90
__schedule+0x383/0x5b0
do_exit+0x672/0x990
oops_end+0x6d/0xa0
no_context+0x249/0x274
wake_up_klogd+0x2f/0x40
__do_page_fault+0xb2/0x4a0
printk+0x4f/0x54
acpi_os_printf+0x43/0x48
page_fault+0x22/0x30
acpi_ns_detach_object+0xe/0x62
acpi_ns_delete_namespace-subtree+0x3d/0x78
acpi_ds_terminate_contrl_method+0x7b/0x114
acpi_ps_parse_aml+0x14e/0x275
acpi_ps_execute_methd+0x1bb/0x26b
acpi_ns_evaulate+0x1b9/0x249
acpi_evaluate_object+0x11d/0x22c
find_guid+0x4f/0x80 [wmi]
wmi_evaluate_method+0x133/0x140 [wmi]
acpi_ut_delete_internal_object_list+0x11/0x29
hp_wmi_perform_query+0xa4/0x190 [hp_wmi]
hp_wmi_get_hw_state+0x2b/0x50 [hp_wmi]
hp_wmi_notify+0xe1/0x2d0 [hp_wmi]
acpi_wmi_notify+0x59/0xc0 [wmi]
__acpi_os_execute+0x9f/0xd0
acpi_ev_notify_dispatch+0x35/0x4e
acpi_os_execute_deferred+0x1d/0x29
process_one_work+0x138/0x3c0
worker_thread+0x116/0x370
manage_workers.isra.29+0x280/0x280
kthread+0xb3/0xc0
kthread_create_on_node+0x110/0x110
ret_from_form+0x7c/0xb0
kthread_create_on_node+0x110/0x110
Code: 00 00 00 00 65 48 8b 04 25 80 b8 00 00 4b 8b 80 9b 02 00 00 4b 8b 40  c8 48 c1 e8 02 83 e0 01 c3 0f 1f 40 00 48 8b 87 98 02 00 00 <48> 8b 40  d8 c3 0f 1f 40 00 48 83 ec ba 08 00 00 00 48 c7 44
Comment 8 Lv Zheng 2014-07-22 00:26:39 UTC
(In reply to Thomas Richter from comment #7)
> More on this. I just had a kernel ooops again, though a different one than
> last time. To remind you, this is a hp 6930p Elitebook running kernel
> 3.12.23, with the above patches applied. This time, I took a picture of the
> ooops, so I can give a better report.
> 
> Here we go:
> 
> wq_worker_sleeping+0x8/0x90
> __schedule+0x383/0x5b0
> do_exit+0x672/0x990
> oops_end+0x6d/0xa0
> no_context+0x249/0x274
> wake_up_klogd+0x2f/0x40
> __do_page_fault+0xb2/0x4a0
> printk+0x4f/0x54
> acpi_os_printf+0x43/0x48
> page_fault+0x22/0x30
> acpi_ns_detach_object+0xe/0x62
> acpi_ns_delete_namespace-subtree+0x3d/0x78

This is dangerous for current ACPICA. :-)
Seems to be known bug.
I'll check your tables.

Thanks and best regards
-Lv

> acpi_ds_terminate_contrl_method+0x7b/0x114
> acpi_ps_parse_aml+0x14e/0x275
> acpi_ps_execute_methd+0x1bb/0x26b
> acpi_ns_evaulate+0x1b9/0x249
> acpi_evaluate_object+0x11d/0x22c
> find_guid+0x4f/0x80 [wmi]
> wmi_evaluate_method+0x133/0x140 [wmi]
> acpi_ut_delete_internal_object_list+0x11/0x29
> hp_wmi_perform_query+0xa4/0x190 [hp_wmi]
> hp_wmi_get_hw_state+0x2b/0x50 [hp_wmi]
> hp_wmi_notify+0xe1/0x2d0 [hp_wmi]
> acpi_wmi_notify+0x59/0xc0 [wmi]
> __acpi_os_execute+0x9f/0xd0
> acpi_ev_notify_dispatch+0x35/0x4e
> acpi_os_execute_deferred+0x1d/0x29
> process_one_work+0x138/0x3c0
> worker_thread+0x116/0x370
> manage_workers.isra.29+0x280/0x280
> kthread+0xb3/0xc0
> kthread_create_on_node+0x110/0x110
> ret_from_form+0x7c/0xb0
> kthread_create_on_node+0x110/0x110
> Code: 00 00 00 00 65 48 8b 04 25 80 b8 00 00 4b 8b 80 9b 02 00 00 4b 8b 40 
> c8 48 c1 e8 02 83 e0 01 c3 0f 1f 40 00 48 8b 87 98 02 00 00 <48> 8b 40  d8
> c3 0f 1f 40 00 48 83 ec ba 08 00 00 00 48 c7 44
Comment 9 Lv Zheng 2014-07-22 08:52:44 UTC
> This is dangerous for current ACPICA. :-)
> Seems to be known bug.
> I'll check your tables.

Sorry, this is not a known bug.
At the end of the WMI evaluation, ACPICA crashed.
I'm trying to figure out the control method name.
Comment 10 Lv Zheng 2014-09-11 06:40:37 UTC
Created attachment 149731 [details]
The minimized the DSDT

Hi,

Sorry for the delayed reply.
It takes time to create a simulation environment for this issue.

Let's check your use case first.

A. Let's find the WMI device first.
The WMI device is PNP0C14, which is WWID ACPI namespace node in your namespace.

B. Let's find the notification value next.
The Notify() opcode invoked for this device include:
Device (WMID)/Method (WGWE)
	Notify (\_SB.WMID, 0x80) // Status Change
X Device (WMID)/Method (WMBA)
	Notify (\_SB.WMID, 0xA0) // Device-Specific
X Device (WMID)/Method (WMAC)
	Notify (\_SB.WMID, 0xA0) // Device-Specific
Device (WMID)/Method (_WED)
	Notify (\_SB.WMID, 0x80) // Status Change
The WMBA/WMAC is not invoked in any decompiled ASL.
So it should an 0x80 notification, which is invoked by WGWE and _WED.

C. Let's check the _WED result next.
The _WED evaluation result is concatenated by WEI,WED, which is passed to the WGWE function.
In the DSDT, there is a facility allows maximum 2 WEI/WED pair to be queued up:
1. For WGWE, it takes 2 arguments, Arg0 is WEI, Arg1 is WED.
If there is no WEI/WED pending, WGWE will store Arg0/Arg1 to WEI1/WED1, otherwise, it stores to WEI2/WED2.
2. In _WED function, this is a standard WMI interface, each time will be evaluated by, it checks if WEI2/WED2 pair is 0, if not, it just triggers notification again.
So all WMI notifications are triggered by WGWE invocations.

3. Let's check the HP WMI notification value last.
The WGWE will be invoked for the following cases:
Method (_L18)
	\_SB.WMID.WGWE (0x01, 0x00) -> HPWMI_DOCK_EVENT
Method (_L02)
	\_SB.WMID.WGWE (0x04, 0x00) -> HPWMI_BEZEL_BUTTON
	\_SB.WMID.WGWE (0x05, 0x00) -> HPWMI_WIRELESS
Method (_WAK) -> Method (\HWAK)
	\_SB.WMID.WGWE (0x05, 0x00) -> HPWMI_WIRELESS
Device (BAT0)/Method (_STA)
	\_SB.WMID.WGWE (0x05, 0x00) -> HPWMI_WIRELESS
Device (BAT0)/Method (_PSR)
	\_SB.WMID.WGWE (0x03, 0x00) -> HPWMI_SMART_ADAPTER
X Device (WMID)/Method (WMAA) -> Method (WHCM)
	\_SB.WMID.WGWE (0x05, 0x00) -> HPWMI_WIRELESS
The HPWMI_xxx is decoded from _WED evaluation result. The WMAA is not invoked in any decompiled ASL.

In your case, I can learn that.
Only for HPWMI_WIRELESS handling code, hp_wmi_get_hw_state and hp_wmi_perform_query will be invoked sequentially by Linux.
So the conclusion is this use case is triggered by switching the wireless function key frequently on your platform.

Please see minimized DSDT attached.
Comment 11 Lv Zheng 2014-09-11 06:44:01 UTC
Created attachment 149741 [details]
The code I used to build the simulation environment

I built a simulation environment for this case.
I triggered _L02 and see what will happen for Linux.
But finally this crash is not reproduced in my simulation environment.

I'm afraid unless you can provide more information, I cannot find the cause according to your limitted description. :-(
Comment 12 Lv Zheng 2014-09-11 06:49:21 UTC
But one existing bug I knew is.

The acpi_install_notify_handler() may have issues, it may not be safe for Linux modules.
And HP_WMI is a module.
So if you insmod/rmmod frequently along with wireless switches, this can be triggered.
If this is exactly your case, I can offer an ACPICA patch to help to protect this.
I'll need more time to review the notify handler implementation.
Comment 13 Thomas Richter 2014-09-11 07:14:49 UTC
If you need more information, please let me know what I could possibly offer. All I can say at this time is that I just observed the crash. Turned on the machine, and got the crash on the screen, no user interactions. The HP has a touch-button to activate/deactivate the wifi, but I did not touch that at all, probably something else in the machine triggered the wifi on/off. It might be that the wifi router became available right at the same moment, i.e. I turned it on just seconds before. I also haven't seen this *particular* crash since quite a while, though the day before yesterday the machine hung again, but in the desktop and I wasn't able to collect any data since it was completely unresponsive.
Comment 14 Lv Zheng 2014-09-14 02:07:04 UTC
There are 3 ways to trigger your case:
The wireless button
The battery status change
The system wakeup
And dock might also be an indirect triggering source.
I suppose you haven't done anything to the sleep and the dock.
So could you try to blacklist the acpi battery drivers by adding boot parameter: modprobe.blacklist=battery modeprobe.blacklist=sbs, and see if the issue can be avoided.
Comment 15 Lv Zheng 2014-09-14 02:13:23 UTC
(In reply to Thomas Richter from comment #13)
> I also haven't seen this
> *particular* crash since quite a while, though the day before yesterday the
> machine hung again, but in the desktop and I wasn't able to collect any data
> since it was completely unresponsive.

Is it possible to collect information through a network console after that?
Comment 16 Lv Zheng 2014-09-14 02:18:57 UTC
(In reply to Thomas Richter from comment #13)
> I also haven't seen this *particular* crash since quite a while

Are you still suffering from the crash in the comment 1 using recent upstream kernel?
Comment 17 Lv Zheng 2014-09-14 02:22:24 UTC
(In reply to Thomas Richter from comment #13)
> The HP has a touch-button to activate/deactivate the wifi, but I did not
> touch
> that at all, probably something else in the machine triggered the wifi
> on/off.

Could you reproduce this issue by frequently switching wifi using this button?
Comment 18 Thomas Richter 2014-09-14 11:13:10 UTC
I recently have not been able to see this particular bug, but that does not mean anything. It is a very sporadic bug. What is probably remarkable is that this machine takes unexpectingly long to create all the entries in /dev (probably sits five seconds idling before resuming the boot process). Whenever the bug appeared, I did not touch or move the machine, i.e. I did not press any buttons neither did I remove it from its dock. I have not tried to trigger this bug by pressing the wifi switch, but will do so and let you know whether that causes an ill effect. I will see whether I can get any output through the netconsole, but it doesn't seem likely (the kernel is hung completely at that point).
Comment 19 Thomas Richter 2014-09-14 11:47:55 UTC
Continuously tapping on the wifi switch during boot did not trigger the kernel hang, wifi just remained off. This said, the laptop bios offers a (rather dreadful) "automatic" wifi switch that turns the wifi off whenever it is in the dock. I turned the switch off in the bios, but wifi is nevertheless off unless I force enable it with "rfkill". This step, however, happens very late in the boot process, whereas the mentioned kernel hang happens whenever the the system "waits for /dev to be fully populated", i.e. long long before I force-enable the wifi.
Comment 20 Lv Zheng 2014-09-14 14:16:13 UTC
(In reply to Thomas Richter from comment #19)
> I recently have not been able to see this particular bug.

If it is a battery bug (see comment below), it is possible that the bug has been fixed in the upstream kernel.

> whereas the mentioned kernel hang happens whenever the the system "waits for
> /dev to be fully populated".

This can be caused by any driver issues. If you mean the same "kernel hang" as the one shown in the comment 7, my investigation result is the oops is __only possible__ when the following control methods are evaluated:

Method (_L02)
	\_SB.WMID.WGWE (0x05, 0x00) -> HPWMI_WIRELESS
Method (_WAK) -> Method (\HWAK)
	\_SB.WMID.WGWE (0x05, 0x00) -> HPWMI_WIRELESS
Device (BAT0)/Method (_STA)
	\_SB.WMID.WGWE (0x05, 0x00) -> HPWMI_WIRELESS

If the "kernel hang of comment 7" happens for the bus/driver probing, then I guess this is a battery driver bug because during this period, _L02 won't be invoked (according to your comment, no wifi switch will be touched) and _WAK won't be invoked (this happens during the boot time, not resuming). I'll try to run the BAT0._STA method in the simulation environment to see if it can be reproduced here.
Have you tried to blacklist the battery driver?

Hmm... If you still have the picture, would you mind to upload it?

Thanks and best regards
-Lv
Comment 21 Lv Zheng 2014-09-14 14:25:20 UTC
One more thing I can do for you is:
Besides the 3 triggering source above, the _WED itself is also a triggering source:
Device (WMID)/Method (_WED)
	Notify (\_SB.WMID, 0x80) // Status Change
The side effect of the _WED implementation in this ACPI namespace need to be evaluated.
The _WED itself can trigger another "queued" notification (see comment 10, list item C).
I'll also discuss this implementation with the others to see if our interpreter architecture is safe with such kind of _WED implementation.

Thanks and best regards
-Lv
Comment 22 Lv Zheng 2014-09-17 05:55:10 UTC
Hi, Thomas

I was requested to split this thread into 2 bug reports.
I just changed the title to strap this thread around the 2nd panic log.
If you still can see a kernel hang related to the 1st panic log, please open a brand new bug for it.
Comment 23 Thomas Richter 2014-09-17 08:53:34 UTC
Thanks, I let you know as soon as the machine starts to hang again. Right now I have not yet been able to reproduce the problem again. Is there anything I can do in particular with the battery to trigger the problem (as in: removing it at the wrong time and then see what happens?)
Comment 24 Lv Zheng 2014-09-18 00:10:59 UTC
It seems you are using a 32-bit kernel. Right?

Then I guess these lines are wrong:
int query = 0x6e;
int query = BIT(r + 8) | ((!blocked) << r);
int wireless = 0;
int state = 0;
...

Let me dig deeper.
Comment 25 Lv Zheng 2014-09-18 10:27:37 UTC
I did more investigations around WMI and ACPI.
http://msdn.microsoft.com/en-us/library/windows/hardware/Dn614028(v=vs.85).aspx

Let me prove that I have nothing to do with this bug.

The hp-wmi and wmi module in the Linux kernel will:

During PNP0C14 (\_SB.WWID) probing, its \_SB.WWID._WDG method is evaluated to obtain GUID to a 2 character ID mapping. The ID is used to form a control method name to be evaluated to obtain WMI data block or handle WMI event.
For your platform, the result is:
  * 5FB7F034-2C63-45E9-BE91-3D44E2C707E4 - AA
  * 95F24279-4D7B-4334-9387-ACCDC67EF61C - 80 E
    2B814318-4BE8-4707-9D84-A190A859B5D0 - A0 E
    05901221-D566-11D1-B2F0-00A0C9062910 - AB
    1F4C91EB-DC5C-460B-951D-C7CB9B4B8D5E - BA
    2D114B49-2DFB-4130-B8FE-4A3C09E75133 - BC
    988D08E3-68F4-4C35-AF3E-6A1B8106F83C - BD
    14EA9746-CE1F-4098-A0E0-7045CB4DA745 - BE
    322F2028-0F84-4901-988E-015176049E2D - BF
    8232DE3D-663D-4327-A8F4-E293ADB9BF05 - BG
    8F1F6436-9F42-42C8-BADC-0E9424F20C9A - BH
    8F1F6435-9F42-42C8-BADC-0E9424F20C9A - BI
    7391A661-223A-47DB-A77A-7BE84C60822D - AC
    DF4E63B6-3BBC-4858-9737-C74F82F821F3 - BJ
Among the above GUIDs, hp-wmi will use the following 2 GUIDs:
  95F24279-4D7B-4334-9387-ACCDC67EF61C HPWMI_EVENT_GUID
  5FB7F034-2C63-45e9-BE91-3D44E2C707E4 HPWMI_BIOS_GUID
They are marked in the above list.

For HPWMI_EVENT_GUID, 80 is the "notify_id", hp-wmi registers an ACPI notify handler to handle Notify (\_SB.WWID, 0x80) notification. When registering, if \_SB.WWID.WE80 exists, it will be evaluated by OSPM to indicate the capability of handling 0x80 to the BIOS. In this case, WE80 doesn't exist.

For HPWMI_BIOS_GUID, AA is the "object_id", hp-wmi evaluates \_SB.WWID.WMAA in hp_wmi_perform_query(), for which you've seen the kernel hang.

In the decompiled ASL, we can see that only the following control methods will invoke the notification of 0x80:
  \_SB.WWID.WGWE
    Notify (\_SB.WMID, 0x80) // Status Change
  \_SB.WMID._WED
    Notify (\_SB.WMID, 0x80) // Status Change

According to our previous investigation, notification of 0x80 happening in the \_SB.WWID._WED is just for the queued up notification passed from \_SB.WWID.WGWE to allow maximum 2 notifications to be queued up. Thus the "\_SB.WWID.WGWE" is the only entry point for the notification of 0x80. This function takes 2 parameters, Arg0 is WEIx (event ID), Arg1 is WEDx (event data).

When Notify (\_SB.WWID, 0x80) is executed, the hp_wmi_notify() will be invoked, in this function, \_SB.WWID._WED will be invoked before launching the WMI event to achieve additional data that belongs to this event (event ID and event data) and \_SB.WWID.WMAA will be invoked to request the WMI data block.

Noticed that the hp_wmi_perform_query() is invoked from hp_wmi_get_hw_state(), we can learn that the event ID is HPWMI_WIRELESS(0x05) because hp_wmi_get_hw_state() is only invoked for HPWMI_WIRELESS event. This means \_SB.WWID.WGWE (WEI=0x05, WED=?) can be the entry point to trigger the notification of your case.

In the decompiled ASL, we can see only the following control methods will invoke WGWE with WEI=0x05:
  \_WAK -> \HWAK
    \_SB.WMID.WGWE (0x05, 0x00)
  \_SB.WMID.WMAA -> \_SB.WMID.WHCM
    \_SB.WMID.WGWE (0x05, 0x00)
  \_GPE._L02
    \_SB.WMID.WGWE (0x05, 0x00)
  \_SB.WWID.BAT0._STA
    \_SB.WMID.WGWE (0x05, 0x00)
Let's check them 1 by 1.

For \_WAK, which is invoked by Linux during system wakeup, which is obviously not your case. So let's ignore \_WAK as the triggering source of your case.

For \_SB.WMID.WMAA, this control method will be evaluated by hp_wmi_perform_query() as mentioned above. For hp_wmi_get_sw_state(), it is invoked as: hp_wmi_perform_query(HPWMI_WIRELESS_QUERY, 0, &wireless, sizeof(wireless), sizeof(wireless)), the parameters will be passed to \_SB.WWID.WMAA as Arg2 Buffer object formatted as "struct bios_args":
  struct bios_args {
    u32 signature;
    u32 command;
    u32 commandtype;
    u32 datasize;
    u32 data;
  };
  args.signature = 0x55434553;
  args.command = write (0) ? 0x2 : 0x1;
  args.commandtype = query (HPWMI_WIRELESS_QUERY);
  args.datasize = sizeof (int wireless);
  args.data = &wireless;
The Arg2 of \_SB.WWID.WMAA will be passed to \_SB.WWID.WHCM as Arg1:
  Method (WMAA, 3, NotSerialized)
  {
    Store ("WMAA Enter", Debug)
    Return (WHCM (Arg1, Arg2))
  }
In \_SB.WWID.WHCM, there are the following code:
  CreateDWordField (Arg1, 0x00, SNIN)
  CreateDWordField (Arg1, 0x04, COMD)
  CreateDWordField (Arg1, 0x08, CMTP)
  CreateDWordField (Arg1, 0x0C, DASI)
  Store (DASI, Local5)
  CreateField (Arg1, 0x00, Multiply (Add (Local5, 0x10), 0x08), 
               DAIN)
So the bios_args structure is exactly mapped in the ASL as:
  SNIN - signature   - 0x55434554
  COMD - command     - 0x01
  CMTP - commandtype - 0x05
  DASI - datasize    - 4
  DAIN - data        - wireless
So "SNIN=0x55434553, COMD=0x01, CMTP=0x05". We noticed that the \_SB.WMID.WGWE (0x05, 0x00) invoked in \_SB.WWID.WMAA is under the condition of "SNIN=0x55434553, COMD=0x02, CMTP=0x05". This condition can only be triggered by hp_wmi_perform_query(HPWMI_WIRELESS_QUERY, 1, ...) which is not your case. So let's ignore \_SB.WMID.WMAA as the triggering source of your case. Then we cut the unrelated code off the \_SB.WWID.WHCM to make it simpler, containing only the code for your case.

For \_GPE._L02, this control method will be evaluated when GPE02 is arrived.
  Method (_L02, 0, NotSerialized)
  {
    Store (0x00, GPEC)
    Store (SSCI, Local0)
    If (Local0)
    {
      Store (0x00, SSCI)
      If (LEqual (Local0, 0x01))
      {
        ...
      }
      If (LAnd (LGreaterEqual (Local0, 0x04),
                LLessEqual (Local0, 0x05)))
      {
        \_SB.WMID.WGWE (Local0, 0x00)
      }
      If (LEqual (Local0, 0x07))
      {
        ...
      }
      If (LEqual (Local0, 0x03))
      {
        ...
      }
      If (LEqual (Local0, 0x02))
      {
        ...
      }
    }
  }
This GPE seems to be used for HP front bezel, and 0x05 seems to be the wireless switch. In order to trigger HPWMI_WIRELESS notification, the value read from \SSCI must be 0x05. I included this \_GPE._L02 in the hand-cut ASL with modification made to \SSCI hardware access by always setting it to 0x05 to manually trigger HPWMI_WIRELESS notification. Real hardware access to \SSCI will not trigger kernel hang like your case, so we are safe to do this.

For \_SB.BAT0._STA, this function checks battery status from EC, the last status is stored in BT0P. Each time the battery status is changed, the \_SB.WMID.WGWE (0x05, 0x00) is invoked.
  Method (_STA, 0, NotSerialized)
  {
    Store (\_SB.PCI0.LPCB.EC0.BSTA (0x01), Local0)
    If (XOr (BT0P, Local0))
    {
      Store (Local0, BT0P)
      Store (Local0, Local1)
      If (LNotEqual (Local1, 0x1F))
      {
        Store (0x00, Local1)
      }
      \_SB.SSMI (0xEA3A, 0x00, Local1, 0x00, 0x00)
      \_SB.WMID.WGWE (0x05, 0x00)
    }
    Return (Local0)
  }
In order to trigger HPWMI_WIRELESS notification, the returning value \_SB.PCI0.LPCB.EC0.BSTA must be changed each time it is evaluated. I included \_SB.BAT0._STA in the hand-cut ASL with modification made to \_SB.PCI0.LPCB.EC0.BSTA to avoid EC hardware accesses and the returning value of \_SB.PCI0.LPCB.EC0.BSTA will always switch between 0x0F and 0x1F to manually trigger HPWMI_WIRELESS notification. Real EC accesses will not trigger kernel hang like your case, so we are safe to do this.

Now I can execute \_GPE._L02 or execute \_SB.BAT0._STA in the simulation environment to reproduce your issue using the hand-cut ASL. I'll upload it later.
The code path is:
  ASL: \_GPE._L02 or \_SB.BAT0._STA
    ASL: \_SB.WMID.WGWE (0x05, 0x00)
      ASL: Notify (\_SB.WWID, 0x80)
        Linux: hp_wmi_notify(HPWMI_WIRELESS)
          Linux: wmi_get_event_data()
            ASL: \_SB.WWID._WED
          Linux: wmi_evaluate_method
            ASL: \_SB.WWID.WMAA
            Linux: Kernel hang here

Then let's see what has happened in \_SB.WMID.WMAA and is it possible to result in a kernel hang by evaluating \_SB.WMID.WMAA. The \_SB.WWID.WMAA invokes \_SB.WWID.WHCM.

This function first converts bios_args into SNIN/COMD/CMTP/DASI/DAIN as mentioned above.

This function also prepares bios_return into Local1:
  struct bios_return {
    u32 sigpass;
    u32 return_code;
  };
And this is the 8 bytes header, output data is following this header.
  If (LEqual (Arg0, 0x03))
  {
    Store (0x80, Local0)
  }
  Store (Buffer (Add (0x08, Local0)) {}, Local1)
  CreateDWordField (Local1, 0x00, SNOU)
  CreateDWordField (Local1, 0x04, RTCD)
Note that the output data size 0x80 is determined by the method ID (0x03). Method ID is 0x03 because Arg0 is actually the Arg1 passed to \_SB.WWID.WMAA which is the 3rd parameter wmi_evaluate_method(HPWMI_BIOS_GUID, 0, 0x3, ...):
  params[1].integer.value = method_id;

Then this function performs WMI query \_SB.WGWS for (SNIN=0x5543553, COMD=0x01, CMTP=0x05):
  If (LEqual (SNIN, 0x55434553))
  {
    Store (0x03, RTCD)
    If (LEqual (COMD, 0x01))
    {
      Store (0x04, RTCD)
      If (LEqual (CMTP, 0x05))
      {
        Store (^WGWS (), Local2)
        Store (0x00, RTCD)
      }
    }
  }

In \_SB.WGWS, we can learn things are proceeded using \ASMB, \EAX, \EBX, \ECX, \EDX. In \_SB.SSMI, and \_SB.WWID.WHCM:
  Store (DAIN, ASMB)
  ShiftLeft (Arg0, 0x10, EAX)
  Store (Arg1, EBX)
  Store (Arg2, ECX)
  Store (Arg3, EDX)
  Store (0x00, REFS)
Then the ASL code invokes \_SB.PCI0.GSWS to wait this to be completed. After it is completed, \EBX will contain the return code and \ECX will contain the length of the output data and the data can be retried from reading \ASMB. The return code/output data size and the output data will be copied to the Local1 and returned to the OSPM. The bios_return is formatted as:
  SNOU - sigpass     - 0x53534150
  RTCD - return_code - \EBX
  DAOU - data        - \ECX size data from ASMB

So in the simulation environment, all “hardware accesses” are simulated and no crash can be found that far.

I was thinking there would be bugs for the following output data copying code:
  Store (0x00, Local5)
  While (LLess (Local5, Local3))
  {
    Store (DerefOf (Index (DerefOf (Index (Local2, 0x02)), Local5)), 
            Index (Local1, Add (Local5, 0x08)))
    Increment (Local5)
  }
But this doesn't trigger anything wrong in the simulation environment. Whether or not the input/output data size has exceeded the \ASMB size should have been protected by the ACPICA region access code. It will trigger exception when the size exceeded the size of the region or the buffer.

I was thinking there might be bugs in hp_wmi_perform_query(), accessing the input/output data could exceed the sizeof bios_args.data:
  memcpy(&args.data, buffer, insize);
And I found all input data size is either 4 or 0:
  Function: CMTP COMD DASI
  hp_wmi_display_state: HPWMI_DISPLAY_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_hddtemp_state: HPWMI_HDDTEMP_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_als_state: HPWMI_ALS_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_dock_state: HPWMI_HARDWARE_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_tablet_state: HPWMI_HARDWARE_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_bios_2009_later: HPWMI_FEATURE_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_set_block: HPWMI_WIRELESS_QUERY 2 sizeof(int) 0
  hp_wmi_get_sw_state: HPWMI_WIRELESS_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_get_hw_state: HPWMI_WIRELESS_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_rfkill2_set_block: HPWMI_WIRELESS2_QUERY 2 size(char [4]) 0
  hp_wmi_rfkill2_refresh: HPWMI_WIRELESS2_QUERY 1 0 sizeof(struct bios_rfkill2_state)
  hp_wmi_post_code_state: HPWMI_POSTCODEERROR_QUERY 1 sizeof(int) sizeof(int)
  set_als: HPWMI_ALS_QUERY 2 sizeof(u32) sizeof(u32)
  set_postcode: HPWMI_POSTCODEERROR_QUERY 2 sizeof(u32) sizeof(u32)
  hp_wmi_notify: HPWMI_HOTKEY_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_rfkill_setup: HPWMI_WIRELESS_QUERY 1 sizeof(int) sizeof(int)
  hp_wmi_rfkill2_setup: HPWMI_WIRELESS2_QUERY 1 0 sizeof(struct bios_rfkill2_state)
And the output data size is carefully checked:
  actual_outsize = min(outsize, (int)(obj->Buffer.Length - sizeof(*bios_return)));
  memcpy(buffer, (u8 *)obj->Buffer.Pointer + sizeof(*bios_return), actual_outsize);
  memset((u8 *)buffer + actual_outsize, 0, outsize - actual_outsize);
Comment 26 Lv Zheng 2014-09-18 10:33:34 UTC
(In reply to Lv Zheng from comment #24)
> It seems you are using a 32-bit kernel. Right?
> 
> Then I guess these lines are wrong:
> int query = 0x6e;
> int query = BIT(r + 8) | ((!blocked) << r);
> int wireless = 0;
> int state = 0;
> ...
> 

This doesn't seem to be a problem across 32-bit/64-bit kernel as 64bit Linux kernel build is LP64 compliant, sizeof(int) should always be 4.
Comment 27 Lv Zheng 2014-09-18 10:36:56 UTC
The other thing we cannot simulate is the parallelism.
But this case is either executed in the single threaded Notify work queue, or protected by the Mutex (MSMI, 0x00), also the nature of Named object creation of these control method will result in a serialized execution. So there is no race between 2 WMI queries.

I really couldn't see any possibilities of such crash by now. So I'm going to close this bug. If you can reproduce it and provide more information. Please feel free to reopen it.
Comment 28 Lv Zheng 2014-09-18 10:39:17 UTC
Created attachment 150781 [details]
The hand-cut ASL used to trigger issues

The updated ASL, including \_GPE._L02 and \_SB.BAT0._STA and the code related to the HPWMI_WIRELESS event.
All hardware code is modified to trigger the possible crash code path.
Comment 29 Lv Zheng 2014-09-18 10:42:06 UTC
Created attachment 150791 [details]
Updated simulator code

Apply this code on top of the recent ACPICA.

make iasl
generate/unix/bin/iasl -tc <the hand-cut ASL>
make acpiexec
generate/unix/bin/acpiexec -es <the compiled hand-cut AML>

In the interactive mode, type:
ex \_GPE._L02
ex \_SB.BAT0._STA
can trigger the 2 cases in the simulation environment.
Comment 30 Lv Zheng 2014-09-18 10:42:58 UTC
Close due to insufficient data.
Feel free to reopen it when you have more useful information.
Comment 31 Thomas Richter 2014-09-18 11:30:55 UTC
(In reply to Lv Zheng from comment #24)
> It seems you are using a 32-bit kernel. Right?
> 

I afraid not. This is a 64 bit kernel.
Comment 32 Lv Zheng 2014-09-18 13:01:06 UTC
(In reply to Thomas Richter from comment #31)
> (In reply to Lv Zheng from comment #24)
> > It seems you are using a 32-bit kernel. Right?
> > 
> 
> I afraid not. This is a 64 bit kernel.

I was thinking so because of this line:
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
Which is different from my 64-bit kernel.

But this doesn't matter any more.
No invalid memory access can be trigger in this case even in 32-bit kernel...

So let's close it for now.
If you find better debugging material, please get back and re-open it.