Bug 70891 - 6 second boot delay if processing SSDT tables
Summary: 6 second boot delay if processing SSDT tables
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Lv Zheng
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-20 06:01 UTC by Lv Zheng
Modified: 2016-12-31 06:37 UTC (History)
7 users (show)

See Also:
Kernel Version: 3.12
Tree: Mainline
Regression: No


Attachments
acpidump output (423.35 KB, application/octet-stream)
2014-02-20 06:35 UTC, Gareth Williams
Details
bugzilla.kernel.org 70891 debugging for initialization processes. (1.69 KB, patch)
2014-02-24 03:24 UTC, Lv Zheng
Details | Diff
ACPICA: Remove indent divergences to reduce the difficulties of maintaining the revised ACPICA commits. (3.30 KB, patch)
2014-02-24 04:40 UTC, Lv Zheng
Details | Diff
ACPICA: Tables: Fix the issues in handling virtual addressed tables. (3.87 KB, patch)
2014-02-24 04:42 UTC, Lv Zheng
Details | Diff
ACPICA: Tables: Fix the issue that multiple ACPI_FREE()s are invoked for same pointer around acpi_tb_add_table(). (2.80 KB, patch)
2014-02-24 04:44 UTC, Lv Zheng
Details | Diff
ACPICA: Dispatcher: Add method execution instrumentation support. (1.48 KB, patch)
2014-02-28 07:14 UTC, Lv Zheng
Details | Diff
Dmesg with instrumentation patch (178.67 KB, text/plain)
2014-02-28 09:50 UTC, Gareth Williams
Details
ACPICA: Dispatcher: Add control method tracing once support. (3.98 KB, patch)
2014-03-07 02:43 UTC, Lv Zheng
Details | Diff
ACPICA: Events: Add EC access instrumentation support. (2.61 KB, patch)
2014-03-07 02:44 UTC, Lv Zheng
Details | Diff
ACPICA: Executer: Add Sleep/Stall opcode instrumentation support. (2.20 KB, patch)
2014-03-07 02:44 UTC, Lv Zheng
Details | Diff
ACPICA: Dispatcher: Add control method tracing once support. (4.23 KB, patch)
2014-03-08 01:27 UTC, Lv Zheng
Details | Diff
Dmesg with _SB_.PCI0.LPCB.EC0_._REG Debug (173.41 KB, text/plain)
2014-03-08 14:19 UTC, Gareth Williams
Details
My DMESG output (60.84 KB, text/plain)
2014-03-09 20:24 UTC, Lv Zheng
Details
ACPICA: Workaround to match method where backslash cannot be specified by the bootloader. (1.00 KB, patch)
2014-03-09 20:25 UTC, Lv Zheng
Details | Diff
dmesg with acpi_trace_once=_SB_.PCI0.LPCB.EC0_._REG (162.45 KB, text/plain)
2014-03-09 21:19 UTC, Gareth Williams
Details
ACPI: Enable EC driver debugging support. (631 bytes, patch)
2014-03-10 02:18 UTC, Lv Zheng
Details | Diff
Dmesg with EC logging (773.38 KB, text/plain)
2014-03-10 21:11 UTC, Gareth Williams
Details
HP EliteBook 840 G1 dmidecode (10.63 KB, text/plain)
2014-05-27 06:32 UTC, Steffen Weber
Details
dmidecode output from HP Pavilion (14.65 KB, text/plain)
2014-05-27 06:37 UTC, Gareth Williams
Details
[PATCH 1/8] ACPICA: Events: Fix edge-triggered GPE by disabling before acknowledging it. (2.48 KB, patch)
2014-06-10 03:09 UTC, Lv Zheng
Details | Diff
[PATCH 2/8] ACPI: Update interrupt handler to honor new ACPI_REENABLE_GPE bit. (1.04 KB, patch)
2014-06-10 03:14 UTC, Lv Zheng
Details | Diff
[PATCH 3/8] ACPI/EC: Add detailed fields debugging support of EC_SC(R). (2.20 KB, patch)
2014-06-10 03:16 UTC, Lv Zheng
Details | Diff
[PATCH 4/8] ACPI/EC: Add asynchronous command byte write support. (6.73 KB, patch)
2014-06-10 03:18 UTC, Lv Zheng
Details | Diff
[PATCH 5/8] ACPI/EC: Remove duplicated ec_wait_ibf0() waiter. (2.93 KB, patch)
2014-06-10 03:19 UTC, Lv Zheng
Details | Diff
[PATCH 6/8] ACPI/EC: Add reference counting for query handlers. (4.21 KB, patch)
2014-06-10 03:23 UTC, Lv Zheng
Details | Diff
[PATCH 7/8] ACPI/EC: Move "struct transaction" to drivers/acpi/internal.h. (5.63 KB, patch)
2014-06-10 03:27 UTC, Lv Zheng
Details | Diff
[PATCH 8/8] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support. (11.20 KB, patch)
2014-06-10 03:29 UTC, Lv Zheng
Details | Diff
dmesg of Linux 3.15.0 + 9 patches (163.02 KB, text/plain)
2014-06-10 19:05 UTC, Steffen Weber
Details
HP EliteBook 840 G1 acpidump (751.74 KB, text/plain)
2014-06-11 13:49 UTC, Steffen Weber
Details
[PATCH 01/11] ACPICA: Events: Fix edge-triggered GPE by disabling before acknowledging it. (2.48 KB, patch)
2014-06-11 14:39 UTC, Lv Zheng
Details | Diff
[PATCH 02/11] ACPI/EC: Add detailed fields debugging support of EC_SC(R). (2.56 KB, patch)
2014-06-11 14:41 UTC, Lv Zheng
Details | Diff
[PATCH 03/11] ACPI/EC: Fix an issue that advance_transaction() processes staled status. (2.20 KB, patch)
2014-06-11 14:42 UTC, Lv Zheng
Details | Diff
[PATCH 04/11] ACPI/EC: Add reference counting for query handlers. (4.21 KB, patch)
2014-06-11 14:44 UTC, Lv Zheng
Details | Diff
[PATCH 05/11] ACPI/EC: Add asynchronous command byte write support. (6.43 KB, patch)
2014-06-11 14:45 UTC, Lv Zheng
Details | Diff
[PATCH 06/11] ACPI/EC: Remove duplicated ec_wait_ibf0() waiter. (2.88 KB, patch)
2014-06-11 14:47 UTC, Lv Zheng
Details | Diff
[PATCH 07/11] ACPI/EC: Fix a race condition in ec_transaction_completed(). (3.83 KB, patch)
2014-06-11 14:48 UTC, Lv Zheng
Details | Diff
[PATCH 08/11] ACPI/EC: Move "struct transaction" to drivers/acpi/internal.h. (5.60 KB, patch)
2014-06-11 14:49 UTC, Lv Zheng
Details | Diff
[PATCH 09/11] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support. (11.43 KB, patch)
2014-06-11 14:54 UTC, Lv Zheng
Details | Diff
[PATCH 10/11] ACPI/EC: Enable debbuging messages. (903 bytes, patch)
2014-06-11 14:55 UTC, Lv Zheng
Details | Diff
[PATCH 11/11] ACPI/EC: Enable detailed register bit fields debugging. (958 bytes, patch)
2014-06-11 14:56 UTC, Lv Zheng
Details | Diff
[PATCH 09/11] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support. (11.78 KB, patch)
2014-06-11 17:31 UTC, Lv Zheng
Details | Diff
dmesg of Linux 3.15.0 + 11 patches (223.74 KB, text/plain)
2014-06-11 18:02 UTC, Steffen Weber
Details
dmesg of Linux 3.15.0 without any patches but DEBUG enabled (166.85 KB, text/x-log)
2014-06-12 07:41 UTC, Steffen Weber
Details
dmesg of Linux 3.15.0 with patches 1+2+3+4+5+6+7+10+11 (182.05 KB, text/x-log)
2014-06-12 08:26 UTC, Steffen Weber
Details
attachment-26689-0.html (2.72 KB, text/html)
2014-06-12 12:12 UTC, Gareth Williams
Details
dmesg of Linux 3.15.0 old config with ACPI DEBUG (147.24 KB, text/plain)
2014-06-12 13:55 UTC, Steffen Weber
Details
dmesg of Linux 3.15.0 old config with all 11 patches (158.88 KB, text/plain)
2014-06-12 13:55 UTC, Steffen Weber
Details
dmesg with kernel 3.15 and patches applied (224.16 KB, text/plain)
2014-06-12 18:59 UTC, Gareth Williams
Details
dmesg with kernel 3.15 vanilla (49.85 KB, text/plain)
2014-06-12 19:37 UTC, Gareth Williams
Details
dmesg with kernel 3.15 with ec patches but no debug patches (49.85 KB, text/plain)
2014-06-12 21:00 UTC, Gareth Williams
Details

Description Lv Zheng 2014-02-20 06:01:42 UTC
Hi,

On bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1055228 ,
Gareth reported 6 second boot delay on 3.12 kernel. Delay does not
happen if acpi_no_auto_ssdt option is used, nor if kernel is compiled
with various debug facilities like LOCKDEP (fedora kernel-debug
variant). We tried to compile kernel with just CONFIG_ACPI_DEBUG=y, on
that kernel issue was reproducible, but after enable printing messages,
again issue disappeared (we tried acpi.debug_layer=0xffffffff together
with acip.debug_level=0x0307ff4f and acip.debug_level=0x03000000,
links to logs are provided in bugzilla).

On affected kernels interesting dmesg part looks like below:

[    0.212035] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
[    0.217930] ACPI: SSDT 0000000092e6ac18 003D3 (v01  PmRef  Cpu0Cst 00003001 INTL 20121220)
[    0.218512] ACPI: Dynamic OEM Table Load:
[    0.218514] ACPI: SSDT           (null) 003D3 (v01  PmRef  Cpu0Cst 00003001 INTL 20121220)
[    0.222109] ACPI: SSDT 0000000092e6a618 005AA (v01  PmRef    ApIst 00003000 INTL 20121220)
[    0.222792] ACPI: Dynamic OEM Table Load:
[    0.222799] ACPI: SSDT           (null) 005AA (v01  PmRef    ApIst 00003000 INTL 20121220)
[    0.226920] ACPI: SSDT 0000000092e69d98 00119 (v01  PmRef    ApCst 00003000 INTL 20121220)
[    0.227495] ACPI: Dynamic OEM Table Load:
[    0.227497] ACPI: SSDT           (null) 00119 (v01  PmRef    ApCst 00003000 INTL 20121220)
[    6.812869] ACPI: Interpreter enabled

With acpi_no_auto_ssdt option interesting part looks like this:

[    0.203230] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
[    0.203595] ACPI Error: [SGMD] Namespace lookup failure, AE_NOT_FOUND (20130725/psargs-359)
[    0.203601] ACPI Error: Method parse/execution failed [\_SB_._OSC] (Node ffff880253859438), AE_NOT_FOUND (20130725/psparse-536)
[    0.206093] ACPI Error: [\_PR_.CPU0._PPC] Namespace lookup failure, AE_NOT_FOUND (20130725/psargs-359)
[    0.206102] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_.SASU] (Node ffff8802538570a0), AE_NOT_FOUND (20130725/psparse-536)
[    0.206108] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_.ECMI] (Node ffff880253857078), AE_NOT_FOUND (20130725/psparse-536)
[    0.206113] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_._REG] (Node ffff880253856910), AE_NOT_FOUND (20130725/psparse-536)
[    0.206122] Fail in evaluating the _REG object of EC device. Broken bios is suspected.
[    0.206127] ACPI: Interpreter enabled

Any help to solving this bug would be appreciate. I'm CCing Gareth, he
can provide more informations if needed.
Comment 1 Gareth Williams 2014-02-20 06:35:40 UTC
Created attachment 126811 [details]
acpidump output

As requested, I've attached an acpidump output.  I hope it's in the correct format.  I've downgraded the kernel as far as 3.10 and this issue is still present.  I'll have to compile a kernel if you want me to go any earlier.
Comment 2 Lv Zheng 2014-02-21 07:50:48 UTC
The problem is caused by the dynamic reloading of the same table during initialization.
The code is not well tested in such an environment.

Let me do some local testing and request you to validate after that.

Thanks for your information.
Comment 3 Lv Zheng 2014-02-24 03:24:33 UTC
Created attachment 127221 [details]
bugzilla.kernel.org 70891 debugging for initialization processes.

This seems not to be a table loading issue, but an EC operation region issue.

Let's first try to figure out which ACPI initialization process has caused this issue.

Please:
1. apply this patch to the latest kernel,
2. build the kernel using the configuration that caused this issue,
3. boot the built kernel without acpi_no_auto_ssdt,
4. if there is still 6 seconds delayed in the startup process, post the dmesg output here.

Thanks.
Comment 4 Lv Zheng 2014-02-24 04:35:57 UTC
And let me post some known fixes for current ACPICA table management implementation.
You may give them a try to see if they can fix the issue.
Comment 5 Lv Zheng 2014-02-24 04:40:26 UTC
Created attachment 127231 [details]
ACPICA: Remove indent divergences to reduce the difficulties of maintaining the revised ACPICA commits.

This is the preparation of the next 2 linuxized ACPICA fixes.
Comment 6 Lv Zheng 2014-02-24 04:42:28 UTC
Created attachment 127241 [details]
ACPICA: Tables: Fix the issues in handling virtual addressed tables.

This is a fix related to the dynamic loaded tables.
Comment 7 Lv Zheng 2014-02-24 04:44:25 UTC
Created attachment 127251 [details]
ACPICA: Tables: Fix the issue that multiple ACPI_FREE()s are invoked for same pointer around acpi_tb_add_table().

This is another fix related to the dynamic table loading.
Comment 8 Lv Zheng 2014-02-24 04:49:53 UTC
Here is the test requests about the last 3 patches.  Please:

1. apply the following patches to the latest kernel,
    "Attachment 127231 [details]"
    "Attachment 127241 [details]"
    "Attachment 127251 [details]"
2. build the kernel using the configuration that caused this issue,
3. boot the built kernel without acpi_no_auto_ssdt,
4. post the dmesg output here.

It might be able to fix the issue if the issue is related to the dynamic table loading.
But since the issue might be caused by EC opregion not dynamic table loading,
the test result of "Comment 3" is still important.

Thanks.
Comment 9 Gareth Williams 2014-02-25 08:01:33 UTC
The dmesg logs are large therefore are tar.gz'd and available at:-
https://dl.dropboxusercontent.com/u/90149727/bugzilla-70891-1.tar.gz

I've added a few extra dmesg from various kernels (without debug and without the patches - so they are probably useless!) as the delay varies wildly between 3.12 and 3.13 kernels and also between cold and warm boots.

If you need me to try any other combination (eg 3.14 kernel or vanilla 3.13), just give me a shout.
Comment 10 Lv Zheng 2014-02-26 04:11:43 UTC
Let me shows the result in your test here:

arch-3.12.9-2.cold.txt:
  [    0.314279] ACPI: SSDT           (null) 00119 (v01  PmRef    ApCst 00003000 INTL 20121220)
  [    7.411438] ACPI: Interpreter enabled
arch-3.12.9-2.warm.txt:
  [    0.314278] ACPI: SSDT           (null) 00119 (v01  PmRef    ApCst 00003000 INTL 20121220)
  [    7.418568] ACPI: Interpreter enabled
fed20-3.13.3-201.cold.txt:
  [    0.232446] ACPI: SSDT           (null) 000119 (v01  PmRef    ApCst 00003000 INTL 20121220)
  [    0.309612] ACPI: Interpreter enabled
fed20-3.13.3-201.warm.txt:
  [    0.232643] ACPI: SSDT           (null) 000119 (v01  PmRef    ApCst 00003000 INTL 20121220)
  [    1.311722] ACPI: Interpreter enabled
arch-3.13.5-1.cold.txt:
  [    0.320782] ACPI: SSDT           (null) 000119 (v01  PmRef    ApCst 00003000 INTL 20121220)
  [    4.478035] ACPI: Interpreter enabled
arch-3.13.5-1.patched.cold.txt:
  [    0.321213] ACPI: SSDT           (null) 000119 (v01  PmRef    ApCst 00003000 INTL 20121220)
  [    1.481476] ACPI: Interpreter enabled
arch-3.13.5-1.patched.warm.txt
  [    0.320909] ACPI: SSDT           (null) 000119 (v01  PmRef    ApCst 00003000 INTL 20121220)
  [    1.937851] ACPI: Interpreter enabled

The dmesg files are named as follows: dist-ver[.patched][.debug].boot.txt
From your README, I can learn:
1. dist:
   arch - Arch Linux (packaged and compiled)
   fed20 - Fedora 20 (packaged kernel)
2. ver: the kernel version
3. debug: boot cmd line:
          acpi.debug_layer=0xffffffff
          acpi.debug_level=0x03000000
          log_buf_len=64M
4. boot:
   warm/cold - cold boot or reboot
               (there seems to be a difference in delay
                depending on how the system was booted)
I'm curious about what the "patched" means.  In the ".patched" dmesgs, the boot time seems to be improved from 4.3sec to 1.1sec by what you've patched.  So what does the "patched" mean?

It is even faster (0.1sec) in the dmesg of booting with the fedora kernel.  So what's the .config of this kernel?

I didn't examine the ACPICA debugging enabled dmesg outputs since.  I'll try to look deeper into it and decode the DSDT to see what have been executed in the failed control methods.
Comment 11 Gareth Williams 2014-02-26 11:38:29 UTC
By 'patched', I meant that the Arch kernel was rebuilt with the three patches you provided in comment 8.

The config for the Fedora 20 kernel is:-
https://dl.dropboxusercontent.com/u/90149727/config-3.13.3-201.fc20.x86_64

I've just copied the Fedora 20 config to the Arch and compiled the kernel.  Just as with the default Fedora 20 kernel, this new kernel displays a faster boot time.  (I then applied the three patches above, but that slowed down the boot marginally).

I've ran diffconfig (from the kernel source's scripts directory) against the original Fedora and Arch configs, but the result is huge.  I'm certain most of those are irelevant, but it will take some time to weed out the irelevant ones and find the possible culprit.

I could start by concentrating on the config entries with certain keywords (eg ACPI).  If you could suggest which ones would be best, it may help.

As this issue was initially reported as a Fedora bug, but seems to have improved considerably since Fedora released a new kernel, is this still a kernel bug or an Arch packaging bug? I'd hate to waste your time on an packaging bug!
Comment 12 Lv Zheng 2014-02-28 07:13:29 UTC
H(In reply to Gareth Williams from comment #11)
> By 'patched', I meant that the Arch kernel was rebuilt with the three
> patches you provided in comment 8.

The problem seems not related to dynamic table loading.

> I could start by concentrating on the config entries with certain keywords
> (eg ACPI).  If you could suggest which ones would be best, it may help.
> 
> As this issue was initially reported as a Fedora bug, but seems to have
> improved considerably since Fedora released a new kernel, is this still a
> kernel bug or an Arch packaging bug? I'd hate to waste your time on an
> packaging bug!


That's OK.  The bug reports are always welcomed as they can help to improve software.

Let me wrote something about what I've investigated.

From the debugging message of the "context switch" that is added by you, I can see acpi hotplug is triggered during the acpi objects initialization.  I'm not sure if this is an issue and can light shadows.

By decoding the DSDT and exmining in it, I found there are many things (graphics, battery) invoked in the _REG of EC operation region.  But the _REG of EC opregion is not invoked in acpi_initialize_objects(), but in acpi_boot_ec_enable().  So this might not be an ACPICA bug.

IMO, mutex debugging information might not be needed in this case.  As it's hardly for us to determine whether the mutex acquiring/releasing messages are caused by the initilization steps (there will be plenty of such messages) or by the dead locks.  I'd prefer to ask you do some control method instrumentation to figure out which control method has consumed the time most.  I'll post a debugging patch and write some test requests later.
Comment 13 Lv Zheng 2014-02-28 07:14:44 UTC
Created attachment 127631 [details]
ACPICA: Dispatcher: Add method execution instrumentation support.

This patch can be used to trace control method execution begin/end time.
Comment 14 Lv Zheng 2014-02-28 07:18:02 UTC
Here is the test request:

1. Please apply the following patches to the kernel that appears to bootup slowly:
   attachment 127221 [details]
   attachment 127631 [details]
2. Using the kernel config that can reproduce the slow bootup to build the kernel
3. Boot the built kernel and post dmesg here.

If the issue disppeared, please unapply the 2 patches and only apply the following patch and try again:
   attachment 127221 [details]

Thanks in advance.
Comment 15 Gareth Williams 2014-02-28 09:50:33 UTC
Created attachment 127671 [details]
Dmesg with instrumentation patch

Dmesg with instrumentation patch applied.  That took 10 seconds to boot the kernel.
Comment 16 Lv Zheng 2014-03-06 07:15:43 UTC
Hi,

Let me post some investigation result.
According to the following log entries:
[    0.264004] ACPI: acpi_initialize_objects executing...

[    0.338131] ACPI: acpi_boot_ec_enable executing...

[    0.338187] ACPI: Begin method [0xffff88025384cf78:\_SB_.PCI0.LPCB.EC0_._REG] execution.
[    0.338204] ACPI: Begin method [0xffff88025384ce10:\_SB_.PCI0.LPCB.EC0_.GBAS] execution.
[    0.338508] ACPI: Begin method [0xffff88025384c438:\_SB_.PCI0.LPCB.EC0_.ECMI] execution.
[    0.338519] ACPI: Begin method [0xffff88025384c1f8:\_SB_.PCI0.LPCB.EC0_.OSTD] execution.
[    0.340339] ACPI: Begin method [0xffff88025384c3f0:\_SB_.PCI0.LPCB.EC0_.SASU] execution.
[    0.354250] ACPI: Begin method [0xffff88025384c240:\_SB_.PCI0.LPCB.EC0_.ULID] execution.
[    0.354400] ACPI: Begin method [0xffff88025384c870:\_SB_.PCI0.LPCB.EC0_._Q92] execution.
[    0.354644] ACPI: Begin method [0xffff88025384d3a8:\_SB_.PCI0.ACEL.INIT] execution.
[    0.354649] ACPI: Begin method [0xffff88025384d090:\_SB_.PCI0.ACEL._STA] execution.
[    0.364202] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    0.870765] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    1.377475] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    1.884184] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    2.394203] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    2.900890] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    3.407585] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    3.914244] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    4.420936] ACPI: Begin method [0xffff88025384c5a0:\_SB_.PCI0.LPCB.EC0_.SMRD] execution.
[    4.427698] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    4.934249] ACPI: Begin method [0xffff88025384d3f0:\_SB_.PCI0.ACEL.ADJT] execution.
[    4.934254] ACPI: Begin method [0xffff88025384d090:\_SB_.PCI0.ACEL._STA] execution.
[    4.934385] ACPI: Begin method [0xffff88025384e240:\_SB_.LID0._LID] execution.
[    4.934673] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    5.440903] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    5.950876] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    6.457514] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    6.964135] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution.
[    7.470766] ACPI: Begin method [0xffff88025384e240:\_SB_.LID0._LID] execution.

The delays happen in SMWR, each time it is called, 0.5 seconds will be consumed.
The called SMWR is listed below:

\_SB_.PCI0.LPCB.EC0_._REG
	\_SB_.PCI0.LPCB.EC0_.GBAS
	\_SB_.PCI0.LPCB.EC0_.ECMI
		\_SB_.PCI0.LPCB.EC0_.OSTD
		\_SB_.PCI0.LPCB.EC0_.SASU
		\_SB_.PCI0.LPCB.EC0_.ULID
		\_SB_.PCI0.LPCB.EC0_._Q92
		\_SB_.PCI0.ACEL.INIT
			\_SB_.PCI0.ACEL._STA
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x20, 0x5F)
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x21, 0x32)
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x32, 0x0D)
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x33, 0x02)
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x30, 0x95)
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x36, 0x07)
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x37, 0x05)
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x34, 0x7F)
			\_SB_.PCI0.LPCB.EC0_.SMRD
			\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x24, Local0)
			\_SB_.PCI0.ACEL.ADJT
				\_SB_.PCI0.ACEL._STA
				\_SB_.LID0._LID
				\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x36, 0x13)
				\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x37, 0x02)
				\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x34, 0x2A)
				\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x24, Zero)
				\_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x22, 0x20)
				\_SB_.LID0._LID

The SMWR function is listed below (I deleted the part of ASL since we always have Arg0=0xC6, Arg1=0x50):
DefinitionBlock ("dsdt.aml", "DSDT", 1, "HPQOEM", "2163    ", 0x00000000)
{
    Scope (_SB)
    {
        Device (PCI0)
        {
            Device (LPCB)
            {
                Device (EC0)
                {
                    OperationRegion (ERAM, EmbeddedControl, Zero, 0xFF)
                    Field (ERAM, ByteAcc, Lock, Preserve)
                    {
                        SMPR,   8, 
                        SMST,   8, 
                        SMAD,   8, 
                        SMCM,   8, 
                        SMD0,   256, 
                    }
                    Field (ERAM, ByteAcc, NoLock, Preserve)
                    {
                        Offset (0x04), 
                        SMW0,   16
                    }
                    Field (ERAM, ByteAcc, NoLock, Preserve)
                    {
                        Offset (0x04), 
                        SMB0,   8
                    }

                    Mutex (MUT0, 0x00)
                    Method (SMWR, 4, NotSerialized)
                    {
                        Acquire (MUT0, 0xFFFF)
                        Store (0x04, Local0)
                        While (LGreater (Local0, One))
                        {
                            Store (Arg3, SMB0)
                            And (SMST, 0x40, SMST)
                            Store (Arg2, SMCM)
                            Store (Arg1, SMAD)
                            Store (Arg0, SMPR)
                            Store (Zero, Local3)
                            While (LNot (And (SMST, 0xBF, Local1)))
                            {
                                Sleep (0x02)
                                Increment (Local3)
                                If (LEqual (Local3, 0x32))
                                {
                                    And (SMST, 0x40, SMST)
                                    Store (Arg2, SMCM)
                                    Store (Arg1, SMAD)
                                    Store (Arg0, SMPR)
                                    Store (Zero, Local3)
                                }
                            }

                            If (LEqual (Local1, 0x80))
                            {
                                Store (Zero, Local0)
                            }
                            Else
                            {
                                Decrement (Local0)
                            }
                        }

                        If (Local0)
                        {
                            Store (And (Local1, 0x1F), Local0)
                        }

                        Release (MUT0)
                        Return (Local0)
                    }
                }
            }
        }
    }
}
It looks like a BIOS issue, but we may dig futher.
We may try to see which EC operation has caused the delay.
Let me think about the next debugging step.
I'll post debugging patch later.
Comment 17 Lv Zheng 2014-03-07 02:43:05 UTC
Created attachment 128341 [details]
ACPICA: Dispatcher: Add control method tracing once support.

In order not to inflating your dmesg size, I added trace once support.
Comment 18 Lv Zheng 2014-03-07 02:44:01 UTC
Created attachment 128351 [details]
ACPICA: Events: Add EC access instrumentation support.

Then we can trace EC operation region accesses with bounded logging overhead.
Comment 19 Lv Zheng 2014-03-07 02:44:43 UTC
Created attachment 128361 [details]
ACPICA: Executer: Add Sleep/Stall opcode instrumentation support.

Also we can trace Sleep execution with bounded logging overhead.
Comment 20 Lv Zheng 2014-03-07 02:49:42 UTC
Here is the test request:

1. Please apply the following patches to the kernel that appears to bootup slowly:
   attachment 127631 [details]
   attachment 128341 [details]
   attachment 128351 [details]
   attachment 128361 [details]
2. Using the kernel config that can reproduce the slow bootup to build the kernel;
3. Boot the built kernel with "acpi_trace_once=\_SB_.PCI0.LPCB.EC0_._REG";
4. If the dmesg output is not too big, please post dmesg here;
5. If the dmesg output is too big, please boot the kernel with "acpi_trace_once=\_SB_.PCI0.LPCB.EC0_.SMWR" and post the dmesg here.

And don't forget to add under scores "_" after _SB and EC0. They are required.

Thanks in advance.
Comment 21 Gareth Williams 2014-03-07 19:52:15 UTC
Attachment 128341 [details] fails at the second hunk.
I've checked the patch and the hunk lines the patch expects to find aren't in the original source (kernel 3.13.5)
Was the patch against that version?

I edited the patch so that:-

extern const char *acpi_gbl_trace_once_method;

is added to the acpifx.h file and although the patch then worked, the kernel failed to compile later.
Comment 22 Gareth Williams 2014-03-07 19:52:57 UTC
Attachment 128341 [details] fails at the second hunk.
I've checked the patch and the hunk lines the patch expects to find aren't in the original source (kernel 3.13.5)
Was the patch against that version?

I edited the patch so that:-

    extern const char *acpi_gbl_trace_once_method;

is added to the acpifx.h file and although the patch then worked, the kernel failed to compile later.
Comment 23 Lv Zheng 2014-03-08 01:27:44 UTC
Created attachment 128571 [details]
ACPICA: Dispatcher: Add control method tracing once support.

The attachment 128341 [details] is not suitable for your kernel version.
Please use this instead.

Sorry for the mistake.
Comment 24 Gareth Williams 2014-03-08 14:19:45 UTC
Created attachment 128591 [details]
Dmesg with _SB_.PCI0.LPCB.EC0_._REG Debug

Thanks for the new patch - it worked fine.  Should I be on a different kernel? It's no major issue for me to be on any version (within reason!) if it helps.
I've attached the dmesg log.
Comment 25 Lv Zheng 2014-03-09 03:29:49 UTC
I'm working on linux-pm/linux-next branch where latest ACPICA patches are shipped.  But you don't need to do that.

I checked the dmesg, there is a mistake.
You shouldn't use
acpi_trace_once=_SB_.PCI0_LPCB.EC0_._REG
You should use
acpi_trace_once=\_SB_.PCI0_LPCB.EC0_._REG

The backslash is required, otherwise the EC0._REG method executioin couldn't be matched and the tracing thus couldn't turned on.

Please correct the boot parameter and try again.

Thanks
Comment 26 Gareth Williams 2014-03-09 16:12:44 UTC
I've hit a problem:-

I've tried to pass the above kernel paramters both by editing grub.cfg and by editing the parameters directly at boot time in grub.  However, I get the following results:-

Entered (after acpi_trace_once=)     Results (in dmesg and /proc/cmdline)
=======                              =======
\_SB_.PCI0_LPCB.EC0_._REG            _SB_.PCI0_LPCB.EC0_._REG
\\_SB_.PCI0_LPCB.EC0_._REG           \\_SB_.PCI0_LPCB.EC0_._REG
\\\_SB_.PCI0_LPCB.EC0_._REG          \\_SB_.PCI0_LPCB.EC0_._REG
'\_SB_.PCI0_LPCB.EC0_._REG'          \\_SB_.PCI0_LPCB.EC0_._REG
"\_SB_.PCI0_LPCB.EC0_._REG"          \\_SB_.PCI0_LPCB.EC0_._REG

The reason you had no backslash was because I only entered one (I remember doing so as I have a non US keyboard, therefore had to search for the correct key to press to give me a backslash in grub) and assumed, incorrectly, that it would be passed to the kernel.

How do I insert a single backslash into the kernel parameters?

Thanks in advance.
Comment 27 Lv Zheng 2014-03-09 20:24:16 UTC
Created attachment 128741 [details]
My DMESG output

I didn't face such issues.
You might be using grub2 while I'm using grub.

My grub entry is:
title		Debian GUN/Linux, acpica (Robert Moore)
root		(hd0,1)
kernel		/vmlinuz-acpica-z530 root=/dev/sda6 threadirqs acpi.debug_layer=0x00000040 acpi_trace_once=\_SB_.PCI0.LPCB.BAT1._STA
initrd		/initrd.img-3.2.0-2-686-pae

This attachment is my dmesg output.

The following lines are useful for us to dig your reported issue deeper.
[    0.164199] ACPI: Begin method [0xc0010978:\_SB_.PCI0.LPCB.BAT1._STA] execution.
[    0.164387] ACPI: Tracing once enabled: c0010978:\_SB_.PCI0.LPCB.BAT1._STA
[    0.164571] ACPI: Begin method [0xc00104f8:\_SB_.PCI0.LPCB.ECOK] execution.
[    0.164781] ACPI: End method [0xc00104f8] execution.
[    0.164963] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.BAL0)
[    0.168096] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.BAL0, 0x00000011)
[    0.168239] ACPI: Begin Sleep(20 ms)
[    0.192021] ACPI: End Sleep(20 ms)
[    0.192223] ACPI: Tracing once disabled: c0010978
[    0.192400] ACPI: End method [0xc0010978] execution.

So I have no idea what is happening to your grub.
Anyway, we neednt' work on grub2 issues at bugzilla.kernel.org.
We can simply work it around.
I'll post a workaround patch later in this thread.
Comment 28 Lv Zheng 2014-03-09 20:25:38 UTC
Created attachment 128751 [details]
ACPICA: Workaround to match method where backslash cannot be specified by the bootloader.

This patch can be used to workaround your issue.

Let me describe the test steps later.
Comment 29 Lv Zheng 2014-03-09 20:30:47 UTC
Here is the test request for kernel 3.13.5:

1. Please apply the following patches to the kernel that appears to bootup slowly:
   attachment 127631 [details]
   attachment 128571 [details]
   attachment 128351 [details]
   attachment 128361 [details]
   attachment 128751 [details]
2. Using the kernel config that can reproduce the slow bootup to build the kernel;
3. Boot the built kernel with "acpi_trace_once=_SB_.PCI0.LPCB.EC0_._REG";
4. If the dmesg output is not too big, please post dmesg here (you can check if it is working before posting according to the dmesg output example I posted in comment 27);
5. If the dmesg output is too big, please boot the kernel with "acpi_trace_once=_SB_.PCI0.LPCB.EC0_.SMWR" and post the dmesg here.

And don't forget to add under scores "_" after _SB and EC0. They are required.

Thanks in advance.
Comment 30 Gareth Williams 2014-03-09 21:19:21 UTC
Created attachment 128761 [details]
dmesg with acpi_trace_once=_SB_.PCI0.LPCB.EC0_._REG
Comment 31 Lv Zheng 2014-03-10 02:15:47 UTC
This time, it was not SMWR, but OSTD has consumed 0.5seconds.
[    0.837768] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.RFLG)
[    1.337293] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.RFLG, 0x00000000)

This function is much more simpler.
As we know ECOK should equal to one and OSYS should be the latest Windows, this function can be simplified as:
DefinitionBlock ("dsdt.aml", "DSDT", 1, "HPQOEM", "2163    ", 0x00000000)
{
    Scope (_SB)
    {
        Device (PCI0)
        {
            Device (LPCB)
            {
                Device (EC0)
                {
                    Name (_HID, EisaId ("PNP0C09"))  // _HID: Hardware ID
                    OperationRegion (ERAM, EmbeddedControl, Zero, 0xFF)
                    Field (ERAM, ByteAcc, Lock, Preserve)
                    {
                        Offset (0x41), 
                        W7OS,   1, 
                        QWOS,   1, 
                            ,   1, 
                        SUSE,   1, 
                        RFLG,   1, 
                            ,   1, 
                            ,   1, 
                    }
                    Method (OSTD, 0, NotSerialized)
                    {
                        Store (Zero, W7OS)
                        Store (Zero, SUSE)
                        Store (Zero, RFLG)
                        Store (One, W7OS)
                    }
                }
            }
        }
    }
}

The log also shows:
[    0.338235] ACPI: Begin method [0xffff88024f84c1f8:\_SB_.PCI0.LPCB.EC0_.OSTD] execution.

Store (Zero, W7OS):
[    0.338244] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.W7OS)
[    0.338341] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.W7OS, 0x00000001)
[    0.338344] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.W7OS, 0x00000000)
[    0.338536] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.W7OS)

Store (Zero, SUSE):
[    0.338542] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.SUSE)
[    0.837581] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.SUSE, 0x00000000)
[    0.837584] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.SUSE, 0x00000000)
[    0.837762] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.SUSE)

Store (Zero, RFLG):
[    0.837768] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.RFLG)
[    1.337293] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.RFLG, 0x00000000) <- 0.5seconds delay
[    1.337296] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.RFLG, 0x00000000)
[    1.337462] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.RFLG)

Store (One, W7OS):
[    1.337488] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.W7OS)
[    1.337592] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.W7OS, 0x00000000)
[    1.337595] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.W7OS, 0x00000001)
[    1.337772] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.W7OS)
[    1.337778] ACPI: End method [0xffff88024f84c1f8] execution.

Since the delay is various each time you boot and obviously has occurred in the EC driver where EC operation region has happened, my conclusion is this is not an ACPICA issue, but is an issue either caused by the EC driver or by the EC firmware.

Since no other platform has reported such issue, it seems the culprit is most likely the EC firmware of your platform.  I can help to confirm this by enabling EC debugging information.  Let me enable EC driver debugging to confirm.  I'll post a debugging patch later.
Comment 32 Lv Zheng 2014-03-10 02:18:34 UTC
Created attachment 128791 [details]
ACPI: Enable EC driver debugging support.

Enabling EC driver debugging.
Comment 33 Lv Zheng 2014-03-10 02:21:27 UTC
Here is the test request:

1. Please apply the following patches to the kernel that appears to bootup slowly:
   attachment 127631 [details]
   attachment 128571 [details]
   attachment 128351 [details]
   attachment 128361 [details]
   attachment 128751 [details]
   attachment 128791 [details]
2. Using the kernel config that can reproduce the slow bootup to build the kernel;
3. Boot the built kernel with "acpi_trace_once=_SB_.PCI0.LPCB.EC0_._REG";
4. If the dmesg output is not too big, please post dmesg here.

And don't forget to add under scores "_" after _SB and EC0. They are required.

Thanks in advance.
Comment 34 Gareth Williams 2014-03-10 21:11:45 UTC
Created attachment 128871 [details]
Dmesg with EC logging
Comment 35 Lv Zheng 2014-03-11 00:35:27 UTC
The following "message" is accompanied each time with the delay:
[   xx.xxxxxx] ACPI : EC: controller reset, restart transaction

Looks like a driver issue.
Give me sometime to learn the code and fix it.

Thanks for reporting.
Comment 36 Lan Tianyu 2014-03-13 04:32:50 UTC
Please try with kernel param "ec.ec_delay=50". This should be able to decrease the delay.
Comment 37 Gareth Williams 2014-03-13 06:46:43 UTC
This is dmesg with ec.ec_delay=50

The delay is approx 2.6 seconds.

[    0.260663] ACPI: Added _OSI(Module Device)
[    0.260665] ACPI: Added _OSI(Processor Device)
[    0.260667] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.260668] ACPI: Added _OSI(Processor Aggregator Device)
[    0.265447] ACPI: Executed 1 blocks of module-level executable AML code
[    0.281431] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
[    0.293676] ACPI: SSDT 0000000092e6ac18 0003D3 (v01  PmRef  Cpu0Cst 00003001 INTL 20121220)
[    0.294276] ACPI: Dynamic OEM Table Load:
[    0.294278] ACPI: SSDT           (null) 0003D3 (v01  PmRef  Cpu0Cst 00003001 INTL 20121220)
[    0.307180] ACPI: SSDT 0000000092e6a618 0005AA (v01  PmRef    ApIst 00003000 INTL 20121220)
[    0.307874] ACPI: Dynamic OEM Table Load:
[    0.307876] ACPI: SSDT           (null) 0005AA (v01  PmRef    ApIst 00003000 INTL 20121220)
[    0.320321] ACPI: SSDT 0000000092e69d98 000119 (v01  PmRef    ApCst 00003000 INTL 20121220)
[    0.320914] ACPI: Dynamic OEM Table Load:
[    0.320916] ACPI: SSDT           (null) 000119 (v01  PmRef    ApCst 00003000 INTL 20121220)
[    2.974744] ACPI: Interpreter enabled
[    2.974756] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20131115/hwxface-580)
[    2.974764] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20131115/hwxface-580)
[    2.974791] ACPI: (supports S0 S3 S4 S5)
[    2.974793] ACPI: Using IOAPIC for interrupt routing
[    2.974833] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    2.974998] ACPI: No dock devices found.
[    2.984644] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
[    2.984651] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    2.984750] \_SB_.PCI0:_OSC invalid UUID
[    2.984751] _OSC request data:1 1f 0 
[    2.984757] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM
Comment 38 Gareth Williams 2014-03-13 06:52:50 UTC
I've just rebooted without the ec.ec_delay=50 kernel parameter and the boot time was similar - delay of 2.5 seconds.
Comment 39 Lv Zheng 2014-03-13 06:58:17 UTC
(In reply to Gareth Williams from comment #38)
> I've just rebooted without the ec.ec_delay=50 kernel parameter and the boot
> time was similar - delay of 2.5 seconds.

Tuning ec_delay cannot fix the real issue.
Let's continue to find if any gaps have caused such frequent controller resets.
I think this is a good chance and you can help us to improve the Linux quality.
Comment 40 Gareth Williams 2014-05-22 05:21:33 UTC
Is there any progress on this issue? Fedora are going through a mass bug update and the downstream bug for this #1055228 is one they've earmarked for closure if there is no progress.  I've just booted with kernel 3.14.4-200 and the kernel took 7.562 seconds.  dmesg showed the delay of 5 seconds at the same point, so the newer kernel hasn't fixed the issue.
Comment 41 Lv Zheng 2014-05-27 06:20:28 UTC
Hi,

I'm going to work on this.
Could you offer a dmidecode output here so that I can see if any quirks are enabled for your platform.

Thanks in advance.
Comment 42 Steffen Weber 2014-05-27 06:32:03 UTC
Created attachment 137461 [details]
HP EliteBook 840 G1 dmidecode

On my HP EliteBook 840 G1 running vanilla Linux 3.14.4 I'm only seeing a 1.5 second delay before the message "ACPI: Interpreter enabled" appears. But maybe my dmidecode output can help you to spot some similarities (see attachment).

...
[    0.112242] ACPI: Added _OSI(Module Device)
[    0.112350] ACPI: Added _OSI(Processor Device)
[    0.112459] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.112568] ACPI: Added _OSI(Processor Aggregator Device)
[    0.125802] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
[    0.141887] ACPI: SSDT 00000000bbde2918 0003F3 (v01  PmRef  Cpu0Cst 00003001 INTL 20110112)
[    0.142617] ACPI: Dynamic OEM Table Load:
[    0.142803] ACPI: SSDT           (null) 0003F3 (v01  PmRef  Cpu0Cst 00003001 INTL 20110112)
[    0.148143] ACPI: SSDT 00000000bbde3698 0005DB (v01  PmRef    ApIst 00003000 INTL 20110112)
[    0.148939] ACPI: Dynamic OEM Table Load:
[    0.149122] ACPI: SSDT           (null) 0005DB (v01  PmRef    ApIst 00003000 INTL 20110112)
[    0.152011] ACPI: SSDT 00000000bbde1d98 000119 (v01  PmRef    ApCst 00003000 INTL 20110112)
[    0.152735] ACPI: Dynamic OEM Table Load:
[    0.152917] ACPI: SSDT           (null) 000119 (v01  PmRef    ApCst 00003000 INTL 20110112)
[    1.664299] ACPI: Interpreter enabled
...
Comment 43 Gareth Williams 2014-05-27 06:37:35 UTC
Created attachment 137471 [details]
dmidecode output from HP Pavilion

Attached the dmidecode output as it was rather large to paste inline.

Thanks in advance.
Comment 44 Lv Zheng 2014-05-30 06:46:05 UTC
A write to the command register doesn't flag the CMD bit:
[    0.335165] ACPI : EC: transaction start (cmd=0x80, addr=0x41)
[    0.335166] ACPI : EC: <--- command = 0x80
[    0.335168] ACPI : EC: ---> status = 0x00
If we could reset at this point immediately, then there wouldn't be such a delay.
Though I found lots of possible improvements can be done to the EC driver, let me start from fixing this issue first.
Comment 45 Lv Zheng 2014-06-10 03:07:27 UTC
Hi,

The CMD field should be used by firmware, it's hard for OSPM to check this field...
I hope this was not your firmware bug.

Recently I did some improvements in EC driver, it acts faster than the original one.
Let us give it a try to find any possibilities to fix it in the OSPM side.
Comment 46 Lv Zheng 2014-06-10 03:09:44 UTC
Created attachment 138701 [details]
[PATCH 1/8] ACPICA: Events: Fix edge-triggered GPE by disabling before acknowledging it.

This is an ACPICA fix to be more robust to GPE handlings.
Comment 47 Lv Zheng 2014-06-10 03:14:53 UTC
Created attachment 138711 [details]
[PATCH 2/8] ACPI: Update interrupt handler to honor new ACPI_REENABLE_GPE bit.

A small fix that is not related to this issue, but is part of my improvements.
Comment 48 Lv Zheng 2014-06-10 03:16:47 UTC
Created attachment 138721 [details]
[PATCH 3/8] ACPI/EC: Add detailed fields debugging support of EC_SC(R).

This makes EC debugging log more useful.
Comment 49 Lv Zheng 2014-06-10 03:18:02 UTC
Created attachment 138731 [details]
[PATCH 4/8] ACPI/EC: Add asynchronous command byte write support.

The big improvement to improve EC throughput.
Comment 50 Lv Zheng 2014-06-10 03:19:28 UTC
Created attachment 138741 [details]
[PATCH 5/8] ACPI/EC: Remove duplicated ec_wait_ibf0() waiter.

I was thinking the out-of-order waiter could trigger something wrong on your platform.  So this patch is important.
Comment 51 Lv Zheng 2014-06-10 03:23:49 UTC
Created attachment 138751 [details]
[PATCH 6/8] ACPI/EC: Add reference counting for query handlers.

Removing kmalloc can help me to reduce one useless work queue scheduling, moving some code to the interrupt context.
Comment 52 Lv Zheng 2014-06-10 03:27:27 UTC
Created attachment 138761 [details]
[PATCH 7/8] ACPI/EC: Move "struct transaction" to drivers/acpi/internal.h.

This is a preparation for the last patch.
Comment 53 Lv Zheng 2014-06-10 03:29:20 UTC
Created attachment 138771 [details]
[PATCH 8/8] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support.

Improvement for EVT_SCI interrupts.
Comment 54 Lv Zheng 2014-06-10 03:34:11 UTC
This time we only need to track EC drivers, so no need to apply ACPICA interpreter tracer patches.

Here is the test request:
1. Please apply the following patches to the latest kernel that appears to bootup slowly:
   attachment 138701 [details]
   attachment 138711 [details]
   attachment 138721 [details]
   attachment 138731 [details]
   attachment 138741 [details]
   attachment 138751 [details]
   attachment 138761 [details]
   attachment 138771 [details]
   attachment 128791 [details]
2. Using the kernel config that can reproduce the slow bootup to build the kernel;
3. Please post dmesg here.

Thanks in advance.
Comment 55 Steffen Weber 2014-06-10 19:05:49 UTC
Created attachment 138961 [details]
dmesg of Linux 3.15.0 + 9 patches
Comment 56 Lv Zheng 2014-06-10 21:44:13 UTC
(In reply to Steffen Weber from comment #55)
> Created attachment 138961 [details]
dmesg of Linux 3.15.0 + 9 patches

It seems bug has been solved as there is no following log in your dmesg again:
[   xx.xxxxxx] ACPI : EC: controller reset, restart transaction

Before closing this bug, I'd wait for Gareth Williams to confirm as he is the original reporter.

Thanks!
Comment 57 Lv Zheng 2014-06-10 21:56:11 UTC
Hi, Steffen Weber

You issue is different from the one Gareth Williams has reported.
He is sufferring from abnormal EC transaction aborts.
In your case:
[    0.187052] ACPI : EC: transaction end
[    0.592774] ACPI : EC: transaction start (cmd=0x80, addr=0xe7)
[    0.612285] ACPI : EC: transaction end
[    0.715032] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
[    0.731218] ACPI : EC: transaction end
[    0.833319] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
[    0.850160] ACPI : EC: transaction end
[    0.951600] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
[    0.968079] ACPI : EC: transaction end
[    1.069886] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
[    1.086994] ACPI : EC: transaction end
[    1.189133] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
[    1.205930] ACPI : EC: transaction end
[    1.309369] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
[    1.325862] ACPI : EC: transaction end
[    1.429587] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
[    1.445792] ACPI : EC: transaction end
[    1.549807] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
[    1.566753] ACPI : EC: transaction end
[    1.670002] ACPI : EC: transaction start (cmd=0x80, addr=0x44)
There is simply something delays between EC transactions.
You need to follow Comment 33 to perform a test to see if it is abnormal.
Maybe this is just caused by some Stall/Sleep AML code.

Thanks.
Comment 58 Lv Zheng 2014-06-11 06:52:21 UTC
(In reply to Lv Zheng from comment #56)
> (In reply to Steffen Weber from comment #55)
> > Created attachment 138961 [details]
> dmesg of Linux 3.15.0 + 9 patches
> 
> It seems bug has been solved as there is no following log in your dmesg
> again:
> [   xx.xxxxxx] ACPI : EC: controller reset, restart transaction
> 
> Before closing this bug, I'd wait for Gareth Williams to confirm as he is
> the original reporter.
> 
> Thanks!

I noticed the patches has caused many out of control QR_SC command.
Let me refresh one of them.

Thanks
Comment 59 Lv Zheng 2014-06-11 07:25:16 UTC
(In reply to Steffen Weber from comment #55)
> Created attachment 138961 [details]
> dmesg of Linux 3.15.0 + 9 patches

Hi, Steffen

I see another issue that is not related to your reported one.
There are many such logs in your dmesg:

[   40.475141] ACPI : EC: ~~~> IRQ
[   40.475151] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   40.475154] ACPI : EC: EC_SC(W) = 0x84
[   40.476154] ACPI : EC: ~~~> IRQ
[   40.476164] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[   40.476169] ACPI : EC: EC_DATA(R) = 0x0d

I was thinking it was caused by my cleanup.
But I checked my code, there wasn't something wrong.

It seems there is no _Q0D method in your ACPI namespace.
Which seems like a firmware bug.
Could you upload acpidump of your platform to confirm?

Thanks
Comment 60 Steffen Weber 2014-06-11 13:49:07 UTC
Created attachment 139101 [details]
HP EliteBook 840 G1 acpidump
Comment 61 Steffen Weber 2014-06-11 13:53:39 UTC
(In reply to Lv Zheng from comment #57)
> [...]
> 
> There is simply something delays between EC transactions.
> You need to follow Comment 33 to perform a test to see if it is abnormal.
> Maybe this is just caused by some Stall/Sleep AML code.

I've applied the patches from comment #33 to Linux 3.15.0 but compiling them fails. Could you please provide updated patches or tell me which Linux version they've been created for?

  CC      drivers/acpi/acpica/utglobal.o
In file included from drivers/acpi/acpica/accommon.h:59:0,
                 from drivers/acpi/acpica/utglobal.c:48:
drivers/acpi/acpica/acglobal.h:174:62: error: macro "ACPI_INIT_GLOBAL" requires 3 arguments, but only 2 given
 const char *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_method, NULL);
                                                              ^
drivers/acpi/acpica/acglobal.h:175:77: error: macro "ACPI_INIT_GLOBAL" requires 3 arguments, but only 2 given
 union acpi_operand_object *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_object, NULL);
                                                                             ^
drivers/acpi/acpica/acglobal.h:175:28: error: conflicting types for ‘ACPI_INIT_GLOBAL’
 union acpi_operand_object *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_object, NULL);
                            ^
drivers/acpi/acpica/acglobal.h:174:13: note: previous declaration of ‘ACPI_INIT_GLOBAL’ was here
 const char *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_method, NULL);
             ^
scripts/Makefile.build:318: recipe for target 'drivers/acpi/acpica/utglobal.o' failed
make[3]: *** [drivers/acpi/acpica/utglobal.o] Error 1
scripts/Makefile.build:465: recipe for target 'drivers/acpi/acpica' failed
make[2]: *** [drivers/acpi/acpica] Error 2
scripts/Makefile.build:465: recipe for target 'drivers/acpi' failed
make[1]: *** [drivers/acpi] Error 2
Makefile:878: recipe for target 'drivers' failed
make: *** [drivers] Error 2
Comment 62 Lv Zheng 2014-06-11 14:11:26 UTC
(In reply to Steffen Weber from comment #61)
> (In reply to Lv Zheng from comment #57)
> > [...]
> > 
> > There is simply something delays between EC transactions.
> > You need to follow Comment 33 to perform a test to see if it is abnormal.
> > Maybe this is just caused by some Stall/Sleep AML code.
> 
> I've applied the patches from comment #33 to Linux 3.15.0 but compiling them
> fails. Could you please provide updated patches or tell me which Linux
> version they've been created for?
> 
>   CC      drivers/acpi/acpica/utglobal.o
> In file included from drivers/acpi/acpica/accommon.h:59:0,
>                  from drivers/acpi/acpica/utglobal.c:48:
> drivers/acpi/acpica/acglobal.h:174:62: error: macro "ACPI_INIT_GLOBAL"
> requires 3 arguments, but only 2 given
>  const char *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_method, NULL);
>                                                               ^
> drivers/acpi/acpica/acglobal.h:175:77: error: macro "ACPI_INIT_GLOBAL"
> requires 3 arguments, but only 2 given
>  union acpi_operand_object *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_object,
> NULL);
>                                                                             
> ^
> drivers/acpi/acpica/acglobal.h:175:28: error: conflicting types for
> ‘ACPI_INIT_GLOBAL’
>  union acpi_operand_object *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_object,
> NULL);
>                             ^
> drivers/acpi/acpica/acglobal.h:174:13: note: previous declaration of
> ‘ACPI_INIT_GLOBAL’ was here
>  const char *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_method, NULL);
>              ^
> scripts/Makefile.build:318: recipe for target
> 'drivers/acpi/acpica/utglobal.o' failed
> make[3]: *** [drivers/acpi/acpica/utglobal.o] Error 1
> scripts/Makefile.build:465: recipe for target 'drivers/acpi/acpica' failed
> make[2]: *** [drivers/acpi/acpica] Error 2
> scripts/Makefile.build:465: recipe for target 'drivers/acpi' failed
> make[1]: *** [drivers/acpi] Error 2
> Makefile:878: recipe for target 'drivers' failed
> make: *** [drivers] Error 2

OK.
So please file a new bug of 1.5 unknown delay for your platform.
Let us stick on EC bugs in this thread.
I may also need your help.
The dmesg you've posted doesn't seem to be good and I improved my patchset, so need you post a dmesg again for me to confirm the improvements are valid.

Thanks
Comment 63 Lv Zheng 2014-06-11 14:14:21 UTC
I purified the patchset:

1. splits the 2 race condition fixes into seperate patches. They might be the root causes of the 6 second delay.
2. improves the last patch, allowing command to be inserted between 2 QR_EC. Steffen's dmesg showed a case that if OSPM doesn't have a handler for the EC notification, we should still reserve chances for commands to be executed.

Let me post the patches and write some testing requests.
Comment 64 Lv Zheng 2014-06-11 14:37:51 UTC
(In reply to Steffen Weber from comment #60)
> Created attachment 139101 [details]
> HP EliteBook 840 G1 acpidump

The grep result of decompiled .dsl files showed no _Q0D implemented:
                    Method (_Q03, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q04, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q05, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q06, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q08, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q0A, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q09, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q0B, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q0C, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q18, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q19, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q20, 0, NotSerialized)  // _Qxx: EC Query
                    Method (_Q21, 0, NotSerialized)  // _Qxx: EC Query
Comment 65 Lv Zheng 2014-06-11 14:39:42 UTC
Created attachment 139121 [details]
[PATCH 01/11] ACPICA: Events: Fix edge-triggered GPE by disabling before acknowledging it.

No content updates for the original patch.
Comment 66 Lv Zheng 2014-06-11 14:41:10 UTC
Created attachment 139131 [details]
[PATCH 02/11] ACPI/EC: Add detailed fields debugging support of EC_SC(R).

Register field debugging is protected by new macro in order not to trigger regressions.
Comment 67 Lv Zheng 2014-06-11 14:42:54 UTC
Created attachment 139141 [details]
[PATCH 03/11] ACPI/EC: Fix an issue that advance_transaction() processes staled status.

The race condition fix is split from original patch 4.
This might be one of the root causes that have caused this 6s-delay issue.
Comment 68 Lv Zheng 2014-06-11 14:44:21 UTC
Created attachment 139151 [details]
[PATCH 04/11] ACPI/EC: Add reference counting for query handlers.

An improvement to reduce useless dynamic allocation.
Comment 69 Lv Zheng 2014-06-11 14:45:56 UTC
Created attachment 139161 [details]
[PATCH 05/11] ACPI/EC: Add asynchronous command byte write support.

The updated patch 4.
The 1st race condition fix is removed from this patch.
This is a preparation for the 2nd race condition fix.
Comment 70 Lv Zheng 2014-06-11 14:47:11 UTC
Created attachment 139171 [details]
[PATCH 06/11] ACPI/EC: Remove duplicated ec_wait_ibf0() waiter.

Removing some design entropy so that the 2nd race condition fix can be obvious and easier.
Comment 71 Lv Zheng 2014-06-11 14:48:38 UTC
Created attachment 139181 [details]
[PATCH 07/11] ACPI/EC: Fix a race condition in ec_transaction_completed().

Another race condition fix.
This might be one of the root causes that have caused the 6s-delay issue.
Comment 72 Lv Zheng 2014-06-11 14:49:52 UTC
Created attachment 139191 [details]
[PATCH 08/11] ACPI/EC: Move "struct transaction" to drivers/acpi/internal.h.

The preparation of the last improvement.
Comment 73 Lv Zheng 2014-06-11 14:54:18 UTC
Created attachment 139201 [details]
[PATCH 09/11] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support.

Unless necessary, drivers should try to utilize the interrupt context and the user process context.
There should only be limited cases a driver need to create new context.

This patch is improved to handle the case in the dmesg posted by Steffan.
If a _Qxx method is not provided by the BIOS, we can see one line warning message and the normal command transactions will not be starving due to the unhandled EVT_SCI.
Comment 74 Lv Zheng 2014-06-11 14:55:52 UTC
Created attachment 139211 [details]
[PATCH 10/11] ACPI/EC: Enable debbuging messages.

EC debugging patch need to be updated due to the updates in the patch 2.
Comment 75 Lv Zheng 2014-06-11 14:56:58 UTC
Created attachment 139221 [details]
[PATCH 11/11] ACPI/EC: Enable detailed register bit fields debugging.

To utilize new debugging facility.
Comment 76 Lv Zheng 2014-06-11 15:03:34 UTC
Here is the test request for Gareth Williams:
1. Please apply the following patches to the latest kernel that appears to bootup slowly:
   attachment 139121 [details]
   attachment 139131 [details]
   attachment 139141 [details]
   attachment 139151 [details]
   attachment 139161 [details]
   attachment 139171 [details]
   attachment 139181 [details]
   attachment 139211 [details]
   attachment 139221 [details]
2. Using the kernel config that can reproduce the slow bootup to build the kernel;
3. Please post dmesg here.
I need to confirm if 139141 and 139181 fixes can eliminate the 6s-delay.

Here is the test request for Steffen Weber:
1. Please apply the following patches to the latest kernel:
   attachment 139121 [details]
   attachment 139131 [details]
   attachment 139141 [details]
   attachment 139151 [details]
   attachment 139161 [details]
   attachment 139171 [details]
   attachment 139181 [details]
   attachment 139191 [details]
   attachment 139201 [details]
   attachment 139211 [details]
   attachment 139221 [details]
2. Using the kernel config that can reproduce EVT_SCI storm to build the kernel;
3. Please post dmesg here.
I need to confirm if the improvement made in 139201 can effectively allowing commands to be executed along with the EVT_SCI storm.

Thanks in advance.
Comment 77 Steffen Weber 2014-06-11 16:17:50 UTC
(In reply to Lv Zheng from comment #62)
> So please file a new bug of 1.5 unknown delay for your platform.
> Let us stick on EC bugs in this thread.

I'll do that later today, thanks!

(In reply to Lv Zheng from comment #76)
> Here is the test request for Steffen Weber:
> 1. Please apply the following patches to the latest kernel:
>    attachment 139121 [details]
>    attachment 139131 [details]
>    attachment 139141 [details]
>    attachment 139151 [details]
>    attachment 139161 [details]
>    attachment 139171 [details]
>    attachment 139181 [details]
>    attachment 139191 [details]
>    attachment 139201 [details]
>    attachment 139211 [details]
>    attachment 139221 [details]
> 2. Using the kernel config that can reproduce EVT_SCI storm to build the
> kernel;
> 3. Please post dmesg here.
> I need to confirm if the improvement made in 139201 can effectively allowing
> commands to be executed along with the EVT_SCI storm.

I've applied the patches to Linux 3.15.0 but got the following compile error:

  CC      drivers/acpi/ec.o
drivers/acpi/ec.c: In function ‘complete_transaction’:
drivers/acpi/ec.c:196:9: error: ‘struct acpi_ec’ has no member named ‘ctrl’
       ec->ctrl.command == ACPI_EC_COMMAND_QUERY) {
         ^
drivers/acpi/ec.c:197:18: error: ‘struct acpi_ec’ has no member named ‘pending_query’
    query_bit = ec->pending_query;
                  ^
drivers/acpi/ec.c: In function ‘prepare_control_transaction’:
drivers/acpi/ec.c:225:12: error: ‘struct acpi_ec’ has no member named ‘ctrl’
  memset(&ec->ctrl, 0, sizeof (struct acpi_ec_transaction));
            ^
drivers/acpi/ec.c:226:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’
  ec->ctrl.command = command;
    ^
drivers/acpi/ec.c:227:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’
  ec->ctrl.rdata = rdata;
    ^
drivers/acpi/ec.c:228:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’
  ec->ctrl.rlen = rlen;
    ^
drivers/acpi/ec.c: In function ‘advance_transaction’:
drivers/acpi/ec.c:256:10: error: ‘struct acpi_ec’ has no member named ‘ctrl’
   t = &ec->ctrl;
          ^
drivers/acpi/ec.c:259:14: error: ‘struct acpi_ec’ has no member named ‘pending_query’
           &ec->pending_query, 1);
              ^
scripts/Makefile.build:318: recipe for target 'drivers/acpi/ec.o' failed
make[2]: *** [drivers/acpi/ec.o] Error 1
scripts/Makefile.build:465: recipe for target 'drivers/acpi' failed
make[1]: *** [drivers/acpi] Error 2
Makefile:878: recipe for target 'drivers' failed
make: *** [drivers] Error 2
Comment 78 Lv Zheng 2014-06-11 17:31:04 UTC
Created attachment 139231 [details]
[PATCH 09/11] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support.

Sorry, I made a mistake when refreshing patches with quilt.
Please use this instead of attachment 139201 [details] and try again.
Comment 79 Lv Zheng 2014-06-11 17:33:14 UTC
(In reply to Steffen Weber from comment #77)
> (In reply to Lv Zheng from comment #62)
> 
> I've applied the patches to Linux 3.15.0 but got the following compile error:
> 
>   CC      drivers/acpi/ec.o
> drivers/acpi/ec.c: In function ‘complete_transaction’:
> drivers/acpi/ec.c:196:9: error: ‘struct acpi_ec’ has no member named ‘ctrl’
>        ec->ctrl.command == ACPI_EC_COMMAND_QUERY) {
>          ^
> drivers/acpi/ec.c:197:18: error: ‘struct acpi_ec’ has no member named
> ‘pending_query’
>     query_bit = ec->pending_query;
>                   ^
> drivers/acpi/ec.c: In function ‘prepare_control_transaction’:
> drivers/acpi/ec.c:225:12: error: ‘struct acpi_ec’ has no member named ‘ctrl’
>   memset(&ec->ctrl, 0, sizeof (struct acpi_ec_transaction));
>             ^
> drivers/acpi/ec.c:226:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’
>   ec->ctrl.command = command;
>     ^
> drivers/acpi/ec.c:227:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’
>   ec->ctrl.rdata = rdata;
>     ^
> drivers/acpi/ec.c:228:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’
>   ec->ctrl.rlen = rlen;
>     ^
> drivers/acpi/ec.c: In function ‘advance_transaction’:
> drivers/acpi/ec.c:256:10: error: ‘struct acpi_ec’ has no member named ‘ctrl’
>    t = &ec->ctrl;
>           ^
> drivers/acpi/ec.c:259:14: error: ‘struct acpi_ec’ has no member named
> ‘pending_query’
>            &ec->pending_query, 1);
>               ^
> scripts/Makefile.build:318: recipe for target 'drivers/acpi/ec.o' failed
> make[2]: *** [drivers/acpi/ec.o] Error 1
> scripts/Makefile.build:465: recipe for target 'drivers/acpi' failed
> make[1]: *** [drivers/acpi] Error 2
> Makefile:878: recipe for target 'drivers' failed
> make: *** [drivers] Error 2

Here is the updated test request:
1. Please apply the following patches to the latest kernel:
   attachment 139121 [details]
   attachment 139131 [details]
   attachment 139141 [details]
   attachment 139151 [details]
   attachment 139161 [details]
   attachment 139171 [details]
   attachment 139181 [details]
   attachment 139191 [details]
   attachment 139231 [details]
   attachment 139211 [details]
   attachment 139221 [details]
2. Using the kernel config that can reproduce EVT_SCI storm to build the kernel;
3. Please post dmesg here.

Thanks in advance.
Comment 80 Steffen Weber 2014-06-11 18:02:42 UTC
Created attachment 139241 [details]
dmesg of Linux 3.15.0 + 11 patches

The patched kernel took a long time to boot, hope it helps! :)
Comment 81 Steffen Weber 2014-06-11 18:16:23 UTC
> So please file a new bug of 1.5 unknown delay for your platform.

I've just filed bug #77681.
Comment 82 Lv Zheng 2014-06-11 21:35:52 UTC
(In reply to Steffen Weber from comment #80)
> Created attachment 139241 [details]
dmesg of Linux 3.15.0 + 11 patches

The
> patched kernel took a long time to boot, hope it helps! :)

Does the previous one - "comment 55" boot?
And can you work in the booted system without seeing any siginificant "no response" period?
Comment 83 Lv Zheng 2014-06-11 21:37:22 UTC
(In reply to Steffen Weber from comment #81)
> > So please file a new bug of 1.5 unknown delay for your platform.

I've just filed bug #77681.

OK.
I'll rebase the patches and ask you again there after finishing necessary EC work.
Comment 84 Lv Zheng 2014-06-11 21:43:47 UTC
(In reply to Steffen Weber from comment #80)
> Created attachment 139241 [details]
dmesg of Linux 3.15.0 + 11 patches

The
> patched kernel took a long time to boot, hope it helps! :)

Maybe I didn't understand you correctly.
Which one do you mean in comment 55:
1. The patchset has improved the system bootup time.
2. The patchset has prevented system from response correctly.
I posted 2nd 11 patches revision because I was thinking you meant 2.
If you meant 1, then the query 0x0d logs could be ignored and I should revert changes back to the previous revision. :-)

Could you also do such an experiment:
1. Please only apply the following patch to the tested kernel:
   attachment 128791 [details]
   This is one line patch, if you failed to apply, try to edit the drivers/acpi/ec.c file yourself.
2. Using the kernel config that can reproduce EVT_SCI storm to build the kernel;
3. Please post dmesg here.
Comment 85 Steffen Weber 2014-06-12 07:40:41 UTC
In comment 55 I applied the 9 patches from comment 54. The kernel did boot up. IIRC it was about as fast as without the patches but I just saved dmesg and rebooted into the unpatched kernel after 1-2 minutes, so I cannot say that with certainty.

Yesterday in comment 80 I applied the patches from comment 79. The kernel did boot up, but the boot process took a very long time (e.g. the rootfs was mounted after 39 seconds instead of after 6 seconds, see dmesg).

So both patchsets have not improved system bootup time. The 1.5 second delay exists with both patchsets. I don't know about any other delays in my boot process that could be improved.
Comment 86 Steffen Weber 2014-06-12 07:41:22 UTC
Created attachment 139381 [details]
dmesg of Linux 3.15.0 without any patches but DEBUG enabled
Comment 87 Lv Zheng 2014-06-12 07:51:56 UTC
(In reply to Steffen Weber from comment #86)
> Created attachment 139381 [details]
> dmesg of Linux 3.15.0 without any patches but DEBUG enabled

Hi, I didn't see 0x0d query in this dmesg.
Looks not like to be generated by the same BIOS.
Could you only try these patches, without patching the last 2:
   attachment 139121 [details]
   attachment 139131 [details]
   attachment 139141 [details]
   attachment 139151 [details]
   attachment 139161 [details]
   attachment 139171 [details]
   attachment 139181 [details]
   attachment 139211 [details]
   attachment 139221 [details]
and upload the dmsg here?

Thanks in advance.
Comment 88 Steffen Weber 2014-06-12 08:26:40 UTC
Created attachment 139401 [details]
dmesg of Linux 3.15.0 with patches 1+2+3+4+5+6+7+10+11
Comment 89 Steffen Weber 2014-06-12 08:36:13 UTC
I forgot that yesterday evening while debugging a Wi-Fi problem I changed the kernel configuration slightly and enabled CONFIG_HP_WMI and CONFIG_ACPI_WMI. Maybe this had an impact?
Comment 90 Lv Zheng 2014-06-12 11:21:25 UTC
(In reply to Steffen Weber from comment #88)
> Created attachment 139401 [details]
> dmesg of Linux 3.15.0 with patches 1+2+3+4+5+6+7+10+11

There is nothing wrong in this patch.
Comment 91 Lv Zheng 2014-06-12 11:36:36 UTC
(In reply to Steffen Weber from comment #85)
> In comment 55 I applied the 9 patches from comment 54. The kernel did boot
> up. IIRC it was about as fast as without the patches but I just saved dmesg
> and rebooted into the unpatched kernel after 1-2 minutes, so I cannot say
> that with certainty.

The situation is:
In attachment 138961 [details]:
This is the last command:
[   11.302053] ACPI : EC: transaction start (cmd=0x80, addr=0x86)
[   11.302054] ACPI : EC: ~~~> TASK
[   11.302058] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[   11.302059] ACPI : EC: EC_SC(W) = 0x84
[   11.302061] ACPI : EC: transaction paused
It is blocked due to the EVT_SCI=1, Q=0x0D storm.
So all EC related functionality should be failed from then on.

> Yesterday in comment 80 I applied the patches from comment 79. The kernel
> did boot up, but the boot process took a very long time (e.g. the rootfs was
> mounted after 39 seconds instead of after 6 seconds, see dmesg).

But in attachment 139241 [details]:
EC commands can be inserted between QR_EC (EC_SC(W) = 0x84) from this line:
[    1.842566] ACPI : EC: transaction start (cmd=0x81, addr=0xd9)
[    1.842569] ACPI : EC: ~~~> TASK
[    1.842572] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[    1.842573] ACPI : EC: EC_SC(W) = 0x84
[    1.842576] ACPI : EC: transaction paused
[    1.843555] ACPI : EC: ~~~> IRQ
[    1.843559] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1
[    1.843562] ACPI : EC: EC_DATA(R) = 0x0d
[    1.843564] ACPI : EC: no handler for query (0x d)
[    4.454375] ACPI : EC: ~~~> IRQ
[    4.454381] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[    4.454382] ACPI : EC: EC_SC(W) = 0x81
[    4.454404] ACPI : EC: transaction unpaused
[    4.455390] ACPI : EC: ~~~> IRQ
[    4.455396] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0
[    4.455397] ACPI : EC: EC_DATA(W) = 0xd9
[    4.455432] ACPI : EC: ~~~> TASK
[    4.455435] ACPI : EC: EC_SC(R) = 0x22 SCI_EVT=1 BURST=0 CMD=0 IBF=1 OBF=0
[    4.456416] ACPI : EC: ~~~> IRQ
[    4.456422] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[    4.456423] ACPI : EC: EC_DATA(W) = 0x14
[    4.456443] ACPI : EC: ~~~> TASK
[    4.456446] ACPI : EC: EC_SC(R) = 0x22 SCI_EVT=1 BURST=0 CMD=0 IBF=1 OBF=0
[    4.457435] ACPI : EC: ~~~> IRQ
[    4.457441] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[    4.457463] ACPI : EC: transaction end
[    4.457489] ACPI : EC: transaction start (cmd=0x81, addr=0xda)
[    4.457490] ACPI : EC: ~~~> TASK
[    4.457493] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[    4.457494] ACPI : EC: EC_SC(W) = 0x84
[    4.457497] ACPI : EC: transaction paused
Which looks good.
So I want to without applying this patchset, how old kernel survives the EVT_SCI=1 Q=0x0d storm.

> So both patchsets have not improved system bootup time. The 1.5 second delay
> exists with both patchsets. I don't know about any other delays in my boot
> process that could be improved.

Your issue is not related to EC, this patchset is supposed to fix the issue reported by Gareth.
I asked you to test here, because you can improve my patchset to survive the EVT_SCI=1 Q=0x0d storm.
But in attachment 139381 [details], I couldn't see the storm again. :-(
Can you redo the test mentioned in the comment 84?
Please try to find old configuration that can trigger EVT_SCI=1 Q=0x0d storm.
You should be able to find logs like:
[   18.302676] ACPI : EC: push gpe query to the queue
[   18.302685] ACPI : EC: <--- command = 0x84
[   18.303692] ACPI : EC: push gpe query to the queue
!!! [   18.303693] ACPI : EC: push query execution (0x d) on queue
[   18.303696] ACPI : EC: start query execution
[   18.303773] ACPI : EC: stop query execution

Please make sure that there are many lines like the one marked by "!!!" and then post the useful dmesg here.

Thanks in advance.
Comment 92 Lv Zheng 2014-06-12 11:39:22 UTC
(In reply to Lv Zheng from comment #90)
> (In reply to Steffen Weber from comment #88)
> > Created attachment 139401 [details]
> > dmesg of Linux 3.15.0 with patches 1+2+3+4+5+6+7+10+11
> 
> There is nothing wrong in this patch.

And in order to confirm the storm was not triggered by my patches.
Please use this configuration, and try the whole 11 patches again.
Post the dmesg here.

I'll wait and pray that an OSPM change will not force the EC firmware to flood Q=0x0d queries.
Comment 93 Lv Zheng 2014-06-12 11:46:54 UTC
(In reply to Lv Zheng from comment #87)
> (In reply to Steffen Weber from comment #86)
> > Created attachment 139381 [details]
> > dmesg of Linux 3.15.0 without any patches but DEBUG enabled
> 
> Hi, I didn't see 0x0d query in this dmesg.
> Looks not like to be generated by the same BIOS.
> Could you only try these patches, without patching the last 2:
>    attachment 139121 [details]
>    attachment 139131 [details]
>    attachment 139141 [details]
>    attachment 139151 [details]
>    attachment 139161 [details]
>    attachment 139171 [details]
>    attachment 139181 [details]
>    attachment 139211 [details]
>    attachment 139221 [details]
> and upload the dmsg here?

Ping Gareth!!!!
Could you give these patches a try?
I've started review the process for these patches.
If they couldn't fix your issue, your issue might be a firmware bug that cannot be fixed as we didn't find other serious EC state machine problems in the current code.
Please let us know the result so that we can see the root cause clearly.
Comment 94 Gareth Williams 2014-06-12 12:12:31 UTC
Created attachment 139421 [details]
attachment-26689-0.html

I'm sorry I've not replied sooner, but I didn't want to spam this bug
with my apologies.



 



I've been away from the offending HP Pavilion laptop for a few days
(work related) but have been monitoring the action on webmail and/or
smartphone, feeling rather redundant.



 



I've now created a Digital Ocean VPS droplet which is currently
building the kernel for me (with the patches and also the relevant
.config from the Arch distro I'm running on the HP laptop). 
Hopefully, that will build soon and I'll be able to test it on the HP
tonight (UK time).



 




At Thursday, 12-06-2014 on 12:46 bugzilla-daemon@bugzilla.kernel.org
wrote:




https://bugzilla.kernel.org/show_bug.cgi?id=70891

--- Comment #93 from Lv Zheng  ---
(In reply to Lv Zheng from comment #87)
> (In reply to Steffen Weber from comment #86)
> > Created attachment 139381 [details]
> > dmesg of Linux 3.15.0 without any patches but DEBUG enabled
> 
> Hi, I didn't see 0x0d query in this dmesg.
> Looks not like to be generated by the same BIOS.
> Could you only try these patches, without patching the last 2:
>    attachment 139121 [details]
>    attachment 139131 [details]
>    attachment 139141 [details]
>    attachment 139151 [details]
>    attachment 139161 [details]
>    attachment 139171 [details]
>    attachment 139181 [details]
>    attachment 139211 [details]
>    attachment 139221 [details]
> and upload the dmsg here?

Ping Gareth!!!!
Could you give these patches a try?
I've started review the process for these patches.
If they couldn't fix your issue, your issue might be a firmware bug
that cannot
be fixed as we didn't find other serious EC state machine problems in
the
current code.
Please let us know the result so that we can see the root cause
clearly.
Comment 95 Steffen Weber 2014-06-12 13:55:29 UTC
Created attachment 139441 [details]
dmesg of Linux 3.15.0 old config with ACPI DEBUG
Comment 96 Steffen Weber 2014-06-12 13:55:59 UTC
Created attachment 139451 [details]
dmesg of Linux 3.15.0 old config with all 11 patches
Comment 97 Steffen Weber 2014-06-12 13:59:43 UTC
I don't see the SCI_EVT=0 storm anymore. Don't know why.
Comment 98 Gareth Williams 2014-06-12 18:59:29 UTC
Created attachment 139481 [details]
dmesg with kernel 3.15 and patches applied

The results look good. 'ACPI: Interpreter enabled' shows up in less than half a second, which is brilliant.

I'll confirm this is the result of the patches by installing a vanilla 3.15 kernel to compare timing.
Comment 99 Gareth Williams 2014-06-12 19:37:20 UTC
Created attachment 139491 [details]
dmesg with kernel 3.15 vanilla

Without the patches there is a 1.1 second delay before the 'ACPI: Interpreter Enabled' message appears.
Comment 100 Gareth Williams 2014-06-12 21:00:32 UTC
Created attachment 139501 [details]
dmesg with kernel 3.15 with ec patches but no debug patches

Early in this whole issue, the kernel would boot quite quickly if kernel debug was enabled.

I've therefore applied the ec patches but not the last two (debug) patches.  The results are promising as 'ACPI: Enable Interpreter' still shows up in under half a second.

I think you've cracked it :-)
Comment 101 Lv Zheng 2014-06-13 00:29:27 UTC
(In reply to Steffen Weber from comment #96)
> Created attachment 139451 [details]
> dmesg of Linux 3.15.0 old config with all 11 patches

Looks great!
So the 39s boot is caused by the storm not the patch.
Comment 102 Lv Zheng 2014-06-13 00:31:33 UTC
(In reply to Steffen Weber from comment #97)
> I don't see the SCI_EVT=0 storm anymore. Don't know why.

OK, so if you could see this again, you could notify me using email:
lv.zheng@intel.com

Actually, I'm appreciated your help because this case did expose an issue in my patch.

Thanks!
Comment 103 Lv Zheng 2014-06-13 00:32:49 UTC
(In reply to Gareth Williams from comment #99)
> Created attachment 139491 [details]
> dmesg with kernel 3.15 vanilla
> 
> Without the patches there is a 1.1 second delay before the 'ACPI:
> Interpreter Enabled' message appears.

Great!
I'll add tested-by into the patches.

Thanks for the reporting and testing.
Comment 104 Lv Zheng 2014-06-13 06:56:51 UTC
Hi,

Bug-fixes patches are sent to the upstream with an additional improvement done for the attachment 139181 [details] since the published patchset doesn't include the asynchronous EVT_SCI and the reference counting improvements.
The improvements are going to be sent in another patchset.

Marking this bug as RESOLVED.
Comment 105 Gareth Williams 2014-06-13 14:50:54 UTC
Many thanks for all of your help/work.  I'll wait for this to filter through to Fedora and Arch.  I'll also anotate the downstream bug.
Thanks again :-)
Gareth
Comment 106 Lv Zheng 2014-07-08 04:18:37 UTC
Closing due to the merging of this patcheset.
It is now in linux-pm.git/bleeding-edge branch, and will appear in linux.git v3.16-rc5.

Note You need to log in before you can comment on or make changes to this bug.