Hi, On bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1055228 , Gareth reported 6 second boot delay on 3.12 kernel. Delay does not happen if acpi_no_auto_ssdt option is used, nor if kernel is compiled with various debug facilities like LOCKDEP (fedora kernel-debug variant). We tried to compile kernel with just CONFIG_ACPI_DEBUG=y, on that kernel issue was reproducible, but after enable printing messages, again issue disappeared (we tried acpi.debug_layer=0xffffffff together with acip.debug_level=0x0307ff4f and acip.debug_level=0x03000000, links to logs are provided in bugzilla). On affected kernels interesting dmesg part looks like below: [ 0.212035] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored [ 0.217930] ACPI: SSDT 0000000092e6ac18 003D3 (v01 PmRef Cpu0Cst 00003001 INTL 20121220) [ 0.218512] ACPI: Dynamic OEM Table Load: [ 0.218514] ACPI: SSDT (null) 003D3 (v01 PmRef Cpu0Cst 00003001 INTL 20121220) [ 0.222109] ACPI: SSDT 0000000092e6a618 005AA (v01 PmRef ApIst 00003000 INTL 20121220) [ 0.222792] ACPI: Dynamic OEM Table Load: [ 0.222799] ACPI: SSDT (null) 005AA (v01 PmRef ApIst 00003000 INTL 20121220) [ 0.226920] ACPI: SSDT 0000000092e69d98 00119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 0.227495] ACPI: Dynamic OEM Table Load: [ 0.227497] ACPI: SSDT (null) 00119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 6.812869] ACPI: Interpreter enabled With acpi_no_auto_ssdt option interesting part looks like this: [ 0.203230] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored [ 0.203595] ACPI Error: [SGMD] Namespace lookup failure, AE_NOT_FOUND (20130725/psargs-359) [ 0.203601] ACPI Error: Method parse/execution failed [\_SB_._OSC] (Node ffff880253859438), AE_NOT_FOUND (20130725/psparse-536) [ 0.206093] ACPI Error: [\_PR_.CPU0._PPC] Namespace lookup failure, AE_NOT_FOUND (20130725/psargs-359) [ 0.206102] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_.SASU] (Node ffff8802538570a0), AE_NOT_FOUND (20130725/psparse-536) [ 0.206108] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_.ECMI] (Node ffff880253857078), AE_NOT_FOUND (20130725/psparse-536) [ 0.206113] ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPCB.EC0_._REG] (Node ffff880253856910), AE_NOT_FOUND (20130725/psparse-536) [ 0.206122] Fail in evaluating the _REG object of EC device. Broken bios is suspected. [ 0.206127] ACPI: Interpreter enabled Any help to solving this bug would be appreciate. I'm CCing Gareth, he can provide more informations if needed.
Created attachment 126811 [details] acpidump output As requested, I've attached an acpidump output. I hope it's in the correct format. I've downgraded the kernel as far as 3.10 and this issue is still present. I'll have to compile a kernel if you want me to go any earlier.
The problem is caused by the dynamic reloading of the same table during initialization. The code is not well tested in such an environment. Let me do some local testing and request you to validate after that. Thanks for your information.
Created attachment 127221 [details] bugzilla.kernel.org 70891 debugging for initialization processes. This seems not to be a table loading issue, but an EC operation region issue. Let's first try to figure out which ACPI initialization process has caused this issue. Please: 1. apply this patch to the latest kernel, 2. build the kernel using the configuration that caused this issue, 3. boot the built kernel without acpi_no_auto_ssdt, 4. if there is still 6 seconds delayed in the startup process, post the dmesg output here. Thanks.
And let me post some known fixes for current ACPICA table management implementation. You may give them a try to see if they can fix the issue.
Created attachment 127231 [details] ACPICA: Remove indent divergences to reduce the difficulties of maintaining the revised ACPICA commits. This is the preparation of the next 2 linuxized ACPICA fixes.
Created attachment 127241 [details] ACPICA: Tables: Fix the issues in handling virtual addressed tables. This is a fix related to the dynamic loaded tables.
Created attachment 127251 [details] ACPICA: Tables: Fix the issue that multiple ACPI_FREE()s are invoked for same pointer around acpi_tb_add_table(). This is another fix related to the dynamic table loading.
Here is the test requests about the last 3 patches. Please: 1. apply the following patches to the latest kernel, "Attachment 127231 [details]" "Attachment 127241 [details]" "Attachment 127251 [details]" 2. build the kernel using the configuration that caused this issue, 3. boot the built kernel without acpi_no_auto_ssdt, 4. post the dmesg output here. It might be able to fix the issue if the issue is related to the dynamic table loading. But since the issue might be caused by EC opregion not dynamic table loading, the test result of "Comment 3" is still important. Thanks.
The dmesg logs are large therefore are tar.gz'd and available at:- https://dl.dropboxusercontent.com/u/90149727/bugzilla-70891-1.tar.gz I've added a few extra dmesg from various kernels (without debug and without the patches - so they are probably useless!) as the delay varies wildly between 3.12 and 3.13 kernels and also between cold and warm boots. If you need me to try any other combination (eg 3.14 kernel or vanilla 3.13), just give me a shout.
Let me shows the result in your test here: arch-3.12.9-2.cold.txt: [ 0.314279] ACPI: SSDT (null) 00119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 7.411438] ACPI: Interpreter enabled arch-3.12.9-2.warm.txt: [ 0.314278] ACPI: SSDT (null) 00119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 7.418568] ACPI: Interpreter enabled fed20-3.13.3-201.cold.txt: [ 0.232446] ACPI: SSDT (null) 000119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 0.309612] ACPI: Interpreter enabled fed20-3.13.3-201.warm.txt: [ 0.232643] ACPI: SSDT (null) 000119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 1.311722] ACPI: Interpreter enabled arch-3.13.5-1.cold.txt: [ 0.320782] ACPI: SSDT (null) 000119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 4.478035] ACPI: Interpreter enabled arch-3.13.5-1.patched.cold.txt: [ 0.321213] ACPI: SSDT (null) 000119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 1.481476] ACPI: Interpreter enabled arch-3.13.5-1.patched.warm.txt [ 0.320909] ACPI: SSDT (null) 000119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 1.937851] ACPI: Interpreter enabled The dmesg files are named as follows: dist-ver[.patched][.debug].boot.txt From your README, I can learn: 1. dist: arch - Arch Linux (packaged and compiled) fed20 - Fedora 20 (packaged kernel) 2. ver: the kernel version 3. debug: boot cmd line: acpi.debug_layer=0xffffffff acpi.debug_level=0x03000000 log_buf_len=64M 4. boot: warm/cold - cold boot or reboot (there seems to be a difference in delay depending on how the system was booted) I'm curious about what the "patched" means. In the ".patched" dmesgs, the boot time seems to be improved from 4.3sec to 1.1sec by what you've patched. So what does the "patched" mean? It is even faster (0.1sec) in the dmesg of booting with the fedora kernel. So what's the .config of this kernel? I didn't examine the ACPICA debugging enabled dmesg outputs since. I'll try to look deeper into it and decode the DSDT to see what have been executed in the failed control methods.
By 'patched', I meant that the Arch kernel was rebuilt with the three patches you provided in comment 8. The config for the Fedora 20 kernel is:- https://dl.dropboxusercontent.com/u/90149727/config-3.13.3-201.fc20.x86_64 I've just copied the Fedora 20 config to the Arch and compiled the kernel. Just as with the default Fedora 20 kernel, this new kernel displays a faster boot time. (I then applied the three patches above, but that slowed down the boot marginally). I've ran diffconfig (from the kernel source's scripts directory) against the original Fedora and Arch configs, but the result is huge. I'm certain most of those are irelevant, but it will take some time to weed out the irelevant ones and find the possible culprit. I could start by concentrating on the config entries with certain keywords (eg ACPI). If you could suggest which ones would be best, it may help. As this issue was initially reported as a Fedora bug, but seems to have improved considerably since Fedora released a new kernel, is this still a kernel bug or an Arch packaging bug? I'd hate to waste your time on an packaging bug!
H(In reply to Gareth Williams from comment #11) > By 'patched', I meant that the Arch kernel was rebuilt with the three > patches you provided in comment 8. The problem seems not related to dynamic table loading. > I could start by concentrating on the config entries with certain keywords > (eg ACPI). If you could suggest which ones would be best, it may help. > > As this issue was initially reported as a Fedora bug, but seems to have > improved considerably since Fedora released a new kernel, is this still a > kernel bug or an Arch packaging bug? I'd hate to waste your time on an > packaging bug! That's OK. The bug reports are always welcomed as they can help to improve software. Let me wrote something about what I've investigated. From the debugging message of the "context switch" that is added by you, I can see acpi hotplug is triggered during the acpi objects initialization. I'm not sure if this is an issue and can light shadows. By decoding the DSDT and exmining in it, I found there are many things (graphics, battery) invoked in the _REG of EC operation region. But the _REG of EC opregion is not invoked in acpi_initialize_objects(), but in acpi_boot_ec_enable(). So this might not be an ACPICA bug. IMO, mutex debugging information might not be needed in this case. As it's hardly for us to determine whether the mutex acquiring/releasing messages are caused by the initilization steps (there will be plenty of such messages) or by the dead locks. I'd prefer to ask you do some control method instrumentation to figure out which control method has consumed the time most. I'll post a debugging patch and write some test requests later.
Created attachment 127631 [details] ACPICA: Dispatcher: Add method execution instrumentation support. This patch can be used to trace control method execution begin/end time.
Here is the test request: 1. Please apply the following patches to the kernel that appears to bootup slowly: attachment 127221 [details] attachment 127631 [details] 2. Using the kernel config that can reproduce the slow bootup to build the kernel 3. Boot the built kernel and post dmesg here. If the issue disppeared, please unapply the 2 patches and only apply the following patch and try again: attachment 127221 [details] Thanks in advance.
Created attachment 127671 [details] Dmesg with instrumentation patch Dmesg with instrumentation patch applied. That took 10 seconds to boot the kernel.
Hi, Let me post some investigation result. According to the following log entries: [ 0.264004] ACPI: acpi_initialize_objects executing... [ 0.338131] ACPI: acpi_boot_ec_enable executing... [ 0.338187] ACPI: Begin method [0xffff88025384cf78:\_SB_.PCI0.LPCB.EC0_._REG] execution. [ 0.338204] ACPI: Begin method [0xffff88025384ce10:\_SB_.PCI0.LPCB.EC0_.GBAS] execution. [ 0.338508] ACPI: Begin method [0xffff88025384c438:\_SB_.PCI0.LPCB.EC0_.ECMI] execution. [ 0.338519] ACPI: Begin method [0xffff88025384c1f8:\_SB_.PCI0.LPCB.EC0_.OSTD] execution. [ 0.340339] ACPI: Begin method [0xffff88025384c3f0:\_SB_.PCI0.LPCB.EC0_.SASU] execution. [ 0.354250] ACPI: Begin method [0xffff88025384c240:\_SB_.PCI0.LPCB.EC0_.ULID] execution. [ 0.354400] ACPI: Begin method [0xffff88025384c870:\_SB_.PCI0.LPCB.EC0_._Q92] execution. [ 0.354644] ACPI: Begin method [0xffff88025384d3a8:\_SB_.PCI0.ACEL.INIT] execution. [ 0.354649] ACPI: Begin method [0xffff88025384d090:\_SB_.PCI0.ACEL._STA] execution. [ 0.364202] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 0.870765] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 1.377475] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 1.884184] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 2.394203] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 2.900890] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 3.407585] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 3.914244] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 4.420936] ACPI: Begin method [0xffff88025384c5a0:\_SB_.PCI0.LPCB.EC0_.SMRD] execution. [ 4.427698] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 4.934249] ACPI: Begin method [0xffff88025384d3f0:\_SB_.PCI0.ACEL.ADJT] execution. [ 4.934254] ACPI: Begin method [0xffff88025384d090:\_SB_.PCI0.ACEL._STA] execution. [ 4.934385] ACPI: Begin method [0xffff88025384e240:\_SB_.LID0._LID] execution. [ 4.934673] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 5.440903] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 5.950876] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 6.457514] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 6.964135] ACPI: Begin method [0xffff88025384c558:\_SB_.PCI0.LPCB.EC0_.SMWR] execution. [ 7.470766] ACPI: Begin method [0xffff88025384e240:\_SB_.LID0._LID] execution. The delays happen in SMWR, each time it is called, 0.5 seconds will be consumed. The called SMWR is listed below: \_SB_.PCI0.LPCB.EC0_._REG \_SB_.PCI0.LPCB.EC0_.GBAS \_SB_.PCI0.LPCB.EC0_.ECMI \_SB_.PCI0.LPCB.EC0_.OSTD \_SB_.PCI0.LPCB.EC0_.SASU \_SB_.PCI0.LPCB.EC0_.ULID \_SB_.PCI0.LPCB.EC0_._Q92 \_SB_.PCI0.ACEL.INIT \_SB_.PCI0.ACEL._STA \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x20, 0x5F) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x21, 0x32) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x32, 0x0D) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x33, 0x02) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x30, 0x95) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x36, 0x07) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x37, 0x05) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x34, 0x7F) \_SB_.PCI0.LPCB.EC0_.SMRD \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x24, Local0) \_SB_.PCI0.ACEL.ADJT \_SB_.PCI0.ACEL._STA \_SB_.LID0._LID \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x36, 0x13) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x37, 0x02) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x34, 0x2A) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x24, Zero) \_SB_.PCI0.LPCB.EC0_.SMWR (0xC6, 0x50, 0x22, 0x20) \_SB_.LID0._LID The SMWR function is listed below (I deleted the part of ASL since we always have Arg0=0xC6, Arg1=0x50): DefinitionBlock ("dsdt.aml", "DSDT", 1, "HPQOEM", "2163 ", 0x00000000) { Scope (_SB) { Device (PCI0) { Device (LPCB) { Device (EC0) { OperationRegion (ERAM, EmbeddedControl, Zero, 0xFF) Field (ERAM, ByteAcc, Lock, Preserve) { SMPR, 8, SMST, 8, SMAD, 8, SMCM, 8, SMD0, 256, } Field (ERAM, ByteAcc, NoLock, Preserve) { Offset (0x04), SMW0, 16 } Field (ERAM, ByteAcc, NoLock, Preserve) { Offset (0x04), SMB0, 8 } Mutex (MUT0, 0x00) Method (SMWR, 4, NotSerialized) { Acquire (MUT0, 0xFFFF) Store (0x04, Local0) While (LGreater (Local0, One)) { Store (Arg3, SMB0) And (SMST, 0x40, SMST) Store (Arg2, SMCM) Store (Arg1, SMAD) Store (Arg0, SMPR) Store (Zero, Local3) While (LNot (And (SMST, 0xBF, Local1))) { Sleep (0x02) Increment (Local3) If (LEqual (Local3, 0x32)) { And (SMST, 0x40, SMST) Store (Arg2, SMCM) Store (Arg1, SMAD) Store (Arg0, SMPR) Store (Zero, Local3) } } If (LEqual (Local1, 0x80)) { Store (Zero, Local0) } Else { Decrement (Local0) } } If (Local0) { Store (And (Local1, 0x1F), Local0) } Release (MUT0) Return (Local0) } } } } } } It looks like a BIOS issue, but we may dig futher. We may try to see which EC operation has caused the delay. Let me think about the next debugging step. I'll post debugging patch later.
Created attachment 128341 [details] ACPICA: Dispatcher: Add control method tracing once support. In order not to inflating your dmesg size, I added trace once support.
Created attachment 128351 [details] ACPICA: Events: Add EC access instrumentation support. Then we can trace EC operation region accesses with bounded logging overhead.
Created attachment 128361 [details] ACPICA: Executer: Add Sleep/Stall opcode instrumentation support. Also we can trace Sleep execution with bounded logging overhead.
Here is the test request: 1. Please apply the following patches to the kernel that appears to bootup slowly: attachment 127631 [details] attachment 128341 [details] attachment 128351 [details] attachment 128361 [details] 2. Using the kernel config that can reproduce the slow bootup to build the kernel; 3. Boot the built kernel with "acpi_trace_once=\_SB_.PCI0.LPCB.EC0_._REG"; 4. If the dmesg output is not too big, please post dmesg here; 5. If the dmesg output is too big, please boot the kernel with "acpi_trace_once=\_SB_.PCI0.LPCB.EC0_.SMWR" and post the dmesg here. And don't forget to add under scores "_" after _SB and EC0. They are required. Thanks in advance.
Attachment 128341 [details] fails at the second hunk. I've checked the patch and the hunk lines the patch expects to find aren't in the original source (kernel 3.13.5) Was the patch against that version? I edited the patch so that:- extern const char *acpi_gbl_trace_once_method; is added to the acpifx.h file and although the patch then worked, the kernel failed to compile later.
Created attachment 128571 [details] ACPICA: Dispatcher: Add control method tracing once support. The attachment 128341 [details] is not suitable for your kernel version. Please use this instead. Sorry for the mistake.
Created attachment 128591 [details] Dmesg with _SB_.PCI0.LPCB.EC0_._REG Debug Thanks for the new patch - it worked fine. Should I be on a different kernel? It's no major issue for me to be on any version (within reason!) if it helps. I've attached the dmesg log.
I'm working on linux-pm/linux-next branch where latest ACPICA patches are shipped. But you don't need to do that. I checked the dmesg, there is a mistake. You shouldn't use acpi_trace_once=_SB_.PCI0_LPCB.EC0_._REG You should use acpi_trace_once=\_SB_.PCI0_LPCB.EC0_._REG The backslash is required, otherwise the EC0._REG method executioin couldn't be matched and the tracing thus couldn't turned on. Please correct the boot parameter and try again. Thanks
I've hit a problem:- I've tried to pass the above kernel paramters both by editing grub.cfg and by editing the parameters directly at boot time in grub. However, I get the following results:- Entered (after acpi_trace_once=) Results (in dmesg and /proc/cmdline) ======= ======= \_SB_.PCI0_LPCB.EC0_._REG _SB_.PCI0_LPCB.EC0_._REG \\_SB_.PCI0_LPCB.EC0_._REG \\_SB_.PCI0_LPCB.EC0_._REG \\\_SB_.PCI0_LPCB.EC0_._REG \\_SB_.PCI0_LPCB.EC0_._REG '\_SB_.PCI0_LPCB.EC0_._REG' \\_SB_.PCI0_LPCB.EC0_._REG "\_SB_.PCI0_LPCB.EC0_._REG" \\_SB_.PCI0_LPCB.EC0_._REG The reason you had no backslash was because I only entered one (I remember doing so as I have a non US keyboard, therefore had to search for the correct key to press to give me a backslash in grub) and assumed, incorrectly, that it would be passed to the kernel. How do I insert a single backslash into the kernel parameters? Thanks in advance.
Created attachment 128741 [details] My DMESG output I didn't face such issues. You might be using grub2 while I'm using grub. My grub entry is: title Debian GUN/Linux, acpica (Robert Moore) root (hd0,1) kernel /vmlinuz-acpica-z530 root=/dev/sda6 threadirqs acpi.debug_layer=0x00000040 acpi_trace_once=\_SB_.PCI0.LPCB.BAT1._STA initrd /initrd.img-3.2.0-2-686-pae This attachment is my dmesg output. The following lines are useful for us to dig your reported issue deeper. [ 0.164199] ACPI: Begin method [0xc0010978:\_SB_.PCI0.LPCB.BAT1._STA] execution. [ 0.164387] ACPI: Tracing once enabled: c0010978:\_SB_.PCI0.LPCB.BAT1._STA [ 0.164571] ACPI: Begin method [0xc00104f8:\_SB_.PCI0.LPCB.ECOK] execution. [ 0.164781] ACPI: End method [0xc00104f8] execution. [ 0.164963] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.BAL0) [ 0.168096] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.BAL0, 0x00000011) [ 0.168239] ACPI: Begin Sleep(20 ms) [ 0.192021] ACPI: End Sleep(20 ms) [ 0.192223] ACPI: Tracing once disabled: c0010978 [ 0.192400] ACPI: End method [0xc0010978] execution. So I have no idea what is happening to your grub. Anyway, we neednt' work on grub2 issues at bugzilla.kernel.org. We can simply work it around. I'll post a workaround patch later in this thread.
Created attachment 128751 [details] ACPICA: Workaround to match method where backslash cannot be specified by the bootloader. This patch can be used to workaround your issue. Let me describe the test steps later.
Here is the test request for kernel 3.13.5: 1. Please apply the following patches to the kernel that appears to bootup slowly: attachment 127631 [details] attachment 128571 [details] attachment 128351 [details] attachment 128361 [details] attachment 128751 [details] 2. Using the kernel config that can reproduce the slow bootup to build the kernel; 3. Boot the built kernel with "acpi_trace_once=_SB_.PCI0.LPCB.EC0_._REG"; 4. If the dmesg output is not too big, please post dmesg here (you can check if it is working before posting according to the dmesg output example I posted in comment 27); 5. If the dmesg output is too big, please boot the kernel with "acpi_trace_once=_SB_.PCI0.LPCB.EC0_.SMWR" and post the dmesg here. And don't forget to add under scores "_" after _SB and EC0. They are required. Thanks in advance.
Created attachment 128761 [details] dmesg with acpi_trace_once=_SB_.PCI0.LPCB.EC0_._REG
This time, it was not SMWR, but OSTD has consumed 0.5seconds. [ 0.837768] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.RFLG) [ 1.337293] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.RFLG, 0x00000000) This function is much more simpler. As we know ECOK should equal to one and OSYS should be the latest Windows, this function can be simplified as: DefinitionBlock ("dsdt.aml", "DSDT", 1, "HPQOEM", "2163 ", 0x00000000) { Scope (_SB) { Device (PCI0) { Device (LPCB) { Device (EC0) { Name (_HID, EisaId ("PNP0C09")) // _HID: Hardware ID OperationRegion (ERAM, EmbeddedControl, Zero, 0xFF) Field (ERAM, ByteAcc, Lock, Preserve) { Offset (0x41), W7OS, 1, QWOS, 1, , 1, SUSE, 1, RFLG, 1, , 1, , 1, } Method (OSTD, 0, NotSerialized) { Store (Zero, W7OS) Store (Zero, SUSE) Store (Zero, RFLG) Store (One, W7OS) } } } } } } The log also shows: [ 0.338235] ACPI: Begin method [0xffff88024f84c1f8:\_SB_.PCI0.LPCB.EC0_.OSTD] execution. Store (Zero, W7OS): [ 0.338244] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.W7OS) [ 0.338341] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.W7OS, 0x00000001) [ 0.338344] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.W7OS, 0x00000000) [ 0.338536] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.W7OS) Store (Zero, SUSE): [ 0.338542] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.SUSE) [ 0.837581] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.SUSE, 0x00000000) [ 0.837584] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.SUSE, 0x00000000) [ 0.837762] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.SUSE) Store (Zero, RFLG): [ 0.837768] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.RFLG) [ 1.337293] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.RFLG, 0x00000000) <- 0.5seconds delay [ 1.337296] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.RFLG, 0x00000000) [ 1.337462] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.RFLG) Store (One, W7OS): [ 1.337488] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.W7OS) [ 1.337592] ACPI: EC: RD(\_SB_.PCI0.LPCB.EC0_.W7OS, 0x00000000) [ 1.337595] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.W7OS, 0x00000001) [ 1.337772] ACPI: EC: WR(\_SB_.PCI0.LPCB.EC0_.W7OS) [ 1.337778] ACPI: End method [0xffff88024f84c1f8] execution. Since the delay is various each time you boot and obviously has occurred in the EC driver where EC operation region has happened, my conclusion is this is not an ACPICA issue, but is an issue either caused by the EC driver or by the EC firmware. Since no other platform has reported such issue, it seems the culprit is most likely the EC firmware of your platform. I can help to confirm this by enabling EC debugging information. Let me enable EC driver debugging to confirm. I'll post a debugging patch later.
Created attachment 128791 [details] ACPI: Enable EC driver debugging support. Enabling EC driver debugging.
Here is the test request: 1. Please apply the following patches to the kernel that appears to bootup slowly: attachment 127631 [details] attachment 128571 [details] attachment 128351 [details] attachment 128361 [details] attachment 128751 [details] attachment 128791 [details] 2. Using the kernel config that can reproduce the slow bootup to build the kernel; 3. Boot the built kernel with "acpi_trace_once=_SB_.PCI0.LPCB.EC0_._REG"; 4. If the dmesg output is not too big, please post dmesg here. And don't forget to add under scores "_" after _SB and EC0. They are required. Thanks in advance.
Created attachment 128871 [details] Dmesg with EC logging
The following "message" is accompanied each time with the delay: [ xx.xxxxxx] ACPI : EC: controller reset, restart transaction Looks like a driver issue. Give me sometime to learn the code and fix it. Thanks for reporting.
Please try with kernel param "ec.ec_delay=50". This should be able to decrease the delay.
This is dmesg with ec.ec_delay=50 The delay is approx 2.6 seconds. [ 0.260663] ACPI: Added _OSI(Module Device) [ 0.260665] ACPI: Added _OSI(Processor Device) [ 0.260667] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.260668] ACPI: Added _OSI(Processor Aggregator Device) [ 0.265447] ACPI: Executed 1 blocks of module-level executable AML code [ 0.281431] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored [ 0.293676] ACPI: SSDT 0000000092e6ac18 0003D3 (v01 PmRef Cpu0Cst 00003001 INTL 20121220) [ 0.294276] ACPI: Dynamic OEM Table Load: [ 0.294278] ACPI: SSDT (null) 0003D3 (v01 PmRef Cpu0Cst 00003001 INTL 20121220) [ 0.307180] ACPI: SSDT 0000000092e6a618 0005AA (v01 PmRef ApIst 00003000 INTL 20121220) [ 0.307874] ACPI: Dynamic OEM Table Load: [ 0.307876] ACPI: SSDT (null) 0005AA (v01 PmRef ApIst 00003000 INTL 20121220) [ 0.320321] ACPI: SSDT 0000000092e69d98 000119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 0.320914] ACPI: Dynamic OEM Table Load: [ 0.320916] ACPI: SSDT (null) 000119 (v01 PmRef ApCst 00003000 INTL 20121220) [ 2.974744] ACPI: Interpreter enabled [ 2.974756] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20131115/hwxface-580) [ 2.974764] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20131115/hwxface-580) [ 2.974791] ACPI: (supports S0 S3 S4 S5) [ 2.974793] ACPI: Using IOAPIC for interrupt routing [ 2.974833] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 2.974998] ACPI: No dock devices found. [ 2.984644] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe]) [ 2.984651] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.984750] \_SB_.PCI0:_OSC invalid UUID [ 2.984751] _OSC request data:1 1f 0 [ 2.984757] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM
I've just rebooted without the ec.ec_delay=50 kernel parameter and the boot time was similar - delay of 2.5 seconds.
(In reply to Gareth Williams from comment #38) > I've just rebooted without the ec.ec_delay=50 kernel parameter and the boot > time was similar - delay of 2.5 seconds. Tuning ec_delay cannot fix the real issue. Let's continue to find if any gaps have caused such frequent controller resets. I think this is a good chance and you can help us to improve the Linux quality.
Is there any progress on this issue? Fedora are going through a mass bug update and the downstream bug for this #1055228 is one they've earmarked for closure if there is no progress. I've just booted with kernel 3.14.4-200 and the kernel took 7.562 seconds. dmesg showed the delay of 5 seconds at the same point, so the newer kernel hasn't fixed the issue.
Hi, I'm going to work on this. Could you offer a dmidecode output here so that I can see if any quirks are enabled for your platform. Thanks in advance.
Created attachment 137461 [details] HP EliteBook 840 G1 dmidecode On my HP EliteBook 840 G1 running vanilla Linux 3.14.4 I'm only seeing a 1.5 second delay before the message "ACPI: Interpreter enabled" appears. But maybe my dmidecode output can help you to spot some similarities (see attachment). ... [ 0.112242] ACPI: Added _OSI(Module Device) [ 0.112350] ACPI: Added _OSI(Processor Device) [ 0.112459] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.112568] ACPI: Added _OSI(Processor Aggregator Device) [ 0.125802] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored [ 0.141887] ACPI: SSDT 00000000bbde2918 0003F3 (v01 PmRef Cpu0Cst 00003001 INTL 20110112) [ 0.142617] ACPI: Dynamic OEM Table Load: [ 0.142803] ACPI: SSDT (null) 0003F3 (v01 PmRef Cpu0Cst 00003001 INTL 20110112) [ 0.148143] ACPI: SSDT 00000000bbde3698 0005DB (v01 PmRef ApIst 00003000 INTL 20110112) [ 0.148939] ACPI: Dynamic OEM Table Load: [ 0.149122] ACPI: SSDT (null) 0005DB (v01 PmRef ApIst 00003000 INTL 20110112) [ 0.152011] ACPI: SSDT 00000000bbde1d98 000119 (v01 PmRef ApCst 00003000 INTL 20110112) [ 0.152735] ACPI: Dynamic OEM Table Load: [ 0.152917] ACPI: SSDT (null) 000119 (v01 PmRef ApCst 00003000 INTL 20110112) [ 1.664299] ACPI: Interpreter enabled ...
Created attachment 137471 [details] dmidecode output from HP Pavilion Attached the dmidecode output as it was rather large to paste inline. Thanks in advance.
A write to the command register doesn't flag the CMD bit: [ 0.335165] ACPI : EC: transaction start (cmd=0x80, addr=0x41) [ 0.335166] ACPI : EC: <--- command = 0x80 [ 0.335168] ACPI : EC: ---> status = 0x00 If we could reset at this point immediately, then there wouldn't be such a delay. Though I found lots of possible improvements can be done to the EC driver, let me start from fixing this issue first.
Hi, The CMD field should be used by firmware, it's hard for OSPM to check this field... I hope this was not your firmware bug. Recently I did some improvements in EC driver, it acts faster than the original one. Let us give it a try to find any possibilities to fix it in the OSPM side.
Created attachment 138701 [details] [PATCH 1/8] ACPICA: Events: Fix edge-triggered GPE by disabling before acknowledging it. This is an ACPICA fix to be more robust to GPE handlings.
Created attachment 138711 [details] [PATCH 2/8] ACPI: Update interrupt handler to honor new ACPI_REENABLE_GPE bit. A small fix that is not related to this issue, but is part of my improvements.
Created attachment 138721 [details] [PATCH 3/8] ACPI/EC: Add detailed fields debugging support of EC_SC(R). This makes EC debugging log more useful.
Created attachment 138731 [details] [PATCH 4/8] ACPI/EC: Add asynchronous command byte write support. The big improvement to improve EC throughput.
Created attachment 138741 [details] [PATCH 5/8] ACPI/EC: Remove duplicated ec_wait_ibf0() waiter. I was thinking the out-of-order waiter could trigger something wrong on your platform. So this patch is important.
Created attachment 138751 [details] [PATCH 6/8] ACPI/EC: Add reference counting for query handlers. Removing kmalloc can help me to reduce one useless work queue scheduling, moving some code to the interrupt context.
Created attachment 138761 [details] [PATCH 7/8] ACPI/EC: Move "struct transaction" to drivers/acpi/internal.h. This is a preparation for the last patch.
Created attachment 138771 [details] [PATCH 8/8] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support. Improvement for EVT_SCI interrupts.
This time we only need to track EC drivers, so no need to apply ACPICA interpreter tracer patches. Here is the test request: 1. Please apply the following patches to the latest kernel that appears to bootup slowly: attachment 138701 [details] attachment 138711 [details] attachment 138721 [details] attachment 138731 [details] attachment 138741 [details] attachment 138751 [details] attachment 138761 [details] attachment 138771 [details] attachment 128791 [details] 2. Using the kernel config that can reproduce the slow bootup to build the kernel; 3. Please post dmesg here. Thanks in advance.
Created attachment 138961 [details] dmesg of Linux 3.15.0 + 9 patches
(In reply to Steffen Weber from comment #55) > Created attachment 138961 [details] dmesg of Linux 3.15.0 + 9 patches It seems bug has been solved as there is no following log in your dmesg again: [ xx.xxxxxx] ACPI : EC: controller reset, restart transaction Before closing this bug, I'd wait for Gareth Williams to confirm as he is the original reporter. Thanks!
Hi, Steffen Weber You issue is different from the one Gareth Williams has reported. He is sufferring from abnormal EC transaction aborts. In your case: [ 0.187052] ACPI : EC: transaction end [ 0.592774] ACPI : EC: transaction start (cmd=0x80, addr=0xe7) [ 0.612285] ACPI : EC: transaction end [ 0.715032] ACPI : EC: transaction start (cmd=0x80, addr=0x44) [ 0.731218] ACPI : EC: transaction end [ 0.833319] ACPI : EC: transaction start (cmd=0x80, addr=0x44) [ 0.850160] ACPI : EC: transaction end [ 0.951600] ACPI : EC: transaction start (cmd=0x80, addr=0x44) [ 0.968079] ACPI : EC: transaction end [ 1.069886] ACPI : EC: transaction start (cmd=0x80, addr=0x44) [ 1.086994] ACPI : EC: transaction end [ 1.189133] ACPI : EC: transaction start (cmd=0x80, addr=0x44) [ 1.205930] ACPI : EC: transaction end [ 1.309369] ACPI : EC: transaction start (cmd=0x80, addr=0x44) [ 1.325862] ACPI : EC: transaction end [ 1.429587] ACPI : EC: transaction start (cmd=0x80, addr=0x44) [ 1.445792] ACPI : EC: transaction end [ 1.549807] ACPI : EC: transaction start (cmd=0x80, addr=0x44) [ 1.566753] ACPI : EC: transaction end [ 1.670002] ACPI : EC: transaction start (cmd=0x80, addr=0x44) There is simply something delays between EC transactions. You need to follow Comment 33 to perform a test to see if it is abnormal. Maybe this is just caused by some Stall/Sleep AML code. Thanks.
(In reply to Lv Zheng from comment #56) > (In reply to Steffen Weber from comment #55) > > Created attachment 138961 [details] > dmesg of Linux 3.15.0 + 9 patches > > It seems bug has been solved as there is no following log in your dmesg > again: > [ xx.xxxxxx] ACPI : EC: controller reset, restart transaction > > Before closing this bug, I'd wait for Gareth Williams to confirm as he is > the original reporter. > > Thanks! I noticed the patches has caused many out of control QR_SC command. Let me refresh one of them. Thanks
(In reply to Steffen Weber from comment #55) > Created attachment 138961 [details] > dmesg of Linux 3.15.0 + 9 patches Hi, Steffen I see another issue that is not related to your reported one. There are many such logs in your dmesg: [ 40.475141] ACPI : EC: ~~~> IRQ [ 40.475151] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 40.475154] ACPI : EC: EC_SC(W) = 0x84 [ 40.476154] ACPI : EC: ~~~> IRQ [ 40.476164] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1 [ 40.476169] ACPI : EC: EC_DATA(R) = 0x0d I was thinking it was caused by my cleanup. But I checked my code, there wasn't something wrong. It seems there is no _Q0D method in your ACPI namespace. Which seems like a firmware bug. Could you upload acpidump of your platform to confirm? Thanks
Created attachment 139101 [details] HP EliteBook 840 G1 acpidump
(In reply to Lv Zheng from comment #57) > [...] > > There is simply something delays between EC transactions. > You need to follow Comment 33 to perform a test to see if it is abnormal. > Maybe this is just caused by some Stall/Sleep AML code. I've applied the patches from comment #33 to Linux 3.15.0 but compiling them fails. Could you please provide updated patches or tell me which Linux version they've been created for? CC drivers/acpi/acpica/utglobal.o In file included from drivers/acpi/acpica/accommon.h:59:0, from drivers/acpi/acpica/utglobal.c:48: drivers/acpi/acpica/acglobal.h:174:62: error: macro "ACPI_INIT_GLOBAL" requires 3 arguments, but only 2 given const char *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_method, NULL); ^ drivers/acpi/acpica/acglobal.h:175:77: error: macro "ACPI_INIT_GLOBAL" requires 3 arguments, but only 2 given union acpi_operand_object *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_object, NULL); ^ drivers/acpi/acpica/acglobal.h:175:28: error: conflicting types for ‘ACPI_INIT_GLOBAL’ union acpi_operand_object *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_object, NULL); ^ drivers/acpi/acpica/acglobal.h:174:13: note: previous declaration of ‘ACPI_INIT_GLOBAL’ was here const char *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_method, NULL); ^ scripts/Makefile.build:318: recipe for target 'drivers/acpi/acpica/utglobal.o' failed make[3]: *** [drivers/acpi/acpica/utglobal.o] Error 1 scripts/Makefile.build:465: recipe for target 'drivers/acpi/acpica' failed make[2]: *** [drivers/acpi/acpica] Error 2 scripts/Makefile.build:465: recipe for target 'drivers/acpi' failed make[1]: *** [drivers/acpi] Error 2 Makefile:878: recipe for target 'drivers' failed make: *** [drivers] Error 2
(In reply to Steffen Weber from comment #61) > (In reply to Lv Zheng from comment #57) > > [...] > > > > There is simply something delays between EC transactions. > > You need to follow Comment 33 to perform a test to see if it is abnormal. > > Maybe this is just caused by some Stall/Sleep AML code. > > I've applied the patches from comment #33 to Linux 3.15.0 but compiling them > fails. Could you please provide updated patches or tell me which Linux > version they've been created for? > > CC drivers/acpi/acpica/utglobal.o > In file included from drivers/acpi/acpica/accommon.h:59:0, > from drivers/acpi/acpica/utglobal.c:48: > drivers/acpi/acpica/acglobal.h:174:62: error: macro "ACPI_INIT_GLOBAL" > requires 3 arguments, but only 2 given > const char *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_method, NULL); > ^ > drivers/acpi/acpica/acglobal.h:175:77: error: macro "ACPI_INIT_GLOBAL" > requires 3 arguments, but only 2 given > union acpi_operand_object *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_object, > NULL); > > ^ > drivers/acpi/acpica/acglobal.h:175:28: error: conflicting types for > ‘ACPI_INIT_GLOBAL’ > union acpi_operand_object *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_object, > NULL); > ^ > drivers/acpi/acpica/acglobal.h:174:13: note: previous declaration of > ‘ACPI_INIT_GLOBAL’ was here > const char *ACPI_INIT_GLOBAL(acpi_gbl_trace_once_method, NULL); > ^ > scripts/Makefile.build:318: recipe for target > 'drivers/acpi/acpica/utglobal.o' failed > make[3]: *** [drivers/acpi/acpica/utglobal.o] Error 1 > scripts/Makefile.build:465: recipe for target 'drivers/acpi/acpica' failed > make[2]: *** [drivers/acpi/acpica] Error 2 > scripts/Makefile.build:465: recipe for target 'drivers/acpi' failed > make[1]: *** [drivers/acpi] Error 2 > Makefile:878: recipe for target 'drivers' failed > make: *** [drivers] Error 2 OK. So please file a new bug of 1.5 unknown delay for your platform. Let us stick on EC bugs in this thread. I may also need your help. The dmesg you've posted doesn't seem to be good and I improved my patchset, so need you post a dmesg again for me to confirm the improvements are valid. Thanks
I purified the patchset: 1. splits the 2 race condition fixes into seperate patches. They might be the root causes of the 6 second delay. 2. improves the last patch, allowing command to be inserted between 2 QR_EC. Steffen's dmesg showed a case that if OSPM doesn't have a handler for the EC notification, we should still reserve chances for commands to be executed. Let me post the patches and write some testing requests.
(In reply to Steffen Weber from comment #60) > Created attachment 139101 [details] > HP EliteBook 840 G1 acpidump The grep result of decompiled .dsl files showed no _Q0D implemented: Method (_Q03, 0, NotSerialized) // _Qxx: EC Query Method (_Q04, 0, NotSerialized) // _Qxx: EC Query Method (_Q05, 0, NotSerialized) // _Qxx: EC Query Method (_Q06, 0, NotSerialized) // _Qxx: EC Query Method (_Q08, 0, NotSerialized) // _Qxx: EC Query Method (_Q0A, 0, NotSerialized) // _Qxx: EC Query Method (_Q09, 0, NotSerialized) // _Qxx: EC Query Method (_Q0B, 0, NotSerialized) // _Qxx: EC Query Method (_Q0C, 0, NotSerialized) // _Qxx: EC Query Method (_Q18, 0, NotSerialized) // _Qxx: EC Query Method (_Q19, 0, NotSerialized) // _Qxx: EC Query Method (_Q20, 0, NotSerialized) // _Qxx: EC Query Method (_Q21, 0, NotSerialized) // _Qxx: EC Query
Created attachment 139121 [details] [PATCH 01/11] ACPICA: Events: Fix edge-triggered GPE by disabling before acknowledging it. No content updates for the original patch.
Created attachment 139131 [details] [PATCH 02/11] ACPI/EC: Add detailed fields debugging support of EC_SC(R). Register field debugging is protected by new macro in order not to trigger regressions.
Created attachment 139141 [details] [PATCH 03/11] ACPI/EC: Fix an issue that advance_transaction() processes staled status. The race condition fix is split from original patch 4. This might be one of the root causes that have caused this 6s-delay issue.
Created attachment 139151 [details] [PATCH 04/11] ACPI/EC: Add reference counting for query handlers. An improvement to reduce useless dynamic allocation.
Created attachment 139161 [details] [PATCH 05/11] ACPI/EC: Add asynchronous command byte write support. The updated patch 4. The 1st race condition fix is removed from this patch. This is a preparation for the 2nd race condition fix.
Created attachment 139171 [details] [PATCH 06/11] ACPI/EC: Remove duplicated ec_wait_ibf0() waiter. Removing some design entropy so that the 2nd race condition fix can be obvious and easier.
Created attachment 139181 [details] [PATCH 07/11] ACPI/EC: Fix a race condition in ec_transaction_completed(). Another race condition fix. This might be one of the root causes that have caused the 6s-delay issue.
Created attachment 139191 [details] [PATCH 08/11] ACPI/EC: Move "struct transaction" to drivers/acpi/internal.h. The preparation of the last improvement.
Created attachment 139201 [details] [PATCH 09/11] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support. Unless necessary, drivers should try to utilize the interrupt context and the user process context. There should only be limited cases a driver need to create new context. This patch is improved to handle the case in the dmesg posted by Steffan. If a _Qxx method is not provided by the BIOS, we can see one line warning message and the normal command transactions will not be starving due to the unhandled EVT_SCI.
Created attachment 139211 [details] [PATCH 10/11] ACPI/EC: Enable debbuging messages. EC debugging patch need to be updated due to the updates in the patch 2.
Created attachment 139221 [details] [PATCH 11/11] ACPI/EC: Enable detailed register bit fields debugging. To utilize new debugging facility.
Here is the test request for Gareth Williams: 1. Please apply the following patches to the latest kernel that appears to bootup slowly: attachment 139121 [details] attachment 139131 [details] attachment 139141 [details] attachment 139151 [details] attachment 139161 [details] attachment 139171 [details] attachment 139181 [details] attachment 139211 [details] attachment 139221 [details] 2. Using the kernel config that can reproduce the slow bootup to build the kernel; 3. Please post dmesg here. I need to confirm if 139141 and 139181 fixes can eliminate the 6s-delay. Here is the test request for Steffen Weber: 1. Please apply the following patches to the latest kernel: attachment 139121 [details] attachment 139131 [details] attachment 139141 [details] attachment 139151 [details] attachment 139161 [details] attachment 139171 [details] attachment 139181 [details] attachment 139191 [details] attachment 139201 [details] attachment 139211 [details] attachment 139221 [details] 2. Using the kernel config that can reproduce EVT_SCI storm to build the kernel; 3. Please post dmesg here. I need to confirm if the improvement made in 139201 can effectively allowing commands to be executed along with the EVT_SCI storm. Thanks in advance.
(In reply to Lv Zheng from comment #62) > So please file a new bug of 1.5 unknown delay for your platform. > Let us stick on EC bugs in this thread. I'll do that later today, thanks! (In reply to Lv Zheng from comment #76) > Here is the test request for Steffen Weber: > 1. Please apply the following patches to the latest kernel: > attachment 139121 [details] > attachment 139131 [details] > attachment 139141 [details] > attachment 139151 [details] > attachment 139161 [details] > attachment 139171 [details] > attachment 139181 [details] > attachment 139191 [details] > attachment 139201 [details] > attachment 139211 [details] > attachment 139221 [details] > 2. Using the kernel config that can reproduce EVT_SCI storm to build the > kernel; > 3. Please post dmesg here. > I need to confirm if the improvement made in 139201 can effectively allowing > commands to be executed along with the EVT_SCI storm. I've applied the patches to Linux 3.15.0 but got the following compile error: CC drivers/acpi/ec.o drivers/acpi/ec.c: In function ‘complete_transaction’: drivers/acpi/ec.c:196:9: error: ‘struct acpi_ec’ has no member named ‘ctrl’ ec->ctrl.command == ACPI_EC_COMMAND_QUERY) { ^ drivers/acpi/ec.c:197:18: error: ‘struct acpi_ec’ has no member named ‘pending_query’ query_bit = ec->pending_query; ^ drivers/acpi/ec.c: In function ‘prepare_control_transaction’: drivers/acpi/ec.c:225:12: error: ‘struct acpi_ec’ has no member named ‘ctrl’ memset(&ec->ctrl, 0, sizeof (struct acpi_ec_transaction)); ^ drivers/acpi/ec.c:226:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’ ec->ctrl.command = command; ^ drivers/acpi/ec.c:227:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’ ec->ctrl.rdata = rdata; ^ drivers/acpi/ec.c:228:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’ ec->ctrl.rlen = rlen; ^ drivers/acpi/ec.c: In function ‘advance_transaction’: drivers/acpi/ec.c:256:10: error: ‘struct acpi_ec’ has no member named ‘ctrl’ t = &ec->ctrl; ^ drivers/acpi/ec.c:259:14: error: ‘struct acpi_ec’ has no member named ‘pending_query’ &ec->pending_query, 1); ^ scripts/Makefile.build:318: recipe for target 'drivers/acpi/ec.o' failed make[2]: *** [drivers/acpi/ec.o] Error 1 scripts/Makefile.build:465: recipe for target 'drivers/acpi' failed make[1]: *** [drivers/acpi] Error 2 Makefile:878: recipe for target 'drivers' failed make: *** [drivers] Error 2
Created attachment 139231 [details] [PATCH 09/11] ACPI/EC: Add asynchronous QR_EC(0x84, Query Embedded Controller) support. Sorry, I made a mistake when refreshing patches with quilt. Please use this instead of attachment 139201 [details] and try again.
(In reply to Steffen Weber from comment #77) > (In reply to Lv Zheng from comment #62) > > I've applied the patches to Linux 3.15.0 but got the following compile error: > > CC drivers/acpi/ec.o > drivers/acpi/ec.c: In function ‘complete_transaction’: > drivers/acpi/ec.c:196:9: error: ‘struct acpi_ec’ has no member named ‘ctrl’ > ec->ctrl.command == ACPI_EC_COMMAND_QUERY) { > ^ > drivers/acpi/ec.c:197:18: error: ‘struct acpi_ec’ has no member named > ‘pending_query’ > query_bit = ec->pending_query; > ^ > drivers/acpi/ec.c: In function ‘prepare_control_transaction’: > drivers/acpi/ec.c:225:12: error: ‘struct acpi_ec’ has no member named ‘ctrl’ > memset(&ec->ctrl, 0, sizeof (struct acpi_ec_transaction)); > ^ > drivers/acpi/ec.c:226:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’ > ec->ctrl.command = command; > ^ > drivers/acpi/ec.c:227:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’ > ec->ctrl.rdata = rdata; > ^ > drivers/acpi/ec.c:228:4: error: ‘struct acpi_ec’ has no member named ‘ctrl’ > ec->ctrl.rlen = rlen; > ^ > drivers/acpi/ec.c: In function ‘advance_transaction’: > drivers/acpi/ec.c:256:10: error: ‘struct acpi_ec’ has no member named ‘ctrl’ > t = &ec->ctrl; > ^ > drivers/acpi/ec.c:259:14: error: ‘struct acpi_ec’ has no member named > ‘pending_query’ > &ec->pending_query, 1); > ^ > scripts/Makefile.build:318: recipe for target 'drivers/acpi/ec.o' failed > make[2]: *** [drivers/acpi/ec.o] Error 1 > scripts/Makefile.build:465: recipe for target 'drivers/acpi' failed > make[1]: *** [drivers/acpi] Error 2 > Makefile:878: recipe for target 'drivers' failed > make: *** [drivers] Error 2 Here is the updated test request: 1. Please apply the following patches to the latest kernel: attachment 139121 [details] attachment 139131 [details] attachment 139141 [details] attachment 139151 [details] attachment 139161 [details] attachment 139171 [details] attachment 139181 [details] attachment 139191 [details] attachment 139231 [details] attachment 139211 [details] attachment 139221 [details] 2. Using the kernel config that can reproduce EVT_SCI storm to build the kernel; 3. Please post dmesg here. Thanks in advance.
Created attachment 139241 [details] dmesg of Linux 3.15.0 + 11 patches The patched kernel took a long time to boot, hope it helps! :)
> So please file a new bug of 1.5 unknown delay for your platform. I've just filed bug #77681.
(In reply to Steffen Weber from comment #80) > Created attachment 139241 [details] dmesg of Linux 3.15.0 + 11 patches The > patched kernel took a long time to boot, hope it helps! :) Does the previous one - "comment 55" boot? And can you work in the booted system without seeing any siginificant "no response" period?
(In reply to Steffen Weber from comment #81) > > So please file a new bug of 1.5 unknown delay for your platform. I've just filed bug #77681. OK. I'll rebase the patches and ask you again there after finishing necessary EC work.
(In reply to Steffen Weber from comment #80) > Created attachment 139241 [details] dmesg of Linux 3.15.0 + 11 patches The > patched kernel took a long time to boot, hope it helps! :) Maybe I didn't understand you correctly. Which one do you mean in comment 55: 1. The patchset has improved the system bootup time. 2. The patchset has prevented system from response correctly. I posted 2nd 11 patches revision because I was thinking you meant 2. If you meant 1, then the query 0x0d logs could be ignored and I should revert changes back to the previous revision. :-) Could you also do such an experiment: 1. Please only apply the following patch to the tested kernel: attachment 128791 [details] This is one line patch, if you failed to apply, try to edit the drivers/acpi/ec.c file yourself. 2. Using the kernel config that can reproduce EVT_SCI storm to build the kernel; 3. Please post dmesg here.
In comment 55 I applied the 9 patches from comment 54. The kernel did boot up. IIRC it was about as fast as without the patches but I just saved dmesg and rebooted into the unpatched kernel after 1-2 minutes, so I cannot say that with certainty. Yesterday in comment 80 I applied the patches from comment 79. The kernel did boot up, but the boot process took a very long time (e.g. the rootfs was mounted after 39 seconds instead of after 6 seconds, see dmesg). So both patchsets have not improved system bootup time. The 1.5 second delay exists with both patchsets. I don't know about any other delays in my boot process that could be improved.
Created attachment 139381 [details] dmesg of Linux 3.15.0 without any patches but DEBUG enabled
(In reply to Steffen Weber from comment #86) > Created attachment 139381 [details] > dmesg of Linux 3.15.0 without any patches but DEBUG enabled Hi, I didn't see 0x0d query in this dmesg. Looks not like to be generated by the same BIOS. Could you only try these patches, without patching the last 2: attachment 139121 [details] attachment 139131 [details] attachment 139141 [details] attachment 139151 [details] attachment 139161 [details] attachment 139171 [details] attachment 139181 [details] attachment 139211 [details] attachment 139221 [details] and upload the dmsg here? Thanks in advance.
Created attachment 139401 [details] dmesg of Linux 3.15.0 with patches 1+2+3+4+5+6+7+10+11
I forgot that yesterday evening while debugging a Wi-Fi problem I changed the kernel configuration slightly and enabled CONFIG_HP_WMI and CONFIG_ACPI_WMI. Maybe this had an impact?
(In reply to Steffen Weber from comment #88) > Created attachment 139401 [details] > dmesg of Linux 3.15.0 with patches 1+2+3+4+5+6+7+10+11 There is nothing wrong in this patch.
(In reply to Steffen Weber from comment #85) > In comment 55 I applied the 9 patches from comment 54. The kernel did boot > up. IIRC it was about as fast as without the patches but I just saved dmesg > and rebooted into the unpatched kernel after 1-2 minutes, so I cannot say > that with certainty. The situation is: In attachment 138961 [details]: This is the last command: [ 11.302053] ACPI : EC: transaction start (cmd=0x80, addr=0x86) [ 11.302054] ACPI : EC: ~~~> TASK [ 11.302058] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 11.302059] ACPI : EC: EC_SC(W) = 0x84 [ 11.302061] ACPI : EC: transaction paused It is blocked due to the EVT_SCI=1, Q=0x0D storm. So all EC related functionality should be failed from then on. > Yesterday in comment 80 I applied the patches from comment 79. The kernel > did boot up, but the boot process took a very long time (e.g. the rootfs was > mounted after 39 seconds instead of after 6 seconds, see dmesg). But in attachment 139241 [details]: EC commands can be inserted between QR_EC (EC_SC(W) = 0x84) from this line: [ 1.842566] ACPI : EC: transaction start (cmd=0x81, addr=0xd9) [ 1.842569] ACPI : EC: ~~~> TASK [ 1.842572] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 1.842573] ACPI : EC: EC_SC(W) = 0x84 [ 1.842576] ACPI : EC: transaction paused [ 1.843555] ACPI : EC: ~~~> IRQ [ 1.843559] ACPI : EC: EC_SC(R) = 0x29 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=1 [ 1.843562] ACPI : EC: EC_DATA(R) = 0x0d [ 1.843564] ACPI : EC: no handler for query (0x d) [ 4.454375] ACPI : EC: ~~~> IRQ [ 4.454381] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 4.454382] ACPI : EC: EC_SC(W) = 0x81 [ 4.454404] ACPI : EC: transaction unpaused [ 4.455390] ACPI : EC: ~~~> IRQ [ 4.455396] ACPI : EC: EC_SC(R) = 0x28 SCI_EVT=1 BURST=0 CMD=1 IBF=0 OBF=0 [ 4.455397] ACPI : EC: EC_DATA(W) = 0xd9 [ 4.455432] ACPI : EC: ~~~> TASK [ 4.455435] ACPI : EC: EC_SC(R) = 0x22 SCI_EVT=1 BURST=0 CMD=0 IBF=1 OBF=0 [ 4.456416] ACPI : EC: ~~~> IRQ [ 4.456422] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 4.456423] ACPI : EC: EC_DATA(W) = 0x14 [ 4.456443] ACPI : EC: ~~~> TASK [ 4.456446] ACPI : EC: EC_SC(R) = 0x22 SCI_EVT=1 BURST=0 CMD=0 IBF=1 OBF=0 [ 4.457435] ACPI : EC: ~~~> IRQ [ 4.457441] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 4.457463] ACPI : EC: transaction end [ 4.457489] ACPI : EC: transaction start (cmd=0x81, addr=0xda) [ 4.457490] ACPI : EC: ~~~> TASK [ 4.457493] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0 [ 4.457494] ACPI : EC: EC_SC(W) = 0x84 [ 4.457497] ACPI : EC: transaction paused Which looks good. So I want to without applying this patchset, how old kernel survives the EVT_SCI=1 Q=0x0d storm. > So both patchsets have not improved system bootup time. The 1.5 second delay > exists with both patchsets. I don't know about any other delays in my boot > process that could be improved. Your issue is not related to EC, this patchset is supposed to fix the issue reported by Gareth. I asked you to test here, because you can improve my patchset to survive the EVT_SCI=1 Q=0x0d storm. But in attachment 139381 [details], I couldn't see the storm again. :-( Can you redo the test mentioned in the comment 84? Please try to find old configuration that can trigger EVT_SCI=1 Q=0x0d storm. You should be able to find logs like: [ 18.302676] ACPI : EC: push gpe query to the queue [ 18.302685] ACPI : EC: <--- command = 0x84 [ 18.303692] ACPI : EC: push gpe query to the queue !!! [ 18.303693] ACPI : EC: push query execution (0x d) on queue [ 18.303696] ACPI : EC: start query execution [ 18.303773] ACPI : EC: stop query execution Please make sure that there are many lines like the one marked by "!!!" and then post the useful dmesg here. Thanks in advance.
(In reply to Lv Zheng from comment #90) > (In reply to Steffen Weber from comment #88) > > Created attachment 139401 [details] > > dmesg of Linux 3.15.0 with patches 1+2+3+4+5+6+7+10+11 > > There is nothing wrong in this patch. And in order to confirm the storm was not triggered by my patches. Please use this configuration, and try the whole 11 patches again. Post the dmesg here. I'll wait and pray that an OSPM change will not force the EC firmware to flood Q=0x0d queries.
(In reply to Lv Zheng from comment #87) > (In reply to Steffen Weber from comment #86) > > Created attachment 139381 [details] > > dmesg of Linux 3.15.0 without any patches but DEBUG enabled > > Hi, I didn't see 0x0d query in this dmesg. > Looks not like to be generated by the same BIOS. > Could you only try these patches, without patching the last 2: > attachment 139121 [details] > attachment 139131 [details] > attachment 139141 [details] > attachment 139151 [details] > attachment 139161 [details] > attachment 139171 [details] > attachment 139181 [details] > attachment 139211 [details] > attachment 139221 [details] > and upload the dmsg here? Ping Gareth!!!! Could you give these patches a try? I've started review the process for these patches. If they couldn't fix your issue, your issue might be a firmware bug that cannot be fixed as we didn't find other serious EC state machine problems in the current code. Please let us know the result so that we can see the root cause clearly.
Created attachment 139421 [details] attachment-26689-0.html I'm sorry I've not replied sooner, but I didn't want to spam this bug with my apologies. I've been away from the offending HP Pavilion laptop for a few days (work related) but have been monitoring the action on webmail and/or smartphone, feeling rather redundant. I've now created a Digital Ocean VPS droplet which is currently building the kernel for me (with the patches and also the relevant .config from the Arch distro I'm running on the HP laptop). Hopefully, that will build soon and I'll be able to test it on the HP tonight (UK time). At Thursday, 12-06-2014 on 12:46 bugzilla-daemon@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=70891 --- Comment #93 from Lv Zheng --- (In reply to Lv Zheng from comment #87) > (In reply to Steffen Weber from comment #86) > > Created attachment 139381 [details] > > dmesg of Linux 3.15.0 without any patches but DEBUG enabled > > Hi, I didn't see 0x0d query in this dmesg. > Looks not like to be generated by the same BIOS. > Could you only try these patches, without patching the last 2: > attachment 139121 [details] > attachment 139131 [details] > attachment 139141 [details] > attachment 139151 [details] > attachment 139161 [details] > attachment 139171 [details] > attachment 139181 [details] > attachment 139211 [details] > attachment 139221 [details] > and upload the dmsg here? Ping Gareth!!!! Could you give these patches a try? I've started review the process for these patches. If they couldn't fix your issue, your issue might be a firmware bug that cannot be fixed as we didn't find other serious EC state machine problems in the current code. Please let us know the result so that we can see the root cause clearly.
Created attachment 139441 [details] dmesg of Linux 3.15.0 old config with ACPI DEBUG
Created attachment 139451 [details] dmesg of Linux 3.15.0 old config with all 11 patches
I don't see the SCI_EVT=0 storm anymore. Don't know why.
Created attachment 139481 [details] dmesg with kernel 3.15 and patches applied The results look good. 'ACPI: Interpreter enabled' shows up in less than half a second, which is brilliant. I'll confirm this is the result of the patches by installing a vanilla 3.15 kernel to compare timing.
Created attachment 139491 [details] dmesg with kernel 3.15 vanilla Without the patches there is a 1.1 second delay before the 'ACPI: Interpreter Enabled' message appears.
Created attachment 139501 [details] dmesg with kernel 3.15 with ec patches but no debug patches Early in this whole issue, the kernel would boot quite quickly if kernel debug was enabled. I've therefore applied the ec patches but not the last two (debug) patches. The results are promising as 'ACPI: Enable Interpreter' still shows up in under half a second. I think you've cracked it :-)
(In reply to Steffen Weber from comment #96) > Created attachment 139451 [details] > dmesg of Linux 3.15.0 old config with all 11 patches Looks great! So the 39s boot is caused by the storm not the patch.
(In reply to Steffen Weber from comment #97) > I don't see the SCI_EVT=0 storm anymore. Don't know why. OK, so if you could see this again, you could notify me using email: lv.zheng@intel.com Actually, I'm appreciated your help because this case did expose an issue in my patch. Thanks!
(In reply to Gareth Williams from comment #99) > Created attachment 139491 [details] > dmesg with kernel 3.15 vanilla > > Without the patches there is a 1.1 second delay before the 'ACPI: > Interpreter Enabled' message appears. Great! I'll add tested-by into the patches. Thanks for the reporting and testing.
Hi, Bug-fixes patches are sent to the upstream with an additional improvement done for the attachment 139181 [details] since the published patchset doesn't include the asynchronous EVT_SCI and the reference counting improvements. The improvements are going to be sent in another patchset. Marking this bug as RESOLVED.
Many thanks for all of your help/work. I'll wait for this to filter through to Fedora and Arch. I'll also anotate the downstream bug. Thanks again :-) Gareth
Closing due to the merging of this patcheset. It is now in linux-pm.git/bleeding-edge branch, and will appear in linux.git v3.16-rc5.