Bug 77431

Summary: ACPI Events not being reported to OS without intel_idle.max_cstate=0 - Notebook Clevo w350etq
Product: ACPI Reporter: qbanin
Component: ECAssignee: Lv Zheng (lv.zheng)
Status: CLOSED CODE_FIX    
Severity: normal CC: lenb, lv.zheng, rui.zhang, tianyu.lan
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.14.5 Subsystem:
Regression: No Bisected commit-id:
Attachments: acpidump log
Dmesg output
Dmesg with ec.c dirty patch applied
dmesg_broken_acpi
dmesg_broken_acpi updated
dmesg_broken_acpi_after_bios_reset
dmesg_broken_acpi_3
dmesg_ec_flag_msi
ec.patch
dmesg_patched_ec
Dmesg with applied patch from #37
dmicode output
ec.patch
[PATCH] Debugging if udelay is required by this hardware
[PATCH] Debugging if BURST mode can fix this issue
The EC event polling implementation

Description qbanin 2014-06-06 18:47:51 UTC
Hi,

Device: Notebook Clevo w350etq
Kernel Linux XNOTE 3.14.5-qba5 #5 SMP PREEMPT Fri Jun 6 18:44:39 CEST 2014 x86_64 GNU/Linux (custom compilation)
Disto: Debian SID

I have problem with acpi support form this notebook. ACPI events (such as: AC (un)plug, brigtness change, power& sleep btn press) are not being reported correcly (or at all). Sometimes it happens right after system boot, sometimes it work for few min - few hour and then stop.

Tried acpi_listen when this problem occured - no output.

In most of cases if I run "acpi" command to manually query system some of the pending ACPI events unstuck for few mins and print out in acpi_listen window, at least for few mins. 

Already tries "clear stale EC events" patch for samsung notebooks - no change.

Here's my dsdt (with few errors fixed): http://pastebin.com/vQzbqgdB

qba@XNOTE:~$ lspci
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4)
00:1c.3 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 4 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GTX 660M] (rev ff)
03:00.0 Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01)
04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 5289 (rev 01)
04:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0a)




qba@XNOTE:~$ dmesg |grep -i acpi
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.14.5-qba5 root=UUID=ba0f9583-91ca-45de-8b7e-c8f8d1421d80 ro i915.i915_enable_rc6=7 i915.i915_enable_fbc=1 i915.lvds_downclock=1 i915.semaphores=1 drm.vblankoffdelay=1 enable_mtrr_cleanup mtrr_spare_reg_nr=1 mtrr_gran_size=32M mtrr_chunk_size=256M processor.ignore_ppc=1 processor.ignore_tpc=1 acpi_osi=Linux init=/sbin/init i915.modeset=1 i915.semaphores=1 quiet
[    0.000000] BIOS-e820: [mem 0x00000000ca0fc000-0x00000000ca1d1fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000ca618000-0x00000000ca65afff] ACPI NVS
[    0.000000] modified: [mem 0x00000000ca0fc000-0x00000000ca1d1fff] ACPI NVS
[    0.000000] modified: [mem 0x00000000ca618000-0x00000000ca65afff] ACPI NVS
[    0.000000] ACPI: RSDP 00000000000f0490 000024 (v02 ALASKA)
[    0.000000] ACPI: XSDT 00000000ca184080 00007C (v01 ALASKA    A M I 01072009 AMI  00010013)
[    0.000000] ACPI: FACP 00000000ca18d2d8 00010C (v05 ALASKA    A M I 01072009 AMI  00010013)
[    0.000000] ACPI: Override [DSDT-   A M I], this is unsafe: tainting kernel
[    0.000000] ACPI: DSDT 00000000ca184190 Logical table override, new table: ffffffff81a66790
[    0.000000] ACPI: DSDT ffffffff81a66790 009149 (v02 ALASKA    A M I 00000021 INTL 20140424)
[    0.000000] ACPI: FACS 00000000ca1d0080 000040
[    0.000000] ACPI: APIC 00000000ca18d3e8 000092 (v03 ALASKA    A M I 01072009 AMI  00010013)
[    0.000000] ACPI: FPDT 00000000ca18d480 000044 (v01 ALASKA    A M I 01072009 AMI  00010013)
[    0.000000] ACPI: MCFG 00000000ca18d4c8 00003C (v01 ALASKA    A M I 01072009 MSFT 00000097)
[    0.000000] ACPI: SSDT 00000000ca18d508 000EA5 (v01 TrmRef PtidDevc 00001000 INTL 20091112)
[    0.000000] ACPI: HPET 00000000ca18e3b0 000038 (v01 ALASKA    A M I 01072009 AMI. 00000005)
[    0.000000] ACPI: SSDT 00000000ca18e3e8 000315 (v01 SataRe SataTabl 00001000 INTL 20091112)
[    0.000000] ACPI: SSDT 00000000ca18e700 000926 (v01  PmRef  Cpu0Ist 00003000 INTL 20051117)
[    0.000000] ACPI: SSDT 00000000ca18f028 000A92 (v01  PmRef    CpuPm 00003000 INTL 20051117)                                           
[    0.000000] ACPI: SSDT 00000000ca18fac0 000574 (v01  SgRef   SgTabl 00001000 INTL 20051117)                                           
[    0.000000] ACPI: SSDT 00000000ca190038 000FAC (v01 OptRef  OptTabl 00001000 INTL 20051117)                                           
[    0.000000] ACPI: Local APIC address 0xfee00000                                                                                       
[    0.000000] ACPI: PM-Timer IO Port: 0x408                                                                                             
[    0.000000] ACPI: Local APIC address 0xfee00000                                                                                       
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)                                                                        
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)                                                                        
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)                                                                        
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)                                                                        
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.14.5-qba5 root=UUID=ba0f9583-91ca-45de-8b7e-c8f8d1421d80 ro i915.i915_enable_rc6=7 i915.i915_enable_fbc=1 i915.lvds_downclock=1 i915.semaphores=1 drm.vblankoffdelay=1 enable_mtrr_cleanup mtrr_spare_reg_nr=1 mtrr_gran_size=32M mtrr_chunk_size=256M processor.ignore_ppc=1 processor.ignore_tpc=1 acpi_osi=Linux init=/sbin/init i915.modeset=1 i915.semaphores=1 quiet
[    0.000044] ACPI: Core revision 20131218
[    0.008226] ACPI: All ACPI Tables successfully acquired
[    0.165286] PM: Registering ACPI NVS region [mem 0xca0fc000-0xca1d1fff] (876544 bytes)
[    0.165298] PM: Registering ACPI NVS region [mem 0xca618000-0xca65afff] (274432 bytes)
[    0.165685] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[    0.165686] ACPI: bus type PCI registered
[    0.165688] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.171888] ACPI: Added _OSI(Module Device)
[    0.171890] ACPI: Added _OSI(Processor Device)
[    0.171891] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.171892] ACPI: Added _OSI(Processor Aggregator Device)
[    0.171893] ACPI: Added _OSI(Linux)
[    0.174547] ACPI: Executed 1 blocks of module-level executable AML code
[    0.191869] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query honored via cmdline
[    0.192571] ACPI: SSDT 00000000ca025018 00083B (v01  PmRef  Cpu0Cst 00003001 INTL 20051117)
[    0.192914] ACPI: Dynamic OEM Table Load:
[    0.192916] ACPI: SSDT           (null) 00083B (v01  PmRef  Cpu0Cst 00003001 INTL 20051117)
[    0.196201] ACPI: SSDT 00000000ca026a98 000303 (v01  PmRef    ApIst 00003000 INTL 20051117)
[    0.196564] ACPI: Dynamic OEM Table Load:
[    0.196566] ACPI: SSDT           (null) 000303 (v01  PmRef    ApIst 00003000 INTL 20051117)
[    0.199985] ACPI: SSDT 00000000ca027c18 000119 (v01  PmRef    ApCst 00003000 INTL 20051117)
[    0.200321] ACPI: Dynamic OEM Table Load:
[    0.200323] ACPI: SSDT           (null) 000119 (v01  PmRef    ApCst 00003000 INTL 20051117)
[    0.207500] ACPI: Interpreter enabled
[    0.207508] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20131218/hwxface-580)
[    0.207512] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20131218/hwxface-580)
[    0.207525] ACPI: (supports S0 S3 S4 S5)
[    0.207526] ACPI: Using IOAPIC for interrupt routing
[    0.207550] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.207689] ACPI: No dock devices found.
[    0.215944] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-3e])
[    0.215949] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.216147] acpi PNP0A08:00: _OSC: platform does not support [PCIeHotplug PME]
[    0.216264] acpi PNP0A08:00: _OSC: OS now controls [AER PCIeCapability]
[    0.216984] pci 0000:00:01.0: System wakeup disabled by ACPI
[    0.217272] pci 0000:00:14.0: System wakeup disabled by ACPI
[    0.217652] pci 0000:00:1a.0: System wakeup disabled by ACPI
[    0.217820] pci 0000:00:1b.0: System wakeup disabled by ACPI
[    0.217981] pci 0000:00:1c.0: System wakeup disabled by ACPI
[    0.218140] pci 0000:00:1c.2: System wakeup disabled by ACPI
[    0.218298] pci 0000:00:1c.3: System wakeup disabled by ACPI
[    0.218501] pci 0000:00:1d.0: System wakeup disabled by ACPI
[    0.219157] pci 0000:01:00.0: System wakeup disabled by ACPI
[    0.221352] pci 0000:03:00.0: System wakeup disabled by ACPI
[    0.223826] pci 0000:04:00.0: System wakeup disabled by ACPI
[    0.226087] acpi PNP0A08:00: Disabling ASPM (FADT indicates it is unsupported)
[    0.226700] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.226743] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled.
[    0.226784] ACPI: PCI Interrupt Link [LNKC] (IRQs *3 4 5 6 10 11 12 14 15)
[    0.226824] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *10 11 12 14 15)
[    0.226865] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled.
[    0.226904] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled.
[    0.226944] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 10 11 12 14 15)
[    0.226989] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 *4 5 6 10 11 12 14 15)
[    0.227060] ACPI: Enabled 5 GPEs in block 00 to 3F
[    0.227087] ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[    0.227319] ACPI : EC: 0 stale EC events cleared
[    0.227508] PCI: Using ACPI for IRQ routing
[    0.233042] pnp: PnP ACPI init
[    0.233050] ACPI: bus type PNP registered
[    0.233108] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active)
[    0.233131] pnp 00:01: Plug and Play ACPI device, IDs PNP0200 (active)
[    0.233147] pnp 00:02: Plug and Play ACPI device, IDs INT0800 (active)
[    0.233228] pnp 00:03: Plug and Play ACPI device, IDs PNP0103 (active)
[    0.233280] system 00:04: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.233311] pnp 00:05: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.233351] system 00:06: Plug and Play ACPI device, IDs INT3f0d PNP0c02 (active)
[    0.233391] system 00:07: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.233412] pnp 00:08: Plug and Play ACPI device, IDs PNP0c04 (active)
[    0.233436] pnp 00:09: Plug and Play ACPI device, IDs PNP0303 (active)
[    0.233479] pnp 00:0a: Plug and Play ACPI device, IDs ETD0403 PNP0f13 (active)
[    0.234213] system 00:0b: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.234352] system 00:0c: Plug and Play ACPI device, IDs PNP0c01 (active)
[    0.234364] pnp: PnP ACPI: found 13 devices
[    0.234365] ACPI: bus type PNP unregistered
[    0.940212] ACPI: bus type USB registered
[    0.942363] ACPI: Thermal Zone [TZ0] (42 C)
[    1.286512] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[    1.286521] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[    1.286534] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[    1.288850] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[    1.288859] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[    1.288863] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[    1.596251] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[    1.596254] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[    1.596256] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[    1.597013] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[    1.597014] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[    1.597016] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[    3.261137] ACPI: Power Button [PWRB]
[    3.261171] ACPI: Sleep Button [SLPB]
[    3.261658] ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042f conflicts with OpRegion 0x0000000000000400-0x000000000000047f (\PMIO) (20131218/utaddress-258)
[    3.261666] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    3.261669] ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053f conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20131218/utaddress-258)
[    3.261673] ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053f conflicts with OpRegion 0x0000000000000500-0x000000000000055f (\_SB_.PCI0.PEG0.PEGP.GPIO) (20131218/utaddress-258)
[    3.261677] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    3.261679] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052f conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20131218/utaddress-258)
[    3.261682] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052f conflicts with OpRegion 0x0000000000000500-0x000000000000055f (\_SB_.PCI0.PEG0.PEGP.GPIO) (20131218/utaddress-258)
[    3.261686] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    3.262562] ACPI: AC Adapter [AC] (on-line)
[    3.277371] ACPI: Battery Slot [BAT] (battery present)
[    3.282286] ACPI: Lid Switch [LID0]
[    3.282332] ACPI: Power Button [PWRF]
[    3.322875] ACPI Warning: SystemIO range 0x000000000000f040-0x000000000000f05f conflicts with OpRegion 0x000000000000f040-0x000000000000f04f (\_SB_.PCI0.SBUS.SMBI) (20131218/utaddress-258)
[    3.322880] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    8.124704] input: ACPI Virtual Keyboard Device as /devices/virtual/input/input21
[    9.760925] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131218/nsarguments-95)
[    9.762930] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131218/nsarguments-95)
[   10.480514] [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS
[   10.480541] ACPI: Video Device [PEGP] (multi-head: yes  rom: yes  post: no)
[   10.480700] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   10.514067] acpi device:4c: registered as cooling_device9


Regards
Qba
Comment 1 Lan Tianyu 2014-06-09 07:15:47 UTC
Please provide the output of acpidump.
Comment 2 qbanin 2014-06-09 08:10:36 UTC
http://pastebin.com/hKUPnDy4

After lot of googling and trying different boot parameters I found out that this issue is somehow r/elated to CPU's C-states. Added "intel_idle.max_cstate=0" to the grub 2 days ago and the ACPI haven't stuck anymore. 

Now I'm booting my PC with "intel_idle.max_cstate=0 processor.max_cstate=0 idle=mwait" and will keep testing. I belive that the con of this solution is higher power usage due to the CPU doesn't enter C7 state?

It'd be nice if this issue could be fixed by software patch because there's no BIOS nor EC firmware upgrade available for my notebook.

Regards
Qba
Comment 3 Zhang Rui 2014-06-11 08:07:31 UTC
please attach the acpidump to bugzilla report as I can not access the pastebin page.

I don't know why this could be related with Intel idle, Len, have you seen similar problems before?
Comment 4 qbanin 2014-06-11 08:55:50 UTC
Created attachment 139061 [details]
acpidump log
Comment 5 qbanin 2014-06-23 01:33:18 UTC
Today "my" workaround for this issue stopped working :( and I have no idea why.
Comment 6 Lan Tianyu 2014-06-25 08:05:15 UTC
Could you apply the following patch to open EC debug option and send out the dmesg?

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index ad11ba4..0708d71 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -27,7 +27,7 @@
  */
 
 /* Uncomment next line to get verbose printout */
-/* #define DEBUG */
+#define DEBUG
 #define pr_fmt(fmt) "ACPI : EC: " fmt
 
 #include <linux/kernel.h>
Comment 7 qbanin 2014-06-25 22:19:04 UTC
Created attachment 140941 [details]
Dmesg output

Dmesg captured after fresh boot + few brightness changes and AC adapter plug/unplug. ACPI events reported correctly (so far).
Comment 8 qbanin 2014-06-25 22:43:22 UTC
Created attachment 140951 [details]
Dmesg with ec.c dirty patch applied
Comment 9 qbanin 2014-06-25 22:49:36 UTC
Regarding comment #8. 

With debug enabled vanilla ec.c I can change brightness and my notebooks reacts to AC unplug but "acpi_listen" output is empty, brightness level bar in KDE is missing and battery applet is still showing "charging 100%" even if AC is unplugged.

3 days ago right after my previous message I applied a dirty patch to ec.c:

*** linux-3.14.8/drivers/acpi/ec.c.orig	2014-06-16 22:41:19.000000000 +0200
--- linux-3.14.8/drivers/acpi/ec.c	2014-06-26 00:34:17.489809373 +0200
***************
*** 1032,1037 ****
--- 1032,1046 ----
  	{
  	ec_clear_on_resume, "Samsung hardware", {
  	DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD.")}, NULL},
+ 	{
+ 	ec_clear_on_resume, "CLEVO hardware", {
+ 	DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL},
+ 	{
+ 	ec_skip_dsdt_scan, "CLEVO hardware", {
+ 	DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL},
+ 	{
+ 	ec_flag_msi, "CLEVO hardware", {
+ 	DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL},
  	{},
  };


-------------------

This "patch" solved all my ACPI related issues (except ec_clear_on_resume, it doesn't change anything). Forgive me the mess in my reports. I hope they will help you to locate the bug. :) Please see attached second debug-enabled dmesg output from my previous message with this patch applied.

Regards
Qba
Comment 10 Lan Tianyu 2014-06-26 02:05:49 UTC
(In reply to qbanin from comment #9)
> Regarding comment #8. 
> 
> With debug enabled vanilla ec.c I can change brightness and my notebooks
> reacts to AC unplug 

Do you mean these functions work just after enabling debug option?

> but "acpi_listen" output is empty, brightness level bar
> in KDE is missing and battery applet is still showing "charging 100%" even
> if AC is unplugged.
> 
> 3 days ago right after my previous message I applied a dirty patch to ec.c:
> 
> *** linux-3.14.8/drivers/acpi/ec.c.orig       2014-06-16 22:41:19.000000000
> +0200
> --- linux-3.14.8/drivers/acpi/ec.c    2014-06-26 00:34:17.489809373 +0200
> ***************
> *** 1032,1037 ****
> --- 1032,1046 ----
>       {
>       ec_clear_on_resume, "Samsung hardware", {
>       DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD.")}, NULL},
> +     {
> +     ec_clear_on_resume, "CLEVO hardware", {
> +     DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL},
> +     {
> +     ec_skip_dsdt_scan, "CLEVO hardware", {
> +     DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL},
> +     {
> +     ec_flag_msi, "CLEVO hardware", {
> +     DMI_MATCH(DMI_SYS_VENDOR, "CLEVO CO.")}, NULL},
>       {},
>   };
> 
> 
> -------------------
> 
> This "patch" solved all my ACPI related issues (except ec_clear_on_resume,
> it doesn't change anything). Forgive me the mess in my reports. I hope they
> will help you to locate the bug. :) Please see attached second debug-enabled
> dmesg output from my previous message with this patch applied.

Could you try the following patchset?

http://marc.info/?l=linux-acpi&m=140279290812659&w=2
http://marc.info/?l=linux-acpi&m=140279291012660&w=2
http://marc.info/?l=linux-acpi&m=140279292712665&w=2
http://marc.info/?l=linux-acpi&m=140279300712697&w=2
http://marc.info/?l=linux-acpi&m=140279300012696&w=2
http://marc.info/?l=linux-acpi&m=140279299612694&w=2
http://marc.info/?l=linux-acpi&m=140279299312692&w=2

> 
> Regards
> Qba
Comment 11 qbanin 2014-06-26 02:40:51 UTC
(In reply to Lan Tianyu from comment #10)

> Do you mean these functions work just after enabling debug option?

No :) By enabling debug only I mean without my dirty patch.

> 
> Could you try the following patchset?
> 
> http://marc.info/?l=linux-acpi&m=140279290812659&w=2
> http://marc.info/?l=linux-acpi&m=140279291012660&w=2
> http://marc.info/?l=linux-acpi&m=140279292712665&w=2
> http://marc.info/?l=linux-acpi&m=140279300712697&w=2
> http://marc.info/?l=linux-acpi&m=140279300012696&w=2
> http://marc.info/?l=linux-acpi&m=140279299612694&w=2
> http://marc.info/?l=linux-acpi&m=140279299312692&w=2
> 
> > 
> > 

Ok, I'll try it and let you know.
Comment 12 qbanin 2014-06-26 20:07:54 UTC
Applied your patches except 6/7 which seems to be already included in 3.14.8. Everything was working fine for about 1h. Now the issue is back (no ACPI events reported, no reaction to brightness up/down buttons nor AC (un)plug).
Comment 13 Lan Tianyu 2014-06-27 02:57:55 UTC
(In reply to qbanin from comment #12)
> Applied your patches except 6/7 which seems to be already included in
> 3.14.8. Everything was working fine for about 1h. Now the issue is back (no
> ACPI events reported, no reaction to brightness up/down buttons nor AC
> (un)plug).

Do you mean it worked normally for 1 hour and then broke again?
Comment 14 qbanin 2014-06-27 08:01:05 UTC
(In reply to Lan Tianyu from comment #13)
> (In reply to qbanin from comment #12)
> > Applied your patches except 6/7 which seems to be already included in
> > 3.14.8. Everything was working fine for about 1h. Now the issue is back (no
> > ACPI events reported, no reaction to brightness up/down buttons nor AC
> > (un)plug).
> 
> Do you mean it worked normally for 1 hour and then broke again?

Yes, but it wasn't exactly 1 hour. I was checking it for about 10 mins after fresh boot and it was working fine, then rechecked after ~1h+ and it wasn't working anymore.
Comment 15 Lan Tianyu 2014-06-27 08:08:44 UTC
Could you show the dmesg with EC debug option after not working?

BTW, please add kernel parameter "log_buf_len=10M" to increase the log buffer.
Comment 16 qbanin 2014-06-27 09:29:01 UTC
Created attachment 141081 [details]
dmesg_broken_acpi

This time it didn't work at all after boot.
Comment 17 Lan Tianyu 2014-07-02 06:22:09 UTC
From the log, there is EC event for thermal zone. Did you plug/unplug AC?

[   47.439317] ACPI : EC: ===== TASK =====
[   47.439320] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[   47.439321] ACPI : EC: EC_SC(W) = 0x84
[   47.440283] ACPI : EC: ===== TASK =====
[   47.440289] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   47.440292] ACPI : EC: EC_DATA(R) = 0x1c <== event number.
[   47.440294] ACPI : EC: push query execution (0x1c) on queue
[   47.440296] ACPI : EC: transaction end
[   47.440300] ACPI : EC: start query execution
[   47.440544] ACPI : EC: transaction start (cmd=0x82, addr=0x00)



            Method (_Q1C, 0, NotSerialized)  // _Qxx: EC Query
            {
                P8XH (Zero, 0x1C)
                Notify (\_TZ.TZ0, 0x80)
                Notify (\_TZ.TZ0, 0x81)
                ADJP ()
            }
Comment 18 qbanin 2014-07-02 07:51:50 UTC
I think I did, but I'm not sure.
Comment 19 Lan Tianyu 2014-07-02 07:58:35 UTC
At last, I don't see EC event for AC. Could you doublecheck again and attach the log?
Comment 20 Lan Tianyu 2014-07-04 06:05:48 UTC
Any update?
Comment 21 qbanin 2014-07-04 16:26:40 UTC
Created attachment 142071 [details]
dmesg_broken_acpi updated

Booted up my netbook -> logged into KDE -> tried to change brightness up/down few times (no reaction) -> unplugged and replugged AC (no reaction) -> caputured dmesg.
Comment 22 qbanin 2014-07-04 16:40:38 UTC
UPDATE: Unplugged and replugged AC twice
Comment 23 Lan Tianyu 2014-07-07 02:13:19 UTC
I see one EC event for thermal in the log and no for AC or Backlight. Could you reset Bios? Don't know why no event for AC and Backlight. This bug has different symptoms with the issue on Samsung machine because thermal event is available.
Comment 24 qbanin 2014-07-07 12:10:10 UTC
Created attachment 142271 [details]
dmesg_broken_acpi_after_bios_reset

Bios reset + steps from comment #21 . The only difference was brightness change ONCE to lower value after ~1 min from button press. The rest was the same (no reaction to AC, brightness up/down).
Comment 25 Lan Tianyu 2014-07-08 03:44:20 UTC
Yes, I saw some events for backlight in the log.

[  151.157294] ACPI : EC: push query execution (0x1c) on queue


   Method (_Q11, 0, NotSerialized)  // _Qxx: EC Query
            {
                If (LEqual (^^^GFX0.CDDS (0x0410), 0x1F))
                {
                    P8XH (Zero, 0x11)
                    Notify (^^^GFX0.LCD0, 0x87)   <== notify video driver.
                    If (LEqual (ECOS, 0x02))
                    {
                        Store (0xE0, ^^^^WMI.EVNT)
                        Notify (WMI, 0xD0)
                    }
                    Else
                    {
                        Add (OEM2, 0xE0, ^^^^WMI.EVNT)
                        Notify (WMI, 0xD0)
                    }
                }
            }

So far, I don't have idea since the EC event is triggered by GPE and totally depends on hardware. Further more, the thermal event works. How about your previous quirk workaround in the comment 9? Does it still work?
Comment 26 Lan Tianyu 2014-07-08 07:23:25 UTC
(In reply to Lan Tianyu from comment #25)
> Yes, I saw some events for backlight in the log.
> 
> [  151.157294] ACPI : EC: push query execution (0x1c) on queue
Sorry. Attached wrong log and the following one is for backlight.

[  151.159291] ACPI : EC: push query execution (0x11) on queue
Comment 27 qbanin 2014-07-08 08:12:44 UTC
(In reply to Lan Tianyu from comment #25)

> How about your
> previous quirk workaround in the comment 9? Does it still work?

Yes, it works perfecly. No issues so far.
Comment 28 Lan Tianyu 2014-07-08 08:25:45 UTC
How about just add ec_skip_dsdt_scan quirk on the new patchset in the comment 11 which has been merged into linux-pm tree?
Comment 29 qbanin 2014-07-08 09:15:23 UTC
Created attachment 142441 [details]
dmesg_broken_acpi_3

Done. Compiled kernel with your patchsets + ec_skip_dsdt_scan quirk and performed the same testing procedure as usual. System reactes to ~50% of events. I managed to change brightnes down/up twice or so, and un(re)plug AC once. System reaction to ACPI events it's either delayed by 15+ seconds or unregistered at all.

It looks like ec_skip_dsdt_scan is not a cure to my issue.
Comment 30 Lan Tianyu 2014-07-09 05:59:28 UTC
(In reply to qbanin from comment #29)
> Created attachment 142441 [details]
> dmesg_broken_acpi_3
> 
> Done. Compiled kernel with your patchsets + ec_skip_dsdt_scan quirk and
> performed the same testing procedure as usual. System reactes to ~50% of
> events.

Thanks. This means ec_skip_dsdt_scan does some thing to make some events work.
But "ec_skip_dsdt_scan" just makes EC device to be probed later.

> I managed to change brightnes down/up twice or so, and un(re)plug AC
> once. System reaction to ACPI events it's either delayed by 15+ seconds or
> unregistered at all.

What do you mean "unregistered at all"? The events were sent 15s later?


> 
> It looks like ec_skip_dsdt_scan is not a cure to my issue.

Could you try just add "ec_flag_msi" quirk?
Comment 31 qbanin 2014-07-09 08:04:51 UTC
(In reply to Lan Tianyu from comment #30)


> What do you mean "unregistered at all"? The events were sent 15s later?

I mean If I press brightness down btn, there's 50% chance the screen will dim after ~15s or nothing will happen :D

> 
> Could you try just add "ec_flag_msi" quirk?

Will try soon and report back.
Comment 32 qbanin 2014-07-09 12:44:24 UTC
Created attachment 142591 [details]
dmesg_ec_flag_msi

Well... it looks like "ec_flag_msi" fixed my issue.
Comment 33 Lan Tianyu 2014-07-11 08:22:43 UTC
How about commenting the following lines?

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index ff16132..23ee51e 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -281,8 +281,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
 {
        unsigned long tmp;
        int ret = 0;
-       if (EC_FLAGS_MSI)
-               udelay(ACPI_EC_MSI_UDELAY);
+//     if (EC_FLAGS_MSI)
+//             udelay(ACPI_EC_MSI_UDELAY);
        /* start transaction */
        spin_lock_irqsave(&ec->lock, tmp);
        /* following two actions should be kept atomic */
Comment 34 qbanin 2014-07-11 14:40:19 UTC
(In reply to Lan Tianyu from comment #33)
> How about commenting the following lines?
> 
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index ff16132..23ee51e 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -281,8 +281,8 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec
> *ec,
>  {
>         unsigned long tmp;
>         int ret = 0;
> -       if (EC_FLAGS_MSI)
> -               udelay(ACPI_EC_MSI_UDELAY);
> +//     if (EC_FLAGS_MSI)
> +//             udelay(ACPI_EC_MSI_UDELAY);
>         /* start transaction */
>         spin_lock_irqsave(&ec->lock, tmp);
>         /* following two actions should be kept atomic */

After commenting out these 2 lines issue is back.
Comment 35 Lan Tianyu 2014-07-14 05:14:20 UTC
Created attachment 142891 [details]
ec.patch

Please try this patch.
Comment 36 qbanin 2014-07-14 09:23:01 UTC
Created attachment 142941 [details]
dmesg_patched_ec

Tried. Symptoms are more or less the same as in comment #31.
Comment 37 Lan Tianyu 2014-07-18 07:25:51 UTC
Thanks for test. I found there is no EC interrupt any morewhen the bug take
places. Please try the following patch.

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index a66ab65..e30bfb1 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -264,9 +264,9 @@ static int ec_poll(struct acpi_ec *ec)
                                                msecs_to_jiffies(1)))
                                        return 0;
                        }
-                       spin_lock_irqsave(&ec->lock, flags);
-                       (void)advance_transaction(ec);
-                       spin_unlock_irqrestore(&ec->lock, flags);
+//                     spin_lock_irqsave(&ec->lock, flags);
+//                     (void)advance_transaction(ec);
+//                     spin_unlock_irqrestore(&ec->lock, flags);
                } while (time_before(jiffies, delay));
                pr_debug("controller reset, restart transaction\n");
                spin_lock_irqsave(&ec->lock, flags);
Comment 38 qbanin 2014-07-18 20:31:59 UTC
This patch cannot be apllied to 3.14.8

------------
patching file drivers/acpi/ec.c
patch unexpectedly ends in middle of line
Hunk #1 FAILED at 264.
1 out of 1 hunk FAILED -- saving rejects to file drivers/acpi/ec.c.rej
------------

This function is different in 3.14.8

static int ec_poll(struct acpi_ec *ec)
{
	unsigned long flags;
	int repeat = 5; /* number of command restarts */
	while (repeat--) {
		unsigned long delay = jiffies +
			msecs_to_jiffies(ec_delay);
		do {
			/* don't sleep with disabled interrupts */
			if (EC_FLAGS_MSI || irqs_disabled()) {
				udelay(ACPI_EC_MSI_UDELAY);
				if (ec_transaction_done(ec))
					return 0;
			} else {
				if (wait_event_timeout(ec->wait,
						ec_transaction_done(ec),
						msecs_to_jiffies(1)))
					return 0;
			}
			advance_transaction(ec, acpi_ec_read_status(ec));
		} while (time_before(jiffies, delay));
		pr_debug("controller reset, restart transaction\n");
		spin_lock_irqsave(&ec->lock, flags);
		start_transaction(ec);
		spin_unlock_irqrestore(&ec->lock, flags);
	}
	return -ETIME;
}
Comment 39 Lan Tianyu 2014-07-21 02:49:28 UTC
You can comment the advance_transaction() line directly.

But it's better that you test newest code from linux-pm tree bleeding-edge branch.

git clone git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
git checkout bleeding-edge
Comment 40 Lan Tianyu 2014-08-04 01:29:59 UTC
Hi:
    Any update?
Comment 41 qbanin 2014-08-07 18:31:47 UTC
Created attachment 145571 [details]
Dmesg with applied patch from #37

After reboot unplugged and replugged AC twice (immediate reaction), lowered brightness by few steps (immediate reaction) but then my system frozen for ~20s (no screen updates, no reaction to input from keyboard nor mouse). After 20s everything resumed but no more acpi events were reported to OS. Attached dmesg captured after resume from softlock.
Comment 42 Lan Tianyu 2014-08-19 02:47:04 UTC
Hi, Please provide output of dmidecode.
Comment 43 qbanin 2014-08-19 02:49:05 UTC
Created attachment 147181 [details]
dmicode output
Comment 44 Lan Tianyu 2014-08-19 03:07:07 UTC
Created attachment 147191 [details]
ec.patch

Please try this patch. It adds MSI quirk for your machine.
Comment 45 Lv Zheng 2014-08-19 04:22:35 UTC
Hi,

I checked 141081, since these log entries:
[    3.336688] ACPI : EC: transaction start (cmd=0x83, addr=0x00)
[    3.336688] ACPI : EC: ===== TASK =====
[    3.336691] ACPI : EC: EC_SC(R) = 0x10 SCI_EVT=0 BURST=1 CMD=0 IBF=0 OBF=0
[    3.336692] ACPI : EC: EC_SC(W) = 0x83
[    3.336698] ACPI : EC: ===== IRQ =====
[    3.336701] ACPI : EC: EC_SC(R) = 0x18 SCI_EVT=0 BURST=1 CMD=1 IBF=0 OBF=0
[    3.336706] ACPI : EC: EC_SC(R) = 0x18 SCI_EVT=0 BURST=1 CMD=1 IBF=0 OBF=0
[    3.336713] ACPI : EC: EC_SC(R) = 0x18 SCI_EVT=0 BURST=1 CMD=1 IBF=0 OBF=0
[    3.336714] ACPI : EC: transaction end
[    3.336753] ACPI : EC: ===== IRQ =====
[    3.336755] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[    3.336758] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[    3.336773] ACPI : EC: ===== IRQ =====
[    3.336777] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[    3.336779] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
The last command handled using interrupt mode was a BD_EC command to disable EC BURST mode.

I also checked 142071, since these log entries:
[    3.715777] ACPI : EC: transaction start (cmd=0x80, addr=0x10)
[    3.715779] ACPI : EC: ===== TASK =====
[    3.715782] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[    3.715783] ACPI : EC: EC_SC(W) = 0x80
[    3.716144] ACPI : EC: ===== TASK =====
[    3.716147] ACPI : EC: EC_SC(R) = 0x08 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=0
[    3.716148] ACPI : EC: EC_DATA(W) = 0x10
[    3.716151] ACPI : EC: ===== IRQ =====
[    3.716156] ACPI : EC: EC_SC(R) = 0x02 SCI_EVT=0 BURST=0 CMD=0 IBF=1 OBF=0
[    3.716159] ACPI : EC: EC_SC(R) = 0x02 SCI_EVT=0 BURST=0 CMD=0 IBF=1 OBF=0
[    3.716175] ACPI : EC: ===== IRQ =====
[    3.716180] ACPI : EC: EC_SC(R) = 0x02 SCI_EVT=0 BURST=0 CMD=0 IBF=1 OBF=0
[    3.716183] ACPI : EC: EC_SC(R) = 0x02 SCI_EVT=0 BURST=0 CMD=0 IBF=1 OBF=0
[    3.716280] ACPI : EC: ===== IRQ =====
[    3.716283] ACPI : EC: EC_SC(R) = 0x01 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=1
[    3.716286] ACPI : EC: EC_DATA(R) = 0x05
[    3.716289] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[    3.716297] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[    3.716298] ACPI : EC: transaction end
[    3.716307] ACPI : EC: ===== IRQ =====
[    3.716311] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
[    3.716314] ACPI : EC: EC_SC(R) = 0x00 SCI_EVT=0 BURST=0 CMD=0 IBF=0 OBF=0
The last command handled using interrupt mode was a RD_EC to read from address 0x10.

For both cases, BURST mode has been enabled for several times.

After the listed log entries, no more PE IRQ generated by the platform can be seen in the log. And since then, the SCI_EVT has never been set.

By using MSI quirk, we are forcing all commands to be issued in the BURST mode and there is code to handle some timing requirements for such firmware.

This quirk has 3 things implemented:
1. delaying 500us between transactions in acpi_ec_transaction_unlocked()
2. delaying 500us instead of waiting 1ms in task context before advancing transaction in ec_poll()
3. forcing every command to be issued in BURST mode in acpi_ec_space_handler()

From comment 34, I learned that with 2 and 3 and without 1, this bug cannot be fixed.
From comment 36, I learned that with 1 and without 2 and 3, this bug cannot be fixed.

So why don't we try a combination of 1 + 2 to make sure this is just a timing issue?
If it was fixed by forcing BURSE mode, then root causes could be others.

Let me post a debug patch later after this comment.
We also need to check SMI_EVT flag for this bug.

Thanks and best regards
-Lv
Comment 46 Lv Zheng 2014-08-19 04:41:19 UTC
Created attachment 147201 [details]
[PATCH] Debugging if udelay is required by this hardware

This patch is generated to perform a test:
1. based on the working environment reported in comment 32
2. remove timing code

Please check if this patch can work for your platform.
Comment 47 Lv Zheng 2014-08-19 04:44:06 UTC
Created attachment 147211 [details]
[PATCH] Debugging if BURST mode can fix this issue

If 147201 cannot fix this issue, please try this patch without 147201 applied.

This patch is generated to perform a test:
1. based on the working environment reported in comment 32
2. remove burst mode code

Please check if this patch can work for your platform.
Comment 48 Lv Zheng 2014-08-19 04:56:51 UTC
It's better you can post dmesg for both tests.

Thanks in advance.
Comment 49 qbanin 2014-08-19 18:01:37 UTC
I'm sorry but I'm a bit confised. Which patch should I try in first place: 147191, 147201 or 147211 or all of them? :)
Comment 50 Lv Zheng 2014-08-20 04:21:38 UTC
(In reply to qbanin from comment #49)
> I'm sorry but I'm a bit confised. Which patch should I try in first place:
> 147191, 147201 or 147211 or all of them? :)

Hi,

Please try all of them.
But don't apply them all.
Please apply only one patch and make sure others are not applied for each try.

Thanks and best regards
-Lv
Comment 51 Lan Tianyu 2014-08-21 02:05:58 UTC
(In reply to qbanin from comment #49)
> I'm sorry but I'm a bit confised. Which patch should I try in first place:
> 147191, 147201 or 147211 or all of them? :)

Could you test 147191 separately? I would upstream this one first.
Comment 52 Lv Zheng 2014-08-21 02:57:29 UTC
Hi,

I'm afraid this problem might still need to be root caused.
It's very appreciated that we can have the testability brought by you and such a platform to help us to root cause the flag_msi quirk.
The quirk seems to be wrong, as it looks like there is such an EC firmware in the world that dosn't follow the specification and doesn't correctly implement the IBF/OBF controlled firmware<->driver communication protocol. I doubt how can this still be an ACPI product.

To root cause it, we need
1. trace whether the bug can be exactly fixed only by the udelay() timing code, so I need you to try attachment 147201 [details] and attachment 147211 [details].
2. trace if GPE or SCI is disabled, for this, could you also provide the following information.
A. When the system stops responding to the events, please do:
   # cat /sys/firmware/acpi/interrupts/gpe17
   According to your dmesg output, the GPE 0x17 is used for the EC:
     ACPI : EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
                      ^^^^
   You can replace 17 to other number according to the dmesg for your boot.
B. Track if the ACPI SCI interrupt has been disabled.
   We can see irq count in the following files.
   # cat /proc/interrupts
   # cat /proc/irq/<x>/spurious
   The number of "x" can be obtained from /proc/interrupts, if you can see such an entry in /proc/interrupts for "acpi":
     9: 5954 0 IO-APIC-fasteoi acpi
     ^                         ^^^^
   Then the "x" is 9.
   You can cat these 2 files several times to see if the "acpi" interrupts can still increase and the unhandled irqs are increased after the system stops responding to the events. Please also report the observation result here by unpload 2 different output results of these 2 files.

Also please provide full name in your reply once so that we can have it correctly filled for the Reported-and-tested-by field in the patch. :-)

Thanks in advance and best regards
-Lv
Comment 53 Lv Zheng 2014-08-21 02:59:28 UTC
(In reply to Lan Tianyu from comment #51)
> (In reply to qbanin from comment #49)
> > I'm sorry but I'm a bit confised. Which patch should I try in first place:
> > 147191, 147201 or 147211 or all of them? :)
> 
> Could you test 147191 separately? I would upstream this one first.

Yes, please confirm this patch first so that the quirk can be upstreamed.
And hope we still can have you performing tests for us to root cause this issue after that.

Best regards
-Lv
Comment 54 Lv Zheng 2014-08-21 07:23:20 UTC
Hi,

I got some clues for the root cause of this issue.

It seems your platform is still able to handle EC events when the EC GPE is disabled:
[   47.439317] ACPI : EC: ===== TASK =====
[   47.439320] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[   47.439321] ACPI : EC: EC_SC(W) = 0x84
[   47.440283] ACPI : EC: ===== TASK =====
[   47.440289] ACPI : EC: EC_SC(R) = 0x09 SCI_EVT=0 BURST=0 CMD=1 IBF=0 OBF=1
[   47.440292] ACPI : EC: EC_DATA(R) = 0x1c
So there is no special timing requirement for this issue.

We just don't know why the GPE is disabled.

Note that in the current EC driver, event polling can only happen after an EC transaction is completed. So if your platform or the driver enters "GPE disabled" mode, and there is no EC command issued, the EC driver will not be able to handle EC event again.

I think this issue can be fixed by this patchset:
https://lkml.org/lkml/2014/7/21/43
In this patchset, we have a seperate thread to poll EC event timely.

Let me prepare a tarball of this patchset for you to test.

Thanks and best regards
-Lv
Comment 55 Lv Zheng 2014-08-22 04:28:44 UTC
Created attachment 147711 [details]
The EC event polling implementation

This patchset contains SCI_EVT polling mode implementation.

It's a big patchset based on some GPE API improvements.
So you need to apply all of them.
Sorry for the inconvenience.

This patchset is a revised ones for the following 2 patchsets:
https://lkml.org/lkml/2014/7/14/901
https://lkml.org/lkml/2014/7/21/43
They are under internal review.

You need to apply the following patches orderly:
lv-gpe01.patch
lv-gpe02.patch
lv-gpe03.patch
lv-gpe04.patch
lv-gpe05.patch
lv-gpe06.patch
lv-gpe07.patch
lv-gpe08.patch
lv-gpe09.patch
lv-gpe10.patch
lv-gpe11.patch
lv-gpe12.patch
lv-gpe13.patch
ec-event01.patch
ec-event02.patch
ec-event03.patch
ec-event04.patch
ec-event05.patch
ec-event06.patch
ec-event07.patch
ec-event08.patch
ec-event09.patch
ec-event10.patch
ec-event11.patch
ec-event12.patch
ec-event13.patch
ec-event14.patch

The ec-event15.patch and ec-event16.patch are not necessary.
If you have quilt installed, then you can:
# cd linux
# ln -s <path to uncompressed folder>/ec-patches ./patches
# quilt push -a

Please give this a try and post the dmesg output here to see if the bugs can be fixed for your system.
If it's still not fixed, please try a boot parameter "acpi.ec_poll_events=Y" and post the dmesg output here.

Well this is just a patchset that can make your platform working. We still haven't root caused it. To root cause it, we need the test result mentioned in the comment 52. For test 2 in comment 52, you can try the test by using the kernel compiled with this patchset applied, so that more information including hardware GPE register's current settings can be dumped from /sys/firware/acpi/events/gpe17.

Thanks in advance.

Best regards
-Lv
Comment 56 Lan Tianyu 2014-08-25 05:43:21 UTC
(In reply to Lan Tianyu from comment #51)
> (In reply to qbanin from comment #49)
> > I'm sorry but I'm a bit confised. Which patch should I try in first place:
> > 147191, 147201 or 147211 or all of them? :)
> 
> Could you test 147191 separately? I would upstream this one first.

Hi Qbanin:
       Could you try this first? It's much simple and just like the quirk patch you have tried.
Comment 57 qbanin 2014-08-28 14:13:54 UTC
I'm sorry it took so long. 147191 works fine, just like my dirty quirk.

Will post results of the remaining patches soon.
Comment 58 qbanin 2014-08-28 14:35:19 UTC
Im having problem with applying patches from comment #55 on top 3.16.1. Few hunks got rejected and the bleeding-edge kernel doesn't compile :(
Comment 59 Lv Zheng 2014-08-29 00:53:57 UTC
(In reply to qbanin from comment #58)
> Im having problem with applying patches from comment #55 on top 3.16.1. Few
> hunks got rejected and the bleeding-edge kernel doesn't compile :(

OK.
I can help to rebase them on 3.16.1.
I can also merge some of them to make it easier.

Thanks for your testing.

Best regards
-Lv
Comment 60 Lv Zheng 2014-08-29 00:55:04 UTC
What's the result of the test mentioned in comment 55?
I just want to find out why the GPE is disabled.
Comment 61 Lan Tianyu 2014-08-29 03:19:20 UTC
The fix patch has been sent to ACPI maillist. Please help LV to test his patchset and that will be very helpful to improve EC driver. Thanks.

https://patchwork.kernel.org/patch/4808561/
Comment 62 qbanin 2014-08-31 20:45:15 UTC
(In reply to Lan Tianyu from comment #61)
> The fix patch has been sent to ACPI maillist. Please help LV to test his
> patchset and that will be very helpful to improve EC driver. Thanks.
> 
> https://patchwork.kernel.org/patch/4808561/

Not a problem, but I need a working patchset for 3.16.1. I have not enough skills to modify it by myself :(
Comment 63 Lan Tianyu 2014-09-01 01:27:35 UTC
I think LV can give you a branch and you just need to run "git pull".

LV, do you have such branch?
Comment 64 Lv Zheng 2014-09-05 07:45:50 UTC
Hi,

Sorry for the delay.
Could you please try the following git repository:

# git clone https://github.com/zetalog/linux
# git checkout ec-next
# copy <your old kernel directory>/.config ./linux/.config

I've merged all posted patches on top of recent Rafael's linux-pm.git/linux-next branch.
Please use this kernel and your previous kernel configuration to try again.

Thanks in advance
-Lv
Comment 65 Lv Zheng 2015-05-13 07:06:09 UTC
Could you help to try the upstream kernel with this quirk disabled on your platform.
We have another commit to ensure the register access guarding in the wait polling mode:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9e295ac
So it should be safe now to use wait polling instead of busy polling for your platform.

Thanks and best regards
-Lv
Comment 66 Len Brown 2015-07-21 19:44:03 UTC
The quirk was added for this machine in Linux-3.17-rc4:

commit 777cb382958851c88763253fe00a26529be4c0e9
Author: Lan Tianyu <tianyu.lan@intel.com>
Date:   Fri Aug 29 10:50:08 2014 +0800

    ACPI / EC: Add msi quirk for Clevo W350etq


The quirk was removed and replaced, hopefully, by
a fixed EC driver in Linux 4.2-rc1:

commit 3174abcfea6a05aa25038156d6722b6c8876fb36
Author: Lv Zheng <lv.zheng@intel.com>
Date:   Fri May 15 14:37:11 2015 +0800

    ACPI / EC: Remove non-root-caused busy polling quirks.


Please re-open if this machine does not work with above kernels.